ResearchSpace

Empirical methods for the estimation of Southern Ocean CO2: support vector and random forest regression

Show simple item record

dc.contributor.author Gregor, Luke
dc.contributor.author Kok, S
dc.contributor.author Monteiro, Pedro MS
dc.date.accessioned 2018-05-11T11:47:33Z
dc.date.available 2018-05-11T11:47:33Z
dc.date.issued 2017-12
dc.identifier.citation Gregor, L., Kok, S. and Monteiro, P.M.S. 2017. Empirical methods for the estimation of Southern Ocean CO2: support vector and random forest regression. Biogeosciences, vol. 14: 5551-5569 en_US
dc.identifier.issn 1726-4170
dc.identifier.issn 1726-4189
dc.identifier.uri https://doi.org/10.5194/bg-14-5551-2017
dc.identifier.uri https://www.biogeosciences.net/14/5551/2017/
dc.identifier.uri http://hdl.handle.net/10204/10192
dc.description © Author(s) 2017. This work is distributed under the Creative Commons Attribution 3.0 License. en_US
dc.description.abstract The Southern Ocean accounts for 40 % of oceanic CO2 uptake, but the estimates are bound by large uncertainties due to a paucity in observations. Gap-filling empirical methods have been used to good effect to approximate pCO2 from satellite observable variables in other parts of the ocean, but many of these methods are not in agreement in the Southern Ocean. In this study we propose two additional methods that perform well in the Southern Ocean: support vector regression (SVR) and random forest regression (RFR). The methods are used to estimate pCO2 in the Southern Ocean based on SOCAT v3, achieving similar trends to the SOM-FFN method by Landschützer et al. (2014). Results show that the SOM-FFN and RFR approaches have RMSEs of similar magnitude (14.84 and 16.45 µatm, where 1 atm = 101 325 Pa) where the SVR method has a larger RMSE (24.40 µatm). However, the larger errors for SVR and RFR are, in part, due to an increase in coastal observations from SOCAT v2 to v3, where the SOM-FFN method used v2 data. The success of both SOM-FFN and RFR depends on the ability to adapt to different modes of variability. The SOM-FFN achieves this by having independent regression models for each cluster, while this flexibility is intrinsic to the RFR method. Analyses of the estimates shows that the SVR and RFR's respective sensitivity and robustness to outliers define the outcome significantly. Further analyses on the methods were performed by using a synthetic dataset to assess the following: which method (RFR or SVR) has the best performance? What is the effect of using time, latitude and longitude as proxy variables on pCO2? What is the impact of the sampling bias in the SOCAT v3 dataset on the estimates? We find that while RFR is indeed better than SVR, the ensemble of the two methods outperforms either one, due to complementary strengths and weaknesses of the methods. Results also show that for the RFR and SVR implementations, it is better to include coordinates as proxy variables as RMSE scores are lowered and the phasing of the seasonal cycle is more accurate. Lastly, we show that there is only a weak bias due to undersampling. The synthetic data provide a useful framework to test methods in regions of sparse data coverage and show potential as a useful tool to evaluate methods in future studies. en_US
dc.language.iso en en_US
dc.publisher Copernicus Gesellschaft MBH en_US
dc.relation.ispartofseries Worklist;20237
dc.subject CO2 en_US
dc.subject Random forest regression en_US
dc.subject Support vector regression en_US
dc.subject Southern Ocean en_US
dc.title Empirical methods for the estimation of Southern Ocean CO2: support vector and random forest regression en_US
dc.type Article en_US
dc.identifier.apacitation Gregor, L., Kok, S., & Monteiro, P. M. (2017). Empirical methods for the estimation of Southern Ocean CO2: support vector and random forest regression. http://hdl.handle.net/10204/10192 en_ZA
dc.identifier.chicagocitation Gregor, Luke, S Kok, and Pedro MS Monteiro "Empirical methods for the estimation of Southern Ocean CO2: support vector and random forest regression." (2017) http://hdl.handle.net/10204/10192 en_ZA
dc.identifier.vancouvercitation Gregor L, Kok S, Monteiro PM. Empirical methods for the estimation of Southern Ocean CO2: support vector and random forest regression. 2017; http://hdl.handle.net/10204/10192. en_ZA
dc.identifier.ris TY - Article AU - Gregor, Luke AU - Kok, S AU - Monteiro, Pedro MS AB - The Southern Ocean accounts for 40 % of oceanic CO2 uptake, but the estimates are bound by large uncertainties due to a paucity in observations. Gap-filling empirical methods have been used to good effect to approximate pCO2 from satellite observable variables in other parts of the ocean, but many of these methods are not in agreement in the Southern Ocean. In this study we propose two additional methods that perform well in the Southern Ocean: support vector regression (SVR) and random forest regression (RFR). The methods are used to estimate pCO2 in the Southern Ocean based on SOCAT v3, achieving similar trends to the SOM-FFN method by Landschützer et al. (2014). Results show that the SOM-FFN and RFR approaches have RMSEs of similar magnitude (14.84 and 16.45 µatm, where 1 atm = 101 325 Pa) where the SVR method has a larger RMSE (24.40 µatm). However, the larger errors for SVR and RFR are, in part, due to an increase in coastal observations from SOCAT v2 to v3, where the SOM-FFN method used v2 data. The success of both SOM-FFN and RFR depends on the ability to adapt to different modes of variability. The SOM-FFN achieves this by having independent regression models for each cluster, while this flexibility is intrinsic to the RFR method. Analyses of the estimates shows that the SVR and RFR's respective sensitivity and robustness to outliers define the outcome significantly. Further analyses on the methods were performed by using a synthetic dataset to assess the following: which method (RFR or SVR) has the best performance? What is the effect of using time, latitude and longitude as proxy variables on pCO2? What is the impact of the sampling bias in the SOCAT v3 dataset on the estimates? We find that while RFR is indeed better than SVR, the ensemble of the two methods outperforms either one, due to complementary strengths and weaknesses of the methods. Results also show that for the RFR and SVR implementations, it is better to include coordinates as proxy variables as RMSE scores are lowered and the phasing of the seasonal cycle is more accurate. Lastly, we show that there is only a weak bias due to undersampling. The synthetic data provide a useful framework to test methods in regions of sparse data coverage and show potential as a useful tool to evaluate methods in future studies. DA - 2017-12 DB - ResearchSpace DP - CSIR KW - CO2 KW - Random forest regression KW - Support vector regression KW - Southern Ocean LK - https://researchspace.csir.co.za PY - 2017 SM - 1726-4170 SM - 1726-4189 T1 - Empirical methods for the estimation of Southern Ocean CO2: support vector and random forest regression TI - Empirical methods for the estimation of Southern Ocean CO2: support vector and random forest regression UR - http://hdl.handle.net/10204/10192 ER - en_ZA


Files in this item

This item appears in the following Collection(s)

Show simple item record