Improved Infilling of Missing Metadata from Expendable Bathythermographs (XBTs) Using Multiple Machine Learning Methods

Historical in situ ocean temperature profile measurements are important for a wide range of ocean and climate research activities. A large proportion of the profile observations have been recorded using expendable bathyther- mographs (XBTs), and required bias corrections for use in climate change studies. It is generally accepted that the bias, and therefore bias correction, depends on the type of XBT used. However, poor historical metadata collection practices mean the XBT probe type information is often missing, for 59% of profiles between 1967 and 2000, limiting the develop- ment of reliable bias corrections. We develop a process of estimating missing instrument type metadata (the combination of both model and manufacturer) systematically, constructing a machine learning pipeline based on thorough data explo- ration to inform these choices. The predicted instrument type, where missing, will facilitate improved XBT bias correc- tions. The new approach improves the accuracy of the XBT type classification compared to previous approaches from a recall value of 0.75–0.94. We also develop an approach to account for the uncertainty associated with metadata assign- ments using ensembles of decision trees, which could feed into an ensemble approach to creating ocean temperature data- sets. We describe the challenges arising from the nature of the dataset in applying standard machine learning techniques to the problem. We have implemented this in a portable, reproducible way using standard data science tools, with a view to these techniques being applied to other similar problems in climate science.

PDF

Datasets


Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods


No methods listed for this paper. Add relevant methods here