Machine-Learning-Based Comparative Study to Detect Suspect Temperature Gradient Error in Ocean Data.

Cookies disclaimer

Our site saves small pieces of text information (cookies) on your device in order to verify your login. These cookies are essential to provide access to resources on this website and it will not work properly without. Learn more

Chouai, Mohamed¹ , Reimers, F.¹ , Vredenborg, M.¹

, Pinkernell, S.¹ , Mieruch-Schnülle, S.¹

Alfred-Wegener-Institut - Helmholtz-Zentrum für Polar- und Meeresforschung

Thousands of ocean temperature and salinity measurements are collected every day around the world. Controlling the quality of this data is a human resource-intensive task because the control procedures still produce many false alarms only detected by a human expert. Indeed, quality control (QC) procedures have not yet benefited from the recent development of efficient machine learning methods to predict simple targets from complex multi-dimensional features. With increasing amounts of big data, algorithmic help is urgently needed, where artificial intelligence (AI) could play a dominant role. Developments in data mining and machine learning in automatic oceanographic data quality control need to be revolutionized. Such techniques provide a convenient framework to improve automatic QC by using supervised learning to reduce the discrepancy with the human expert evaluation.

This scientific work proposes a comparative analysis of machine learning classification algorithms for ocean data quality control to detect the suspect temperature gradient error. The objective of this work is to obtain a very effective QC classification method from ocean data using a representative set of supervised machine learning algorithms. The work to be presented consists of the second step of our overall system, in which the first is based on a deep convolutional neural network to detect good/bad profiles, and the second is to locate bad samples. For this reason, the dataset used to train the used benchmarking models is composed only of bad profiles.

The following algorithms are used in this study (with a hyperparameters optimisation): Multilayer Perceptron (MLP), Support Vector Machine (SVM) with different kernels, Random Forest (RF), Naive Bayes (NB), K-Nearest Neighbors (KNN), and Linear Discriminant Analysis (LDA). Optimization of the hyper-parameters using Grid-Search is required to ensure the best classification results.

The results obtained on the Unified Database for the Arctic and Subarctic Hydrography (UDASH) dataset are promising, especially with the MLP algorithm, in which we had an accuracy of 86.64% in the detection of good samples and 88.84% in the detection of the bad samples, where room for improvement exists. This system could have the potential to be used as a semi-automatic quality control system.

Push notifications in your browser are not yet configured.

Data Science Symposium No. 7

Posters and Live Demos

Machine-Learning-Based Comparative Study to Detect Suspect Temperature Gradient Error in Ocean Data.

Cookies disclaimer

Data Science Symposium No. 7

Uploaded Material

Posters and Live Demos

Connection details

Cookies disclaimer