
Community members

Data conversion for the MOSAiC webODV

Freier, Julia1 , Mieruch-Schnülle, S.1 , Schlitzer, R.1
  1. Alfred-Wegener-Institut - Helmholtz-Zentrum für Polar- und Meeresforschung

This contribution is an introduction to the data management process in the MOSAiC - Virtual Research Environment (M-VRE) project. The M-VRE project aims to make the unique and interdisciplinary data set of MOSAiC easily accessible to the field of scientists from different research areas [ADD22]. In addition, a virtual environment is available to analyze and visualize them directly online. This supports the research in improving transparency, traceability, reproducibility and visibility.
One tool that is incorporated within M-VRE is webODV, the online version of Ocean Data View (ODV) [Sch22]. ODV is a software for visualization of oceanographic data in oceanography since almost 30 years. Given its software structure, it is equally suitable for data of the atmosphere, on land, on ice. Yet, there are requirements of ODV regarding the format of the data set which is why a conversion of the data is required.
In the following, the workflow of data from archive to webODV is described.
First of all, the data source needed to be defined. As part of the MOSAiC project, an agreement was reached through the MOSAiC Data Policy to upload the data to the long-term archive PANGAEA [Imm+19]. For this reason, PANGAEA is used as the data source for the webODV implementation in M-VRE.
Secondly, the automated query and download of the MOSAiC data is applied. The search of entries with tag ”mosaic20192020” is automated. The PANGAEA Request Results service [PAN22b] is used to access the metadata.
The third step is the conversion of the data format. It is based on the code pangaea2odv written by R. Koppe (AWI Bremerhaven). It is a Python script to convert the PANGAEA .tab format to an ASCII format executable by ODV. The target format is a .txt file consisting in header and data in tab-separated columns.  The following meta variables are defined: Basis, Cruise, Event, Station, Project, URL, RIS and BibTeX citation, Version, Last modified, Scientists, main scientist, Contact, Method, Bot. Depth [m], Original file URL, Longitude and Latitude. The data variables include all the data variables defined in PANGAEA. Depending on the data types of the collection, the primary variable is selected.
Furthermore, the collections are supposed to resemble the PANGAEA data sets as closely as possible. However, it is necessary that similar measurements are combined in the same. For instance, 89 data sets were uploaded by [Aka+21]. Each record is an event and the variables and many metadata are identical. A python routine generates collection names based on the titles of the PANGAEA entries. Among other things, dates, leg numbers, etc. are removed. Finally, to build collections readable by webODV the spreadsheet files first have to be imported into ODV and then saved as a collection (consisting of .odv file and .data folder). This is automated using the terminal.
The deployment of the M-VRE webODV is accessible through the M-VRE [ADD22] project website or directly through the URL [AWI22]. However, the MOSAiC data policy [Imm+19] established that the public release will be on 01/01/2023. Until then, the login requires an AWI account and membership in the MOSAiC consortium. The data structure in which the collections are embedded is based on the structure of the science teams during the Expedition.

