• June 27, 2022, 13:15 – 15:30

All material
  • Slides_NFDI4Earth A…
    July 15, 2022
  • Slides_Development …
    July 15, 2022

Collaborations, initiatives and data strategies (POF IV, DataHub, etc.)

June 27, 2022

Current and future data-driven scientific applications from explorative methods to operational systems require suitable data strategies. The necessary requirements for this are presented in the context of earth system science and their embedding in POF 4 and NFDI. In particular, exemplary examples are presented which stand for consistent data management as a data strategy from data collection to the data product including the data processing steps.

Those who operate autonomous vehicles under or above water want to know where they are, if they are doing well and if they are doing what they supposed to do. Sometimes it makes sense to interfere in the mission or to request a status of a vehicle. When the GEOMAR AUV team were asked to bring two Hover AUVs (part of the MOSES Helmholtz Research Infrastructure Program; “Modular Observation Solutions for Earth Systems”) into operational service, that was exactly what we wanted. We decided to create our own tool: BELUGA.
After almost 3 years BELUGA is a core tool for our operational work with our GIRONA 500 AUVs and their acoustic seafloor beacons. BELUGA allows communication and data exchange between those devices mentioned above and the ship.
Every component inside this kind of ad hoc network has an extended driver installed. This driver handles the messages and their content and decides which communication channels to use: Wi-Fi, satellite or acoustical communication. The driver of the shipborne component has additionally a module for a data model, which allows to add more network devices, sensors and messages in future. The user works on a web based graphical interface, which visualizes all the …

Compliant with the FAIR data principles, long-term archiving of bathymetry data from multibeam echosounders –a highly added value for the data life cycle - is the challenging task in the data information system PANGAEA. To cope with the increasing amount of data (“bathymetry puzzle pieces”) acquired from research vessels and the demand for easy “map-based” means to find valuable data, new semi-automated processes and standard operating procedures (SOPs) for bathymetry data publishing and simultaneous visualization are currently developed.

This research is part of the “Underway Research Data” project, an initiative of the German Marine Research Alliance (Deutsche Allianz Meeresforschung e.V., DAM). The DAM “Underway Research Data” project, spanning across different institutions, started in mid-2019. The aim of the project is to improve and unify the constant data flow from German research vessels to data repositories like PANGAEA. This comprises multibeam-echosounder and other permanently installed scientific devices and sensors following the FAIR data management aspects. Thus, exploiting the full potential of German research vessels as instant “underway” mobile scientific measuring platforms.

In an ongoing effort within the Helmholtz association to make research data FAIR, i.e. findable, accessible, interoperable and reusable we also need to make biological samples visible and searchable. To achieve this a first crucial step is to inventory already available samples, connect them to relevant metadata and assess the requirements for various sample types (e.g. experimental, time series, cruise samples). This high diversity is challenging for when creating standardized workflows to provide a uniform metadata collection with complete and meaningful metadata for each sample. As part of the Helmholtz DataHub at GEOMAR the B iosample I nformation S ystem (BIS) has been set up, turning the former decentral sample management into a fully digital and centrally managed long-term sample storage.

The BIS is based on the open-source research data management system CaosDB, which offers a framework for managing diverse and heterogeneous data.  It supports fine-grained access permissions and regular backups, has a powerfull search engine, different APIs and an extendable WebUI. We have designed a flexible datamodel and multiple WebUI modules to support scientists, technicans and datamanagers in digitalizing and centralising sample metadata and making the metadata visible in data portals (e.g. https://marine-data.de).

The system allows us to manage …

The International Generic Sample Number (IGSN) is a globally unique and persistent identifier for physical objects, such as samples. IGSNs allow to cite, track and locate physical samples and to link samples to corresponding data and publications. The metadata schemata is modular and can be extended by domain-specific metadata.

Within the FAIR WISH projected funded by the Helmholtz Metadata Collaboration, domain-specific metadata schemes for different sample types within Earth and Environment are developed based on three different use cases. These use cases represent all states of digitization, from hand-written notes only to information stored in relational databases. For all stages of digitization, workflows and templates to generate machine-readable IGSN metadata are developed, which allow automatic IGSN registration. These workflows and templates will be published and will contribute to the standardization of IGSN metadata.

Facilitating and monitoring the ingestion and processing of continuous data streams is a challanging exercise that is often only addressed for individual scientific projects and/ or stations and thus results in a heterogeneous data environment.

In order to reduce duplication and to enhance data quality we built a prototypical data ingestion pipeline using open-source frameworks with the goal to a) unify the data flow for various data sources, b) enhance observability at all stages of the pipeline, c) introduce a multi-stage QA/ QC procedure to increase data quality and reduce the lag of data degradation or data failure detection. The system is orchestrated using Prefect , QA/ QC is handled by Great Expectations and SaQC , and the SensorThings API and THREDDS Data Server are used to facilitate data access and integration with other services.

The prototype workflow also features a human-in-the-loop aspect so scientific PIs can act on incoming data problems early and with little effort. The framework is flexible enough so specific needs of individual projects can be addressed while still using a common platform. The final outcome of the pipeline are aggregated data products that are served to the scientists and/ or the public via data catalogues. …

The NFDI 4 Earth Academy is a network of early career scientists interested in linking Earth System and Data Sciences beyond institutional borders . The research networks Geo.X, Geoverbund ABC/J, and DAM offer an open science and learning environment that covers specialized training courses, collaborations within the NFDI4Earth consortium and access to all NFDI 4 Earth innovations and services. Fellows of the Academy advance their research projects by exploring and integrating new methods and connect with like-minded scientists in an agile, bottom-up, and peer-mentored community. We support young scientists in developing skills and mindset for open and data-driven science across disciplinary boundaries.

Current and future data-driven scientific applications from explorative methods to operational systems require suitable data strategies. The necessary requirements for this are presented in the context of earth system science and their embedding in POF 4 and NFDI. In particular, exemplary examples are presented which stand for consistent data management as a data strategy from data collection to the data product including the data processing steps.

Those who operate autonomous vehicles under or above water want to know where they are, if they are doing well and if they are doing what they supposed to do. Sometimes it makes sense to interfere in the mission or to request a status of a vehicle. When the GEOMAR AUV team were asked to bring two Hover AUVs (part of the MOSES Helmholtz Research Infrastructure Program; “Modular Observation Solutions for Earth Systems”) into operational service, that was exactly what we wanted. We decided to create our own tool: BELUGA.
After almost 3 years BELUGA is a core tool for our operational work with our GIRONA 500 AUVs and their acoustic seafloor beacons. BELUGA allows communication and data exchange between those devices mentioned above and the ship.
Every component inside this kind of ad hoc network has an extended driver installed. This driver handles the messages and their content and decides which communication channels to use: Wi-Fi, satellite or acoustical communication. The driver of the shipborne component has additionally a module for a data model, which allows to add more network devices, sensors and messages in future. The user works on a web based graphical interface, which visualizes all the …

Compliant with the FAIR data principles, long-term archiving of bathymetry data from multibeam echosounders –a highly added value for the data life cycle - is the challenging task in the data information system PANGAEA. To cope with the increasing amount of data (“bathymetry puzzle pieces”) acquired from research vessels and the demand for easy “map-based” means to find valuable data, new semi-automated processes and standard operating procedures (SOPs) for bathymetry data publishing and simultaneous visualization are currently developed.

This research is part of the “Underway Research Data” project, an initiative of the German Marine Research Alliance (Deutsche Allianz Meeresforschung e.V., DAM). The DAM “Underway Research Data” project, spanning across different institutions, started in mid-2019. The aim of the project is to improve and unify the constant data flow from German research vessels to data repositories like PANGAEA. This comprises multibeam-echosounder and other permanently installed scientific devices and sensors following the FAIR data management aspects. Thus, exploiting the full potential of German research vessels as instant “underway” mobile scientific measuring platforms.

In an ongoing effort within the Helmholtz association to make research data FAIR, i.e. findable, accessible, interoperable and reusable we also need to make biological samples visible and searchable. To achieve this a first crucial step is to inventory already available samples, connect them to relevant metadata and assess the requirements for various sample types (e.g. experimental, time series, cruise samples). This high diversity is challenging for when creating standardized workflows to provide a uniform metadata collection with complete and meaningful metadata for each sample. As part of the Helmholtz DataHub at GEOMAR the B iosample I nformation S ystem (BIS) has been set up, turning the former decentral sample management into a fully digital and centrally managed long-term sample storage.

The BIS is based on the open-source research data management system CaosDB, which offers a framework for managing diverse and heterogeneous data.  It supports fine-grained access permissions and regular backups, has a powerfull search engine, different APIs and an extendable WebUI. We have designed a flexible datamodel and multiple WebUI modules to support scientists, technicans and datamanagers in digitalizing and centralising sample metadata and making the metadata visible in data portals (e.g. https://marine-data.de).

The system allows us to manage …

The International Generic Sample Number (IGSN) is a globally unique and persistent identifier for physical objects, such as samples. IGSNs allow to cite, track and locate physical samples and to link samples to corresponding data and publications. The metadata schemata is modular and can be extended by domain-specific metadata.

Within the FAIR WISH projected funded by the Helmholtz Metadata Collaboration, domain-specific metadata schemes for different sample types within Earth and Environment are developed based on three different use cases. These use cases represent all states of digitization, from hand-written notes only to information stored in relational databases. For all stages of digitization, workflows and templates to generate machine-readable IGSN metadata are developed, which allow automatic IGSN registration. These workflows and templates will be published and will contribute to the standardization of IGSN metadata.

Facilitating and monitoring the ingestion and processing of continuous data streams is a challanging exercise that is often only addressed for individual scientific projects and/ or stations and thus results in a heterogeneous data environment.

In order to reduce duplication and to enhance data quality we built a prototypical data ingestion pipeline using open-source frameworks with the goal to a) unify the data flow for various data sources, b) enhance observability at all stages of the pipeline, c) introduce a multi-stage QA/ QC procedure to increase data quality and reduce the lag of data degradation or data failure detection. The system is orchestrated using Prefect , QA/ QC is handled by Great Expectations and SaQC , and the SensorThings API and THREDDS Data Server are used to facilitate data access and integration with other services.

The prototype workflow also features a human-in-the-loop aspect so scientific PIs can act on incoming data problems early and with little effort. The framework is flexible enough so specific needs of individual projects can be addressed while still using a common platform. The final outcome of the pipeline are aggregated data products that are served to the scientists and/ or the public via data catalogues. …

The NFDI 4 Earth Academy is a network of early career scientists interested in linking Earth System and Data Sciences beyond institutional borders . The research networks Geo.X, Geoverbund ABC/J, and DAM offer an open science and learning environment that covers specialized training courses, collaborations within the NFDI4Earth consortium and access to all NFDI 4 Earth innovations and services. Fellows of the Academy advance their research projects by exploring and integrating new methods and connect with like-minded scientists in an agile, bottom-up, and peer-mentored community. We support young scientists in developing skills and mindset for open and data-driven science across disciplinary boundaries.