• June 28, 2022, 09:00 – 09:45

All material viewing 3 of 15

Posters and Live Demos

Not yet scheduled
Towards the detection of ocean carbon regimes
Layerwise Relevance Propagation for Echo State Networks applied to Earth System Variability.
HELMI – The Hereon Layer For Managing Incoming Data
HARMONise – Enhancing the interoperability of marine biomolecular (meta)data across Helmholtz Centres
Fast and Accurate Physics-constrained Learning with Symmetry Constraints for the Shallow Water Equations
Data conversion for the MOSAiC webODV
The Coastal Pollution Toolbox – data services and products in support of knowledge for action
New approaches for distributed data analysis with the DASF Messaging Framework
Robust Detection of Marine Life with Label-free Image Feature Learning and Probability Calibration
MOSAiC webODV – An online service for the exploration, analysis and visualization of MOSAiC data
Assessing the Feasibility of Self-Supervised Video Frame Interpolation and Forecasting on the Cloudcast Dataset
Approximation and Optimization of Environmental Simulations in High Spatio-Temporal Resolution through Machine Learning Methods
Machine-Learning-Based Comparative Study to Detect Suspect Temperature Gradient Error in Ocean Data.
Machine Learning Parameterization for Cloud Microphysics
MuSSeL project data management and outreach
Investigating the coastal impacts of riverine flood events with the River Plume Workflow
Low-Carbon Routing using Genetic Stochastic Optimization and Global Ocean Weather
Learning deep emulators for the interpolation of satellite altimetry data
AI4FoodSecurity: Identifying crops from space
Automatic low-dimension explainable feature extraction of climatic drivers leading to forest mortality
Model evaluation method affects the interpretation of machine learning models for identifying compound drivers of maize variability

In the context of global climate change and environmental challenges, one research question is how different ocean regions take up carbon dioxide and which bio-physical drivers are responsible for these patterns. The carbon uptake at the sea surface is different in different areas. It depends on several drivers (sea surface temperature, the salinity of the water, alkalinity, dissolved inorganic carbon, phytoplankton, etc.), which enormously vary on both a spatial and seasonal time scale. We name a carbon regime a region having common relationships (on a seasonal and spatial scale) between carbon uptake and its drivers (sea surface temperature, etc.).


We are using the output of a global ocean biogeochemistry model providing surface fields of carbon uptake and its drivers on a monthly time scale. We aim to use spatial and seasonal correlations to detect the regimes.  We take advantage of both supervised and unsupervised machine learning methodologies to find different carbon states. The aim is to determine individual local correlations in each carbon state. We build a top-down grid-based algorithm that incorporates both regression and clustering algorithms. The technique divides the entire ocean surface into smaller grids. The regression model detects a linear relationship between carbon uptake and other ocean …

Artificial neural networks (ANNs) are known to be powerful methods for many hard problems (e.g. image classification or timeseries prediction). However, these models tend to produce black-box results and are often difficult to interpret. Here we present Echo State Networks (ESNs) as a certain type of recurrent ANNs, also known as reservoir computing. ESNs are easy to train and only require a small number of trainable parameters. They can be used not only for timeseries prediction but also for image classification, as shown here: Our ESN model serves as a detector for El Nino Southern Oscillation (ENSO) from sea-surface temperature anomalies. ENSO is actually a well-known problem and has been widely discussed before. But here we use this simple problem to open the black-box and apply layerwise relevance propagation to Echo State Networks.

The Hereon operates a larger number of continuously measuring sensors on mobile and stationary platforms outside the Hereon Campus. Transferring the data from the sensors to the internal network is a critical step , as data is often required to be accessible for researchers in near-real-time (NRT) and needs be retrieved from the outside in a secure way that does not pose a threat to the internal infrastructure.

Therefore, we developed the He reon L ayer for M anaging I ncoming data (HELMI). Using HELMI, data from external sensor systems is moved securely via a Virtual Private Network solution ( Wireguard ) to the Hereon internal infrastructure. The Wireguard client can be either installed directly on the sensor system or on a piece of hardware dedicated for data transfer that is connected to the sensor. Data is transferred as files via the RSYNC or as NRT data via the Message Queuing Telemetry Transport (MQTT) protocol, respectively. After transfer researchers can retrieve their files and access telemetry from an internal endpoint. NRT-Data can be automatically visualized using web applications and parameters at the client can be remotely controlled .

The automatic data transfer via HELMI minimizes the risk of data loss, …

Biomolecules, such as DNA and RNA, make up all ocean life, and biomolecular research in the marine realm is pursued across several Helmholtz Centres. Biomolecular (meta)data (i.e. DNA and RNA sequences and all steps involved in their creation) provide a wealth of information about the distribution and function of marine organisms. However, high-quality (meta)data management of biomolecular data is not yet well developed in environmentally focused Helmholtz Centres. This impedes every aspect of FAIR data exchange internally and externally, and the pursuit of scientific objectives that depend on this data. In this Helmholtz Metadata Collaboration project between the Alfred-Wegener-Institut Helmholtz Zentrum für Polar- und Meeresforschung and the GEOMAR Helmholtz-Zentrum für Ozeanforschung Kiel (with scientific PIs rooted in POFIV Topic 6), we will develop sustainable solutions and digital cultures to enable high-quality, standards-compliant curation and management of marine biomolecular metadata, to better embed biomolecular science in broader digital ecosystems and research domains. The approach will build on locally administered relational databases and establish a web-based hub to exchange metadata compliant with domain-specific standards, such as the MIxS (Minimum Information about any (x) Sequence). To interface with and enhance the Helmholtz digital ecosystem, we aim to link the operations and archiving workflows …

The shallow water equations (SWEs) are widely employed for governing a large-scale fluid flow system, for example, in the coastal regions, oceans, estuaries, and rivers. These partial differential equations (PDEs) are often solved using semiimplicit schemes that solve a linear system iterativelyat each time step, resulting in high computational costs. Here we use physics constrained deep learning to train a convolutoinal network to solve the SWEs, while training on the discretized PDE directly without any need for numerical simulations as training data data. To improve accuracy and stability over longer integration times, we utilise group equivariant convolutional networks, so that the the learned model respects rotational and translational symmetries in the PDEs as hard constraints at every point in the training process. After training, our networks accurately predict the evolution of SWEs for freely chosen initial conditions and multiple time steps. Overall, we find that symmetry constraints signficantly improve performance compared to standard  convolution networks.

This contribution is an introduction to the data management process in the MOSAiC - Virtual Research Environment (M-VRE) project. The M-VRE project aims to make the unique and interdisciplinary data set of MOSAiC easily accessible to the field of scientists from different research areas [ADD22]. In addition, a virtual environment is available to analyze and visualize them directly online. This supports the research in improving transparency, traceability, reproducibility and visibility.
One tool that is incorporated within M-VRE is webODV, the online version of Ocean Data View (ODV) [Sch22]. ODV is a software for visualization of oceanographic data in oceanography since almost 30 years. Given its software structure, it is equally suitable for data of the atmosphere, on land, on ice. Yet, there are requirements of ODV regarding the format of the data set which is why a conversion of the data is required.
In the following, the workflow of data from archive to webODV is described.
First of all, the data source needed to be defined. As part of the MOSAiC project, an agreement was reached through the MOSAiC Data Policy to upload the data to the long-term archive PANGAEA [Imm+19]. For this reason, PANGAEA is used as the data …

Knowledge transfer requires, first, meaningful approaches and products to transfer knowledge amongst different users and, second, appropriate measures for the creation of knowledge across scientific disciplines. The Coastal Pollution Toolbox ( https://www.coastalpollutiontoolbox.org/index.php.en ), a central product of the program-oriented funding topic on “Coastal Transition Zones under Natural and Human Pressures”, serves as a digital working environment for scientists and knowledge hub and information platform for decision-makers. It supports action and optimisation of scientific concepts to investigate pollution in the land-to-sea continuum.

In order to address demands of various users the toolbox comprises of three compartments: Science Tools provide expert users with information on new methods, approaches or indicators for baseline assessments or for the re-evaluation of complex problems. Synthesis Tools address challenges of global environmental change. They are information-rich products based on consolidated data of different types and origin and provide expert users with knowledge. Management Tools provide usable information and options for action. Ready-to-use tools grounded on evidence-based science are available to those involved in planning and management of coastal and marine challenges.

As part of the development process coastal pollution information services will be created and co-developed with stakeholders and end-users. This will ensure optimal interest and use …

The Data Analytics Software Framework (DASF, https://doi.org/10.5880/GFZ.1.4.2021.004 ) supports scientists to conduct data analysis in distributed IT infrastructures by sharing data analysis tools and data. For this purpose, DASF defines a remote procedure call (RPC) messaging protocol that uses a central message broker instance. Scientists can augment their tools and data with this protocol to share them with others or re-use them in different contexts.

Our framework takes standard python code developed by a scientist, and automatically transforms the functions and classes of the scientists code into an abstract layer. This abstraction, the server stub as it is called in RPC, is connected to the message broker and can be accessed by submitting JSON-formatted data through a websocket in the so-called client stub. Therefore the DASF RPC messaging protocol  in general is language independent, so all languages with Websocket support can be utilized. As a start DASF provides two ready-to-use  language bindings for the messaging protocol, one for Python and one for  the Typescript programming language.

DASF is developed at the GFZ German Research Centre for Geosciences and was funded by the Initiative and Networking Fund of the Helmholtz Association through the Digital Earth project ( https://www.digitalearth-hgf.de/ ). In this …

Advances in imaging technology for in situ observation of marine life has significantly increased the size and quality of available datasets, but methods for automatic image analysis have not kept pace with these advances. On the other hand, knowing about distributions of different species of plankton for example would help us to better understand their lifecycles, interactions with each other or the influence of environmental changes on different species. While machine learning methods have proven useful in solving and automating many image processing tasks, three major challenges currently limit their effectiveness in practice. First, expert-labeled training data is difficult to obtain in practice, requiring high time investment whenever the marine species, imaging technology or environmental conditions change. Second, overconfidence in learned models often prevents efficient allocation of human time. Third, human experts can exhibit considerable disagreement in categorizing images, resulting in noisy labels for training. To overcome these obstacles, we combine recent developments in self-supervised feature learning based with temperature scaling and divergence-based loss functions. We show how these techniques can reduce the required amount of labeled data by ~100-fold, reduce overconfidence, cope with disagreement among experts and improve the efficiency of human-machine interactions. Compared to existing methods, these techniques …

Introduction

MOSAiC (https://mosaic-expedition.org/) has been the largest polar expedition in history. The German icebreaker Polarstern was trapped in the ice from October 2019 to October 2020, and rich data have been collected during the polar year. The M-VRE project (The MOSAiC Virtual Research Environment, https://mosaic-vre.org/) has the aim to support the analysis and exploitation of the MOSAiC data by providing online software tools for the easy, interdisciplinary and efficient exploration and visualization of the data. One service provided by M-VRE is webODV, the online version of the Ocean Data View Software (ODV, https://odv.awi.de/).

Setup

The MOSAiC webODV is available via https://mosaic-vre.org/services/ or dirctly at https://mvre.webodv.cloud.awi.de/. Due to a moratorium until the end of 2022, the data can only be accessed by the MOSAiC consortium. From 2023 on, MOSAiC data and thus webODV will be available for the science community and the general public. In the webODV configuration, datasets as well as the ODV software reside and run on a server machine, not on the client computer. The browser client communicates with the server over the Internet using secure websockets. Up to now we provide two webODV services, which are described in the following.

Data Extraction

The Data Extraction service is …

Cloud dynamics are integral to forecasting and monitoring weather and climate processes. Due to a scarcity of high-quality datasets, limited research has been done to realistically model clouds. This proposal applies state-of-the art machine-learning techniques to address this shortage,using a real-life dataset,CloudCast.

Potential techniques, such as RNNs and CNNs paired with data augmentations are explored.  Preliminary results show promise for the task of supervised video frame interpolation and video prediction. High performance is achieved with a supervised approach.

These video techniques demonstrate a potential to lower the cost for satellite capture, restoration, and calibration of errors in remote sensing data. Future work is proposed to develop more robust video predictions on this and other similar datasets. With these additions, climate scientists and other practitioners could successfully work at a higher frequency.

Environmental simulations in high spatio-temporal resolution consisting of large-scale dynamic systems are compute-intensive, thus usually demand parallelization of the simulations as well as high performance computing (HPC) resources. Furthermore, the parallelization of existing sequential simulations involves potentially a large configuration overhead and requires advanced programming expertise of domain scientists. On the other hand, despite the availability of modern powerful computing technologies, and under the perspective of saving energy, there is a need to address the issues such as complexity and scale reduction of large-scale systems’ simulations. In order to tackle these issues, we propose two approaches: 1. Approximation of simulations by model order reduction and unsupervised machine learning methods, and 2. Approximation of simulations by supervised machine learning methods.

In the first method, we approximate large-scale and high-resolution environmental simulations and reduce their computational complexity by employing model order reduction techniques and unsupervised machine learning algorithms. In detail, we cluster functionally similar model units to reduce model redundancies, particularly similarities in the functionality of simulation model units and computation complexity. The underlying principle is that the simulation dynamics depend on model units’ static properties, current state, and external forcing. Based on this, we assume that similar model units’ settings lead …

Thousands of ocean temperature and salinity measurements are collected every day around the world. Controlling the quality of this data is a human resource-intensive task because the control procedures still produce many false alarms only detected by a human expert. Indeed, quality control (QC) procedures have not yet benefited from the recent development of efficient machine learning methods to predict simple targets from complex multi-dimensional features. With increasing amounts of big data, algorithmic help is urgently needed, where artificial intelligence (AI) could play a dominant role. Developments in data mining and machine learning in automatic oceanographic data quality control need to be revolutionized. Such techniques provide a convenient framework to improve automatic QC by using supervised learning to reduce the discrepancy with the human expert evaluation.

This scientific work proposes a comparative analysis of machine learning classification algorithms for ocean data quality control to detect the suspect temperature gradient error. The objective of this work is to obtain a very effective QC classification method from ocean data using a representative set of supervised machine learning algorithms. The work to be presented consists of the second step of our overall system, in which the first is based on a deep convolutional …

In weather and climate models, physical processes that can’t be explicitly resolved are often parameterized. Among them is cloud microphysics that often works in tandem with the convective parameterization to control the formation of clouds and rain.

Existing parameterization schemes available for cloud microphysics suffer from an accuracy/speed trade-off. The most accurate schemes based on Lagrangian droplet methods are computationally expensive and are only used for research and development. On the other hand, more widely used approaches such as bulk moment schemes simplify the particle size distributions into the total mass and number density of cloud and rain droplets. While these approximations are fairly realistic in many cases, they struggle to represent more complex microphysical scenarios.

We develop a machine learning based parameterization to emulate the warm rain formation process in the Super droplet scheme (a type of Lagrangian scheme) in a dimensionless control volume. We show that the ML based emulator matches the Lagrangian simulations better than the bulk moment schemes, especially in the cases of very skewed droplet distributions. Compared to previous attempts in emulating warm rain, our ML model shows a better performance. The ML model inference runs fast thereby reducing the computational time otherwise needed for …

The collaborative project “MuSSeL” investigates various natural and anthropogenic changes, such as climate change, increase of fishing and the development of offshore wind farms, and the effect these changes have on the biodiversity and well-being of benthic communities in the North Sea. A central part of this project is to make any data gathered, easily accessible to stakeholders and the general public alike. To streamline this process it was decided to use ESRI software solutions, for data management and public outreach. The live demo will demonstrate how data can be visualised, analysied and made available on the project website, all using ESRI solutions.

The River Plume Workflow is a part of the Digital Earth Flood event explorer (FEE), which was designed to compile different aspects of riverine flood events.

The focus of the River Plume Workflow is the impact of riverine flood events on the marine environment, when, at the end of a flood event chain, an unusual amount of nutrients and pollutants is washed into the coastal waters. The River Plume Workflow provides scientists with tools to detect river plumes in marine data during or after an extreme event and to investigate their spatio-temporal extent, their propagation and impact. This is achieved through the combination of in-situ data from autonomous measuring devices, drift model data produced especially for the observational data and satellite data of the observed area. In the North Sea, we use measurements from the FerryBox mounted on the Büsum-Helgoland ferry to obtain regular in-situ data and offer model trajectories from drift simulations around the time of extreme events in the Elbe River.

The River Plume Workflow helps scientists identify river plume candidates either manually within a visual interface or through an automatic anomaly detection algorithm, using Gaussian regression. Combining the observational data with model trajectories that show the position …

Introduction

Existing marine technology introduces the capability for every marine vehicle to integrate in a timely manner a large collection of global ship positioning data along with instructed higher safety and reduced emission intensity routes. New AI-assisted navigational devices could quickly integrate unsupervised routing predictions based on assimilated and forecasted global ocean and weather.

Description of goals

Here we design a route optimization algorithm for taking advantage of current predictions from ocean circulation models. We develop validation scenarios for a marine vehicle and show how it can propagate in a safer environment. Our optimization is designed to fulfill emission goals by achieving a lower fuel use.

Results and Achievements

The weather data from the European Observational Marine Copernicus Center allows leveraging both satellite observations and high resolution wind, wave and current predictions at any position in the global ocean. We propose a low fuel consumption, carbon emission ship routing optimization that employs these real time high resolution data in a stochastic optimization algorithm.

The proposed optimization method is based on local continuous random modifications subsequently applied to an initial shortest-distance route between two points. It is parallelised using a genetic approach. The model is validated using both vessel noon reports …

Over the last few years, a very active field of research has aimed at exploring new data-driven and learning-based methodologies to propose computationally efficient strategies able to benefit from the large amount of observational remote sensing and numerical simulations for the reconstruction, interpolation and prediction of high-resolution derived products of geophysical fields. These approaches now reach state-of-the performance for the reconstruction of satellite-derived geophysical fields. In this context, deep emulators emerge as new means to bridge model-driven and learning-based frameworks. Here, we focus on deep emulators for reconstruction and data assimilation issues, and more specifically on 4DVarNet schemes. These schemes bridge variational data assimilation formulation and deep learning schemes to learn 4DVar models and solvers from data. Here, we present an application of 4dVarNet schemes to the reconstruction of sea surface dynamics. More specifically, we aim at learning deep emulators for the interpolation from altimetry data. Similarly to a classic optimal interpolation, we leverage a minimization-based strategy but we benefit from the modeling flexibility of deep learning framework to embed non-linear and multi-scale priors and learn jointly a gradient-based solver for the underlying variational cost. Overall, the proposed 4dVarNet scheme defines an end-to-end neural architecture which use irregularly-sampled altimetry data …

The European Space Agency (ESA) launched the AI4EO initiative to bridge the gap between the artificial intelligence (AI) and the Earth observation (EO) communities [1]. In the AI4FoodSecurity challenge [2], the goal is to identify crops in agricultural fields using time series remote sensing data from different satellites. In the first challenge track, predictions were made for a region in South Africa, including a spatial domain shift. In the second challenge track, predictions were made for a region in Germany (Brandenburg), including a spatio-temporal domain shift.

We here present our contribution to the AI4FoodSecurity challenge. As data sources we selected both radar wavelength images from the Sentinel-1 satellite, as well as visual and infrared images from the Planet Fusion Monitoring satellites. We implemented a Pixel-Set Encoder with Lightweight Temporal Attention (PseLTae) [3]. Samples are constructed by randomly selecting pixels from a given agricultural field. We train separate encoders for each data source. Attention heads are used to extract characteristic changes of the distinct crop types throughout the growing season. At the decoder stage, both sources are combined to yield a prediction for the crop type. We used data augmentation by oversampling the agricultural fields, as well as cross validation.

The …

Forest mortality is a complex phenomenon because of the interaction of multiple drivers over
a long period. Understanding these interactions and their relevant time scale is important for
forest management. Unlike climate data which are continuous (daily or hourly resolutions),
forest mortality related observations are discrete (e.g. the number of trees, mortality fraction)
with lower frequencies (once/twice a year). Forests also have persistent memory with a large
buffering capacity. All the above-mentioned reasons make the analysis of forest mortality
difficult with the conventional tools. Deep learning is well suited for modelling multivariate
time series with persistent non-linear interactions. In this study, we generate 200,000 years of
hourly climate data using a weather generator (AWE-GEN). We aggregate the hourly data to
daily values and feed it to a process based forest model (FORMIND). The forest model gives
us mortality fractions per year, in line with the forest mortality related observations. For the
method development phase, we use these simulated data. First, we use a variationalautoencoder
to extract climatic features and use that for the prediction of forest mortality. In
the second stage, we do the prediction of forest mortality and feature extraction together and
illustrate the difference between the extracted features. …

Extreme impacts can be caused by the compounding effects of multiple drivers, such as weather events that might not individually be considered extreme. An example of this is the phenomenon of ‘false spring’, where a combination of a warm late winter or early spring, followed by a frost once the plants have entered a vulnerable stage of development, results in severe crop damage. The relationships between growing-season climate conditions and end-of-season crop yield are complex and nonlinear, and improving our understanding of such interactions could aid in narrowing the uncertainty in estimates of climate risk to food security. Additionally, data-driven methods that are capable of identifying such compounding effects could be useful for the study of other sectoral impacts.

Machine learning is an option for capturing such complex and nonlinear relationships for yield prediction. In order to extract these relationships, explainable or interpretable machine learning has been identified as a potential tool. However, the usefulness of those extracted interpretations is dependent on the assumption that the model has learned the expected relationships. One prerequisite for this assumption is that the model has sufficient predictive skill. The method chosen for measuring model performance is therefore an important methodological decision, but as …