All material
Live demo of a tool

Community members
ORCID iD icon
ORCID iD icon
ORCID iD icon
ORCID iD icon
ORCID iD icon
ORCID iD icon

New approaches for distributed data analysis with the DASF Messaging Framework

Sommer, Philipp Sebastian1ORCID iD icon , Eggert, D.2ORCID iD icon , Wichert, V.1ORCID iD icon , Baldewein, L.1ORCID iD icon , Dinter, T.3ORCID iD icon , Werner, C.4ORCID iD icon
  1. Helmholtz-Zentrum Hereon
  2. GFZ German Research Centre for Geosciences
  3. Alfred-Wegener-Institut - Helmholtz-Zentrum für Polar- und Meeresforschung
  4. Karlsruhe Institute of Technology

The Data Analytics Software Framework (DASF, https://doi.org/10.5880/GFZ.1.4.2021.004 ) supports scientists to conduct data analysis in distributed IT infrastructures by sharing data analysis tools and data. For this purpose, DASF defines a remote procedure call (RPC) messaging protocol that uses a central message broker instance. Scientists can augment their tools and data with this protocol to share them with others or re-use them in different contexts.

Our framework takes standard python code developed by a scientist, and automatically transforms the functions and classes of the scientists code into an abstract layer. This abstraction, the server stub as it is called in RPC, is connected to the message broker and can be accessed by submitting JSON-formatted data through a websocket in the so-called client stub. Therefore the DASF RPC messaging protocol  in general is language independent, so all languages with Websocket support can be utilized. As a start DASF provides two ready-to-use  language bindings for the messaging protocol, one for Python and one for  the Typescript programming language.

DASF is developed at the GFZ German Research Centre for Geosciences and was funded by the Initiative and Networking Fund of the Helmholtz Association through the Digital Earth project ( https://www.digitalearth-hgf.de/ ). In this talk, we want to present the framework with some simple examples, and present two new approaches for the framework. One is an alternative light-weight message broker based on the python web-framework Django, that supports containerization, user-management and token authentification. The other one is an approach for easily applicable end-to-end-encryption in the messaging framework and user-authentification in the backend module for secure federation of data analysis between research centers.