Most projects of this CRC need to handle considerable amounts of data, perform experiments with that data, build models and software artifacts, and eventually produce experimental results. Handling these very diverse, possibly large datasets and models is a challenge in itself. We believe that considerable synergies exist among the different projects in terms of data and information management. Therefore, to avoid that each subproject (re-)invents its own data management solution, we believe that it makes sense to provide a unified platform for data and information management.
The main purpose of this project is to provide a unified framework and infrastructure for all data, meta-data, models, and software artifacts used and created in this CRC. This serves multiple goals. It allows for long-term preservation and versioning of experimental data (raw, preprocessed, results; where required anonymized), models, software (if necessary virtualized). It lowers the individual effort of subprojects for data curation, conversion, and (pre-)processing. It fosters the coordination of modeling efforts, through common storage of documents and models, and through meta-data annotation and querying facilities. It provides reusable workflows for data processing (e.g. using Apache Flink). It provides a common infrastructure for benchmarking, and enables reproducibility of experimental results (where applicable). We will provide a publicly available Web-front to allow other research groups, external to this CRC, to get (partial) access to the repository, subject to privacy ensurance where relevant. We will foster the three task forces, providing the repositories storing and organizing their shared technical artifacts: documentation, papers, literature; usability design and testing tools; computer-processable models (data models, analytic/predictive models, privacy properties, …) as well as their interrelations (abstraction hierarchies, concept-wise matchings, purpose and scope, . . . ). We will also provide technical support for coordinating the task forces.
This project is looking for an administrator to realize the technical infrastructure. The applicant should have at least a Bachelor’s degree in Computer Science and a track record in dev-ops engineering. Please apply for this position by sending your short CV to the project’s PI.