Designing a data access and integration workflow for medical data science: a use case of compiling a reusable data set for primary tumor discovery at MeDIC Cologne

Gehrmann, Julia; Decker, Stefan Josef (Thesis advisor); Beyan, Oya Deniz (Thesis advisor)

Aachen : RWTH Aachen University (2022, 2023)
Master Thesis

Masterarbeit, RWTH Aachen University, 2022


Recent advances in Machine Learning (ML) and Artificial Intelligence (AI) also led to considerable progress in Medical Data Science (MDS). They enable a better understanding of diseases and treatment responses as well as applying personalized therapy. This results in an improved prognosis for many patients. Often MDS projects need data from several medical data systems to get enough features per patient for well-founded decision-making. However, this Data Access and Integration (DAI) is complicated by several issues. For instance, clinical data is often not digitized in a structured manner and generally shows a high degree of heterogeneity. Additional challenges include the strict data protection rules following from the high sensitivity of medical data and the multi-actor architecture of data systems. Thus, Data Integration Centers (DIC) like the Medical Data Integration Center (MeDIC) Cologne are currently established to ease the DAI process for researchers making medical data available for MDS all over Germany. However, these DICs still face technical, legal, ethical, and organizational problems. This thesis aims at supporting MDS teams including DICs to define concrete DAI processes by proposing a DAI workflow for MDS that serves as a basis for discussing concrete DAI project designs as well as optimizing DAI processes at university hospitals. This was done in five steps. Firstly, I performed a literature review on current technical, legal, ethical, and organizational challenges in the DAI process. Secondly, I gained personal DAI experience in a clinical use case of the Center for Integrated Oncology (CIO) Cologne. Thirdly, I used the results of the first two steps to propose a DAI workflow for MDS accompanied by a surrounding DAI framework and a performance estimation guideline. In a fourth step this proposal was assessed by employees of University Hospital Cologne (UHC) in the course of expert interviews. Eventually, I implemented the received feedback into the proposal. The result is a widely applicable DAI workflow for MDS accompanied by a classification of involved roles, system concepts, data security measures and a performance estimation guideline.


  • Department of Computer Science [120000]
  • Chair of Computer Science 5 (Information Systems and Databases) [121810]
  • Chair of Computer Science 5 (Information Systems and Databases) [124510]