An approach for global and local data lifecycle management with provenance and persistent identifiers

  • Ein Ansatz für globales und lokales Datenlebenszyklusmanagement mit Provenienz und persistenten Identifikatoren

Gleim, Lars Christoph; Decker, Stefan Josef (Thesis advisor); Sure-Vetter, York (Thesis advisor)

Aachen : RWTH Aachen University (2023)
Dissertation / PhD Thesis

Dissertation, RWTH Aachen University, 2023


Data lifecycle management is gaining in complexity in times of increasing interorganizational collaboration, agile product development, and growing information sharing throughout the enterprise supply chain. Current data management practices based on data lake and data warehouse systems are struggling to scale to the industrial requirements of tomorrow. Based on the foundations of Linked Data technology and principles and best practices for findable, accessible, interoperable and reusable (FAIR) data management, we describe an approach for global and local data lifecycle management with provenance and persistent identifiers. We refer to this approach as World Wide Data Management (WWDM). Throughout this work, we define key services, principles, and best practices for WWDM and provide a reference architecture and implementation. We present: FactID a novel approach for persistent resource identification and archiving enabling globally-distributed resource persistence. Extended Memento an Hyper Text Transfer Protocol (HTTP) extension for enabling uniform infrastructure-independent data persistence at industrial scale. FactDAG a data integration and interoperability model linking data across system, organizational boundaries, and throughout the data and product lifecycle using provenance links. FactStack an implementation of the FactDAG model based on the principles of Linked Data and FAIR data management. ReShare an approach enabling verifiable accountability for data sharing across organizational boundaries using the novel concept of Digital Transmission Contracts. FactFUSE an approach integrating the WWDM paradigm based on the FactStack system with traditional hierarchical file systems to support the practical usability and adoption of the overall paradigm. Through the combination of these contributions, the suite of data management services at the core of the WWDM paradigm is realized in an interoperable and sustainable fashion, guiding and supporting the data management throughout its lifecycle. As such, the presented results provide a practical foundation to enable interorganizational data management and collaboration with minimal operational overhead, promoting the adoption of FAIR data management principles and best practices throughout the product lifecycle and the enterprise supply chain. Subsequently, the described WWDM solution provides the foundation for next-generation data management practices for agile product development and collaboration in Industry 4.0 and beyond.


  • Department of Computer Science [120000]
  • Chair of Computer Science 5 (Information Systems and Databases) [124510]