Syncpal: a simple and iterative reconciliation algorithm for file synchronizers
- Syncpal: Ein schlichter, iterativer Abgleichungsalgorithmus für File Synchronizer
Shekow, Marius Alwin; Jarke, Matthias (Thesis advisor); Prinz, Wolfgang (Thesis advisor); Gross, Tom (Thesis advisor)
Aachen (2019, 2020)
Dissertation / PhD Thesis
Dissertation, RWTH Aachen University, 2019
Abstract
File synchronizers are tools with the goal to facilitate collaboration scenarios and data management across multiple devices. They replicate the file system, e.g. from a cloud storage to a device disk, achieving convergence by only transmitting detected changes. A popular variant available in a plethora of widely adopted products are state-based file synchronizers such as Dropbox. They detect operations by computing the difference between a previously persisted state and the respective current state, performing a bi-directional synchronization of two replicas. While users rely on synchronization to run without errors, bugs in industrial synchronizers make this difficult. Users often have to detect and fix synchronization errors themselves, and some errors remain undetected for a long time. This results in cost-intensive iterations in cooperation processes, which should be avoided in academia and industry. This work identifies three core challenges of state-based file synchronization. The first challenge is the heterogeneity of different file systems, which requires the file synchronizer to detect and handle incompatible capabilities. Second, a synchronizer needs to detect and resolve conflicting operations that result from a group of users working on their replica in isolation. Third, non-conflicting operations computed from state differencing are not immediately suitable for propagation. The operation order is not available and operations may be affected by consolidation, such that important intermediate operations are missing. This problem most notably affects file systems that support the move operation which changes an object's parent folder. The goal of this work is to design and analyze an algorithm and develop an implementation of a file synchronizer that solves these challenges. To address heterogeneity we analyze existing real-world implementations (such as the NTFS file system) and their compatibility issues, and also examine related academic works. We identify six file system capabilities relevant to file synchronizers, formally define a file system model F that is in large parts compatible to the various existing definitions and suggest several alternatives for handling incompatible differences. To detect conflicts we perform a precondition analysis of the operations of F. Resolving conflicts is an open problem where the right approach depends on the context. Our related work analysis finds that most academic works present arbitrary resolution methods that lack a rationale for their decisions. To determine a reflected conflict resolution approach we design a four-step framework which starts with an informal definition of consistency properties and iteratively refines it to a set of formal and detailed steps for resolving concrete conflicts. Apart from F and a conflict resolution approach the main contribution of this work is Syncpal, an iterative algorithm that reconciles two divergent file systems, solving all of the above challenges. It first handles conflicts, one at a time, such that resolving one conflict does not negatively affect others. Whenever possible, conflicts are avoided. It then finds a valid propagation order for the remaining non-conflicting operations, breaking cyclic dependencies if necessary. The iterative nature of Syncpal reduces the overall complexity and the probability of bugs. The technical evaluation of our implementation of Syncpal includes its complexity analysis, automated testing and a comparison with five industrial-grade file synchronizers. We find that our algorithm improves the handling of file system heterogeneity and synchronizes changes from long offline periods correctly, where other implementations fail and may even cause data loss. Our implementation has been in operation by 30 users over a period of over 18 months, providing valuable insights for further research regarding usage patterns and practical requirements.
Institutions
- Department of Computer Science [120000]
- Chair of Computer Science 5 (Information Systems and Databases) [121810]
Identifier
- DOI: 10.18154/RWTH-2019-11267
- RWTH PUBLICATIONS: RWTH-2019-11267