About
Introduction
Everyone who uses more than one computer is aware of the data management problem posed by doing so: having multiple copies of files requires synchronization of files to bring all copies up to date after some copies have changed. The simplest and perhaps most widespread method is manual synchronization, in which users remember which files they have changed on which computers and manually copy those files to the other computers.
To do this job automatically we use a tool called file a synchronizer, that reconciles disconnected modifications to a replicated directory structure. Trustworthy synchronizers are difficult to build, since they must deal correctly with both the semantic complexities of file systems and the unpredictable failure modes arising from distributed operation. We present here a detailed specification of a particular file synchronizer called csync.
Data Replication
Traditional replication techniques try to maintain single-copy consistency — the users thinks he has a single, highly available copy of his data.This goal can be achieved in many ways, but the basic concept remains the same: traditional techniques block access to a replica unless it is provably up to date. We call these techniques “pessimistic”. What we want are “optimistic” strategies for replication. What is optimistic replication? Optimistic replication is a technique for sharing data efficiently in wide-area or mobile environments. The key feature that separates optimistic replication algorithms from their pessimistic counterparts is their approach to concurrency control. Pessimistic algorithms synchronously coordinate replicas during accesses and block the other users during an update. In contrast, optimistic algorithms let data be read or written without
a priori synchronization, based on the “optimistic” assumption that problems will occur only rarely, if at all. Updates are propagated in the background. These optimistic strategies were already mentioned 1975 in RFC 677, talking about maintenance problems of duplicated databases in ARPA-like networks.
Design
The most important goal of a file synchronizer is correctness. A synchronizer changes scattered and potentially large parts of the users’ filesystem, which may contain sensitive and valuable information. Moreover, this work is largely unsupervised by the user! This push a synchronizer in a unique position to harm the system. So the synchronizer must ensure a fail-safe behavior in all situations. Doing so requires several different sorts of bulletproofing.
An issue closely related to safety is the treatment of conflicts. A synchronizer tries to propagate non-conflicting changes between replicas, ideally making them equal at the end of the synchronization, but designs differ in what happens when this is not possible. Some insist on consistency — all replicas must be identical after the run, even if this means that some of the changes made by the user to one or another of the replicas must be discarded or overwritten. Others treat the user’s changes as sacred: if a file has changed in an incompatible way on two replicas, then the synchronizer does nothing, without the guidance from the user. This means that the replicas may differ after the synchronization process.
csync is a user-level file synchronizer implemented as a library which describes the behavior of a file synchronizer exactly. It uses the filesystem states (metadata) to detect changes. It implements different reconciler algorithms to give the user the possibility to decide if the changes are sacred or not.
The tasks of the synchronization are thus divided into three conceptually separated phases: update detection, reconciliation and propagation.

Update Detection
Walks over the replica and collects information
Reconciliation
Decides what to do
Propagation
Commit the changes
…
