Rsync synchronizes two files, bringing an old version of a file on the target up to date with a new version of a file on the source. Rsync
sends as few bytes as possible by detecting common sections of the two
versions and using the common data in the old version when building the new
version. To detect the common sections, the target generates a weak and
strong checksum for blocks in the target file. The target transfers the
checksums
to the source. On the source, rsync stores the weak checksums in a hash
table. The checksums take approximately 100 times less space and bandwidth
than the file. The source file is scanned, calculating the weak checksum
at each offset. Rsync probes the hash table with each checksum. Upon
finding a matching checksum, a strong checksum is computed and compared.
Matching
strong checksums indicate matching data with high probability.
Rsync encodes matched data as a COPY command.
The COPY
encodes an action on the target that duplicates data from the
reference file in the updated file.
The algorithm encodes unmatched data as an ADD command, which
includes the data to be added.
The target receives encodings from the
source. Encodings describe the new file sequentially (from first byte to
last) so that the target can reconstruct the new version in a single pass.
The use of a weak and strong checksum saves computation by allowing
the source to generate and compute a strong checksum only when the weak
checksum already matches. Also, the chosen weak checksum saves computation
by rolling from offset to offset
without recomputing the checksum
based on all
bytes [7].
Rolling checksums observe that strings of length
at offsets
and
differ in only two bytes - the first byte of the
string starting at
and the last byte of the string starting
at
. The algorithm computes a rolling checksum for offset
by subtracting the contribution of the first byte
of the checksum at offset
and adding the contribution of the
last byte of checksum at offset
.