This article describes a system that facilitates simple synchronization (sync) between databases. The system is designed for simple sync between a heterogeneous collection of databases in which it is difficult to implement a prepackaged replication solution. To add to the simplicity of the system, the sync system is stateless.
The sync system was originally designed for relatively low transaction slave databases and an aggregation master database. An example would be syncing user preferences, browsing history and bookmarks between computers in Google Chrome.
The system is designed to sync individual tables, rather than entire databases at a time. This allows simple, flexible control over what is synced.
The architecture required for this system is:
Each table in the database to be synced has a
SequenceId column of integer type and an
IsDeleted column of boolean type.
SequenceId is not a timestamp, but a greater SequenceId indicates a record has been modified more recently. It is analogous to the rowversion data type in SQL Server. Every time a row is Inserted, Updated or logically Deleted (via the
IsDeleted column), SequenceId is set to:
MAX(SequenceId) + 1
This allows the client to pick up all changes to a table the next time it requests a table sync by passing in the most recent
SequenceId it knows about.
IsDeleted column allows logical deletion of rows. This is a core requirement of the sync strategy. It allows us to sync Deletes which would otherwise be lost if we simply deleted the row. Logical Deletes provide a number of other benefits of their own accord:
For sending changes from slaves -> master
For pulling changes from master -> slave
The following query is essentially the core of the sync strategy. It is best encapsulated in a "Master" HTTP sync service for maximum flexibility and modularization:
SELECT * FROM [Table] WHERE SequenceId > [Current_Slave_SequenceId]
The slave client will receive a collection of rows in whatever format has been requested from the master HTTP sync service.
Because we don't know which of the rows in the new collection the slave already has, we have a few choices:
An example of how deletes could be dealt with on a client if storage capacity was a concern:
IsDeleted == 1, perform
DELETE FROM [Table] WHERE Id = [Deleted_Row_Id]
Because a service sits between the slaves and the master database, high traffic tables could be moved onto their own shard without changing the interface of the HTTP sync service. These changes would be transparent to slave clients.