Full Stack Development - by Ryan Kirkman

Simple Relational Database Sync

This article describes a system that facilitates simple synchronization (sync) between databases. The system is designed for simple sync between a heterogeneous collection of databases in which it is difficult to implement a prepackaged replication solution. To add to the simplicity of the system, the sync system is stateless.

The sync system was originally designed for relatively low transaction slave databases and an aggregation master database. An example would be syncing user preferences, browsing history and bookmarks between computers in Google Chrome.

Architecture

The system is designed to sync individual tables, rather than entire databases at a time. This allows simple, flexible control over what is synced.

The architecture required for this system is:

Common Table Structure

Each table in the database to be synced has a SequenceId column of integer type and an IsDeleted column of boolean type.

SequenceId is not a timestamp, but a greater SequenceId indicates a record has been modified more recently. It is analogous to the rowversion data type in SQL Server. Every time a row is Inserted, Updated or logically Deleted (via the IsDeleted column), SequenceId is set to:

MAX(SequenceId) + 1

This allows the client to pick up all changes to a table the next time it requests a table sync by passing in the most recent SequenceId it knows about.

The IsDeleted column allows logical deletion of rows. This is a core requirement of the sync strategy. It allows us to sync Deletes which would otherwise be lost if we simply deleted the row. Logical Deletes provide a number of other benefits of their own accord:

General Overview of the Sync Strategy

For sending changes from slaves -> master

For pulling changes from master -> slave

The following query is essentially the core of the sync strategy. It is best encapsulated in a "Master" HTTP sync service for maximum flexibility and modularization:

SELECT *
FROM [Table]
WHERE SequenceId > [Current_Slave_SequenceId]

Syncing Changes to Slave Clients

The slave client will receive a collection of rows in whatever format has been requested from the master HTTP sync service.

Because we don't know which of the rows in the new collection the slave already has, we have a few choices:

An example of how deletes could be dealt with on a client if storage capacity was a concern:

Potential Performance Optimizations

Because a service sits between the slaves and the master database, high traffic tables could be moved onto their own shard without changing the interface of the HTTP sync service. These changes would be transparent to slave clients.

comments powered by Disqus