I was recently asked by a client to set up an easy way to monitor Replication Latency in SQL Server. The client wanted to be notified whenever data took more than 10 minutes to reach its destination. SQL Server does provide some built-in methods to help with this, namely Tracer Tokens and Replication Monitor. However, it is widely known that these methods are not always accurate and come with performance overhead. A simpler and accurate approach is to use Canary tables, which act as a simple check to ensure changes on the Publisher are being replicated to the Subscriber in a reasonable amount of time.

The term “canary” is derived from the use of these birds in coal mines to detect toxic gases. Canaries have higher sensitivity to gases than humans, so their changes in behavior would alert the miners when gas levels became too high and unsafe. Applying this concept to Replication, the Canary tables act as an early warning system, focused on verifying that data is being copied efficiently between two servers. This process involves creating a table with a regularly updated timestamp, replicating that table, then checking the values in both the Publisher table and the Subscriber table to determine whether there is a lag. Follow the steps below to set up this simple and effective method.

1) Create a table in the Publisher database.

CREATE TABLE [dbo].[Canary_PubName]
  	([PubTime] DATETIME NOT NULL
  	PRIMARY KEY CLUSTERED
    	(PubTime] ASC ON [PRIMARY]

2) Create a SQL Agent job on the Publisher to update the table at regular intervals. The job should run every minute and execute the following statement.

UPDATE dbo.Canary_PubName
 SET PubTime = GETDATE()

3) Add the new Canary table article to your Publication. This will ensure the table exists on both the Publisher and the Subscriber, and the table’s contents are being Replicated. If you do not want to generate a new Snapshot at this time, follow these instructions to get the article added without generating a new Snapshot.

Read more on: Adding an Article to Replication Without a Full Snapshot

4) Create a SQL Agent job on the Subscriber that compares the value in the Canary_PubName table on the Subscriber with the current date and time. If Replication is working correctly, these values should always be within seconds of each other. Include in the job a command to send an email if these values are more than 10 minutes apart (or whatever time interval you choose). Schedule this job to run every minute.

DECLARE @pubTime DATETIME
    	,@minBehind INT
    	,@body NVARCHAR(300)
  	  ,@recipients NVARCHAR(200)

 SELECT @recipients = 'email1, email2'

 SELECT @pubTime = PubTime
 FROM [DatabaseName].[dbo].[Canary_PubName]

 SET @minBehind = DATEDIFF(mi,@pubTime,GETDATE())

 SELECT @body = 'Replication is behind in publication ' + 'PublicationName' + '  by ' + ISNULL(CONVERT(NVARCHAR(32),@minBehind),'10') + ' minutes.'

 IF (@minBehind > 10)
 BEGIN
 -- Email only every 5  min when failure happens to prevent excessive emails
  	IF (@minBehind % 10) = 0
  	BEGIN
     	EXEC msdb.dbo.sp_send_dbmail
     	@profile_name='ProfileName',
     	@body = @body,
     	@recipients = @recipients,
     	@subject = 'Replication over 10 min behind'
   	END
 END

Repeat these steps for every Publication in your environment. You do not need to create new SQL Agent Jobs for each Publication, you can just add a step for each publication to the existing jobs you just created above.

Replication can be a critical process in your environment, so it is important to know when data is not flowing properly. Using a simple monitoring method, such as Canary tables, will ensure you are notified early, thereby helping to prevent a possible large-scale issue.

For more topics, check out our other SQL Server blogs.

Contact us for any questions.