Merge Replication from SQL2014 publisher to SQL2008R2 subscriber

My company replaced an aging SQL2008R2 server with a new SQL2014 server. The server, both in its old and new forms, acted as both distributor and publisher in the replication of about 40 merge publications, 100-150 articles per publication. The publications are scheduled to synchronize nightly to a subscriber SQL2008R2 server (on our LAN) and their start times are staggered to reduce server load. All subscriptions are pull subscriptions.

We recently did an update that affected every record of one of the highest row count articles in each publication. In some publications this approaches hundreds of millions of rows. Typically this article synchronizes changes at a speed of about 4000 rows per second. When attempting to synchronize these large changes from our new server the speed of dropped as low as 25 rows per second, even when we ensured that only one synchronization was running at a time and there was low activity on the publisher and no other processing on the subscriber. However updates affecting lower volumes of records also continue to synchronize at about 4000 rows per second. As I've continued to test the cutoff point is between 20 million and 28 million records (20 million updates normally, and 28 million is slow).

Because we still have the old 2008R2 server on our network we have been able to test replicating large changes from that server as well, but we do not see the same issue. We have been able to verify that the 2014 server's publication and article settings exactly match those of the 2008R2 server it replaced. We have also checked to ensure that the Agent Profile settings for each replication agent are the same between old and new servers.

While we recognize that such a high update volume is not an ideal scenario for merge replication, we do not see the same behavior when replicating from 2008R2 to 2008R2 machines, only when replicating from 2014 to 2008R2. Therefore we are wondering if this issue should be reported as a bug? If that is the case, how would one go about doing that?




  • Edited by GSSDaring Monday, March 30, 2015 4:38 PM update problem detail in light of further testing
March 26th, 2015 6:13pm

Hi GSSDaring,

According to your description, it is slow to synchronize large changes from SQL Server 2014 to SQL Server 2008 R2. You can perform the following steps to enhance the replication performance.

1. Monitor Merge replication performance with Replication Monitor or other tools to check the bottleneck of such issue.

2. Set the minimum and maximum amount of memory allocated to new SQL Server 2014 Database Engine properly. And consider adding memory to servers used in replication.

3. Make changes of the following settings which affect Merge performance.
   -DownloadGenerationsPerBatch
   ParallelUploadDownload 
    -SrcThreads 
    -DestThreads
   -MakeGenerationInterval
   -DownloadGenerationsPerBatch
   -UploadGenerationsPerBatch

Reference:
SQL Server Merge Replication Best Practices
Enhance General Replication Performance



Thanks,
Lydia

Free Windows Admin Tool Kit Click here and download it now
March 27th, 2015 8:53am

You will find it faster to reinitialize for such a large volume of changes.
March 30th, 2015 2:11pm

Thank you for taking the time to comment, Lydia.

I have reviewed all of the suggestions in your comment and in the articles you listed. Although I can't guarantee I have the optimal settings for the Merge performance settings you enumerated, I can assure you that they are at the same values we used when replicating from the old 2008R2 server. The new 2014 server has 3x the number of cores, 4x the amount of memory assigned to the DB engine, and an equivalent I/O capability as the server it replaced. Meanwhile the subscriber server is unchanged and has 28GB of RAM devoted to the DB engine, which should be enough for the entire update many times over.

In monitoring the performance of merge synchronization I have noticed what seems to me to be a higher number of locks at the subscriber when replicating from the new server.

Thanks again,

Joel

Free Windows Admin Tool Kit Click here and download it now
March 30th, 2015 4:19pm

Thank you Hilary,

I've already come across your name maybe a dozen times in researching this issue, so I appreciate very much that you would take the time to write.

I very much agree that a reinitialize would be faster for this sort of change. We are updating our procedures for making large changes like this to account for that fact (although the fact that the table allows bidirectional changes makes reinitializations a bit trickier).

However, the fact remains that we are able to merge this type of change with our old 2008R2 publisher in a reasonable (although lengthy) amount of time, and with our newer, far more powerful 2014 server, we can't (we calculated that our largest table would update in just under 93 days versus the 14 hours it took from the old server).

I think anyone who's ever invested in new technology, only to discover a "gotcha" of this nature, can empathize with our frustration.

Thanks again though,

Joel

March 30th, 2015 4:33pm

In that case, while reinitialization might be a quick fix and might get you out of the problem, there is likely something else causing it.

Have you updated statistics, looked at waitstats to try to find out where SQL Server is spending all its time. The missing indexes DMV might make helpful indexing suggestions. I have not run into anything in SQL 2014 which might be causing this issue.

When I encounter these issues I normally see if I can't figure out what the problem articles are and see which ones are generating all the merge replication traffic. You might see that there is some trigger causing more than the normal amount of merge activity.

Free Windows Admin Tool Kit Click here and download it now
March 30th, 2015 4:40pm

Thanks Hilary,

The missing indexes DMV does suggest an index on [rowguid] with all the other columns in the table included. It seems like if the merge synchronization really needed that index you would expect the index on [rowguid] that is generated as part of the merge ([MSmerge_index_...]) to be designed so that it would fulfill that function. I will test with that index present to see if it solves the issue.

Through testing I've discovered that I can reproduce this behavior in a publication with only the one article. The published table has no triggers (other than those added by replication), contains only a primary key and a simple non-clustered index. Its only foreign key constraint is marked "not for replication" (it is a leaf-level child in its relationship). It has no partitions and uses no filters. Other than being merge replicated, it really couldn't be a more ordinary relational database table.

I've been testing with verbose logging, and although there are differences in the ordering of logged items, I don't see a profound difference in the actual processes being logged between the smaller/faster and larger/slower merges. One change that is noticeable is that there are there are a number of items indicating calls to sp_MSenumchangesdirect on the publisher that are executing very quickly (in milliseconds) at the start of the smaller/faster merge. Other than that, one is just vastly slower than the other.


March 31st, 2015 10:25pm

After working on a support incident with Microsoft over the course of a couple of weeks they identified that the Merge Publication's setting for 'generation_leveling_threshold' was zero. This setting was recommended at one point as a workaround for a since-resolved replication bug. Setting 'generation_leveling_threshold' to a non-zero value (we chose 100K given the fairly large row counts) allowed the SQL2014 server to merge synchronize at speeds equaling or exceeding our previous server.

Thank you for your comments and suggestions. I hope our solution will be of help to someone else experiencing this issue.


  • Marked as answer by GSSDaring 5 hours 31 minutes ago
  • Edited by GSSDaring 5 hours 30 minutes ago
Free Windows Admin Tool Kit Click here and download it now
April 23rd, 2015 9:37pm

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics