Database Replication Link State betweek Parent Site and Child Site shows "Link Failed"

Not sure what this means or how to fix it.   In the Console under the Monitoring\Database Replication section, I see the Icon with a red X and the Link State says "Link Failed.  Under Summary, it has two entries: one says that the Parent Site to Child Site "Link Failed" and the last Global Synchronization Time was 2 weeks ago. and the other says Child Site to Parent Site Global State is "Link Active" and the Last Synchronization Time is "current date/time" Percentage = 100%.

Is this something to worry about?

August 19th, 2015 11:51am

Yes this is something to worry about. Have you already used Replication Link Analyzer?
Free Windows Admin Tool Kit Click here and download it now
August 19th, 2015 12:07pm

How to fix it long term: if you have less than 150k clients, plan on a future migration from your "why did you have a CAS and a hierarchy to being with", to a Standalone primary site--so you don't have to deal with replication anymore. 

Short term:  Theoretically the first thing to try is in your console, Monitoring, Db replication, right-click on link which is "Link Failed." and run through "Replication Link Analyzer".  It will likely find things, and try to fix them--usually through a reinit.  All you can do is try it, and hope.

Another thing to "look" at.  It's nothing but looking--you can't affect anything by doing this.  In SQL Management studio, connect to the two servers (both of them) on both sides of the link that is failed.

on each CM_xxx database, run   Exec spdiagdrs

In the 4th or 5th pane from those results, you'll see *exactly* whith replication group is failed, and 'lastsynctime' for that group (on that server).  "In general", if the last synctime is within the last 30-60 minutes, for us, that's fine--EVEN IF it says degraded (or even failed).  the difference between degraded and failed is actually a setting.  I forget the default; but if you have a huge obnoxious site with bad links; your synctimes might go longer than the defaults and even if replication is actually working--just delayed--it might trigger a degraded or failed status.

The other thing I look at to see "is there an issue", is in the results of exec spdiagdrs, I look at the 3rd pane, "IncomingMessagesInQueue", and/or "OutgoingMessagesInQueue".  On a busy site, there will always be a count in there--it's rarely 0.  But if it's 100,000 or more AND it's not just been a hotfix release (in which case that might be normal), and it just keeps growing and NOT shrinking (just re-run exec spdiagdrs to watch that count)--there might be some process that, in SQL, is blocking messages being processed.  If you aren't an awesome guru at diagnosing SQL--the easiest thing would be to simply reboot your servers.  If whatever-it-was is some chronic issue that will just re-occur 15 minutes after reboot it won't help--but it also just might kick loose whatever was blocking replication.  But if it doesn't help, and you have to open a call w/Microsoft--at least when their first suggestion is "have you tried rebooting"--you can say yes.

August 19th, 2015 12:20pm

thank you for the info. I will give it a try and let you know if this worked.

Free Windows Admin Tool Kit Click here and download it now
August 19th, 2015 3:02pm

Running the Replication Link Analyzer from the CAS server. It says Replication initialization is in progress for site (Primary) from site (CAS).  It also says to check the rcmctrl.log file which I have opened up using the "Trace Log Tool".  Let me know what I am suppose to see if this is working properly

thanks

August 20th, 2015 1:15pm

looks like it is working now after running the Replication Link Analyzer from the CAS server a couple of times.  the first two times and stated that it had link errors.  the third time gave me all green checks.  Not sure what happen but link status now reads - Link is active.

thanks Sherry for your assistance.

 
Free Windows Admin Tool Kit Click here and download it now
August 20th, 2015 4:45pm

spoke too soon. looks like the issue (Link has failed) came back.  Ran the Replication Link Analyzer again from the Primary site and message states:  Replication link analyzer detected the link failed on site "CAS database".

I would like to run the "Exec spdiagdrs" but don't know how.  I am not a SQL guy.  I have also rebooted both the CAS and Primary servers as well.

thanks

 
August 20th, 2015 5:30pm

Yes, I know this is an old post, but Im trying to clean them up. Did you solve this problem, if so what was the solution?

Since no one has answer this post, I recommend opening  a support case with Microsoft Customer Support Services (CSS) as they can work with you to solve this problem.

Free Windows Admin Tool Kit Click here and download it now
August 29th, 2015 1:08pm

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics