Issue with SQL Server Always on Availability

Hi, I am hoping someone will be able to help me.

For the second time in as many weeks the AOA cluster has totally failed. Within 5 seconds, all the nodes lose each other, cluster isn't quorate and shuts down. 7 - 8mins later everything comes back up. last week I am pretty sure it was because one of our server admins was doing work on DNS servers which required a reboot, and there appeared to be event logs that supported this. This week I cannot find any DNS related issues and the error exactly the same as last week is:

Cluster node {same on all nodes} was removed from the active failover cluster membership. The Cluster service on this node may have stopped. This could also be due to the node having lost communication with other active nodes in the failover cluster. Run the Validate a Configuration wizard to check your network configuration. If the condition persists, check for hardware or software errors related to the network adapters on this node. Also check for failures in any other network components to which the node is connected such as hubs, switches, or bridges.

And:

The Cluster service is shutting down because quorum was lost. This could be due to the loss of network connectivity between some or all nodes in the cluster, or a failover of the witness disk.

Run the Validate a Configuration wizard to check your network configuration. If the condition persists, check for hardware or software errors related to the network adapter. Also check for failures in any other network components to which the node is connected such as hubs, switches, or bridges.

I have ran the cluster validation wizard for the network section, no issues other than HostRecordTTL, RegisterAllProvidersIP, two adapters from 4 that are not connected and the fact it looks like there is only one network adapter, but in fact it is a teamed adapter.

Any help would be greatly appreciated as im not sure if it is a cluster / AOA config issue or network / DNS issue.

Thanks,

Steven.

February 19th, 2014 9:49am

Steven,

This is not an issue with AlwaysOn Availability Groups. Your WSFC is failing and the cluster is shutting down thus taking your AG's offline. What WSFC configuration do you have ? Ie how many nodes. What is the quorum model that is configured. Is your WSFC configure on physical or virtual servers ?

For further investigation I would be getting your Server\network admins to be looking into the comms but definitely the DNS and the cluster being able to communicate with your AD.

Thanks

Free Windows Admin Tool Kit Click here and download it now
February 19th, 2014 6:26pm

Hi Warick,

thanks for your reply. It turns out that it was network infrastructure problem, but also an issue with the WSFC quorum - server admins hadn't done the FSW at they had promised. Both being resolved today.

Thanks,

Steven.

February 20th, 2014 6:03am

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics