Failed to failover SQL instance from one node to another
Hi,
I am experiencing an issue with SQL Server 2012 clustering. Briefly, when I move SQL instance from node SQ1 to node SQ2 to process fails.
At the beginning of the process, the cluster try to change to owner to SQ2 but somehow it fails back automatically to SQ1 and the following errors are thrown in cluster event viewer :
Event ID:1205
The Cluster service failed to bring clustered role 'SQL Server (MSSQLSERVER)' completely online or offline. One or more resources may be in a failed state. This may impact the availability of the clustered role.
Event ID:1069
Cluster resource 'SQL Server' of type 'SQL Server' in clustered role 'SQL Server (MSSQLSERVER)' failed.
Based on the failure policies for the resource and role, the cluster service may try to bring the resource online on this node or move the group to another node of the cluster and then restart it. Check the resource and group state using Failover Cluster
Manager or the Get-ClusterResource Windows PowerShell cmdlet.
Any idea can lead to resolve this issue
Thanks,
TF
-
Edited by
TarekF
20 hours 21 minutes ago
correction
September 12th, 2015 6:17am
Can you share the entries of the cluster log from the two nodes? And also look into SQL Errorlog please and see if anything is recorded there.
September 12th, 2015 8:10am
Here are the logs
https://www.dropbox.com/s/xy4mfs6zbjaaydt/clusterlogs.zip?dl=0
thanks for your help
September 12th, 2015 10:03am
Here is what I see in the cluster log on Node 2:
00000c84.0000101c::2015/09/12-09:18:30.982 ERR [RES] SQL Server <SQL Server>: [sqsrvres] Failed to start service with error 1069. Please try again
Error 1069 is normally a logon failure... Is it possible that the password of the service account was changed after installation? (Or maybe has been mistyped during setup?)
September 12th, 2015 10:08am
No Not possible it was set to never expire and the same user is used on Node1
Also SQL service is running on Node2
-
Edited by
TarekF
16 hours 45 minutes ago
September 12th, 2015 10:21am
I would agree with PrinceLucifer
C:\Users\system>net helpmsg 1069
The service did not start due to a logon failure.
We are not saying password is expired, it looks like its changed OR its incorrectly entered on Node2
September 12th, 2015 10:34am
I tried to restart SQL service on node 2 and it working fine with the password already set.
If it was entered incorrectly I will be able to restart sql/agent services. Am I right ?
I can confirm we haven't changed SQL service password.
-
Edited by
TarekF
15 hours 36 minutes ago
September 12th, 2015 11:28am
And does the failover work now or not?
September 12th, 2015 11:29am
No not working.
I tired to failover to node2, it fails back automatically to node1 and generate the errors ID:1205 and 1069.
Any suggestion this is really weird. It was working fine
-
Edited by
TarekF
15 hours 33 minutes ago
September 12th, 2015 11:34am
Does the system start writing a SQL Errorlog? Or is there any other service in the failover cluster group? 1069 is a very clear error message... There is little wiggle room when this one pops up...
September 12th, 2015 11:53am
One more thing: 1069 should lead to two event log entries: One in the System Event log of the SQL Server and another one in the Security Event Log on the Domain Controller. Can you please check if you find any of these entries at 09:18:30 UTC today? The
Domain controller was DC1.xchange.local.
September 12th, 2015 1:40pm