Hi All,
Failover in my Hyper-V cluster doesn't work when a specific NIC (office), on which DC is active, is disconnected.
My cluster:
- 3 nodes, running Hyper-V 2012r2 Core. (Dell R630)
- Direct Attached Shared Storage to all 3 nodes through MPIO (Dell MD3200)
- 5 networks:
- Office network (this is a 192.168.20.x via this network, the nodes are added to the domain controller)
- Factory 1 network (this is a 192.168.0.x network, no DC, no gateway, no DNS. Factory PLC's use this network)
- Factory 2 network (this is a 192.168.1.x network, again, no DC, no gateway, no DNS, also used for PLC's)
- Migration network (this is a 10.0.0.x network, dedicated switch, no DC, no gateway, no DNS only used for migration.)
- Cluster Heartbeat network ( this is a 172.16.0.x network, dedicated switch, no DC, no gateway, no DNS, used for cluster)
Cluster is up and running validated. Now when testing failover a couple of scenario's work allright like:
- Killing power to one of the nodes, the cluster senses this and restarts the VM's on a different node.
- Remove all network connections from one node, VM is restarted on a different node.
- A couple of VM's use 2 networks: Factory 2 network and Office network, when Factory 2 network is disconnected,the cluster senses this and the VM's are 'Live Migrated' to a different node which still has both network connections alive.
So far so good, but if I:
- Disconnect the Office network on a node which hosts the VM's that use 2 networks (Factory 2 and Office)
I can see that the cluster has sensed this and wants to live migrate the VM's to a different host.
After a few seconds the VM's have status 'Migration Queued', instead of directly migrating.
After a while, there's an error that the VM's are not migrated and are still running on the original node which don't have all network connections anymore.
A lot of digging throught logs I find this: No authority could be contacted for authentication 0x80090311
So, I understand this because that node doesn't have a connection to the office network, on which the DC resides, anymore, so it can not author the kerberos constrained delegation anymore.
But what I don't get is how to resolve this issue.
How can I have all the nodes in the cluster trust eachother and accept live migration even when the DC can not be reached at all.
Looking forward to your responses!
BR,
Mark