Network Drops for 30 seconds During Hyper-V Live Migration

I have 3 physical Hyper-V hosts setup with clustered storage. I disabled VMQ because I was getting errors when trying to do live migrations. I have also ran the network portion of the cluster validation tests without errors. What happens is basically when I do a live migration from any host to any other host I lose network connectivity to any VM running on those hosts. During this time I have a SQL application that is running and locks up and freezes all the users. Many will have to use task manager to kill the application to get back in or even reboot their machines to free it up.

I have been doing a ton of reading on network settings and configurations and have made no progress. Any help to point me in a direction to get this solved will be appreciated. I need to be able to do Live Migrations on my cluster storage.

Thanks for any help.


  • Edited by BrianEhMVP Monday, May 18, 2015 3:51 PM clarity
  • Moved by BrianEhMVP Monday, May 18, 2015 3:51 PM
May 18th, 2015 2:25pm

There has always been a momentary drop in networking at the tail end of a Live Migration.

It has frequently been demonstrated to be 3 to 5 pings lost.

Now, having supported a SQL client / Server application in the past, I know there are are some applications that lack reconnection logic if the connection to the SQL Server is interrupted in any way whatsoever.  ( I had the please to manage one ).

I am guessing that this is the root of your issue.  My recommendation; don't migrate during business hours.

Free Windows Admin Tool Kit Click here and download it now
May 18th, 2015 3:19pm

Yes I agree that the momentary loss is expected during the migration and I tracked that down to a step where the hosts hands off the NIC and MAC to the new host. But this isn't a 5 ping loss, this is a 30-45 second disconnect, all RDP sessions drop into retry mode, all VNC sessions disconnect, and access to file shares is interrupted. This is a substantial loss of network connectivity, far more than the normal 5 ping loss I would expect to see.
May 18th, 2015 3:37pm

Thanks, that adds a bit more clarity to your issue.

Free Windows Admin Tool Kit Click here and download it now
May 18th, 2015 3:51pm

I assume its some sort of cluster / network setting, If I live migrate a machine from HostA to HostB and my SQL server is on HostC, but I lose network connectivity on HostC that's not involved in the migration, I can only assume its some system wide setting.
For clarity, I am from the VMware world, this is my first Hyper-V cluster configuration I have been responsible for, so I got dropped in on these existing issues and am trying to figure them out.

May 18th, 2015 5:11pm

Hi,

Update your drivers to the latest version. and is your network fast enough. When doing a Copy from one node to the other is there also a freeze ?

And did you configure the Live migration network ? right click on networks and live migration settings and tell the cluster use this network ? if you do all this on the same nic your config is wrong.

Below is a live migration on a SMB3 dedicated network 2x 10Gb and as you can see I moved 4 VM's else I could not take the screenshot ;-)

Hope this helps

Free Windows Admin Tool Kit Click here and download it now
May 19th, 2015 4:44pm

Thank you for the reply. I tested a file copy with 2.2GB file. I copied from A to B and C simultaneously, then from B to A and C, then C to A and B. No matter what iteration of copying I did, I never lost a single ping on any host or virtual machine running in the cluster.

I went into the Live Migration settings and there are 3 networks chosen to be used for the Live Migration, the Live Migration Network, the Cluster Shared Volumes Network, and the Management Network. All the iSCSI networks are not selected.

Anything else I should look at?

Thanks.

May 20th, 2015 1:55pm

Hi,

you have a special live migration network but you are using also the management nic ?

you should change that. LM only thrue the LM nic

And during the copy you are using the Management Nic right ?

and the LM/CSV nic has only an IP and no DNS/gateway

Test this and see the load on the NIC like I showed you

Free Windows Admin Tool Kit Click here and download it now
May 20th, 2015 4:00pm

Forgive me for sounding ignorant, but I should modify the LM settings so it only uses the LM network. Then test a migration and watch the load? How do I get to the screen that you posted above?

When I go onto a host and open the network properties of the LM adapter, it only has an IP and mask, no DG, no DNS.

Yes the copies I tested were through the Mgmt adapter which has all IP, Mask, DG, and DNS info.

May 20th, 2015 5:54pm

I enabled only the LM network for Live Migrations and verified the IP setup. I tried another migration and it crashed the SQL box again. Also when the migration completed it said: "There currently are no network adapters with network optimization available on host pc.domain.com."
  • Edited by JaxIsland Thursday, May 21, 2015 1:38 PM
Free Windows Admin Tool Kit Click here and download it now
May 21st, 2015 12:53pm

Is the SQL Server (IP) in the management IP only or has it multiple IP's

But you are sure you can run the VM on the other node ?

May 21st, 2015 4:32pm

The SQL server has a single network adapter and it is in the management only.

Yes the VM runs without issue after it is moved, we have moved it for a couple years like this, just scheduling downtime and moving it then waiting for the applications to come back. But its not a matter of moving the SQL VM, its moving other VM's to or from the host that will lock up the machines.

I did another test migration today after upgrading the switch firmware that connects to the servers. I moved a host from HostC to HostB. SQL is on HostB. During the migration I lost network connectivity to all servers on the destination host (HostB). As soon as the migration completed, access was restored. I watched task manager and the LM network was pegged at 1Gbps.

Free Windows Admin Tool Kit Click here and download it now
May 21st, 2015 6:19pm

HI,

I think the Problem is touching the limits of your server and configuration. As the live migration is pushing the server to the limit. You could limit the LM

Hyper-V: Live Migration Network Configuration Guide

https://technet.microsoft.com/en-us/library/ff428137(v=ws.10).aspx

 

May 22nd, 2015 12:22pm

That document is for 2008 R2, my environment is 2012 R2, is there a document compatible with 2012 R2?

Thank you.

Free Windows Admin Tool Kit Click here and download it now
May 22nd, 2015 1:10pm

Hi Jaxlsland,

Base on my experience the 2012R2 still use Live migrateion requirement, you can perform a check of your network settings as this KB required, second please install the following hotfix to narrow down the issue area.

Recommended hotfixes and updates for Windows Server 2012 R2-based failover clusters

https://support.microsoft.com/en-us/kb/2920151

Im glad to be of help to you!

May 24th, 2015 3:02am

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics