Update the Hyper-V Network Virtualization Policy system job gets stuck (Network Steve Forum)

Update the Hyper-V Network Virtualization Policy system job gets stuck

Hi,

I have a problem with Hyper-V Network Virtualization
where the SCVMM system job with the Name "Update the Hyper-V Network
Virtualization Policy" sometimes keeps running for ever. Then some of the
VMs lose their network connection until I restart the scvmm service. The
service restart lets the system job to fail and afterward the update of the
network virtualization policy works again.

Has someone else the same problem? What could be the
reason for this?

Thx

Edited by J0fe Tuesday, July 15, 2014 7:51 AM

July 15th, 2014 10:51am

Are you using NVGRE in your environment? Is it configured correctly?

mostly this is related to issues with the virtualization gateways/hosts.

Can you verify your configuration with this whitepaper that we created? http://gallery.technet.microsoft.com/Hybrid-Cloud-with-NVGRE-aa6e1e9a

-kn

Free Windows Admin Tool Kit Click here and download it now

July 17th, 2014 9:25pm

Yes I'm using NVGRE and it works most of the time correctly. But from time to time the system job in scvmm gets stuck and then VMs which are using NVGRE lost sometime their connection to the internet.

But I have an assumption now. I have to run the scvmm management server and the HVN gateways on the same host cluster (2 node cluster). When I run the scvmm on one node and the HVN Gateway on the other node then it works. But I think if both run on the same host the system job gets stuck after a while. Is this possible? Is this maybe a known issue?

I know that best practice is to use a separate cluster only for HVN Gateways. But because of the size of the environment I have to run the Gateways on the cluster which runs also all other management VMs.
I will also verify my configuration against your whitepaper to get sure everything is correctly configured.

July 18th, 2014 10:13am

Yes, this is not best practice. Where do you run your VMs using NVGRE? hopefully they are living on another host/cluster.

Note that I have never tried the scenario you describe here, but NVGRE implemented by VMM is mainly a policy driven management of the hosts and guests that are participating in this game.

If you expect these errors, it is likely that something is wrong with the provider addresses/policies in your environment.

Are the connection string to the virtualization gateway (Network Service) correct, or have you pinned this to a specific host? If so, you will see similar errors when the gateway VM is moved. a virtualization gateway should never be live migrated.

-kn

Free Windows Admin Tool Kit Click here and download it now

July 18th, 2014 8:20pm

VMs using NVGRE are running on another Cluster. Only Gateways and VMs without NVGRE are running on the same cluster.
The connection string is pointing to the name of the host cluster and the guest cluster of the gateways. Gateway VM were never live migrated.
But I think I have found now the reason for the sporadic disconnects of the VM with NVGRE. My Gateway VMs have to default GWs. One on the management NIC and one on the frontend NIC. I've configured the metrics like you described on you blog. But sometimes the gateway still wants to connect trough the management NIC. So I remove the default GW on the management NIC completely now Since then it seems that the random disconnects are gone and I also not seen the stuck system job anymore.

Hope the issue is solved now but I will still keep an eye on it for awhile....
Thank for your tips and hints so far!

July 21st, 2014 9:51am

This topic is archived. No further replies will be accepted.