Poor IO on fixed VHDX disks 2012 R2

I have a 7 node 2012 R2 Hyper-V cluster (failover cluster) with SCVMM.  The backend storage is a Netapp 2040-2, cluster-mode, and I'm using SMB 3.0 for my storage protocol.  The hosts each have 4 1Gb bonded NICs using LACP and TransportPorts.

When I first setup the cluster, I did some testing from the guests and found that I could write to the storage at or near the speed of the network, 4Gb/s (512MB/s).  This last week, applications started performing slowly and I found that I could now write to the storage at only about 30MB/s.

I checked the storage, network, found no problems or bottlenecks.

Finally, I created a VM directly on the hyper-v host which is has not been added as a resource to the failover cluster and was not created via SCVMM.  This VM can write to the same storage (same volume, share, spindles, etc) at almost 4Gb/s.  I can test simultaneously with any VM on any one of my 7 hyper-v hosts and it continues to have excellent IO performance and the VMs inside the cluster continue to have poor IO performance.

I have run packet traces to try to determine if anything could be wrong with the conversation between hyper-v and the storage, but found no issues.


February 22nd, 2014 1:30am

I've narrowed this problem down to a performance problem with fixed VHDX disks.

To remove network and storage from the equation, I created a VM on local disk on one of my cluster nodes.

I added two disks to that VM, both on local storage:

1. VHDX fixed disk

2. VHDX dynamic disk

Writing to the dynamic disk, I consistently get ~ 630MB/s and on the hyper-v host the disk queue length remains at 0.1.

Writing to the fixed disk, I consistently get ~ 180MB/s and on the hyper-v host the disk queue length goes to 100.

I'm wondering if a recent update caused this as this problem only recently began for us.

Free Windows Admin Tool Kit Click here and download it now
February 23rd, 2014 4:27pm

Hi Interloper,

For troubleshooting , please try to create one fixed and one dynamic VHDX file and attach them to the host (same volume ), then copy files from host to the VHDs ?

If it is normal , please try to allocate a separate IDE or SCSI controller for each type disk in VM settings , then re-test the speed of file copy .

Any further information please feel free to let us know .

Best Regards

El

February 26th, 2014 8:39am

Did you ever end up having any luck with this Interloper?

I'm having the exact same issues with fixed size vhdx files in a 2012 r2 cluster and I cant seem to find much info on the net about it.

Free Windows Admin Tool Kit Click here and download it now
March 24th, 2014 9:38pm

I didn't have any luck.  I did some more testing, using dd on linux (RHEL 6.4 64) and iometer on Windows.  Due to the difference in testing, disregard the actual write throughput number, but instead look at the variance between the types of disks / bus. I ran each test 3 times to make sure nothing else was interfering.  These were run on VM(s) on local disk on a host doing nothing else but this testing:

OS Version Test tool Disk type Emulation Write MB/s Read MB/s
Linux RHEL 6.4 dd Fixed SCSI 180 270
Linux RHEL 6.4 dd Fixed IDE 160 236
Linux RHEL 6.4 dd Dynamic SCSI 600 370
Linux RHEL 6.4 dd Dynamic IDE 600 370
Windows 2008 R2 iometer Fixed SCSI 22.825267 11.463962
Windows 2008 R2 iometer Fixed IDE 6.168301 6.265588
Windows 2008 R2 iometer Dynamic SCSI 3.918661 3.885099
Windows 2008 R2 iometer Dynamic IDE 3.295828 3.282337

Everything I'm finding says we should only see an 8% difference between fixed and dynamic.  These numbers are much different than that and it's interesting that with Linux dynamic is MUCH faster.


March 26th, 2014 1:02pm

Thanks for the reply mate, have you disabled ODX, assuming your storage network doesn't support it? Apparently it is enabled by default in 2012 r2 even if your storage network doesn't support it.

I have also found with my drives here the bigger the drive and the more stuff on it the worse the performance becomes.

Free Windows Admin Tool Kit Click here and download it now
March 26th, 2014 8:42pm

I'm seeing something similar.  I've just converted some Dynamic VHDXs to Fixed and the storage traffic is now being redirected to the co-ordinator node (the one that owns the CSV where the VHDXs reside)

This means that the storage traffic goes over the cluster network, to that co-ordinator node, and then to the SAN, rather than directly to the SAN from the node running the VM in question :-(

Not ideal! Particularly in our case as that cluster network is 1Gbps compared with the 10Gbps iSCSI network

July 14th, 2014 8:18pm

Anyone had any luck resolving this? I'm having issues with IBM XIV Gen 2 SANs that are fibre attached. The host can do 200MB + but the VMs are only hitting 5MB max...
Free Windows Admin Tool Kit Click here and download it now
July 9th, 2015 11:02am

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics