Dedupe CSV for VDI

Hello,

l have a 2 node Win 2012 R2 Hyper-V RDS Cluster hosting Pooled Virtual Desktops.  The Cluster Shared Volume is presented directly from the EMC Fibre SAN to the Hyper-V hosts.   I want to enable Deduplication on the CSV.   All of the documentation suggests not doing this directly on the Hyper-V hosts, but creating a SOFS Cluster that presents the deduped disk to the Hyper-V hosts.  This will add a level of complexity to the environment that I would like to avoid. Is the only reason for this possible performance issues?  Can I enable dedupe directly on the Hyper-V hosts and the attached CSV volume using the powershell command Enable-DedupVolume C:\ClusterStorage\Volume1 UsageType HyperV?  What are the ramifications?

Thank you.

July 31st, 2015 12:33pm

Is the only reason for this possible performance issues?

- Not possible but real performance issues that will affect on your RDS deployment. Dedup process requires intensive memory and CPU resources. Actually you can tune it to allocate less resources but in this case performance and productivity of deduplication will be less than expected.

Can I enable dedupe directly on the Hyper-V hosts and the attached CSV volume using the powershell command Enable-DedupVolume C:\ClusterStorage\Volume1 UsageType HyperV?

- Sure you can but as you mentioned it is not recommended.

Make sure that when you are talking about creating SOFS cluster it doesn't mean that you create it on the same host as dedicated role of Hyper-V cluster. It is not recommended and supported. Actually you will not be able to use this SOFS share by hosts.

IMHO if something is not recommended by vendor do not try to somehow avoid it. I would suggest to use best practice solutions to achieve desired performance, stability and productivity.

Free Windows Admin Tool Kit Click here and download it now
July 31st, 2015 1:08pm

Hello,

l have a 2 node Win 2012 R2 Hyper-V RDS Cluster hosting Pooled Virtual Desktops.  The Cluster Shared Volume is presented directly from the EMC Fibre SAN to the Hyper-V hosts.   I want to enable Deduplication on the CSV.   All of the documentation suggests not doing this directly on the Hyper-V hosts, but creating a SOFS Cluster that presents the deduped disk to the Hyper-V hosts.  This will add a level of complexity to the environment that I would like to avoid. Is the only reason for this possible performance issues?  Can I enable dedupe directly on the Hyper-V hosts and the attached CSV volume using the powershell command Enable-DedupVolume C:\ClusterStorage\Volume1 UsageType HyperV?  What are the ramifications?

Thank you.

Our VDI deployment has the RDVHs deduping SAN-based storage on a CSV the way that you describe. Our before/after baseline performance comparisons aren't turning up anything. We tossed around the idea of disabling the background sweeps but ultimately decided to just let it go and see what happened. Whatever performance differences there may be, the user experience is not impacted, and nothing else really matters. We're certainly not upset about it when we pull the space savings reports. No matter what all the FUD is out there around it, dedupe on CSV is a fully supported solution that many people have in production. If you're not sure, I would recommend that you set up a proof-of-concept LUN and run a half dozen or so deduped VMs from it and see what you think.
July 31st, 2015 9:52pm

Hello,

l have a 2 node Win 2012 R2 Hyper-V RDS Cluster hosting Pooled Virtual Desktops.  The Cluster Shared Volume is presented directly from the EMC Fibre SAN to the Hyper-V hosts.   I want to enable Deduplication on the CSV.   All of the documentation suggests not doing this directly on the Hyper-V hosts, but creating a SOFS Cluster that presents the deduped disk to the Hyper-V hosts.  This will add a level of complexity to the environment that I would like to avoid. Is the only reason for this possible performance issues?  Can I enable dedupe directly on the Hyper-V hosts and the attached CSV volume using the powershell command Enable-DedupVolume C:\ClusterStorage\Volume1 UsageType HyperV?  What are the ramifications?

Thank you.

1) Microsoft dedupe plays nice with cold data (mostly read-intensive workloads, OK...) and does not work well with write-intensive content. See:

Plan to deploy data deduplication

https://technet.microsoft.com/en-us/library/hh831700.aspx

"Does the data access pattern allow for sufficient time for deduplication? 

Files that change often and are constantly accessed by users or applications are not good candidates for deduplication. The constant access and change to the data are likely to cancel any optimization gains made by deduplication, and deduplication may not be able to process the files. 

  • A good candidate for deduplication is a file share that hosts user documents, virtual files, or software deployment files that contain data that is modified infrequently and read frequently.
  • Poor candidates for deduplication are a constantly-mounted SQL Server database that is running virtual machines, and live Exchange Server databases.
Good candidates allow time to deduplicate the files. File age policies can be applied to control when files are deduplicated to prevent early or frequent deduplication of files that are still likely to be modified significantly."

2) Run PerfMon and see how much writes do you do on your current CSV compared to reads. If reads are 80% you can enable dedupe on CSV just fine! If more (critical is maybe 40%) you'll be in trouble as you'll waste your IOPS for nothing :( See how to work with PerfMon:

Windows PerfMon Counters Explained

http://blogs.technet.com/b/askcore/archive/2012/03/16/windows-performance-monitor-disk-counters-explained.aspx

3) Make sure you run MSFT dedupe verification tool and actually get how much space you'll save. Do this before doing anything else :) See:

Evaluate dedupe savings with DDPEVAL

http://blogs.technet.com/b/klince/archive/2012/08/09/evaluate-savings-with-the-deduplication-evaluation-tool-ddpeval-exe.aspx

You can easily find the juice does not cost the squeeze :) Even with uber-expensive EMC SAN disk space.

Good luck!

Free Windows Admin Tool Kit Click here and download it now
August 1st, 2015 5:40am

Thank you Eric.   I have an environment that I can test this in but was interested in hearing from anyone else doing it.   I would think by scheduling optimization after hours we'd avoid performance issues. 
August 3rd, 2015 11:03pm

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics