What’s New in Data Deduplication?

If I had to pick two words to sum up the major changes for Data Deduplication coming in the next version of Windows Server, they would be “scale” and “performance”. In this posting, I’ll explain what these changes are and provide some recommendations of what to evaluate in Windows Server Technical Preview 2.

In Windows Server 2016, we are making major investments to enable Data Deduplication (or “dedup” for short) to more effectively scale to handle larger amounts of data. For example, customers have been telling us that they are using dedup for such scenarios as backing up all the tenant VMs for hosting businesses, using from hundreds of terabytes to petabytes of data. For these cases, they want to use larger volumes and files while still getting the great space savings results they are currently getting from Windows Server.

Dedup Improvement #1: Use the volume size you need, up to 64TB

Dedup in Windows Server 2012 R2 optimizes data using a single-threaded job and I/O queue for each volume. It works great, but you do have to be careful not to make the volumes so big that the dedup processing can’t keep up with the rate of data changes, or “churn”. In a previous blog posting (Sizing Volumes for Data Deduplication in Windows Server), we explained in detail how to determine the right volume size for your workload and typically we have recommended to keep volume size <10TB.

That all changes in Windows Server 2016 with a full redesign of dedup optimization processing. We now run multiple threads in parallel using multiple I/O queues on a single volume, resulting in performance that was only possible before by dividing up your data into multiple, smaller volumes:

The result is that our volume guidance changes to a very simple statement: Use the volume size you need, up to 64TB.

Dedup Improvement #2: File sizes up to 1TB are good for dedup

While the current version of Windows Server supports the use of file sizes up to 1TB, files “approaching” this size are noted as “not good candidates” for dedup. The reasons have to do with how the current algorithms scale, where, for example, things like scanning for and inserting changes can slow down as the total data set increases. This has all been redesigned for Windows Server 2016 with the use of new stream map structures and improved partial file optimization, with the results being that you can go ahead and dedup files up to 1TB without worrying about them not being good candidates. These changes also improve overall optimization performance by the way, adding to the “performance” part of the story for Windows Server 2016.

Dedup Improvement #3: Virtualized backup is a new usage type

We announced support for the use of dedup with virtualized backup applications using Windows Server 2012 R2 at TechEd last November, and there has been a lot of customer interest in this scenario since then. We also published a TechNet article with the DPM Team (see Deduplicating DPM Storage) with a reference configuration that lists the specific dedup configuration settings to make the scenario optimal.

With a new release we can do more interesting things to simplify these kinds of deployments and in Windows Server 2016 we have combined all the dedup configuration settings into a new usage type called, as you might expect, “Backup”. This both simplifies the deployment as well as helps to “future proof” your configuration since any future setting changes can be included to be automatically changed by setting this usage type.

Suggestions for What to Check Out in Windows Server TP2

What should you try out in Windows Server TP2? Of course, we encourage you to evaluate overall the new version of dedup on your own workloads and datasets (and this applies to any deployment you may be using or interested in evaluating for dedup, including volumes for general file shares or for supporting a VDI deployment, as described in our previous blog article on Large Scale VDI Deployment).

But specifically for the new features, here are a couple of areas we think it would be great for you to try.

Volume Sizes

Try larger volume sizes, up to 64TB. This is especially interesting if you have wanted to use larger volumes in the past but were limited by the requirements for smaller volume sizes to keep up with optimization processing.

Basically the guidance for this evaluation is to only follow the first section of our previous blog article Sizing Volumes for Data Deduplication in Windows Server, “Checking Your Current Configuration”, which describes how to verify that dedup optimization is completing successfully on your volume. Use the volume size that works best for your overall storage configuration and verify that dedup is scaling as expected.

Virtualized Backup

In the TechNet article I mentioned above, Deduplicating DPM Storage, there are two changes you can make to the configuration guidance.

Change #1: Use the new “Backup” usage type to configure dedup

In the section “Plan and set up deduplicated volumes” and in the following section “Plan and set up the Windows File Server cluster”, replace all the dedup configuration commands with the single command to set the new “Backup” usage type.

Specifically, replace all these commands in the article:

# For each volume

Enable-DedupVolume -Volume <volume> -UsageType HyperV

Set-DedupVolume -Volume <volume> -MinimumFileAgeDays 0 -OptimizePartialFiles:$ false -Volume <volume>

 

# For each cluster node

Set-ItemProperty -Path HKLM:\Cluster\Dedup -Name DeepGCInterval -Value 0xFFFFFFFF

Set-ItemProperty -Path HKLM:\Cluster\Dedup -Name HashIndexFullKeyReservationPercent -Value 70

Set-ItemProperty -Path HKLM:\Cluster\Dedup -Name EnablePriorityOptimization -Value 1

…with this one new command:

# For each volume

Enable-DedupVolume -Volume <volume> -UsageType Backup

Change #2: Use the volume size you need for the DPM backup data

In the article section “Plan and set up deduplicated volumes”, a volume size of 7.2TB is specified for the volumes containing the deduplicated VHDX files containing the DPM backup data. For evaluating Windows Server TP2, the guidance is to use the volume size you need, up to 64TB. Note that you still need to follow the other configuration guidance, e.g., for configuring Storage Spaces and NTFS. But go ahead and use larger volumes as needed, up to 64TB.

Conclusion

We think that these improvements to Data Deduplication coming in Windows Server 2016 and available for you to try out in Windows Server Technical Preview 2 will give you great results as you scale up your data sizes and deploy dedup with virtualized backup solutions.

And we would love to hear your feedback and results. Please send email to dedupfeedback@microsoft.com and let us know how your evaluation goes and, of course, any questions you may have.

Thanks!

 

 

 



The Storage Team at Microsoft – File Cabinet Blog

{ Comments on this entry are closed }

Support Tip: Establishing SQL Server Instance Auto Protection on a Secondary DPM Server

May 6, 2015

~ Dwayne Jackson | Senior Support Escalation Engineer Hi Everyone, Dwayne Jackson here again with a quick tip for you in case you ever run into an issue where SQL Server instance auto-protection for Data Protection Manager is not enabled during DPM Secondary Protection. Please ensure your symptoms align with the items noted below.   Scenario […]

Read the full article →

Time is on your side—welcoming Timeful to Google

May 6, 2015

Posted by Alex Gawley, Director of Product Management With the proliferation of mobile phones and greater access to technology, it should be easy to get things done quickly and effortlessly. That’s why with tools like Gmail, Inbox, Calendar and Docs, we’ve built smart features to help you organize your life and take the work out […]

Read the full article →

Exchange Server 2016 Architecture

May 6, 2015

Exchange Server 2016 builds upon the architecture introduced in Exchange Server 2013, with the continued focus goal of improving the architecture to serve the needs of deployments at all scales. Important: This article contains preliminary information that may be changed prior to final commercial release of the software described herein. Building Block Architecture In Exchange […]

Read the full article →

Exchange Server 2016 Architecture

May 6, 2015

Exchange Server 2016 builds upon the architecture introduced in Exchange Server 2013, with the continued focus goal of improving the architecture to serve the needs of deployments at all scales. Important: This article contains preliminary information that may be changed prior to final commercial release of the software described herein. Building Block Architecture In Exchange […]

Read the full article →

Peruse Reads Your Spreadsheets So You Don’t Have To

May 5, 2015

 The time-consuming tedium of file search is San Francisco-based startup Peruse’s jumping-off point. It’s aiming to simplify locating files by changing how people search so they don’t have to remember exactly where to look or exactly what the file was called. Peruse’s fix for this age-old tech problem is to use a natural-language question-and-answer interface, […]

Read the full article →

SQL Server 2016 public preview coming this summer

May 5, 2015

Satya Nadella, CEO of Microsoft, announced SQL Server 2016, an intelligent platform for a mobile first, cloud first world.  The next major release of Microsoft’s flagship database and analytics platform provides breakthrough performance for mission critical applications and deeper insights on your data across on-premises and cloud. Top capabilities for the release include: Always Encrypted […]

Read the full article →

Hello from Microsoft Ignite

May 5, 2015

We’re excited to be kicking off Ignite this week in Chicago – doing it on Star Wars day is a nice bonus as the Exchange team is here in full force. It has been great catching up with our customers and Exchange MVPs as we prepare for this action-packed week. At Ignite we will begin […]

Read the full article →

Backing up SharePoint with Data Protection Manager and troubleshooting related issues

May 4, 2015

~ Chris Butcher | Senior Support Escalation Engineer Hi folks, Chris Butcher here again with part 2 in our series on protecting SharePoint with System Center 2012 Data Protection Manager. In part 1 we went through the process of enabling protection and examined issues you might encounter when configuring SharePoint Protection in Data Protection Manager. […]

Read the full article →

‘Apple Won’t Always Rule. Just Look at IBM.’

May 4, 2015

Apple can’t grow like this forever. No company can. In a few short years, Apple has become the biggest company on the planet by market value – so big that it dwarfs every other one on the stock market. It dominates the Standard & Poor’s 500-stock index as no other company has in 30 years. […]

Read the full article →