Deduplication in SharePoint 2013

Hi everybody,

I'm interested in deduplication and what is possible out-of-the-box with SharePoint 2013, what functionality does and does not exist, but not any integration with 3rd party products.

For instance:

- Consecutive backups of the same document residing in a document library will be deduplicated

Thanks.

December 24th, 2012 12:27pm

This support is known as "Shredded Storage". You can learn more at the link below.

http://blogs.technet.com/b/wbaer/archive/2012/11/12/introduction-to-shredded-storage-in-sharepoint-2013.aspx

Free Windows Admin Tool Kit Click here and download it now
December 24th, 2012 4:17pm

Note that Shredded Storage is not deduplication.  If there are two copies of the same document located on the same content database, it will store two copies of that document.  Shredded is used for versioning of a single document (Office documents), storing only the differences between the versions, unlike in previous versions of SharePoint where it stores a complete document for every version.

If you want deduplication, you're going to have to look for a 3rd party RBS provider, like Metalogix's StoragePoint, which will only store a single copy in the RBS data store no matter how many times the item is referenced in SharePoint, given it is identical.

December 24th, 2012 7:30pm

There seems to be some confusion on the term of de-duplication and SP2013.  Hardware based de-duplication using RBS existed in SP2010. But conceptually shredded storage can be referred to as de-duplication for documents within the same library if versioning is turned on. No longer are complete versions stored when changes are made to a document. Secondly, in regards to eDiscovery/Holds and SP2013, complete versions are not created, so de-duplication is being implemented here. These  represent two new places out of the box for SP2013 for de-duplication. 

Below is an interesting article on possible problems and benefits with shredded storage.

http://www.sharepointpromag.com/blog/dan-holmes-viewpoint-on-sharepoint-blog-24/sharepoint-2013/sharepoint-2013-shredded-storage-144987

Free Windows Admin Tool Kit Click here and download it now
December 24th, 2012 8:51pm

I would consider dedupe to be single instance storage, a la Exchange 2003.  A single binary blob used in multiple locations -- something which, OOTB, SharePoint does not provide.  Shredded Storage also has limitations (only good for Office OpenXML documents).
December 24th, 2012 9:03pm

Hi all.

De-duplication - SAN MFG have provided de-duplication for years. However, Windows Server 2012 provides de-duplication to those of us that dont have SANs accept at client sites.. You will find significant drive space improvements, I'm saving 39% out of 6TB, I don't have a SAN with built-in De-Duplication for my small office.

Exchange SIS (Single Instance Storage) - Sending a large (30 MB+) attachment to 20 users.  Even if there were only 5 recipients out of the 20 on the same database, in Exchange 2003 that meant the 30MB attachment was stored once instead of 5 times on that database. In Exchange 2010, that attachment is stored 5 times (150 MB for that database) and isn't compressed. But depending on your storage architecture, the capacity to handle this should be there. Also, your email retention requirements will help here, by forcing the removal of the data after a certain period of time.  

Cobalt - In SharePoint 2010 when saving a document, such as a documented opened from SharePoint with the Office 2010 client, only the incremental change to the document are submitted over the network from the client to the server; however, the document is coalesced on the Web server requiring a full read from the database server, and subsequently the new file inclusive of the change are written to the database server.

Shredded Storage is both improves I/O and reduces compute utilization when making incremental changes to document or storing documents in SharePoint 2013. Shredded Storage builds upon the Cobalt (I.e. File Synchronization via SOAP of HTTP) protocol introduced in SharePoint 2010.

Shredded Storage - In SharePoint 2010 when a file is uploaded to a Document Library/List a single row is created in AllDocStreams to host the BLOB associated with the upload. As previously discussed, on subsequent edits to the file only the changes bytes (incremental change) are sent to the server across the network reducing the clients overall bandwidth utilization; however, in order to coalesce the changes, the file is read from the database server by the web server where the merge occurs and the file sent back to the database server for storage. In SharePoint 2010 this process improved the reliability of file I/O operation; however, the web server incurred a penalty as the result of the change. Shredded Storage improves on the SharePoint 2010 model by breaking an individual BLOB into shredded BLOBS that are stored in new database Table, DocStreams. Each BLOB contains a numerical Id representative of the source BLOB when coalesced. When a client updates a file only the shredded BLOB that corresponds to the change is updated with the update occurring on the database server as opposed to the Web server. As a result File IO operations are reduced by ~2x when compared to FSSHTTP in SharePoint 2010 and the storage footprint significantly reduced.

Reference:

Dude where's my SIS by The Exchange Team  http://blogs.technet.com/b/exchange/archive/2010/02/22/dude-where-s-my-single-instance.aspx

Intro to Shredded Storage by Bill Baer - http://blogs.technet.com/b/wbaer/archive/2012/11/12/introduction-to-shredded-storage-in-sharepoint-2013.aspx

Plan to Deploy Data De-Duplication http://technet.microsoft.com/en-us/library/hh831700.aspx

-Ivan

Free Windows Admin Tool Kit Click here and download it now
December 27th, 2012 1:01am

If one looks closer in the fine print Metalogix StoragePoint doesn't support single-instance storage on SP2013--because of shredded storage. Shredded storage tends to complicate everything RBS related. Even if it was supported on SP2013, single-instance on StoragePoint is per Storage Profile, so depending on how you are creating your RBS profiles and endpoints, it doesn't necessarily provide as much benefit as you would hope. I have a couple large site collections and would like to dedup between the the site collections but that would require, essentially externalizing everything in my web application which isn't what I want to do right now. That leaves you with hardware or OS level deduplication. Windows Server 2012 R2 has much improved, block-level, dedup integrated but appears to have its limitations as well. So, the search continues.  
April 24th, 2015 4:45pm

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics