ReFS, Integrity and extra resiliency on CSV

Hi,

Firstly, to explain my setup. Server 2012 R2 hyper-V hosts in a cluster (2 node) using a CSV NTFS.

A Guest Cluster for File Services using 2012 R2 and a shared VHDX, would like to use ReFS.

Current File Server is a single VM 2008 R2, data stored on NTFS (VHDX)

I am migrating a file server (approx. 1TB of data) but I want my new FS to be nice and resilient and highly available, I have a few questions which I am not too clear on the answers for and was wondering if anyone can shed some light on it. So far I have determined that deduplication is not supported on ReFS fine, data integrity is my top priority Id only save about 10% anyway.

I have also read that DFSR is not supported but I am not clear on whether this is replication TO the ReFS volume, or FROM the ReFS volume or both my existing file system is NTFS. I am unsure how to move 1TB of data retaining all ACLS and file structure from NTFS to ReFS, my idea was to use DFSR and once data is replicated I kill the original connection and switch it over out of hours, but now I am not sure. Robocopy is out of the question for me because the rate of data change is too frequent at the source, the number of files is just ridiculous to trawl through, and the time it will take is far too long. Im assuming an image based backup of a volume will restore the file system, Im thinking Windows Server Backup might work just to restore the root folder share, but Ive not gone from one file system to another using that. I also do want to use DFS as a namespace for connecting to the folder share, instead of using \\servername\sharename I want to use \\DFSNamespace\Sharename which will be used in mapped drives on the clients will either replication or namespace work on ReFS or neither, the technet documentation isnt too clear to me.

Secondly, I understand that having Checksums on files proves something about the file being integral. Is that right? If thats the case, is this something that ReFS does an equivalent thing for, or am I getting myself confused between these two things. If checksums would be useful for certain files even on ReFS, how would I generate these and use them when opening and saving certain files (there are a subset of files such as finance and design files which are critical to our business, we really want to make sure these files are not corrupted).

Finally, an idea to make a CSV or cluster available storage more resilient to failures or corruption:

I will be using both a General File server and a SOFS for application data. General File server uses an available storage disk in the cluster and is attached the general file server role, whereas SOFS uses CSV, these will be based on shared VHDX files. In both cases I am wondering how to make the underlying VHDX more resilient from corruption.

I have read that one way ReFS corrects corrupted blocks of data is that it will get clean blocks from a mirrored disk in storage spaces, so my question is this (and is it technically possible, and a supported configuration?)

Can I attach two shared VHDX files for the available storage in the cluster, create a mirrored storage space and add this space to the cluster instead of a single disk will this work?

Can I do the same above, but then add that mirrored space as a CSV.

My process would be this:

Add shared VHDX to both VMs

Using server manager: Create a storage space

Create a mirrored disk within the space

Create a volume

Format it as ReFS

Add it to the cluster as available storage

Add it is as CSV (for the SOFS scenario)

Will this work?

I have also noticed there is an area in Failover Cluster Manager to create and add storage pools, this is a feature I have never been able to use before because of not having the correct disks is this what I am after to do the above?

Many thanks in anticipation

Steve

May 19th, 2015 8:35pm

Steve, one tiny remark about ReFS and integrity streams... You cannot have data checksummed (metadata IS checksummed) with ReFS on a live VMs. That's a limitation of Windows Server 2012 R2 ReFS and it's hopefully being flexed out in upcoming Windows Server 2016. So there's no integrity benefit for ReFS against NTFS within CSV scenario. Plus ReFS is still slower for small writes (because of a metadata processing overhead) compared to NTFS. Good discussion here:

ReFS, live VMs and integrity data

http://serverfault.com/questions/567759/is-refs-ready-to-host-production-vhdxs-on-hyper-v-2012-r2-clusters

Good luck!


Free Windows Admin Tool Kit Click here and download it now
May 20th, 2015 9:36am

Hi Steve,

I'll try to address these questions.

1. It will just not work. If you try to add a folder on ReFS to DFSR group, it will fail. So the replication group cannot be created from the beginning. 

2. As said in many articles, ReFS will help correct logical errors automatically. We do not need to do additional step. It's transparent to users.

3. Technically it will work (and yes you need to config CSV in Cluster Manager) but if these 2 VHDX are located on 1 physical disk, if it is corrupted, you will still lose all data. It is better to use physical disks in storage pool. 

May 20th, 2015 10:18am

Sorry Anton, just to be clear i am referring to ReFS being on the filesystem inside the VM itself... not checking the integrity of VHDX files, just normal files inside the VHDX... is your point still valid?

thanks

Steve

Free Windows Admin Tool Kit Click here and download it now
May 20th, 2015 2:03pm

ok thanks for your answers, just to expand on a couple of bits.

1. can i still link to a DFS Name space with the folders on ReFS (no replication involved, just the namespace)?

3. the underlying physical disk the VHDX files reside on will be two separate physical disks. you said technically it will work, is it a valid supported scenario though?

cheers

Steve

May 20th, 2015 2:08pm

Sorry Anton, just to be clear i am referring to ReFS being on the filesystem inside the VM itself... not checking the integrity of VHDX files, just normal files inside the VHDX... is your point still valid?

thanks

Steve

Whoops... My fault! You CAN use ReFS inside your VMs to hash sum your data and not only metadata but because VMs are not layered on top of Storage Spaces directly ReFS "healing" is not propagated down the storage stack. Making long story short: no benefit :(
Free Windows Admin Tool Kit Click here and download it now
May 20th, 2015 10:09pm

so if i am understanding this correctly, using ReFS inside a VM has no real benefit, because the CSV (Starwind in our case) is NTFS and the physical disk that starwind sits on is NTFS.....

which leads me to think i need ReFS applied from the physical disk.... but if that's the case would i really need it applied to the CSV and inside the VM's as well? if there is no benefit to this then i may as well use NTFS inside the VM and gain my DFSR and Dedupe inside the VM?

Thanks

Steve

May 21st, 2015 7:17am

so if i am understanding this correctly, using ReFS inside a VM has no real benefit, because the CSV (Starwind in our case) is NTFS and the physical disk that starwind sits on is NTFS.....

which leads me to think i need ReFS applied from the physical disk.... but if that's the case would i really need it applied to the CSV and inside the VM's as well? if there is no benefit to this then i may as well use NTFS inside the VM and gain my DFSR and Dedupe inside the VM?

Thanks

Steve

It has! You get your data corruption detected inside a VM. ReFS cannot do anything to fix it but at least it would shout about.

StarWind may (or may not, depends on settings...) have own hashes for data & metadata. Also RAID controllers may have ones as well (DDN does this for sure). So last thing you want to do is having data checksummed @ hardware, logical volume manager and file system. At the same time! Too much of the overhead.

You may enable dedupe inside a VM (but not with ReFS, NTFS is the only supported FS so far) but you're not going to get even per-volume deduplication. So the only usable scenario is maybe a file server.

Free Windows Admin Tool Kit Click here and download it now
May 21st, 2015 5:18pm

A file server is exactly what I am aiming at the dedupe, but with 130GB saving on a 1TB it may be worth sacrificing the feature for better data integrity.

you said, ReFS inside a vm wont be able to fix the corruption but it will let you know about it:

  1. how does ReFS inform you of an error, where is this logged?
  2. in order to get ReFS to fix it inside a VM what do I need to do? I am assuming Id need ReFS on both the physical local disks, AND the starwind CSV... as any corruption higher up the chain wont get fixed lower down? if corruption is detected inside the VM though, and ReFS is on both physical and starwind CSV... wouldnt those lower levels fix my VM first anyway? not clear on how the chain of events pan out in these configuration.

thanks

Steve

May 22nd, 2015 7:12am

A file server is exactly what I am aiming at the dedupe, but with 130GB saving on a 1TB it may be worth sacrificing the feature for better data integrity.

you said, ReFS inside a vm wont be able to fix the corruption but it will let you know about it:

  1. how does ReFS inform you of an error, where is this logged?
  2. in order to get ReFS to fix it inside a VM what do I need to do? I am assuming Id need ReFS on both the physical local disks, AND the starwind CSV... as any corruption higher up the chain wont get fixed lower down? if corruption is detected inside the VM though, and ReFS is on both physical and starwind CSV... wouldnt those lower levels fix my VM first anyway? not clear on how the chain of events pan out in these configuration.

thanks

Steve

0) No point. Guess you did get your estimations by running MSFT supplied dedupe calculator? I mean DDPEVAL? See:

DDPEVAL (MSFT Dedupe Calculator)

http://blogs.technet.com/b/klince/archive/2012/08/09/evaluate-savings-with-the-deduplication-evaluation-tool-ddpeval-exe.aspx

1) They get fired to system event log. Entry is

Microsoft\Windows\DataIntegrityScan

under "Applications and Services Logs" key.

You use any Event Viewer monitoring app. Good reviews here:

Event Log Monitoring Tools (Review)

http://community.spiceworks.com/topic/157251-windows-server-event-monitoring-tool-recommendations

We use SolarWinds (NOT StarWind!!) app. See:

SolarWinds Event Log Consolidator

http://www.solarwinds.com/products/freetools/event-log-consolidator.aspx

3) You need to have underlying compatible storage. Somehow redundant Storage Spaces. If ReFS would hit bad hash on data it would treat block as dead, get another copy from underlying block storage and re-allocate block to other place marking old address as "gone forever". Something ZFS did for years. 

Free Windows Admin Tool Kit Click here and download it now
May 22nd, 2015 10:33am

thanks for all that, data integrity log was what i was after, i just couldn't remember the folder :)

so, in my case i have RAID 10 consisting of 14 SAS disks, Starwind sits on top of this... do you think i would be better off making a mirror storage space out of these 14 SAS disks? i have heard storage spaces gives better performance compared to hardware RAID.... some demo that happened a while ago using JBOD and they got x amount of million IOPS... but question is, would 14 Disks work better in mirror spaces or hardware RAID.

since my disks are currently in use at the moment i can't test performance between the two.

Steve

May 22nd, 2015 11:43am

thanks for all that, data integrity log was what i was after, i just couldn't remember the folder :)

so, in my case i have RAID 10 consisting of 14 SAS disks, Starwind sits on top of this... do you think i would be better off making a mirror storage space out of these 14 SAS disks? i have heard storage spaces gives better performance compared to hardware RAID.... some demo that happened a while ago using JBOD and they got x amount of million IOPS... but question is, would 14 Disks work better in mirror spaces or hardware RAID.

since my disks are currently in use at the moment i can't test performance between the two.

Steve

In this particular case hardware RAID10 would be preferred. 
Free Windows Admin Tool Kit Click here and download it now
May 22nd, 2015 1:51pm

ok, on the face of it hardware RAID sounds a better option. would the answer be different if I said I had 1 x TB SATA SSD? i'm not sure I can use SATA SSD in storage space as well as the 14 SAS... but would this work better in storage spaces using tiering or using the SSD for L2 cache in starwind, which would give me the better performance option?

thanks

Steve

May 24th, 2015 3:54pm

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics