Cannot replace failed RAID5 member disk
My setup:Windows Server 2008 SP2 64bit, RAID5 volume which consist of 6 member 1TB SATA disks.One of physical disks have bad clusters. During RAID5 volume sync it's marked as "error".I need to replace this bad physical disk. I've connected new spare disk with same size.As i know, i slould use "repair volume" procedure. I've tried this many times (throught GUI and DISKPART command shell), but this procedure each time repairs another healthy member of volume. Any other members except bad disk. Then resync and again "error" mark on bad physical disk.1. Like this: http://img503.imageshack.us/img503/3377/dman1.gif2. Trying to repair: http://img130.imageshack.us/img130/1340/dpart1.gif3. Result of repair command: http://img694.imageshack.us/img694/9042/dman2.gif4. Then go to (1)If i try to bring bad disk "offline" (to do a volume repair), volume status is fail. Bringing offline some other disks does the same, for another disks a have failed redundancy status.I spent on solving this problem 2 weeks and have no more ideas.Please, help me replace this bad disk while it's not dead permanently and RAID5 volume is still alive.
MCITP: Enterprise Administrator, MCSA
January 1st, 2010 4:01pm
Hi,
Yes, if you replace one disk for a RAID 5, you should just need to click “repair volume”, and then the system will repair the RAID itself. I performed a test in a Windows Server 2008 virtual machine on my Hyper-V computer with 6 disks without any problem. I suspect that there are some issue with your new disk. If possible, please replace another new disk with the bad one. By the way, as your disk is huge, it will take times to repair the RAID.
In addition, please check whether there are any erros in Event Viewer.
Best Regards,
Vincent Hu
Free Windows Admin Tool Kit Click here and download it now
January 4th, 2010 12:55pm
Thank you for taking part ;)New disk is 100% healthy.I've did further investigation. I suspect issue in this: RAID5 has 6 members (disks 2 to 7 in my setup; 5 healthy and 1 bad disk), one healthy member (for example disk 6) was get offline and whole array become "failed rd". Then disk 6 become online again. Resync procedure started. Because of disk 4 has bad sectors, resync will never end with success. When resync get error from disk 4, array status is "failed rd", disk 4 marked as "errors". Status of healthy disk 6 is "online", but it not contain correct data, because of resync was interrupted with error from disk 4 [system event: The device, \Device\Harddisk4\DR10, has a bad block.]I'm connected spare disk 8. When i try to repair raid5 array, its trying to replace disk with incorrect data (disk 6). Now disk 6 is clean and disk 8 needs resync. Resync procedure started end interrupted with error from bad disk 4. Next "repair" will replace disk 8 with disk 6 and so on...On my first post i wrote "any member except bad". Its not coorect bacause Windows is not maintain disk numbers between reboots in case of cable swap. The true: repair procedure only swaps two disks (disk 6 and disk 8).To get rid of error from disk 4 during resync, i've tried to clone disk 4 to new disk (with igrore of read errers). I've tried Acronis True Imge and Ghost in disk-ro disk mode. Both programs are not support dynamic disks.What to do now? Can you emulate disk read error in virtual environment?
MCITP: Enterprise Administrator, MCSA
January 4th, 2010 1:36pm
I'm having this same problem. I have five 1.5tb drives a raid 5 array. Status came up as 'failed redundancy' one day so I bought two new drives (just in case). I put the first one in, repaired the volume, and the status was still 'failed redundancy'. I put the second one in, repaired again, and it just took the first new disk out of the array. Repairing again just keeps cycling one of the new drives out of the array. Is there a way to specify which disk to remove from the array when repairing?
Free Windows Admin Tool Kit Click here and download it now
February 5th, 2010 7:49am
I could not solve this problem and simply rebuld new array.In this russian thread was some suggestions, you can read it using google translate, if interested.The first problem of MS Raid in fact what you cannot determine which phisycal drive corresponds to disk drive number in disk manager, there's no way to view drive serial number without additional software. The second problem -- you cannot force resync with error skip. The third -- where no documented ways to clone dynamic disks.First drive i'm disconnected was healthy, error number one, array become failed rd. Another member of array contains bad sector. Error number two. RAID5 topology can't work with errors on two dirves.There's no way to "secify which disk to remove from the array when repairing", bacause only one disk can be not reliable in RAID5 at same time. Workaround for this, if array in healthy state -- set neded disk as "offline". Then repair arrao to new connected disk.
MCITP: Enterprise Administrator, MCSA
February 5th, 2010 5:14pm


