Replace RAID 1 Hard drive

donte10

Member
Reaction score
1
Location
Milwaukee, WI
Let me tell you what I got. I have a fairly new client that began to have issues with their Symantec Backup Exec running backups all night long and into the late morning around 11am. This was not normal, they used to be finished before the staff returned to the office in the morning at 7am. In addition, the client would also get blue screens, but seemed to be caused when running Symantec backups. So the client decided to troubleshoot with a Symantec support tech without success (they remoted into the server).

I happened to drop in for a different workstation problem that was reported and thought I'd take a peak at the server. I noticed errors in the event viewer log like:

Under System Log:
error: the device\hardisk0\DR0, has a bad block.
warning: error detected harddisk1\DR2 during a paging operation.

Under Applications Log:
Source: Symantec "Job failed with error: corrupt data encountered"
Source: VSS "VSS Writer has rejected an event -- probably caused by faulty hardware.

Note: This server is using RAID 1 with 2x 500GB SATA drives (400GB used). Server is 4-5 years old running Server 2008. Keep in mind client has no good backup for about 1-month.

I think it's a bad hard drive.. so I ran 'Crystal Disk Info' and it turns out that Drive A is reporting 'CAUTION' with reallocated sectors and Drive B is reporting 'BAD' with reallocated sectors.

My plan was to rebuild one drive at a time starting with the worser of the two drives (crossing fingers that drive A would not be too bad to copy from). So I took out the Drive B and inserted a new enterprise SATA 1TB drive, hoping that it would rebuild and be ok. The LSI LOGIC RAID card said it was resynching/rebuilding the drive, so I left it run over night. I returned the next day to find that the Drive B is reporting 'failed' in the RAID controller firmware. (I tried again with a new drive and same thing). I believe the rebuild is not working because of the faulty Drive A.

Hopefully I explained this well...what do you think? One question is can I install a bigger drive into the raid to rebuild. It looks like it auto adjusted down to 500GB.

Also, what is the next step for me to get this system up and running with fresh drives? I was thinking cloning using another alternative software, but wasn't sure if that's my next step. How would I do this anyway?
 
Let me tell you what I got. I have a fairly new client that began to have issues with their Symantec Backup Exec running backups all night long and into the late morning around 11am. This was not normal, they used to be finished before the staff returned to the office in the morning at 7am. In addition, the client would also get blue screens, but seemed to be caused when running Symantec backups. So the client decided to troubleshoot with a Symantec support tech without success (they remoted into the server).

I happened to drop in for a different workstation problem that was reported and thought I'd take a peak at the server. I noticed errors in the event viewer log like:

Under System Log:
error: the device\hardisk0\DR0, has a bad block.
warning: error detected harddisk1\DR2 during a paging operation.

Under Applications Log:
Source: Symantec "Job failed with error: corrupt data encountered"
Source: VSS "VSS Writer has rejected an event -- probably caused by faulty hardware.

Note: This server is using RAID 1 with 2x 500GB SATA drives (400GB used). Server is 4-5 years old running Server 2008. Keep in mind client has no good backup for about 1-month.

I think it's a bad hard drive.. so I ran 'Crystal Disk Info' and it turns out that Drive A is reporting 'CAUTION' with reallocated sectors and Drive B is reporting 'BAD' with reallocated sectors.

My plan was to rebuild one drive at a time starting with the worser of the two drives (crossing fingers that drive A would not be too bad to copy from). So I took out the Drive B and inserted a new enterprise SATA 1TB drive, hoping that it would rebuild and be ok. The LSI LOGIC RAID card said it was resynching/rebuilding the drive, so I left it run over night. I returned the next day to find that the Drive B is reporting 'failed' in the RAID controller firmware. (I tried again with a new drive and same thing). I believe the rebuild is not working because of the faulty Drive A.

Hopefully I explained this well...what do you think? One question is can I install a bigger drive into the raid to rebuild. It looks like it auto adjusted down to 500GB.

Also, what is the next step for me to get this system up and running with fresh drives? I was thinking cloning using another alternative software, but wasn't sure if that's my next step. How would I do this anyway?

That's tough. I would have never attempted a rebuild under those circumstances. The best thing for you to do at this point is to get a good forensic image of one or both drives on a bench machine for you to work with. I like DDRescue or RStudio for imaging. Once you have a clean image, you could install two clean drives, transfer your image to a USB HDD, then use a live CD like Parted Magic to image the array.

You may not get a clean enough image to boot from, but at least this way you will minimize actual data loss. If you are lucky enough to be able to boot, you'll want to for sure run CHKDSK and SFC on the machine to clean up as many errors as possible.

NOTE: Depending on the value of the data, it may be worth it to recommend that they send the drives out for professional recovery. You don't want to make a bad situation by messing up their chances of a successful recovery. I'm pretty good at data recovery, but there's certain types of data that I refuse to mess around with.
 
RAID 1....entry level hardware...with one tanked drive, and with one drive with one foot off the cliff ready to jump. Not a comfy situation. The "not so enterprise" grade RAID controllers aren't as good at rebuilding as higher end more enterprise level hardware. So a slightly flakey drive can tank things.

First...confirm those backups are good. What kind of data is on this server? Is it running more complicated databases like SQL and MS Exchange e-mail? Or is it running basically just file/print sharing? Flat file storage, or easy/basic databases? Running as a domain controller?

How large is this network? What I'm starting to get at...picture doing a rebuild...from scratch...how difficult will this be?

To answer your question about size of the drives in a RAID volume....yes you can shove a larger drive into an existing RAID volume....the "extra" space of that larger drive simply won't be used (with most RAID types..there are some unconventional types which can utilize that extra space, but for the purpose of conventional servers....no).

What brand of server? I'd consider renewing support on it if possible.

What kind of Symantec backup is it? Does it do image based backups? If so...might consider doing a full backup, confirming the image backup, confirming the data is intact, take out the drives, put in a pair of new ones, fresh RAID 1, push image back to it.

Another thing to consider...server is already 5+ years old...IMO not worth putting a ton of money into. Might consider a new server, install a hypervisor..make it easier to import/restore an image based backup. You're going to easily be over a thousand bucks into this thing ....I don't see the value of putting that much cash into an old server.

Another option is some cloning software to clone drive A...and then work from that drive. But to be honest....if a drive is already about to jump off the cliff...sometimes pulling it for the purpose of cloning...the RAID controller might not get along with it or the clone if you plug it back in. And cloning the drive can push it over the edge.

This isn't a situation I get a warm 'n fuzzy from.
 
Last edited:
I would first talk to the client about the data and the value.

If it's critical data I would go ahead and send it to a data recovery place.

If you are going to try an image it here is a guide
http://www.technibble.com/guide-using-ddrescue-recover-data/

Just remember that next time it's best to clone it first while the drive is still working. Then try the rebuild.

The server runs and the data is still accessible from Drive A. How do I go about cloning a drive that is used in a RAID 1 array? Would I simply pull Drive A, connect it to another system, and use software to clone it? Once the clone is complete, place the cloned drive back into the system and see if it fires up? What cloning software would you recommend?
 
The server runs and the data is still accessible from Drive A. How do I go about cloning a drive that is used in a RAID 1 array? Would I simply pull Drive A, connect it to another system, and use software to clone it? Once the clone is complete, place the cloned drive back into the system and see if it fires up? What cloning software would you recommend?

This is what I was going to recommend as well. R-Studio is very popular amongst data recovery specialists. Personally I would clone to a disk image, duplicate the image, and then restore that image. That way if the source drive dies unexpectedly you have an original and a backup.
 
First...confirm those backups are good. What kind of data is on this server? Is it running more complicated databases like SQL and MS Exchange e-mail? Or is it running basically just file/print sharing? Flat file storage, or easy/basic databases? Running as a domain controller?
This is a file server with no special SQL or Exchange. The network has 18 computers and this 1 server running on a domain controller. It's a custom build/Nebolis server that was built by a different company roughly 5 years ago. Replacing the server is a thought I've been contemplating and will be putting together a proposal. I'd rather go this route, but still need to protect them in the mean time. They were thinking of upgrading in the first quarter of next year.


What kind of Symantec backup is it? Does it do image based backups? If so...might consider doing a full backup, confirming the image backup, confirming the data is intact, take out the drives, put in a pair of new ones, fresh RAID 1, push image back to it.
The latest good image would be a month old; the client would be fine with that. I'm not sure if I'm comfortable with them losing 1-month of data.

Another thing to consider...server is already 5+ years old...IMO not worth putting a ton of money into. Might consider a new server, install a hypervisor..make it easier to import/restore an image based backup. You're going to easily be over a thousand bucks into this thing ....I don't see the value of putting that much cash into an old server.
Replacing/upgrading the server with Server 2012 is what I would like to do and will be looking to sell them one, especially with this hard drive situation. I've never used hypervisor?

Another option is some cloning software to clone drive A...and then work from that drive. But to be honest....if a drive is already about to jump off the cliff...sometimes pulling it for the purpose of cloning...the RAID controller might not get along with it or the clone if you plug it back in.
What software would you suggest using to make the clone of drive A?
 
Restoring from a month ago will end up with broken active directory relationships between the server and a good number of workstations (the trust). But it's an option...granted one that involves work (remove those workstations from the domain, and join again...things will fall back into place).

Nobilis/Equus stuff...yeah we see that in lots of dental offices, we even have a 2 or 3 year old one we just brought back from a client (along with over a dozen workstations) that we replaced all that gear with HP stuff. Yeah unfortunately was similar....SATA drives RAID 1. UGH!

Ditto on R-Studio...we purchased that for some situations in the past.
 
If I didn't have the tools I have available to me now, I'd do the following:

1. Use ddrescue to get the cleanest copy of both drives, to fresh new, known healthy drives (if possible)
2. Compare the clones to figure out which drive has the most current data on it
3. If the drive with the current data was cloned without any read errors within the files, I'd try to get the system working with it
4. Add a new drive and rebuild the mirror with from the working clone
5. If the clones aren't 100% clean, I'd set them aside, put two new drives in the system, build from scratch and import data and settings (much longer path, but a good excuse to set up things the right way, with a solid backup routine in place)

I don't recommend cloning with any software that doesn't log and allow you to re-read unreadable sectors. You may only have one or two kicks at the can and if the good drive fails mid-stream, you are screwed.
 
Back
Top