Strange way to fix a server - break the RAID!

AndyM

Well-Known Member
Reaction score
339
Location
Peterborough, UK
Just a project that I had in recently, but I thought I'd share.

Dell T110 server running Server2008 Foundation x64. RAID 1 on an S100 controller.

For the life of me, I could not get this server to boot. Tried using an MSDaRT Win 7 x64 CD (Server 2008 DaRT CD is something I yet have to create), but it wouldn't see the drives. Could not download the driver from the Dell website - it's simply not listed. So one thing led to another, and I removed one of the drives to slave into my tech PC. Recognised without any problem, but no partitions - strange :confused: Anyway, as it was RAID 1, and all I need is the data (the server is not in a production environment anymore), I thought I'd just power the server on with a broken RAID1 array. It booted! Performed a disc image, and now the image and the server are safe. Haven't bothered to run diags on the drive that was removed, as I know I won't need to revisit this project for at least a couple of months, and as said, I only need the data, not a working server.

Andy
 
Had the same thing with a crappy Acer server running Intel fake Raid1. The Server was having performance issues mainly disks read/write speed. Anyway all of a sudden it stopped booting and just hung or crashed on startup. I broke the Raid and it booted fine, turned out disk 2 was failing.
 
Last edited:
Yeah, "Breaking the mirror" was a pretty common need with software RAID...when done within the OS. Had to do that plenty of times.

Seen messed up RAID on fake-RAID controllers quite often also....one of the many reasons I hate entry level RAID controllers with SATA drives on servers.
 
The above mentioned process may have worked, but it was not the safe approach when it comes to protecting the client's data. If your client's data isn't worth the time to play it safe, then knock your socks off, but be sure to have them sign a waiver saying that the data on the drives is not worth more than $300....this way, if and when thing go south, they can't come back and sue you for the cost of the data and the cost of downtime.

Anyway, assuming that the data has any value at all, you should have removed and mirrored both drives with ddrescue (or something better that can handle bad sectors), allowing you to get a full backup and testing every sector on both drives. Then, once you get the clones, check the file system on each clone and determine which drive was last online and determine how long far out of sync they drives are. Once you know the condition of the drives and which one has the most current data, proceed as needed.
 
What are your favorite hardware RAID controllers? is the Perc H370 any good? I get hard that I think is having problems mirroring... That comes to ask, what are the symptoms of a bad RAID controller?
 
What are your favorite hardware RAID controllers? is the Perc H370 any good? I get hard that I think is having problems mirroring... That comes to ask, what are the symptoms of a bad RAID controller?

Dells mid to upper range ones, and my favorite ones...HPs SmartArray controllers...those IMO are the best.

RAID controllers themselves rarely go bad...there are many weird symptoms that you'll never come across again...so hard to make a list of typical symptoms. 99% of the time it's a drive that tanks. And with a hardware RAID controller..you have no need to break a sweat...you just swap out the failed drive with a fresh one and let her rebuild. Or if the server is acting up, as Andy did, just pull the tanked drive...and the server will usually boot up no problem. With lesser grade RAID controllers sometimes a tanked drive will cause it to hang. Sometimes just reseating a drive will allow it to come back fine.
 
Dells mid to upper range ones, and my favorite ones...HPs SmartArray controllers...those IMO are the best.

RAID controllers themselves rarely go bad...there are many weird symptoms that you'll never come across again...so hard to make a list of typical symptoms. 99% of the time it's a drive that tanks. And with a hardware RAID controller..you have no need to break a sweat...you just swap out the failed drive with a fresh one and let her rebuild. Or if the server is acting up, as Andy did, just pull the tanked drive...and the server will usually boot up no problem. With lesser grade RAID controllers sometimes a tanked drive will cause it to hang. Sometimes just reseating a drive will allow it to come back fine.
NEVER just swap out a failed drive on a RAID controller (software or hardware) without confirming that the data is 100% backed up and readable. I get a huge amount of RAID recoveries in where the rebuild of the one drive causes another drive to go offline when it hits a bad sector on that other drive. In order to rebuild a failed drive, every sector on every drive must be read, the sectors XOR'd and that data written to the new drive. If one sector is bad on another drive, you just caused the array to go offline and potentially start the a downward spiral of serious data loss.

That said, I don't have a preference between Dell, HP or IBM...though, Dell and IBM are usually a lot easier to recover. Although the idea of 2 drive redundancy is nice, RAID 6 is much more complex to recover from and I don't recommend it either.
 
NEVER just swap out a failed drive on a RAID controller (software or hardware) without confirming that the data is 100% backed up and readable.

Actually, there was a customer backup prior to doing this. If everything would have failed, and their backup wasn't readable, I would just have taken it to Sean at PCImage. Worth the 5 minute drive :)

Andy
 
Not my first year on the job...we can skip the recovery sales pitch...and look, I can type without yelling in caps.
Not a sales pitch. In fact, it is contrary to a sales pitch as I'm trying to ensure that my services aren't needed.

That said, I'm sure that you know and do check that the data is backed up first. However, it was not mentioned by you and your post looks like it isn't needed. You would be amazed at how many times I get botched RAIDs and the tech or user says, "I read it on the internet that it was safe."

So, I'll risk being offensive in hopes to help save users from unnecessary data loss.
 
Actually, there was a customer backup prior to doing this. If everything would have failed, and their backup wasn't readable, I would just have taken it to Sean at PCImage. Worth the 5 minute drive :)

Andy
Good on you for checking the backup. It is such a simple step, frequently ignored.
 
Getting back to the original problem... Are you sure it's a RAID 1 and not a RAID 0 configuration?

You'd be shocked at how often that mistake is make when the controllers are configured.

However some controllers to offset the start of the array volume to make room for metadata. So the MBR may actually be at sector 2048 or something like that, which Windows won't outright recognize.

If all you need back is the data, try scanning the drive with some data recover software like R-Studio and see if it discovers a file system.
 
Getting back to the original problem... Are you sure it's a RAID 1 and not a RAID 0 configuration?

Yes. If it was RAID 0, it wouldn't have booted from a single drive :)

If all you need back is the data, try scanning the drive with some data recover software like R-Studio and see if it discovers a file system.

Didn't need to do this. The system booted from a single drive, and I just did an image using Active Image Protector. Come to think of it, I should really take another image just in case, and it's all chargeable to the customer, so no worries there either :) But thanks for the tip on R-Studio - fantastic program :)

Andy
 
Good on you for checking the backup. It is such a simple step, frequently ignored.

In my younger days, I used to be heavily involved in D.R. testing for a major UK financial. IBM ES9000's, AS400's, DEC VAX and NT4 servers. One thing we knew - you can never have too many backups! D.R. Testing on the ES9000's used to take 5 days at a dedicated DR Centre, AS400's took 3 days, and the DEC VAX only took 2 days.

Andy
 
Back
Top