Supermicro server X11SPL-F ~1 RAID OS drive failed

Rigo

Active Member
Reaction score
166
Location
Australia
One of the OS SSDs failed completely, the one in the first bay is also slightly degraded but probably ok.
I created a clone of the still good one but that failed to load the OS with "Reboot and select proper boot device" error message.
I can see that the drive is detected during POST.
Assume it's a RAID issue.
Put back the original SSD1 still get the same error.
Reading from: https://www.supermicro.com/support/...add the,is built, the red LED should turn off.
It appears the RAID could rebuild when hot-plugging the replacement drive, I assume the OS needs to be running for that to happen?
Or do I plug the replacement drive after the POST failing to load the OS drive?
Or some other ways, open to suggestions
Thx folks
 
I'm pretty sure on my Dell servers they rebuild at the BIOS level but they have real RAID cards. Meaning the RAID card itself does the work. Looking at the specs for the board indicates the RAID is handled by onboard C621 chipset. That means it's most likely a software RAID so the OS does the heavy lifting. If you can't boot to the OS you can't do anything. Assuming, we all know what that means, they are mirrored you might be able to repair the OS. Which OS?
 
I'm pretty sure on my Dell servers they rebuild at the BIOS level but they have real RAID cards. Meaning the RAID card itself does the work. Looking at the specs for the board indicates the RAID is handled by onboard C621 chipset. That means it's most likely a software RAID so the OS does the heavy lifting. If you can't boot to the OS you can't do anything. Assuming, we all know what that means, they are mirrored you might be able to repair the OS. Which OS?
Might get an answer about the OS 😏
 
If there are only 2 drives, then your choices are RAID 1 or RAID 0. If it's RAID 1 because it's software RAID, you'll have to get the OS working to rebuild the array like @Markverhyden said. If it's RAID 0, then you're SOL. Do they have backups?
 
Bold to assume it had any RAID at all. Two drives in the system could be two separate disks, and one of them is just bad now.
 
Bold to assume it had any RAID at all. Two drives in the system could be two separate disks, and one of them is just bad now.
Much depends on the OS. If it's MS it should automatically detect the soft RAID and show the options. If it's any kind of *nix it won't do that automatically. When I looked that mobo it indicated it was used in storage servers. Those do use the C621 for the 2 rear SATA drives which is most likely the ones in question. But I do know what assume spells. LOL
 
Last edited:
Much depends on the OS. If it's MS it should automatically detect the soft RAID and show the options. If it's any kind of *nix it won't do that automatically. When I looked that mobo it indicated it was used in storage servers. Those do use the C621 for the 2 rear SATA drives which is most likely the ones in question.
The above link indicates hardware level rebuilding, not software rebuilding.

So the question boils down to whatever the RAID utility reports when accessed...

If it was a software mirror, the mainboard won't see anything, and the only requirement to booting the platform is at worst... removing the faulted drive so the EFI/BIOS boot order will settle on the remaining disk.

Then you'd attach a new disk, and use disk management to import it into the array and watch Windows rebuild it.

But that doesn't appear to be the case here, the X11SPL-F mainboard as seen here: https://www.supermicro.com/en/products/motherboard/x11spl-f

That mainboard reports the Intel C621 controller that can do RAID 0,1,5, and 10. If I recall correctly these controllers were soft RAID, where the driver configured Windows do to the lifting. There should be some utility in the BIOS to control its behavior.

BUT that's where in my mind RAID was never configured, because again if it was a RAID 1, and a single drive failed, this chipset would simply disable the dead drive, and the platform would be booting to the 2nd disk. The fact that it doesn't boot tells me the drives were never configured in a redundant way, and the server is dead.

Chipset is 7 years old... and the platform has already consumed an SSD. I question if it's young enough to be worth fixing too.
 
Last edited:
The above link indicates hardware level rebuilding, not software rebuilding.

BUT that's where in my mind RAID was never configured, because again if it was a RAID 1, and a single drive failed, this chipset would simply disable the dead drive, and the platform would be booting to the 2nd disk. The fact that it doesn't boot tells me the drives were never configured in a redundant way, and the server is dead.
@Rigo's original outside link points to a totally different server and motherboard. That post references a Super Server 6023P-8R which supposedly uses a X5DP8-G2 dual Xeon processor mainboard. https://www.supermicro.com/manuals/superserver/2U/MNL-0688.pdf. Not to mention the post is dated '05.

But your second point is very valid. I've seen many times where a software RAID 0 boot drive has one failed drive and still boots to the OS. But he did not indicate what he saw, if anything, when he mirrored the drive.
 
But your second point is very valid. I've seen many times where a software RAID 0 boot drive has one failed drive and still boots to the OS. But he did not indicate what he saw, if anything, when he mirrored the drive.
Yep, and therefore we're stuck. The inconsistency of provided details aside, we don't know what RAID level was configured, if a RAID level was configured at all, nor what the working drive configuration was on this platform prior to fault.

I really hope I'm wrong, but my gut is positively screaming... get the backups... she's dead Jim!
 
Thank you all for the inputs folks.
I got a little bit more info in pics.
There were a total of 8 drives installed
2xSATA 240GB SSDs - OS I was told
6xSAS HDDs - I assume storage.
There's a backup drive with a .nbd file but still trying to find out what was getting backed up.
 

Attachments

  • Pic1.jpg
    Pic1.jpg
    35 KB · Views: 9
  • Pic2.jpg
    Pic2.jpg
    68.7 KB · Views: 9
  • Pic3.jpg
    Pic3.jpg
    39.3 KB · Views: 8
  • Pic4.jpg
    Pic4.jpg
    125.6 KB · Views: 9
Last edited:
The screen grab of the RAID configuration utility is only indicating 4 drives, as two mirrors are configured. One ~500gb, the other ~5TB.

Drive Group 0 showing as "RAID 1" is really odd... it would make more sense for that to be RAID 10, or RAID 5. I wonder if it's reporting RAID 1, but then doing RAID 0 separately? That is valid too... but again odd.

But you did confirm that's a MEGARAID controller, which means the hardware is controlling the rebuild process. 100% of your issue is defined within the first screen shot if a drive failure has happened. Sadly, this is indicative of a multiple drive failure...
 
If the case would that be correctable?
No, that would be by design.

RAID 10 is RAID 1 (mirror) that is RAID 0'd (striped).

So assuming that there's 4 drives in a RAID 10 configuration, it's technically correct to report two RAID 1s, that are in turn RAID 0'd together... because that's what RAID 10 is. And why in some old literature you'll see it referred to as RAID 1+0. For that matter TECHNICALLY RAID 0+1 is possible too. Though why you'd do that... makes my head hurt.

Anyway... with a RAID 10 Array, you're still OK as long as you don't lose both halves of the two mirrors.

But if the thing won't BOOT, that means the boot mirror is having issues, that's the larger concern. That should be the two smaller drives... BUT you said 2x240gb disks... and that screen clearly reports ~500gb. But it also reports RAID 1... that's very confusing.
 
In the original setup yes
The pic was taken probably after the complete death of one of these.
Which would maybe negate the other one showing up as that raid no longer exist?
Drive group 1 is mirrored so it can't be the boot drives. Can't get 500gb from two mirrored 240gb drives. I'd bet the 240's were installed in the back which means they should be using the C621 soft RAID for the boot drives.
 
@Rigo, @Markverhyden brings up a great point I hadn't considered... I haven't worked with the C621 specifically, and it's entirely possible that the platform has TWO RAID controllers. The Intel and the MegaRAID.

Does the system have a RAID expansion card in it? You'll have to open the case and find the wires connecting all the drives to either the mainboard or a card and see how many devices you're working with.
 
Would live ejecting the still running only OS drive corrupt the RAID?
Apparently the end-user did so when the server started throwing warning beeps about the failed drive.
To achieve what, that's the question 🤔
 
Back
Top