Raid failing on ML350p?

freedomit

Well-Known Member
Reaction score
200
I have a HP ML350p Gen8 that partially crashed last night. The Server has Server 2012 installed with only the HyperV role and runs 3 guests. Last night at 2am I got an email from my rmm saying no data received from the HOST, at 8am i also got the same message from one of the guests, the other two guests were still running fine. I could not ping or remote connect to the crashed host and guest so i shutdown the working VM's and power cycled the Server, everything is now working as normal.

So checking the event logs the host and guest have both crashed as nothing in either the event logs since they went offline. The host has loads of warning messages before the crash...

Event ID: 129
Source: HPCISSs2
Reset to device, \Device\RaidPort1, was issued

Event ID: 153
Source: disk
The IO operation at logical block address 0x88b16f0 for Disk 0 was retried.

The event 129 is always the same but the 153 references different disks and blocks that are part of different arrays. I have run the HP Insight Diags which all passed and since rebooting the logs have been clean.

Any idea what happened?
 
Since it's talking "logical" disks....sounds like it's referencing a RAID volume. And since it's talking about different disks...makes me think higher up the chain than just physical disks.
The cable bridle that connects 'tween the RAID controller and the drive bay
The drive cage itself
The RAID controller itself

May want to update firmware on the RAID controller
Since she's a Gen8...still under support..no?

What does the RAID management utility show when you launch that? Assuming you've got her booted up again. If not...go in through iLO.
 
Since it's talking "logical" disks....sounds like it's referencing a RAID volume. And since it's talking about different disks...makes me think higher up the chain than just physical disks.
The cable bridle that connects 'tween the RAID controller and the drive bay
The drive cage itself
The RAID controller itself

May want to update firmware on the RAID controller
Since she's a Gen8...still under support..no?

What does the RAID management utility show when you launch that? Assuming you've got her booted up again. If not...go in through iLO.

Thanks for your reply

Im just downloading the latest HP Service Pack and will install that over the weekend. Yep Server is still under support, just wanted to see what others thought before calling HP. At the moment its stable and running fine, no further events in the logs and have adjust our RMM to alert if any pop up. Raid Management Ultil is showing everything as normal.
 
So yeah they'll probably want you to:
*Update BIOS on the Proliant
*Update firmware on the RAID controller
*Update drivers and management utility for the RAID controller
*Update firmware for the HDDs
*Reseat RAID controller, battery cache pack, and the cable bridle from the RAID controller to the HDD cage
*Reseat HDDs

I'd be pretty much demanding a new RAID controller, cables, and HDD cage. Don't see HP RAID controller issues much....but it's not something you want to have repeat a few more times. If the RAID controller isn't doing its job and writing data to disk properly on rude shutdowns...(which they are normally the best at)...you do risk corruption.
 
Since updating the Server with the latest HP Proliant Service Pack i haven't seen this issue again, fingers crossed it was just a one off bug.
 
Back
Top