Problem with HP ProLiant ML150G6 - RAID, Backup

BadBoy House

Member
Reaction score
3
Hi all.

I've got a new client who has an HP ProLiant ML150G6 server running SBS 2008.

They've been having some RAID and Backup related issues.

I've not experienced RAID issues before on HP servers so any help on this would be greatly appreciated.

On Tuesday and Wednesday night this week their server crashed (I believe during the daily Windows Server Backup routine).

After rebooting the server further to Tuesday night's crash, the RAID details on first boot gave the following details:

B110i Sata Raid Controller
1786 -slot 0 - Drive Array Recovery Needed

The following disk drives need automatic data recovery rebuild. Port 1i - box 1 - bay 1

Select F1 for recovery of data to drives
Select F2 to continue without recovery of data to drives.

I selected F1 and the server booted back up fine. It was ok until the following night.

After rebooting the server further to Wednesday night's crash, the RAID details on first boot gave the following details:

HP Smart Array B110i SATA Controller (v1.10) 1 Logical Drives

1779 - Slot 0 Drive Array - Replacement Drive(s) detected or previously failed drive(s) now appear to be operational.

Port 1i - Box 1 - Bay 1
Port 2i - Box 1 - Bay 2

Logical drive(s) disabled due to possible data loss.

Press F1 to continue with logical drive(s) disabled
Press F2 to accept data loss and continue

I pressed F2 and the server booted up fine and as normal.

Last night (Thursday) night the server did not crash, however the daily Windows Server Backup failed approximately 80 minutes through the backup. They back up to external USB backup drives (WD Elements)

The error message that was given by the backup program was:

'Failed - Incorrect Function'

The event log provided the following error:

The Backup log reported the following error approximately 80 minutes into the daily backup:

Backup started at '27/09/2012 18:00:11' failed with following error code '2147942401'.

As I've done each day since these problems began, I ran the HP Array Diagnostic Utility and the HP Array Configuration Utility to see if any errors, warnings, or problems were detailed but I could find no problems at all.

Neither utility mentioned any errors - the Array Configuration Utility had no warnings, Array Status ok etc. There is nothing that I can see in either utility that would indicate any problems.

At this point I'm not sure whether the problem with the daily backup failing is because of a RAID issue or something else entirely. Unfortunately the server is out of warranty.

Any ideas on this?? I'm going to get them to change the backup drive to rule out a problem with the external backup drive being used.
 
Last edited:
Ugh.....painful server to take on for a client.

First...I do not believe in "100" series servers....from HP, or from Dell. Real servers start at the 300 class...like Proliant ML350. 100 series are just glorified desktops trying to run as a server. Onboard "fake RAID"...and SATA drives...hardware that has not business trying to run a server. I have just one or two exceptions for SATA drives on a server...but SBS is certainly not one of them.

OK rant over....I know, doesn't help you with your problem.

From experience of being put into similar issues such as you have...taking on a new client that has an underspec'd server...and running SBS....I'll make a few bullets for you to consider.

*Update BIOS of the server
*Update firmware of the RAID controller if this model allows that
*If this model has a Windows based RAID manager...update that.

Now...it looks like you have a single RAID 1 volume. SBS may be installed entirely on a C drive...or that volume may be split up into two partitions...C and D or C and E, whatever. I have a huge need for SBS to be installed across two different spindles. This means two different HDDs....or two different RAID volumes. A RAID 1 volume is still just 1 spindle. So the cheapest RAID setup to have two spindles is two RAID 1 volumes. Next ideal would be RAID 1 for the C drive and RAID 5 for the data...that is still two spindles. The reason is concurrent drive usage of two different partitions...performance goes up many times by having C and D across two different spindles.

I also dislike SATA...I'd say 90% of the server problems that are drive related that I have to deal with...they're on cheaper SATA drives, notably with fake RAID controllers. SBS is "very heavy" on the drives...so having it on a weaker drive setup...and you're more likely to frequently deal with issues.

Trying to sell the client a new server is not realistic. However, you can express to him the need to "beef up" the server to give more reliability. What I have done, when coming across weak servers like this...is upgrade the hard drives and RAID setup. You have a pair of drives in RAID 1. I'd purchase 4x new drives...and create two RAID 1 volumes. Get some good enterprise grade SATA drives like Western Digital Black Edition or better yet their RE series...with 64 megs of cache.
Replace the first drive of existing RAID 1...let it rebuild. The next day once confirming the mirror has rebuild, yank the second exiting drive, replace it..let the RAID 1 rebuild. Now take your third and fourth brand new drives...put them in...and create a brand new RAID 1 volume...so you now have two RAID 1 volumes.

I'm guessing that your SBS install is on two partitions...so clone that second partition to the new RAID 1 volume you made. Now your second partition will be on a second spindle. Ensure that the servers pagefile is windows managed and on BOTH C and D.

Run a checkdisk on both volumes.
Ensure that the antivirus has all of the correct exclusions in its real time file protection, and is setup correctly for small business server. I see this overlooked a lot..and it causes issues. SBS has a HUGE long list of files/directories to exclude...as well as slim down the file extension types to scan. Most antivirus installs don't do all of this correctly by default...and many techs miss this. I have a thread around here listing those settings.

Your server should perform quite a bit better now. And be less likely to encounter drive/RAID related issues. My hunch is that your backup issues are related to VSS issues...and that came from probably file corruption from the crash. You can just run a checkdisk and defrag and it will probably go away...and then return in a few months. Or...you can go through the steps above to do you best to ensure it doesn't come back.
 
Thanks for replying.

I'm the same - I tend to stick with Dell servers if I'm supplying - T410's normally.

Anyway,

It's a single raid volume and everything is installed on the C: drive. One big partition. Not sure why it was not split between C: and D: drive when originally installed.

They're only a very small business - 4 workstations plus the server.

No new RAID volumes have been made. Up to this point it's all been F1 or F2 at start-up to get the o/s to load.

The fact that the HP Configuration and Diagnostics Utilities have not reported or shown any errors whatsoever makes me doubt there is an actual problem with the RAID array.


I need to rule out the external backup drive first - I'll get the drive changed over with the other one they use and let a backup run over night.

If it still fails then it must be an issue with the drives somewhere. Is it safe to run a checkdisk on a drive that's part of a RAID array?

thanks again.
 
I've got an old ML 3 series in the office I retired from a client several years ago. For the last year it was in production, you had to hit the F2 on every boot. Finicky, but it worked ok.
 
The alerts could be false....and sometimes updating the firmware on RAID controllers fixes those false "cries wolf". Like there was a bug. Tis why a firmware check was one of my suggestions.
Also with many of the servers you can update the firmware on them drives themselves.

You can change SBS to utilize a second volume...move the infostore and user shares/redirects and apps shares over to it after you've added a drive.

Checkdisk on RAID is fine. The OS doesn't know it's on multiple drives, the idea of a hardware RAID controller is to present a single drive to the OS. The RAID controller doesn't now or care what OS it's hosting. The OS doesn't care what/how many drives are put together to make that RAID volume..it just sees one big hard drive and treats it as such. Yes even "fake RAID".

It's best to run hardware tools and diags that the RAID controller has first...such as consistency checks, etc.
 
The server hasn't crashed for the last few days but I've pinpointed the problems to the backup routine that runs every night at 7PM. The backup has been failing each night pretty much at the start of the backup.

The errors generated are as follows:-

Error reported by Windows Server backup:
Failed - Creating the shared protection point on the source volumes failed.
Detailed Error: The volume shadow copy operation failed with error 0x80042306.
The volume of the backup was stopped before the backup started running.

Errors reported by Event Viewer Backup Log:
Event ID 9:
Failed - Creating the shared protection point on the source volumes failed.
Detailed Error: The volume shadow copy operation failed with error 0x80042306.

Errors reported by Event Viewer System Log:
Event ID 28:
The Shadow Copy of volume C: could not be created due to a failure in creating the necessary on disk structures.


This seems to point towards a problem with the shadow copy process.
 
I ran VSSADMIN List Writers from command prompt.

A number of entries have an error state.

Any ideas??


vssadmin 1.1 - Volume Shadow Copy Service administrative command-line tool
(C) Copyright 2001-2005 Microsoft Corp.

Writer name: 'System Writer'
Writer Id: {e8132975-6f93-4464-a53e-1050253ae220}
Writer Instance Id: {06c1536d-5f63-4dcc-a47f-7d37b2998200}
State: [7] Failed
Last error: No error

Writer name: 'FRS Writer'
Writer Id: {d76f5a28-3092-4589-ba48-2958fb88ce29}
Writer Instance Id: {5ddb0aed-129e-4219-8372-b8384aafaa11}
State: [7] Failed
Last error: No error

Writer name: 'SqlServerWriter'
Writer Id: {a65faa63-5ea8-4ebc-9dbd-a0c4db26912a}
Writer Instance Id: {1d2e555c-a9de-40a3-9f6a-298e197f8c83}
State: [7] Failed
Last error: No error

Writer name: 'SharePoint Services Writer'
Writer Id: {c2f52614-5e53-4858-a589-38eeb25c6184}
Writer Instance Id: {3b6cfa18-ffa7-4909-ba0e-fae0ccded1c3}
State: [7] Failed
Last error: No error

Writer name: 'ASR Writer'
Writer Id: {be000cbe-11fe-4426-9c58-531aa6355fc4}
Writer Instance Id: {8b756676-9a74-453f-881d-5a5c9fd17023}
State: [1] Stable
Last error: No error

Writer name: 'FSRM Writer'
Writer Id: {12ce4370-5bb7-4c58-a76a-e5d5097e3674}
Writer Instance Id: {e91b36fe-33f5-4245-8796-ac3558c6714e}
State: [7] Failed
Last error: No error

Writer name: 'Microsoft Exchange Writer'
Writer Id: {76fe1ac4-15f7-4bcd-987e-8e1acb462fb7}
Writer Instance Id: {14ea807e-6057-49db-8350-4e7b7446e2db}
State: [7] Failed
Last error: No error

Writer name: 'TS Gateway Writer'
Writer Id: {368753ec-572e-4fc7-b4b9-ccd9bdc624cb}
Writer Instance Id: {9295cea8-b033-4615-ad2a-4186029955a8}
State: [7] Failed
Last error: No error

Writer name: 'SPSearch VSS Writer'
Writer Id: {57af97e4-4a76-4ace-a756-d11e8f0294c7}
Writer Instance Id: {9dc80630-98bd-4ce8-90df-90dd065ef9e8}
State: [7] Failed
Last error: No error

Writer name: 'IIS Metabase Writer'
Writer Id: {59b1f0cf-90ef-465f-9609-6ca8b2938366}
Writer Instance Id: {6b9decb3-3e3f-44e5-9ae6-b3f77292e2ae}
State: [7] Failed
Last error: No error

Writer name: 'BITS Writer'
Writer Id: {4969d978-be47-48b0-b100-f328f07ac1e0}
Writer Instance Id: {43b49b25-af5d-44d7-8105-f0d9027f86e6}
State: [1] Stable
Last error: No error

Writer name: 'IIS Config Writer'
Writer Id: {2a40fd15-dfca-4aa8-a654-1f8c654603f6}
Writer Instance Id: {62bbf8c8-c9cb-4b1d-a91d-0ef7c894a2d9}
State: [7] Failed
Last error: No error

Writer name: 'NPS VSS Writer'
Writer Id: {35e81631-13e1-48db-97fc-d5bc721bb18a}
Writer Instance Id: {50b9b650-e7f6-4ed4-b5da-cea33e466625}
State: [7] Failed
Last error: No error

Writer name: 'WMI Writer'
Writer Id: {a6ad56c2-b509-4e6c-bb19-49d8f43532f0}
Writer Instance Id: {a1c49899-649e-401a-ad2a-1f2d33ccbe1a}
State: [7] Failed
Last error: No error

Writer name: 'Certificate Authority'
Writer Id: {6f5b15b5-da24-4d88-b737-63063e3a1f86}
Writer Instance Id: {729876a8-a1d4-4273-a688-2f1999dc4ec9}
State: [7] Failed
Last error: No error

Writer name: 'Shadow Copy Optimization Writer'
Writer Id: {4dc3bdd4-ab48-4d07-adb0-3bee2926fd7f}
Writer Instance Id: {73af9271-df46-4eb0-8295-42ba6b9f45b5}
State: [1] Stable
Last error: No error

Writer name: 'Registry Writer'
Writer Id: {afbab4a2-367d-4d15-a586-71dbb18f8485}
Writer Instance Id: {828b1f47-6ec2-41f5-bb95-0eeb90fc8a7a}
State: [1] Stable
Last error: No error

Writer name: 'COM+ REGDB Writer'
Writer Id: {542da469-d3e1-473c-9f4f-7847f01fc64f}
Writer Instance Id: {bf46d207-f4df-46a4-9ccb-2846e43b280f}
State: [1] Stable
Last error: No error

Writer name: 'Dhcp Jet Writer'
Writer Id: {be9ac81e-3619-421f-920f-4c6fea9e93ad}
Writer Instance Id: {9e1e61b0-4ffc-494a-aa17-98437c693fe9}
State: [7] Failed
Last error: No error

Writer name: 'NTDS'
Writer Id: {b2014c9e-8711-4c5c-a5a9-3cf384484757}
Writer Instance Id: {56611b9c-d217-4620-98e4-451f9978e131}
State: [7] Failed
Last error: No error
 
That's fine - I will tell it to run a chksk /r on next reboot and get them to begin the restart when they finish tonight.

Does the chkdsk /r routine save a log anywhere when you set it to run on boot up?
 
chkdsk didn't make any difference - no errors found.

the next backup failed - I've posted the errors below. Looks like I might have to call Microsoft server support £$.

Event Log:
Backup started at 02/10/2012 failed with the following error code '2147942401'

Windows Server Backup Log:
Status: Failed – Incorrect Function


Application Log:
Log Name: Application
Source: Microsoft-Windows-Backup
Date: 02/10/2012 20:10:43
Event ID: 517
Task Category: None
Level: Error
Keywords:
User: SYSTEM
Computer: (their server name)
Description:
The description for Event ID 517 from source Microsoft-Windows-Backup cannot be found. Either the component that raises this event is not installed on your local computer or the installation is corrupted. You can install or repair the component on the local computer.

If the event originated on another computer, the display information had to be saved with the event.

The following information was included with the event:

EV_RenderedValue_0.00
2147942401
%%2147942401

The locale specific resource for the desired message is not present
 
That......is an odd error I've not seen. The errors you put on the first page of this thread...I've seen those plenty of times. But wow..not this last one.

I've seen a script Microsoft support once used on a server I had them working on remotely, which basically reinstalls all of the shadow copy components and backup components.

Perhaps you can find that script around, or the commands of it just to type in at command prompt yourself....may be worth a shot.
 
Yeah it's not one I've seen before either.

I'm going to give chkdsk /r another go - just to be doubly sure that there are no errors to be fixed.

Failing that I'll be calling MS.
 
RAID 1 can still let a server hang. The RAID card usually on checks checksum on writes, not reads. If there is a bad spot on one drive, it could be getting bad read data and sending it to the OS.

SATA drives and RAID - I just had a "discussion" with HP about this, I could not get a tech that understood the difference between error handling on a consumer drive versus an enterprise drive. Enterprise class drives will not try for 30-60 seconds to read a bad spot on a drive(TLER Time limited error recovery), they report the error to the controller and let the controller decide the best action. A consumer drive will sit there and try to read and try to read the data, which does not make RAID controllers happy.

What kind of drives are you using for the windows backup? I have learned it is VERY picky about drives/enclosures. I've had 2 servers that gave me absolute fits until I replaced the backup drives and suddenly it worked like a charm.

Good luck, sounds like a fun one.
 
Cheers bluecoast.

The backup drives are Western Digital Element. Although today I tried a Freecom 250GB drive and it failed with this also.

It's certainly a fun one - I treat problems like this as puzzles that just need figuring out although with this one I'm about out of things to try. Hopefully MS will be able to help!
 
I got the chance to run chkdsk /r on the hard drive yesterday and left running over night.

All results ok - no errors, bad sectors reported in any of the 5 stages.

HOWEVER the following message was shown after results of stage 5:

THE SECOND NTFS BOOT SECTOR IS UNWRITEABLE

I'm sure I've read that this area is somewhere that the Windows Server Backup program needs access to - which would explain why the backup keeps failing.

I'm probably going to have to call MS about this one.

Any ideas?
 
I've not experienced RAID issues before on HP servers so any help on this would be greatly appreciated.
 
Back
Top