Troubleshooting an intermittently rebooting server ...

thecomputerguy

Well-Known Member
Reaction score
1,453
I have a client who against all advice decided to buy himself a piece of garbage server to run his insurance organization with 10 employees. What I recommended to him was upwards of $6000-$7000, he settled on a low low end Dell server with 8GB RAM and 2x500GB HD's in a RAID1 for usage as a AD/DC/DNS, Folder Redirection, File sharing, Print sharing & a couple of small databases.

I told him I'd still install whatever he bought but I didn't realize he was going to skimp so hard. Either way I kept my promise and installed it.

About 2 years later now the server just randomly reboots. I asked him what happens that makes him know the server rebooted and he says that the internet goes down, and occasionally everything on his desktop disappears (folder redirection). The server is randomly rebooting between 2-5 times a day, even in the middle of the night at 2AM when no one is using anything.

So I hop into the logs and see that yes the server is rebooting, and after filtering the logs to show critical, error, and warning, what I'm left with is basically ...

A bunch of:
Log size is full
Log type: ESM

A whole bunch of:
Kernal power criticals, the server was not shutdown properly
Previous shutdown was unexpected at XX:XX

There are some Disk Errors but they are directed at DR2 which I'm pretty sure is an external drive since there are only 3 total drives in the system

DR0 & DR1 should be the RAID1
DR2 Should be the External

Other than that ... nothing else shows up in the log that would identify why it's rebooting.

Dell OpenManage reports that the server is healthy.

Any hints?

I've thought about manually changing the DNS so that the workstations aren't using the Server for DNS to keep the internet going, but I'm not sure how that would affect login times. Then theres the folder redirection issues with a rebooting server.

They said they might just continue to deal with it because I told them I might have to just reformat the server and start from scratch which is going to be pretty expensive, I'd pop new drives in, or buy a new server, they just don't want to spend significant money on the tools that make them money, crazy crazy crazy.
 
If it's an overburdened server (8 gigs and single SATA based RAID volume will be).....DNS will not respond fast enough for the workstations when they have an alternate (secondary) DNS server so they will end up querying to public DNS..and active directory will be broken. The workstations wait only like 3 milliseconds from a response from the primary DNS before they turn and ask the secondary DNS. So primary DNS rarely gets a change to do its job when people add public DNS servers as secondary. That's one of the reasons doing that is poor practice.

Reboots out of the blue...gotta be hardware based. Get rid of external peripherals for a while...just to rule out.
Don't feel bad and volunteer your time because this guy squeezes a nickel so hard he makes the buffalo cry. Get Dell support involved right away. If it's out of warranty (cuz he got a cheap 1x year warranty model 13 or more months ago)...purchase an extended warranty.
And go through the routine...update latest BIOS, firmware, drivers, and then run the diag tools. Dunno which model/gen server you had but for example...some here. http://www.dell.com/support/article...-troubleshooting-on-poweredge-servers?lang=EN
 
If it's an overburdened server (8 gigs and single SATA based RAID volume will be).....DNS will not respond fast enough for the workstations when they have an alternate (secondary) DNS server so they will end up querying to public DNS..and active directory will be broken. The workstations wait only like 3 milliseconds from a response from the primary DNS before they turn and ask the secondary DNS. So primary DNS rarely gets a change to do its job when people add public DNS servers as secondary. That's one of the reasons doing that is poor practice.

Reboots out of the blue...gotta be hardware based. Get rid of external peripherals for a while...just to rule out.
Don't feel bad and volunteer your time because this guy squeezes a nickel so hard he makes the buffalo cry. Get Dell support involved right away. If it's out of warranty (cuz he got a cheap 1x year warranty model 13 or more months ago)...purchase an extended warranty.
And go through the routine...update latest BIOS, firmware, drivers, and then run the diag tools. Dunno which model/gen server you had but for example...some here. http://www.dell.com/support/article...-troubleshooting-on-poweredge-servers?lang=EN

Yeah I'm thinking the same thing ... thankfully no one uses it, but it started happening out of the blue a few months ago. To call it overburdened would be an understatement.
 
First thing I'd do is swap out the RAM. If that doesn't fix it, go from there with stress testing, memory diagnostics, etc. to see what makes it fail. Given that it's failing only 2-5x/day, probably best to have it in shop for that.
 
First thing I'd do is swap out the RAM. If that doesn't fix it, go from there with stress testing, memory diagnostics, etc. to see what makes it fail. Given that it's failing only 2-5x/day, probably best to have it in shop for that.

Yeah except they would be non-operational without it onsite....
 
They need to be told how bad these sudden reboots are for the server, and that corruption WILL occur. Could be just plain files they store, but they could also be running some shared LOB app, or accounting software.
Multiple hard reboots (basically pulling the power cord from the server) will corrupt something. Just a matter of "when".

How is their backup?
Are they prepared to run for a period of time without the server WHEN it does eventually get to the point of not being able to boot up? Missing \system32\config or something like that will likely be showing up on the monitor of the server any day now....
 
It's been mentioned by others but I would also be leaning towards hardware such as RAM or PSU. I would not budge for anything less than 16GB RAM in that rig anyway, so I would push to get that installed and see how it goes. Also, is this hooked up to a UPS?
 
It's been mentioned by others but I would also be leaning towards hardware such as RAM or PSU. I would not budge for anything less than 16GB RAM in that rig anyway, so I would push to get that installed and see how it goes. Also, is this hooked up to a UPS?

Yep but its just a basic BestBuy residential grade home UPS ($120) or so. So are you all saying I'm supposed to have a server laying around that I can Virtualize this to?
 
Yep but its just a basic BestBuy residential grade home UPS ($120) or so. So are you all saying I'm supposed to have a server laying around that I can Virtualize this to?
Oof, with so much crappy stuff... I would add that to the list of potentials. So my gut feeling would be RAM, PSU, or UPS now.

Regarding the server... generally I'll keep whatever the latest generation server that I've replaced (tear out the old stuff at a clients location) hanging around for just in case. I have a Dell R410 in the shop on standby, and it didn't cost me anything. Fortunately I've only had to bring a loaner in once, and that was actually just a dinky desktop. To be honest, in a pinch, a decent business grade workstation can do the job of a hypervisor just fine. It should only be in there for a short period of time, and you should be using a nice snapshot backup system to a spare disk while it is. Hell I even have Hyper-V installed on my ThinkPad for those really big "I NEED TO GET SOMETHING WORKING NOW" scenarios.
 
This is saying Duff RAM to me - and given that it's one of the easiest things to check, suggest you at least rule it out as a first step.
 
Yep but its just a basic BestBuy residential grade home UPS ($120) or so. So are you all saying I'm supposed to have a server laying around that I can Virtualize this to?

There's another thing to look at. I've seen "too small of a battery UPS for the load" keep flipping out and and causing reboots. Seen this quite a few times.
 
So are you all saying I'm supposed to have a server laying around that I can Virtualize this to?
Yes. One of the reasons you charge a whole lot more for server work is to cover that kind of expense. Everybody says that their computer is top priority. Until they get the bill. Most will never spend that cost for a mere workstation they will for the server when they realize that not having it shutters the business until it is repaired. I have an old Dell PE2900 that I can deploy in a pinch for situations like that.
 
Last edited:
Back
Top