How to screw up with class - :-)

HCHTech

Well-Known Member
Reaction score
3,824
Location
Pittsburgh, PA - USA
I had a another "learning opportunity" yesterday - haha. I have a very busy Chiropractor's office as a client. They are a tricky client because they are open until 7pm 6 days a week, so the only time available to work on things is way after-hours or Sundays. I don't love this bit, but they are a good customer, pay their bills on time and don't hold the fact that I don't believe in Chiropractic against me (too much).

So, they have 2 servers, a domain controller and a SQL server for their practice management application. Unfortunately, they were not "created" together, so they are about 3 years different in age. As a result when it's time to replace the DC, the SQL server is only 3 years old. The last time we replaced the DC in late 2019, we pitched the idea of doing a HyperV host with only a single guest (the DC), but with enough horsepower to hold a 2nd guest (the SQL server) when the time next came for that box's replacement.

They thought this was a good idea (!) but since they were also buying about 15 workstations at the time, wanted to hold off on the actual bits necessary to provide that extra horsepower at the time of initial purchase. I shouldn't have let them do this, but well, I did. So the plan was to set it up, then in 3 years, purchase and install more RAM, a 2nd Processor, and more disk space; then build the 2nd VM for the SQL server when it's time was up in 2022.

Then the pandemic happened.....which resulted in a year's wait to migrate the SQL server. Not to mention that the old SQL server is on 2012, so EOL is here. The HyperV host is luckily on Server 2019, but is now 4 years old so realistically it should be replaced in 2 years. Unluckily, during 2019 I was still building servers, so I'm now stuck maintaining a box (all Intel parts, still have warranty for 1 more year) that is no longer in production.

After finally getting approval to go forward with this plan last month, I purchase the required RAM, CPU and SSDs. The stupid heat sink for that Xeon Silver was the hardest to get, 3 weeks waiting on that. Finally everything is in and I schedule a Sunday afternoon to go install the parts - I hate doing this stuff in situ, but since everything has to be back running by Monday morning, I didn't really have a choice.

I get all of the bits installed, get the server back into the rack, hit the power button and say a little prayer to the tech universe. Ugh - no go. Starts to spin up, then we're down again; no messages on the monitor. BMC says "power issue". Not a failed supply, just "issue". I dig back through the MB manual to make sure the existing dual PS setup will support dual CPUs - it says yes. I double check the RAM was mounted in the right slots, and the 2nd CPU power cable was connected all the way, nothing wrong there. I reseat both hotswap power supplies, no good. Damn. Maybe if I try just one more time. Double-Damn.

I drag the thing back out of the rack, don't see anything wrong. I pull the CPU back out and there it is. 3 bent pins along one edge of the socket. I have no idea how that could have happened. The CPU gets mounted to the heatsink first, using a fixed-in-place plastic guide for alignment. There are 1.25" long alignment pins on two corners of the CPU socket that the heatsink slides over - it only fits one way - there just isn't any way to do it wrong. Well, I mean apparently there is, and I must have found it.

The socket has had it's little plastic cover on it right from the start since we only mounted one CPU when this box was built in 2019. So either it was a manufacturing issue or something I did. I put the socket cover back on and remounted the RAM so it was all applied to the first CPU and what do you know, the box fired right up again. All of the RAM is detected, as well as the new storage drives. I created the new array and started it formatting before leaving for the day (after confirming that the DC was still doing its job correctly, of course).

I spent the rest of the afternoon thinking about how to break this news to the client. I already knew what I was going to do, but I always have to spend some penance worrying about things like this first.

I wrote a pretty-comprehensive email to the 3 doctors who own the practice this morning, laying out what happened and that the problem must have been my fault despite my precautions & careful effort. Unfortunately, that motherboard is no longer sold, and since this was post-manufacturing damage, I cannot get a warranty replacement. The retail on the board when it was last sold was about $800 - I did ask Intel this morning whether they would just sell me one of their remaining stock, but they didn't think that would be possible. Amazon scalpers have one or two at double the retail price, but I'm not keen to take that risk either.

I believe we can proceed with the project, but we'll just have to make due with the single existing processor. I proposed this in my email of contrition to the client this morning as well. To my surprise, they responded positively - thanking me for my honesty and basically saying "these things happen". I just hope the performance ends up being acceptable. Time will tell.

Anyway, that's my story. Mark's rules for business: 1. Never lie 2. When you screw up, take ownership 3. Be prepared to buy your mistake. Oh, and probably 4. Don't build your own servers. TN already convinced me of this in 2020, but I still have 3 or 4 of them out there, can't wait until that number falls to zero.
 
#5 check new equipment upgrade slots for damage on delivery instead of at time of upgrade

Like you said it could have been shipped like that since it was never checked.
 
Ebay perhaps? Lots of good reputation & professional sellers there...
Perhaps - those good reputation & professional sellers are joined by just as many (says I) of poor reputation & non-professional sellers, unfortunately. I'll guess I'll cross that bridge if I have to. I'll know more after I get the new VM up and running.
 
Yeah honest is the best policy, always.
Had one of two "Ooops..my bad" situations myself over the decades in this game. It happens, we're human.

For older server parts, and for older biz grade desktop parts...THIS place located right in NY is excellent, fast shipping too.

Any parts we need for older HPs and Dell and Lenovos...they rock...good prices.
 
For older server parts, and for older biz grade desktop parts...THIS place located right in NY is excellent, fast shipping too.

Yes, I love Server Supply as well - I've used them in the past a few times. They are my second-favorite vendor right under laptopscreens.com - haha. They don't have any of the board I would need, unfortunately -
 
Back
Top