[SOLVED] Server 2012 R2 0x00000139 BSOD

Moltuae

Rest In Peace
Reaction score
3,669
Location
Lancs, UK
Okaaaay, this one is really beginning to pee me off now :confused: ....



zEbTBGa.jpg


Before I start backtracking and uninstalling/disabling everything, one by one (or setting fire to the damn thing!), has anyone got any ideas what might be causing this?

Roughly once every day, same BSOD every time - BSOD 139, parameter 3 (only the address parameters change):

The computer has rebooted from a bugcheck. The bugcheck was: 0x00000139 (0x0000000000000003, 0xffffd00089d6d450, 0xffffd00089d6d3a8, 0x0000000000000000). A dump was saved in: C:\Windows\MEMORY.DMP. Report Id: 033015-8984-01.

I believe it's a TCP connections issue, but the only fixes I can find are applicable to 2012 not 2012 R2:
https://support.microsoft.com/en-us/kb/2883658

Hardware is a HP Proliant DL380 G7, with integrated quad HP NC382i DP NICs and a quad Intel Pro/1000 PT LP card, running Server 2012 R2 (GUI) DataCenter Ed + Hyper-V role.

Worked perfectly for several weeks. Had 4 of the NICs teamed, dedicated for host use and the other 4 NICs teamed and dedicated to VM use. All 8 NICs were connected, straddling 2 HP 1920 switches (split, 2 of each team on each switch, for redundancy).

Back-tracking, I think the only thing that changed before this started happening was that I had just attached a NAS, using the iSCSI initiator.

I have since removed the iSCSI connection (and disabled the service) and tried numerous teaming configurations, including 2 teams of 2 on each of the quad interfaces (disabling the other) and team configurations that don't straddle the switches, yet the problem remains.

I have also tried updating all drivers, initially manually, then using SDI (which found a few more).

The frustrating thing is, the bleedin' BSOD is so infrequent, it takes about a day to find out whether the problem remains, and it would take several days to know (with any certainty) whether it's resolved.

This is the bugcheck analysis, for what its worth:
[see attached file]
---------



My next move will be to begin disabling heaps of stuff, then waiting (several days) before re-enabling or disabling more. Anyone know what the cause might be before I do?

Thanks in advance.
 

Attachments

  • code.txt
    12.6 KB · Views: 0
Last edited by a moderator:
I seem to remember something about that NIC when looking around the HP website recently for another quad port card. Anyways.... firmware update? I know there have been some.

Andy
 
Thanks Andy!

I'm pretty sure all firmware is up-to-date, but good point, I'll recheck ...


As part of my diagnostics though, I have disabled the ports of both quad port cards in turn (integral and add-on), such that only one or the other is being used. Either way, it still crashes. You'd think that if the NIC firmware was the culprit, disabling the ports would put an end to the BSODs.
 
Is there any antivirus on the host? Or even the guests? I'd uninstall it for a while.
Remove and rebuilt virtual switches

That model onboard is Broadcom based, right? If you're not using it, disable it....haven't had issues with Broadcomes in a few years...but ...just cuz.
 
Thanks Stonecat. Great suggestions!

The integrated NICs are indeed Broadcom-based. However, they're presently disabled and I've had a couple of BSODs since.

No anti-virus on the host right now, but the VMs, presently 3 x Server 2012 R2 (ie generation 2 VMs) ,are each running System Center Endpoint Protection. I'll try uninstalling that and redoing the virtual switches next (once I've determined whether other recent changes have made any difference).
 
OK I see that you meant that in your prior post....my misunderstanding (regarding disabling the onboard). I didn't that on the first skim.
I had horrible issues with Broadcoms in the Server 03 days. So I dislike them. Pretty sure haven't seen issues since '08.

You dump (heh....I said "dump")....points to netio.sys...which is network stack related...but unfortunately it's still very vague. Can be more hardware related..such as NIC drivers, or could be some issue causing the TCP/IP stack to get wonky...such as a software firewall or some AV software. And I'd include virtual switches, And you've already covered the other thing that makes sense...your iscuzzer mount.

The hyper-v host in workgroup mode?
ANY other software on the hyper-v host? Some battery UPS monitoring software?
RMM agents? (I know you want those on there....but do you have iLo licensed for additional remote access?)

Gah..this one must be frustrating..guessing she's quite important since she's skinned with data center and you had all that teaming....so very little windows of opportunity for downtime and testing stuff out.

Hmmm....lets circle around that teaming a little bit....
What you have for virtual switches?
How many guests is she hosting?

Might you be able to experiment by ditching the teaming....and making a vswitch (1 to 1 per NIC port)...for each guest. So each guest just has a dedicated NIC to the switch...share the lightest servers NIC with your host if you run out.
 
The hyper-v host in workgroup mode?
Hmmm, no. Recently added it to the domain so that it could act as a GUI manager for a couple of other physical servers on the domain, both running Hyper-V Server 2012 R2 (Core). That was one of a number of configuration steps I was looking to undo if I can't pinpoint the cause. Ya thinking there's a reason that could be it?

ANY other software on the hyper-v host? Some battery UPS monitoring software?
RMM agents? (I know you want those on there....but do you have iLo licensed for additional remote access?)
You know what, you might be on to something here! ...

Actually, there wasn't a great deal of superfluous software on there before it started crashing -- the vast majority of what's on there now came from 'Windows Kits' and the associated debugging tools that I installed more recently in an attempt to diagnose the issue. I did have the HP Insight Management Agents, iLO drivers, etc installed previously, but it had been running stably with that stuff on there for 3 weeks or more. There is the Broadcom management suite which, again, has been on there for weeks, but I think I'll uninstall that anyway.

However! ..... comparing installation dates, I have just noticed something: My ScreenConnect client was installed (due to an update/re-installation) the day before the first crash. Nothing else for weeks before that. Gotta wonder if that could be it! ....


Gah..this one must be frustrating..guessing she's quite important since she's skinned with data center and you had all that teaming....so very little windows of opportunity for downtime and testing stuff out.
Thankfully, there's another few servers on site to take the load. This server was temporarily decommissioned for upgrades; the plan being to upgrade it and, once it's running smoothly, upgrade/replace the remaining servers. There's still is bit of urgency though because, without this server, there's not a great deal of redundancy on site and we're now pushing some of the remaining servers a little, running a VM or two more than they can comfortably handle.

So, at this stage, reboots are not a problem, and even hardware changes can easily be made, if necessary. The site is a 20 minute drive away, but I'm usually there at least once a week though.

Hmmm....lets circle around that teaming a little bit....
What you have for virtual switches?
How many guests is she hosting?
Presently just a single simple virtual switch, connecting all VMs to one of the 2 teams (the other team is exclusively used for the host). This was configured as 4 ports per team but, for diagnostic purposes, these have been reduced to 2 each, using only the Intel Pro/1000 PT card.

Just 3 guests running right now. None of which are production VMs, so shut down/restart is not a problem.

Might you be able to experiment by ditching the teaming....and making a vswitch (1 to 1 per NIC port)...for each guest. So each guest just has a dedicated NIC to the switch...share the lightest servers NIC with your host if you run out.

That's was what I was about to try next. I'll probably just shut down the VMs and delete the teams, then leave it for a few days, see how it is. Thing is though this team/VM setup ran great for weeks. This all seemed to happen about the time I installed the iSCSI initiator service.




Thanks again Stonecat :)
 
Just an update: After nearly 4 weeks of total stability, I'm going to declare this one fixed! :)

I tried removing the teams, disabling NICs, changing NICs and drivers, disabling and uninstalling AV -- everything suggested and everything else I could think of -- but nothing short of totally disabling essential networking services would stop the BSODs.

In the end, I removed the Hyper-V role and started again from scratch. I'm not sure exactly what the cause was, but I can only assume it was a team/virtual-switch issue (as I think you were hinting at Stonecat). I suspect the gremlin was awoken by re-configuring teaming after configuring the virtual switch. Now that the gremlin has become a mogwai again, I'll have to remember not to feed it by messing with the teaming.

Thanks guys for your suggestions.
 
Wow...forgot about this one.
I've gotten a bit wary of Hyper-V lately, esp when broadcoms are in place. You probably recall a post or two of mine a month or so ago about the vswitches falling asleep.
 
Yeah, I remember that. I may disable the integrated broadcom NICs if I have any further issues, and replace them with another intel quad card, but so far they're working fine. And, to be fair to them, they don't seem to have been the culprit in this case. At one point the server was still BSODing even with the broadcom NICs disabled.
 
Back
Top