Hyper-V monitoring replication

HCHTech

Well-Known Member
Reaction score
4,308
Location
Pittsburgh, PA - USA
I have a new client that is replicating a HyperV server to on offsite server. I got a horror story about the previous tech who missed that the replication stopped working, which caused the onsite server's disk to fill with differential files, which crashed the VMs, which caused a big slug of downtime while things were sorted out. Surprisingly that was NOT when the previous tech got fired - haha.

Anyway, I'm sure the client told me this story to impress upon me that he never wants this to happen again, so I'd like to oblige him. :D

It doesn't look to me like Hyper-V includes much in the way of notification settings. I believe I can put an event log check into my monitoring software for event #29292 or 32022 in the Hyper-V-VMMS log, but I'm wondering what else I might do.

How are others monitoring Hyper-V Replication?
 
I use Hyper-V replication on all of my multi-server setups, along with DFS replication but I don't have any kind of automated monitoring/notifications in place, mainly because everything is frequently backed up anyway. I really only use it as a convenience feature (to allow for quick migrations) and to give a belt-and-braces solution to VMs that are already being backed-up.

Every couple of weeks I just manually check replication health, as part of my maintenance schedule. Barely a day goes by that I'm not working on one of the servers though, so I'll usually glance at the 'Replication Health' column in Hyper-V manager while I'm in there. In my experience, Hyper-V replication is very reliable anyway. In general, the only time it gets a out-of-whack is when one of the hosts has been offline or rebooted for updates, after which I usually give the status another quick check.
 
Ok, got it. I think I might have hit a wall with the event log checks. SW doesn't seem to allow you to write a check for the Hyper-V-VMMS log, which is where the replication events get stored. I'm going to try and get a powershell script together anyway. The other thing I could do is tighten up my "spaced used" checks so a smaller delta would flag, as well as my "free space" checks so a larger threshold would flag. I don't want to build a whole system around the exception, but since this is my first one, I'm still researching.

This setup is using Windows Server Backup on the host to backup the VMs nightly, so he's ripe for a more robust solution on that front. He is price sensitive at the moment because he is moving his office into a new building in the spring and spending all his pennies on the renovation of the new space. I'll plant the seed and then just keep a closer eye than usual on things until there is more budget for a better solution.
 
Do you have an RMM, if so which one do you use? You should be able to add a check, and even an auto-fix for Hyper-V Replica to your RMM. I've done it in the past for Kaseya, using Powershell scripts.
With Powershell, you can use these commands to get the current status and to reset the status and resume replication:

Code:
Get-VM | Resume-VMReplication
Get-VM | Reset-VMReplicationStatistics

Also, here are the EventID codes related to Hyper-V Replica:

32596 - Error - Not Enabled
32086 - Error - Suspended due to failure
29292 - Error - Could not, timeout, unreachable
32022 - Error - Could not, not resolved
32082 - Error - Could not, cancelled
32315 - Warning - Failed, will retry

Also, I have a PS script that I found that was originally written for Nagios. You can find it in my github repo:
https://github.com/flatlinebb/MSPscripts/blob/master/Check-HyperVReplica.ps1

Hope that helps.
 
Back
Top