The Hyper-V host has gone down — or it's repeatedly bluescreening, or VMs are showing as "saved" or "missing", or storage has dropped out and the VMs are paused-critical. Whichever flavour of crash it is, you have a stack of virtual machines that aren't running and a business that can't wait.
This guide walks through Hyper-V host recovery exactly as a senior infrastructure engineer would run it: stabilise the host, recover the VMs, deal with file-system damage, and identify what actually caused it so the same fault doesn't bite again next month. Written for IT managers and engineers running Hyper-V on Windows Server 2019, 2022 or 2025 — standalone hosts and clustered (Failover Cluster / CSV) deployments.
If your VMs are down right now and the business is stopped, call 01923 372471 — senior engineer responds quickly, on-site within 2 hours.
Step 1: Get to the host (don't reboot yet)
Console access is essential — RDP is unreliable when a host is unstable, and may be impossible if the management OS itself is the problem.
- Physical: keyboard and monitor.
- Dell PowerEdge: iDRAC virtual console. Login via the iDRAC IP, launch the virtual console, take control.
- HPE ProLiant: iLO remote console.
- Lenovo ThinkSystem: XCC.
Once you have a console, resist the urge to reboot. Volatile state — process lists, network connections, in-flight I/O, dump-pending state — is what tells you what failed. Capture it first.
Step 2: Triage the host
# Is the Hyper-V management service responding?
Get-Service vmms, vmcompute, hvhost
# What VMs does the host think exist, and what state are they in?
Get-VM | Select-Object Name, State, Status, Uptime, ComputerName
# Are there any cluster issues (clustered hosts only)?
Get-ClusterNode
Get-ClusterResource | Where-Object State -ne 'Online'
Get-ClusterSharedVolume | Select-Object Name, State, OwnerNode, SharedVolumeInfo
# Storage health
Get-PhysicalDisk | Where-Object HealthStatus -ne 'Healthy'
Get-VirtualDisk | Select-Object FriendlyName, HealthStatus, OperationalStatus
Get-Disk | Where-Object IsOffline -eq $true
The output of those commands tells you which of four scenarios you're in:
| Scenario | Symptom | Section to read |
|---|---|---|
| A | Host is up, VMs in Saved or Off state, no storage issues | Step 3 |
| B | Host is up but storage missing — CSV offline, LUN dropped, disk failed | Step 4 |
| C | Host won't boot, or BSODs on boot | Step 5 |
| D | Cluster split-brain — multiple nodes claim ownership | Step 6 |
Multiple scenarios can apply simultaneously (a cluster node failure with storage issues, for example). Work through them in order.
Step 3: VMs need to be brought back online — host is healthy
Most common scenario after an unscheduled host reboot. Hyper-V automatically starts VMs configured with Automatic Start Action = Always (or restarts them if they were running before the host crashed), but failures during this auto-start cycle leave VMs Saved or Off.
Bring them back:
# Start VMs that should be running
Get-VM | Where-Object State -in 'Off','Saved' | Start-VM
# If a saved-state file is corrupt and the VM won't start:
Get-VM "VMName" | Remove-VMSavedState
Start-VM -Name "VMName"
Remove-VMSavedState discards the in-memory state captured when the host went down. The VM will boot fresh — equivalent to a hard power-cycle from the guest's perspective. This is fine for stateless services and most production workloads. Be careful with domain controllers — restarting from saved state on a Hyper-V version older than 2012, or after a snapshot revert without VM-GenerationID support, causes USN rollback. On modern combinations (Server 2012+ Hyper-V, Server 2012+ guest DC), this is safe.
If a VM starts but the guest is unstable:
- Check the guest's event log in-VM
- Check Microsoft-Windows-Hyper-V-Worker / Admin on the host — VMs that crash report errors here
- Check storage I/O times during boot — slow storage causes boot timeouts that look like guest failures
Step 4: Storage has dropped out
This is the scenario that turns a brief outage into a long one. Symptoms:
- VMs in
Paused-Criticalstate - CSV showing
OfflineorFailed - Hyper-V Manager shows VMs but you can't start them — error references missing files
- Event log: Microsoft-Windows-Hyper-V-VMMS events 22050, 22052, 18560 (storage-related)
Standalone host with local storage
# What disks does the host see?
Get-Disk
Get-PhysicalDisk
# Bring offline disks back online
Get-Disk | Where-Object IsOffline -eq $true | Set-Disk -IsOffline $false
Get-Disk | Where-Object IsReadOnly -eq $true | Set-Disk -IsReadOnly $false
If a RAID array has degraded — check from iDRAC / iLO / RAID controller management. A degraded RAID-1 or RAID-5 will continue to serve data; a failed RAID-5 with two disks down will not. Replace the failed disk(s) before any further work — additional load on the array can finish off marginal disks.
Standalone host with SAN-attached storage (iSCSI / FC)
Check the iSCSI initiator — iscsicli or Get-IscsiTarget, Get-IscsiConnection. The storage path may have dropped; reconnect:
Get-IscsiTarget | Connect-IscsiTarget -IsPersistent $true
Update-IscsiTarget
Get-Disk | Where-Object PartitionStyle -eq 'RAW' # any new disks?
For FC, check the HBA in the BIOS or the vendor utility (Emulex OneCommand, QLogic QConvergeConsole). LUN masking changes on the SAN side will cause the host to lose visibility of LUNs that were working five minutes ago — check with the SAN admin.
Clustered host with CSV
# Is the CSV available to the cluster?
Get-ClusterSharedVolume
# Move the CSV to a healthy node
Get-ClusterSharedVolume "Cluster Disk N" | Move-ClusterSharedVolume -Node <healthy-node>
# If the CSV refuses to come online, check the underlying storage
Get-ClusterResource "Cluster Disk N" | Get-ClusterParameter
Test-Cluster -Node <node> -Include 'Storage'
CSV failures are usually one of: physical disk failure beneath the CSV, network failure on the CSV/cluster network, witness failure on a 2-node cluster losing quorum, or clussvc itself crashed. The Failover Cluster Manager event log under Microsoft-Windows-FailoverClustering / Operational will tell you which.
Step 5: Host won't boot, or BSODs on boot
Recover the VMs first, post-mortem the host afterwards.
Recover the VM files to a different host
If the host's OS is dead but the storage is intact (separate disk for OS vs VMs is the canonical Hyper-V layout for this exact reason):
- Pull the storage drives, or remap the SAN LUNs to a healthy host.
- On the healthy host, attach the storage. The VHDX files and VM configuration files (
.vmcx,.vmgs) are intact. - Import the VMs:
# In Hyper-V Manager: Action > Import Virtual Machine
# Or via PowerShell:
Import-VM -Path "D:\HyperV\VMs\<VMName>\Virtual Machines\<GUID>.vmcx" -Copy -GenerateNewId
# If the VM is to keep its identity (preferred where the original host is staying offline):
Import-VM -Path "D:\HyperV\VMs\<VMName>\Virtual Machines\<GUID>.vmcx" -Register
-Register keeps the original VM ID — required for cluster scenarios and for guests that are domain-joined or licensed against a specific VM ID. -GenerateNewId is for situations where the original host might come back and you don't want a duplicate.
Diagnose the broken host in parallel
Capture the memory dump if there is one (%SystemRoot%\MEMORY.DMP — or minidumps in %SystemRoot%\Minidump\). Boot the host from Windows Server installation media and choose Repair:
- Startup Repair — cheap, occasionally works
- Command prompt — for
sfc /scannow /offbootdir=C:\ /offwindir=C:\Windows,dism /image:C:\ /cleanup-image /restorehealth /source:wim:D:\sources\install.wim:1, orchkdsk C: /fif a dirty shutdown left disk corruption - Roll back updates — if the host BSODed after a recent Windows Update:
dism /image:C:\ /get-packages, thendism /image:C:\ /remove-package /packagename:<name>
If a recent Datto RMM, AV agent, or backup agent install correlates with the start of instability, that is your suspect. Boot to safe mode (bcdedit /store C:\Boot\BCD /set {default} safeboot minimal from the recovery prompt) and uninstall the agent.
If the OS disk is too damaged to repair, the answer is to rebuild — fresh Windows Server install on a new disk, install Hyper-V role, reattach the VM storage, import the VMs.
Step 6: Cluster split-brain or partition
If two nodes both believe they own a CSV or a clustered VM, you have a split-brain. This is the worst-case Hyper-V cluster scenario because both nodes will write to the same VHDX, corrupting it.
Symptoms:
- Same VM showing as running on two different nodes
- CSV disk online on multiple nodes
- VHDX file showing recent writes from multiple hosts in audit logs
Recovery:
- Stop one node entirely. Pick the node you trust less, or the node that was offline more recently.
Stop-Computer -Forcefrom the iDRAC. - On the remaining node:
Test-Cluster, repair quorum, validate disks. - Once stable, run
chkdskon the CSV (offline) before bringing VMs back. - Restore VHDX files from backup if any are corrupt — there is no in-place repair for a VHDX written to by two hosts simultaneously.
- Investigate the cluster network. Split-brain almost always means heartbeats failed — investigate network team configuration, switch port flaps, or quorum witness reachability.
Step 7: Repairing damaged VHDX files
If a VM won't boot post-recovery, or shows file-system corruption inside the guest, the VHDX may have been damaged.
# Check VHDX integrity
Get-VHD "D:\HyperV\<VMName>\Virtual Hard Disks\<disk>.vhdx" | Format-List
# Look at FragmentationPercentage, FileSize vs Size, ParentPath if differencing,
# and Attached state. A VHDX showing as Attached when no VM should be using it
# indicates a stale lock — usually fixable by stopping/starting vmms.
# For differencing disks where the parent has gone walkabout:
Set-VHD "D:\HyperV\<VMName>\<disk>.avhdx" -ParentPath "D:\HyperV\Parent.vhdx" -IgnoreIdMismatch
# To merge a chain of differencing disks back to the parent (after a snapshot/checkpoint mess):
Merge-VHD -Path "D:\HyperV\<VMName>\<disk>.avhdx" -DestinationPath "D:\HyperV\<VMName>\<parent>.vhdx"
Hyper-V does not have a built-in VHDX repair tool akin to chkdsk. If a VHDX header is damaged, your options are:
- Restore from backup (Veeam, Datto, Windows Server Backup)
- Mount the VHDX read-only and copy intact data files out (
Mount-VHD -Path <file> -ReadOnly) - Third-party VHDX repair tools (StarWind, DiskInternals) — last resort, with limited success rates
This is one of several reasons the discipline of "snapshots are not backups" matters. A VM that has lived in a checkpointed state for months is one host crash away from an unrecoverable VHDX chain.
What NOT to do
- Do not delete
.avhdxfiles manually to "clean up" old checkpoints. They are differencing disks; the parent is incomplete without them. Use Hyper-V Manager's checkpoint deletion orRemove-VMSnapshot, which merges them back properly. - Do not run
chkdskon a CSV from inside a VM. CSV file system repair must be done on the host with the CSV taken offline first. - Do not import a VM with
-GenerateNewIdif the original VM is going to come back online. You will end up with two AD-joined computer accounts, two M365 tenants seeing the same machine, two licences. - Do not assume a "successful" Hyper-V Replica means recoverable VMs. Replica is not a backup — if the source corrupts, replica corrupts. Verify with periodic test failovers, not just the green status icon.
- Do not restore a domain controller VM from an old VHDX backup unless your hypervisor and guest both support VM-GenerationID. See How to recover a failed domain controller.
Prevention
Once stable:
- Separate the OS disk from the VM storage disk. Always. A blown OS disk is a 1-hour rebuild. A blown OS-and-VM-storage combined disk is a week.
- Patch the host on a controlled cadence, not the same day as the patch release. Cumulative updates on Hyper-V hosts have a track record of breaking VMs unpredictably.
- Maintain Veeam / Datto / Windows Server Backup with application-aware processing for guest VMs — backup of the VHDX file alone does not capture in-flight database writes cleanly.
- Test restore once a quarter. Pick a non-critical VM, restore it to an isolated network, boot it, log in. Until you've done this you don't know your backup works.
- Monitor host hardware proactively — iDRAC / iLO / XCC SNMP into your monitoring system. A failing PSU or DIMM warns you for weeks before it crashes the host.
- Keep at least 20% free space on every Hyper-V volume. Dynamic VHDX expansion failures from a full volume are a leading cause of guest VM crashes that look like host crashes.
When to call us
For Hyper-V specifically:
- A host is BSODing repeatedly and you can't isolate the cause.
- Storage has dropped out of a cluster and you can't get the CSV back online.
- A VHDX is corrupt and your backup doesn't have what you need.
- You suspect split-brain or have evidence of two hosts writing to the same VM.
- A VM hosting a domain controller is in saved state and you're unsure whether to start it.
- You're rebuilding a host and migrating VMs under time pressure.
Engineerdirect.co.uk has senior engineers across Hyper-V, VMware, Failover Clustering, SAN/iSCSI/FC storage, and the underlying server hardware — Dell, HPE, Lenovo. On-site response across London and the South East within 2 hours.
Call 01923 372471 — senior engineer answers directly. We respond quickly.
FAQ
Why are my VMs in "Paused-Critical" state? The host has lost access to the storage holding the VHDX, or the volume is full. Check disk status (Get-Disk, Get-VirtualDisk) and free space on the host. Once storage returns, Resume-VM on each VM. If storage is permanently lost, you'll need to restore from backup.
Can I copy a running VM's VHDX file to a backup location? No — a VHDX in use by a running VM is locked, and copying via tools that bypass the lock produces a crash-consistent (broken) file. Use Hyper-V checkpoints, Hyper-V export, or a backup tool that uses VSS for application-aware backup.
My host bluescreens every few hours. How do I find the cause? Check %SystemRoot%\MEMORY.DMP (or minidumps in %SystemRoot%\Minidump\) with WinDbg or BlueScreenView. Common causes on Hyper-V hosts: NIC driver bugs (especially Broadcom, Intel i40e), storage driver bugs, third-party agents (older Datto RMM, AV with kernel hooks). Roll back recent driver updates and recent agent installations as a first move.
What's the difference between Hyper-V Replica and a backup? Replica is a near-real-time copy of running VMs to another host, primarily for disaster recovery (host or site failure). Backup is a point-in-time copy with retention, primarily for data recovery (corruption, accidental deletion, ransomware). Replica protects you from infrastructure loss; backup protects you from data loss. You need both.
Is it safe to extend a VHDX live while the VM is running? Yes for dynamic VHDX, with caveats. The host expands the file; the guest then needs to extend the partition (diskmgmt.msc or diskpart inside the guest). Run a backup first. Don't shrink a VHDX live — that's a guest-down operation.
Should I use VMware instead of Hyper-V? Both are mature platforms. Hyper-V's licensing is included with Windows Server (Standard or Datacenter) which is materially cheaper for most SMBs after the Broadcom/VMware licence changes. VMware has a deeper ecosystem for very large environments. For 5–50 VM businesses on Windows-heavy stacks, Hyper-V is usually the right answer.
A crashed hypervisor with production VMs on it is exactly the kind of failure our emergency server & infrastructure recovery handles directly.
Part of a series of disaster-recovery references. If your VMs are down right now: 01923 372471.
References
Authoritative Microsoft documentation behind this guide:
- Microsoft Learn — Hyper-V on Windows Server
- Microsoft Learn — Failover Clustering overview
- Microsoft Learn — Volume Shadow Copy Service (VSS)
Don't read guides when your systems are down. Call and get a senior engineer on the phone directly.