The Hyper-V host has gone down — or it's repeatedly bluescreening, or VMs are showing as "saved" or "missing", or storage has dropped out and the VMs are paused-critical. Whichever flavour of crash it is, you have a stack of virtual machines that aren't running and a business that can't wait.

This guide walks through Hyper-V host recovery exactly as a senior infrastructure engineer would run it: stabilise the host, recover the VMs, deal with file-system damage, and identify what actually caused it so the same fault doesn't bite again next month. Written for IT managers and engineers running Hyper-V on Windows Server 2019, 2022 or 2025 — standalone hosts and clustered (Failover Cluster / CSV) deployments.

If your VMs are down right now and the business is stopped, call 01923 372471 — senior engineer responds quickly, on-site within 2 hours.

Step 1: Get to the host (don't reboot yet)

Console access is essential — RDP is unreliable when a host is unstable, and may be impossible if the management OS itself is the problem.

Once you have a console, resist the urge to reboot. Volatile state — process lists, network connections, in-flight I/O, dump-pending state — is what tells you what failed. Capture it first.

Step 2: Triage the host

# Is the Hyper-V management service responding?
Get-Service vmms, vmcompute, hvhost

# What VMs does the host think exist, and what state are they in?
Get-VM | Select-Object Name, State, Status, Uptime, ComputerName

# Are there any cluster issues (clustered hosts only)?
Get-ClusterNode
Get-ClusterResource | Where-Object State -ne 'Online'
Get-ClusterSharedVolume | Select-Object Name, State, OwnerNode, SharedVolumeInfo

# Storage health
Get-PhysicalDisk | Where-Object HealthStatus -ne 'Healthy'
Get-VirtualDisk | Select-Object FriendlyName, HealthStatus, OperationalStatus
Get-Disk | Where-Object IsOffline -eq $true

The output of those commands tells you which of four scenarios you're in:

ScenarioSymptomSection to read
AHost is up, VMs in Saved or Off state, no storage issuesStep 3
BHost is up but storage missing — CSV offline, LUN dropped, disk failedStep 4
CHost won't boot, or BSODs on bootStep 5
DCluster split-brain — multiple nodes claim ownershipStep 6

Multiple scenarios can apply simultaneously (a cluster node failure with storage issues, for example). Work through them in order.

Step 3: VMs need to be brought back online — host is healthy

Most common scenario after an unscheduled host reboot. Hyper-V automatically starts VMs configured with Automatic Start Action = Always (or restarts them if they were running before the host crashed), but failures during this auto-start cycle leave VMs Saved or Off.

Bring them back:

# Start VMs that should be running
Get-VM | Where-Object State -in 'Off','Saved' | Start-VM

# If a saved-state file is corrupt and the VM won't start:
Get-VM "VMName" | Remove-VMSavedState
Start-VM -Name "VMName"

Remove-VMSavedState discards the in-memory state captured when the host went down. The VM will boot fresh — equivalent to a hard power-cycle from the guest's perspective. This is fine for stateless services and most production workloads. Be careful with domain controllers — restarting from saved state on a Hyper-V version older than 2012, or after a snapshot revert without VM-GenerationID support, causes USN rollback. On modern combinations (Server 2012+ Hyper-V, Server 2012+ guest DC), this is safe.

If a VM starts but the guest is unstable:

Step 4: Storage has dropped out

This is the scenario that turns a brief outage into a long one. Symptoms:

Standalone host with local storage

# What disks does the host see?
Get-Disk
Get-PhysicalDisk

# Bring offline disks back online
Get-Disk | Where-Object IsOffline -eq $true | Set-Disk -IsOffline $false
Get-Disk | Where-Object IsReadOnly -eq $true | Set-Disk -IsReadOnly $false

If a RAID array has degraded — check from iDRAC / iLO / RAID controller management. A degraded RAID-1 or RAID-5 will continue to serve data; a failed RAID-5 with two disks down will not. Replace the failed disk(s) before any further work — additional load on the array can finish off marginal disks.

Standalone host with SAN-attached storage (iSCSI / FC)

Check the iSCSI initiator — iscsicli or Get-IscsiTarget, Get-IscsiConnection. The storage path may have dropped; reconnect:

Get-IscsiTarget | Connect-IscsiTarget -IsPersistent $true
Update-IscsiTarget
Get-Disk | Where-Object PartitionStyle -eq 'RAW' # any new disks?

For FC, check the HBA in the BIOS or the vendor utility (Emulex OneCommand, QLogic QConvergeConsole). LUN masking changes on the SAN side will cause the host to lose visibility of LUNs that were working five minutes ago — check with the SAN admin.

Clustered host with CSV

# Is the CSV available to the cluster?
Get-ClusterSharedVolume

# Move the CSV to a healthy node
Get-ClusterSharedVolume "Cluster Disk N" | Move-ClusterSharedVolume -Node <healthy-node>

# If the CSV refuses to come online, check the underlying storage
Get-ClusterResource "Cluster Disk N" | Get-ClusterParameter
Test-Cluster -Node <node> -Include 'Storage'

CSV failures are usually one of: physical disk failure beneath the CSV, network failure on the CSV/cluster network, witness failure on a 2-node cluster losing quorum, or clussvc itself crashed. The Failover Cluster Manager event log under Microsoft-Windows-FailoverClustering / Operational will tell you which.

Step 5: Host won't boot, or BSODs on boot

Recover the VMs first, post-mortem the host afterwards.

Recover the VM files to a different host

If the host's OS is dead but the storage is intact (separate disk for OS vs VMs is the canonical Hyper-V layout for this exact reason):

  1. Pull the storage drives, or remap the SAN LUNs to a healthy host.
  2. On the healthy host, attach the storage. The VHDX files and VM configuration files (.vmcx, .vmgs) are intact.
  3. Import the VMs:
# In Hyper-V Manager: Action > Import Virtual Machine
# Or via PowerShell:
Import-VM -Path "D:\HyperV\VMs\<VMName>\Virtual Machines\<GUID>.vmcx" -Copy -GenerateNewId

# If the VM is to keep its identity (preferred where the original host is staying offline):
Import-VM -Path "D:\HyperV\VMs\<VMName>\Virtual Machines\<GUID>.vmcx" -Register

-Register keeps the original VM ID — required for cluster scenarios and for guests that are domain-joined or licensed against a specific VM ID. -GenerateNewId is for situations where the original host might come back and you don't want a duplicate.

Diagnose the broken host in parallel

Capture the memory dump if there is one (%SystemRoot%\MEMORY.DMP — or minidumps in %SystemRoot%\Minidump\). Boot the host from Windows Server installation media and choose Repair:

If a recent Datto RMM, AV agent, or backup agent install correlates with the start of instability, that is your suspect. Boot to safe mode (bcdedit /store C:\Boot\BCD /set {default} safeboot minimal from the recovery prompt) and uninstall the agent.

If the OS disk is too damaged to repair, the answer is to rebuild — fresh Windows Server install on a new disk, install Hyper-V role, reattach the VM storage, import the VMs.

Step 6: Cluster split-brain or partition

If two nodes both believe they own a CSV or a clustered VM, you have a split-brain. This is the worst-case Hyper-V cluster scenario because both nodes will write to the same VHDX, corrupting it.

Symptoms:

Recovery:

  1. Stop one node entirely. Pick the node you trust less, or the node that was offline more recently. Stop-Computer -Force from the iDRAC.
  2. On the remaining node: Test-Cluster, repair quorum, validate disks.
  3. Once stable, run chkdsk on the CSV (offline) before bringing VMs back.
  4. Restore VHDX files from backup if any are corrupt — there is no in-place repair for a VHDX written to by two hosts simultaneously.
  5. Investigate the cluster network. Split-brain almost always means heartbeats failed — investigate network team configuration, switch port flaps, or quorum witness reachability.

Step 7: Repairing damaged VHDX files

If a VM won't boot post-recovery, or shows file-system corruption inside the guest, the VHDX may have been damaged.

# Check VHDX integrity
Get-VHD "D:\HyperV\<VMName>\Virtual Hard Disks\<disk>.vhdx" | Format-List

# Look at FragmentationPercentage, FileSize vs Size, ParentPath if differencing,
# and Attached state. A VHDX showing as Attached when no VM should be using it
# indicates a stale lock — usually fixable by stopping/starting vmms.

# For differencing disks where the parent has gone walkabout:
Set-VHD "D:\HyperV\<VMName>\<disk>.avhdx" -ParentPath "D:\HyperV\Parent.vhdx" -IgnoreIdMismatch

# To merge a chain of differencing disks back to the parent (after a snapshot/checkpoint mess):
Merge-VHD -Path "D:\HyperV\<VMName>\<disk>.avhdx" -DestinationPath "D:\HyperV\<VMName>\<parent>.vhdx"

Hyper-V does not have a built-in VHDX repair tool akin to chkdsk. If a VHDX header is damaged, your options are:

This is one of several reasons the discipline of "snapshots are not backups" matters. A VM that has lived in a checkpointed state for months is one host crash away from an unrecoverable VHDX chain.

What NOT to do

Prevention

Once stable:

When to call us

For Hyper-V specifically:

Engineerdirect.co.uk has senior engineers across Hyper-V, VMware, Failover Clustering, SAN/iSCSI/FC storage, and the underlying server hardware — Dell, HPE, Lenovo. On-site response across London and the South East within 2 hours.

Call 01923 372471 — senior engineer answers directly. We respond quickly.

FAQ

Why are my VMs in "Paused-Critical" state? The host has lost access to the storage holding the VHDX, or the volume is full. Check disk status (Get-Disk, Get-VirtualDisk) and free space on the host. Once storage returns, Resume-VM on each VM. If storage is permanently lost, you'll need to restore from backup.

Can I copy a running VM's VHDX file to a backup location? No — a VHDX in use by a running VM is locked, and copying via tools that bypass the lock produces a crash-consistent (broken) file. Use Hyper-V checkpoints, Hyper-V export, or a backup tool that uses VSS for application-aware backup.

My host bluescreens every few hours. How do I find the cause? Check %SystemRoot%\MEMORY.DMP (or minidumps in %SystemRoot%\Minidump\) with WinDbg or BlueScreenView. Common causes on Hyper-V hosts: NIC driver bugs (especially Broadcom, Intel i40e), storage driver bugs, third-party agents (older Datto RMM, AV with kernel hooks). Roll back recent driver updates and recent agent installations as a first move.

What's the difference between Hyper-V Replica and a backup? Replica is a near-real-time copy of running VMs to another host, primarily for disaster recovery (host or site failure). Backup is a point-in-time copy with retention, primarily for data recovery (corruption, accidental deletion, ransomware). Replica protects you from infrastructure loss; backup protects you from data loss. You need both.

Is it safe to extend a VHDX live while the VM is running? Yes for dynamic VHDX, with caveats. The host expands the file; the guest then needs to extend the partition (diskmgmt.msc or diskpart inside the guest). Run a backup first. Don't shrink a VHDX live — that's a guest-down operation.

Should I use VMware instead of Hyper-V? Both are mature platforms. Hyper-V's licensing is included with Windows Server (Standard or Datacenter) which is materially cheaper for most SMBs after the Broadcom/VMware licence changes. VMware has a deeper ecosystem for very large environments. For 5–50 VM businesses on Windows-heavy stacks, Hyper-V is usually the right answer.


A crashed hypervisor with production VMs on it is exactly the kind of failure our emergency server & infrastructure recovery handles directly.

Part of a series of disaster-recovery references. If your VMs are down right now: 01923 372471.

References

Authoritative Microsoft documentation behind this guide:

Dealing with this right now?

Don't read guides when your systems are down. Call and get a senior engineer on the phone directly.

📞 01923 372471