A Veeam backup job has been failing for days. Or weeks. Or — and this is the call we get most — the customer has just discovered the backups have been red on the dashboard for two months, and now the server they wanted to restore from is the one that died.
Backup failures are unique in that they are the disaster you don't notice until the disaster you actually feared has already happened. The cure is not to wait for a restore — it's to read backup failures the same day they appear. This guide walks through how a senior engineer diagnoses Veeam backup failures across Veeam Backup & Replication 12.x, the common error families, the meaningful repair steps, and the verification routine that turns "we have backups" into "we have backups that actually restore". Written for IT managers and backup admins running Veeam in Hyper-V or VMware environments.
If you have a backup failure now and a recovery scenario looming, call 01923 372471 — senior engineer answers directly, on-site within 2 hours.
Step 1: Read the actual error, not the colour
The Veeam dashboard shows job status as Success / Warning / Failed. Don't stop at the colour — open the job report and read the error text. Veeam's errors are precise; the cure follows directly from the wording.
In the Veeam console: Home > Jobs > [job] > right-click > Statistics > [latest session] > Errors tab.
Or from PowerShell:
Add-PSSnapin VeeamPSSnapin
Get-VBRJob -Name "<job name>" | Get-VBRBackupSession | Sort-Object EndTime -Descending |
Select-Object -First 5 | Get-VBRTaskSession |
Select-Object Name, Status, @{n="Reason";e={$_.Result.Reason}}
The Reason field is the answer to "why did this fail?". Capture it; it tells you which class of issue you're in:
| Error category | Common Reason text | Section |
|---|---|---|
| VSS / guest processing | "Failed to create snapshot", "VSS provider failed", "Cannot freeze guest" | Step 2 |
| Network / repository | "Cannot connect to host", "Failed to write block", "I/O error" | Step 3 |
| Storage / capacity | "Not enough disk space", "Backup repository is full" | Step 4 |
| Retention / chain | "Cannot find required backup file", "Backup chain is corrupted" | Step 5 |
| Authentication / agent | "Logon failure", "Cannot establish connection" | Step 6 |
| Hypervisor | "VSphere snapshot failed", "Hyper-V change tracking failed" | Step 7 |
Step 2: VSS / guest processing failures
The Volume Shadow Copy Service is responsible for taking application-consistent snapshots of running services (SQL, Exchange, AD). When VSS fails, the backup either falls back to crash-consistent (worse for databases) or fails outright.
Common errors
Failed to take a snapshot of the volume— usually a writer in a failed state.VSS_E_WRITER_ERRORINCONSISTENTSNAPSHOT (0x800423F4)— a specific writer (SQL, Exchange) had an error during the freeze.Cannot freeze guest— VSS inside the guest VM is not responding to Veeam's request.
Diagnose
On the affected VM (or physical server):
vssadmin list writers
Every writer should show State: [1] Stable and Last error: No error. Any writer in Failed state, or with a non-zero last error, is the cause.
# Restart the writers (no service restart approach):
# - For most application writers, restart the service hosting them.
# - SQL writer: restart "SQL Server VSS Writer" service.
# - Exchange writer: restart "Microsoft Exchange Replication" service.
# - System / shadow copy writers: restart "Volume Shadow Copy" and "COM+ System Application".
Restart-Service VSS, "COM+ System Application"
For persistent VSS failures:
# Re-register VSS components (run as admin):
cd /d %windir%\system32
net stop vss
net stop swprv
regsvr32 /s ole32.dll
regsvr32 /s oleaut32.dll
regsvr32 /s vss_ps.dll
vssvc /register
regsvr32 /s /i swprv.dll
regsvr32 /s /i eventcls.dll
regsvr32 /s es.es
regsvr32 /s stdprov.dll
regsvr32 /s vssui.dll
regsvr32 /s msxml.dll
regsvr32 /s msxml3.dll
regsvr32 /s msxml4.dll
net start swprv
net start vss
vssadmin list writers
Reboot to clear stuck snapshot operations is sometimes the only fix — schedule it; check vssadmin list shadows for stuck shadow copies and remove them with vssadmin delete shadows /all (when no backup is actively running).
VMware-specific VSS
For VMware-hosted VMs, application-aware processing relies on VMware Tools' VSS provider. If that's missing or broken:
- Confirm VMware Tools is installed and current (
Get-VMToolsfrom PowerCLI, or in vSphere client) - Check
vmware-tools-servicesrunning on Linux guests - For Windows, the VMware Tools install option
VSS Servicemust be selected — re-run the installer if it's missing
Hyper-V-specific VSS
Hyper-V uses the Hyper-V Volume Shadow Copy Requestor integration component on the guest. Check:
- VM Settings → Integration Services → Backup (volume checkpoint) is enabled
- The guest's
Hyper-V Volume Shadow Copy Requestorservice is running - The guest has the latest Integration Services (older versions on newer hosts cause silent failures)
Step 3: Repository connectivity / I/O failures
Common errors
Failed to write block/I/O error— the repository is unreachable, the share has lost the connection, or the underlying disk has had an error.Cannot connect to repository— credentials, network, or the repository's host is down.Failed to upload disk— for cloud / object-storage repositories, often timeouts or auth tokens expired.
Diagnose
For SMB repositories:
# Test from the Veeam server / proxy:
Test-NetConnection -ComputerName <repo-host> -Port 445
net use \\<repo-host>\<share> /user:<account> *
The account Veeam uses to connect to the repository must have the rights — for SMB repositories on a Windows file server, the simplest reliable model is a dedicated veeam-repo service account with full control on the share and on the underlying NTFS folder. If that account password has rotated and Veeam is using the old one, the job fails; recreate the credential in Veeam under Credentials Manager.
For dedicated-Linux hardened repositories (the recommended modern Veeam pattern):
# On the Linux repo:
df -h /mnt/veeam-repo # free space
ls -la /mnt/veeam-repo # ownership
sudo journalctl -u veeamtransport --since "1 hour ago" # transport service log
Veeam Transport runs as the user veeamtransport (or whatever was specified at config). If the underlying disk has filled, gone read-only (filesystem hit a fault and remounted RO), or the filesystem has degraded, transport fails immediately.
Object-storage repositories
Capacity Tier (S3, Wasabi, Azure Blob, Backblaze B2) failures usually trace to:
- Expired API keys
- IAM policy changes that revoked write/delete permissions
- Bucket-level immutability settings preventing Veeam from updating block descriptors
- Network egress blocked at the customer's firewall
Test the credentials independently with aws s3 ls (S3-compatible) or equivalent before assuming Veeam is broken.
Step 4: Repository capacity / chain length
Common errors
Not enough free space on the target diskFailed to allocate processed blocksBackup file size limit reached
What's actually happening
A Veeam backup chain consists of a full backup file (.vbk) and a sequence of incrementals (.vib for forward incremental, .vrb for reverse). The repository must hold enough space for the whole chain plus the next incremental, plus headroom for Veeam's working space.
A common pattern: the customer set retention to 30 days, never sized the repository to support 30 days of data growth, and one busy quarter the repository fills.
# In the Veeam console:
Get-VBRBackup -Name "<job name>" | Select-Object Name, FullSize, IncrementSize, Files
To free space without breaking chains:
- Don't manually delete
.vbkor.vibfiles. Veeam tracks them in its database; manual deletion creates orphan records and corrupts subsequent restores. - Reduce retention in the job's settings → Storage → Retention. Veeam will trim old restore points on the next successful run.
- Run a Health Check on the chain (Job → Edit → Storage → Advanced → Maintenance → Run health check periodically) to remove damaged blocks.
- Move backups to a larger repository via Veeam's Backup Copy Job, then point the job at the new target.
- Add capacity tier (cloud) for older restore points — keep recent ones local, tier old ones to S3-compatible storage. Veeam handles this transparently.
Step 5: Backup chain corruption
The error you don't want: Cannot find required backup file or Backup chain is corrupted. The job knows it needs a specific .vbk or .vib file; the file is missing, damaged, or unreadable.
Causes
- Manual deletion of files in the repository folder
- Repository disk fault (silent bit-rot — common on cheap NAS without ZFS or ReFS)
- Veeam configuration database restored from backup that's out of sync with the actual repository contents
- Antivirus on the repository quarantined backup files (it does happen; AV exclusions for
.vbk,.vib,.vrb,.vlbare not optional)
Recovery options
Option A — Health check / repair
In the Veeam console, right-click the chain → Backup files > Health Check. Veeam scans every block, identifies corrupt blocks, and (if forward-incremental) writes good blocks into the next backup run. This rebuilds the chain over the next few backup cycles without losing all history.
Option B — Active full
Force the job's next run to be a new full backup, starting a fresh chain. Old chain remains for restore until retention removes it. This is the cleanest path forward if the corruption isn't restorable.
Option C — Restore from copy
If you have a Backup Copy Job to a secondary repository (or to capacity tier in the cloud), you can restore from that — even if the primary chain is destroyed. This is the value proposition of 3-2-1.
Option D — Rebuild the configuration
If the Veeam configuration database itself is the problem (vbm files exist on disk but Veeam doesn't see them), use Import Backup to rebuild the chain entry in the database from the on-disk files.
What you don't have
If the chain is corrupt and you have no copy: that range of historical backups is gone. The future chain (from the next active full forward) will be intact, but you have lost the ability to restore from any point during the corrupt period. This is precisely why 3-2-1 exists.
Step 6: Authentication / agent failures
Common errors
Logon failure: unknown user name or bad passwordFailed to start agentCannot install runtime components
The Veeam server pushes a runtime agent to each protected machine over administrative shares (\\server\admin$) using an account with local admin rights. If the account password has changed, the local admin shares are disabled, or a firewall rule is blocking SMB, agent push fails.
Fix
- Update the credential in Veeam (Credentials Manager).
- Test the credential against the protected machine:
psexec \\<server> -u <account> -p <password> hostnameor simplynet use \\<server>\admin$ /user:<account> *. - For domain machines, the simplest reliable model is a dedicated
veeam-svcdomain account with local admin rights on every protected server (delegated via Restricted Groups GPO). - For non-domain or DMZ servers, add a local admin account and use it specifically for that target.
Step 7: Hypervisor-side failures
VMware
Failed to create snapshoton the VM — usually because a previous Veeam snapshot wasn't cleaned up. In vSphere, find the VM, Snapshots > Manage > Delete All. Then reset the job.Change Block Tracking failed— CBT corruption. Disable CBT (Edit Settings > VM Options > Advanced > Configuration ParameterssetctkEnabled = FALSEandscsi0:0.ctkEnabled = FALSEper disk), reboot, re-enable. Or run Veeam's Reset CBT option in the job settings.- VM has too many snapshots — sometimes Veeam fails to consolidate. Manually delete stale snapshots (carefully); investigate why consolidation isn't completing automatically (storage I/O latency is the usual culprit).
Hyper-V
- Resilient Change Tracking (RCT) errors — RCT is Hyper-V's equivalent of CBT. RCT corruption is rare but real. Resolution: stop the VM, delete the RCT files (
.rct,.mrt) from the VM folder, restart the VM. Next backup will be a full but RCT will rebuild. - Cluster-aware backup failures — if the VM lives on a CSV, ensure the Veeam proxy has access to the CSV. The proxy account needs Cluster Read rights at minimum.
- Production checkpoints vs Standard checkpoints — Hyper-V VMs configured for Standard checkpoints (memory included) cause Veeam to fall back to crash-consistent backups for application-aware workloads. Configure VMs for Production checkpoints (VSS-based) in VM settings.
Verifying backups actually restore
Backup that hasn't been verified is hope, not insurance.
Manual: scheduled test restore
Once a quarter, restore one VM (or one critical file set) to an isolated network. Boot it. Log in. Confirm services start. Document the test in your ticketing system. If a restore fails, you've found a problem before the disaster does.
Automated: Veeam SureBackup
SureBackup runs scheduled test restores in a sandboxed virtual lab, boots the VMs, runs heartbeat tests, and reports pass/fail. Available in Veeam Enterprise and higher editions. The single most valuable feature in Veeam for businesses without the time for manual quarterly tests.
Automated: Veeam Backup Validator
Free utility (separate executable on the Veeam server) that checksums every backup file against its stored hash. Doesn't test restorability of the application, but catches silent file corruption before it becomes catastrophic.
"C:\Program Files\Veeam\Backup and Replication\Backup\Veeam.Backup.Validator.exe" /backup:"<job name>"
Run weekly via Task Scheduler. Alert on any failure.
What NOT to do
- Don't ignore yellow / warning states. Veeam warnings are early indicators; the next state is red.
- Don't disable AV on the repository as a generic fix. Add AV exclusions for
.vbk,.vib,.vrb,.vlb, the Veeam program directory, and the repository folder. Do not disable AV entirely. - Don't manually delete
.vbkor.vibfiles to free space. Use Veeam's retention policy or repository move. - Don't run Veeam jobs on a single repository sized exactly for the data. Sizing for current data + 30% headroom + retention growth is the practical minimum.
- Don't rely on a backup repository accessible from a domain admin account during a ransomware incident. That repository is compromised. See First steps after a ransomware attack. Immutable repositories are essential for ransomware resilience.
Prevention
- 3-2-1 backup rule. Three copies of data, on two different media, one offsite. Tape, immutable cloud (S3 Object Lock, Veeam hardened Linux repository, Datto SIRIS Cloud), or both.
- Run backup health checks weekly. Schedule via Veeam's job settings.
- Run test restores quarterly. Manual or via SureBackup.
- Monitor backup completion and alert on failure same-day. Veeam Email Notifications, plus a separate "no email received" watchdog (Healthchecks.io, dead-man's-switch monitoring) — because if Veeam itself fails to email, you'll never know.
- Document your RPO/RTO. Recovery Point Objective is how much data you can afford to lose; Recovery Time Objective is how long you can afford to be down. Backups are designed to meet these — not the other way around.
- Don't bolt-on cloud backup retention onto a poorly-architected primary. If the primary backup chain is fragile, the cloud copy is fragile.
When to call us
Call us if:
- A backup chain has been failing for weeks and you need to recover what restorability is left.
- A repository disk has failed and you're not sure what's salvageable.
- A ransomware incident has destroyed primary backups and you need to identify what's recoverable from immutable / copy / cloud.
- Veeam is broken in a way the documentation hasn't helped with.
- You want a backup architecture review — sizing, repository design, immutability, 3-2-1 implementation.
Engineerdirect.co.uk has senior engineers across Veeam Backup & Replication, Datto BCDR, Microsoft Azure Backup, and the underlying VMware / Hyper-V / NAS / object-storage layers.
Call 01923 372471 — senior engineer answers directly. Same-day on-site across London and the South East.
FAQ
My job has been showing "Success with warnings" for weeks. Is that OK? No. Warning indicates the backup ran but with degraded state — most often falling back from application-aware to crash-consistent. For SQL/Exchange/AD this means the backup is taken but not in a properly recoverable state. Investigate the warning, fix the root cause, get back to clean Success.
Should I delete old backup files manually if the repo is full? No — Veeam tracks files in its database, and manual deletion creates orphan entries. Reduce retention in the job, run an Active Full to start a new chain, or move data to a larger repository.
Can I run backups during business hours? Yes for most workloads — VSS snapshots are quick (typically under a minute) and the backup itself runs against the snapshot, not the live data. SQL/Exchange-heavy environments benefit from out-of-hours runs to avoid log truncation timing issues. Check vssadmin list writers during a backup window to spot any writer that's struggling.
My backup repository is on the same domain as my production environment. Is that a problem? For ransomware resilience, yes — a domain admin compromise reaches your backups. The modern Veeam pattern is: production data on the domain, backups on a hardened Linux repository in a workgroup, with credentials only known to Veeam, immutability enabled. This survives a domain-wide compromise that an in-domain repository would not.
Do I need both Veeam Replication and Veeam Backup? Replication gives near-real-time copies of running VMs to a secondary host (low RPO/RTO for infrastructure failure). Backup gives point-in-time copies with retention (recovery from corruption, deletion, ransomware). Most businesses need both — replication for "the host died" and backup for "someone deleted important files three weeks ago".
How big should my Veeam repository be? Rule of thumb: roughly the size of your protected production data, multiplied by 1.5 for a year of forward-incremental retention with monthly active fulls and 4 weekly retention points. So 5 TB of production = roughly 7-8 TB of repository. Tune by watching real growth over a few months.
If backups are failing with a restore looming, our emergency backup & server recovery can step in before it becomes a data-loss event.
Part of a series of disaster-recovery references. If your backups are broken and a restore is approaching: 01923 372471.
References
Authoritative vendor documentation behind this guide:
- Veeam Help Center — product documentation
- Microsoft Learn — Volume Shadow Copy Service (VSS)
- Veeam R&D Forums
Don't read guides when your systems are down. Call and get a senior engineer on the phone directly.