Active Directory

How to Recover a Failed Domain Controller

⏱ 12 min read Written by a Senior Infrastructure Engineer, engineerdirect.co.uk

Applies to: Windows Server 2012 R2, 2016, 2019, 2022, 2025
Severity: Critical
Published: 3 May 2026
Last reviewed: May 2026

Before you start: This guide is for general information. In a live business outage, avoid making repeated changes without a rollback plan. If identity, backup or ransomware systems are involved, preserve logs and evidence before attempting recovery.

When to stop and call an engineer

If the failed DC holds FSMO roles, is your only domain controller, or AD-integrated DNS is failing domain-wide — stop before running ntdsutil or metadata cleanup. A wrong move here can break authentication for the entire business.

📞 Call 01923 372471 Emergency Active Directory & server recovery →

A domain controller has stopped responding. Authentication is failing across the business. Group Policy isn't applying. Email might still be working — for now — because clients have cached credentials, but you're on a clock.

This guide walks through the recovery procedure exactly as a senior engineer would run it: assess, attempt repair, decide whether to recover or replace, and clean up cleanly without leaving orphaned metadata that will haunt you for years. Written for IT managers and infrastructure engineers who have inherited or are responsible for a Windows AD domain.

If your business is offline right now and you need this resolved today, call 01923 372471 — a senior engineer answers directly. If you've got time to work through it, read on.

How to recognise a failed domain controller

You probably already know, but the symptoms cluster like this:

Users cannot authenticate. New logins fail with The trust relationship between this workstation and the primary domain failed or There are currently no logon servers available.
Group Policy stops applying. gpupdate /force returns errors referencing the DC name or 0x80070005.
Other DCs flag the failed one in Event Viewer (Directory Service log, event IDs 1311, 1865, 1925, 2042).
dcdiag and repadmin /showrepl show replication failures pointing at the broken DC.
The DC is unreachable on its IP or hostname, or it's reachable but services like NTDS, Netlogon and DNS Server won't start.

Don't panic-restart. The first move is always to assess, not to act.

Step 1: Assess — is it dead, or just unwell?

From a healthy domain controller (you do have more than one — right?), run:

dcdiag /e /v /f:dcdiag-output.txt
repadmin /showrepl * /csv > replication.csv
repadmin /replsummary

These three commands tell you, in order: what's failing across all DCs, the exact replication state of every partition between every pair, and a summary view of who hasn't talked to whom and for how long.

The single most important number to find is how long since the failed DC last replicated successfully. If that number is approaching your tombstone lifetime (default 180 days on domains created on Server 2003 SP1 or later, 60 days on older domains — confirm with Get-ADObject -Identity "CN=Directory Service,CN=Windows NT,CN=Services,$((Get-ADRootDSE).configurationNamingContext)" -Properties tombstoneLifetime), you cannot bring this DC back. Replicating a DC that has been offline longer than the tombstone lifetime causes lingering object corruption that propagates across the forest. Don't do it. Demote and rebuild instead.

If the DC is within tombstone lifetime, you have options.

Step 2: Try to recover the DC in place

Get console access — physical, iDRAC, iLO, Hyper-V console, vSphere console. RDP will not help if the box is in a failed state.

Boot the DC and check Event Viewer:

Directory Service log
DNS Server log
System log around the time of failure

You're looking for the proximate cause. Common ones:

Symptom	Likely cause	First thing to try
NTDS service won't start, JET errors in event log	Database corruption	Boot to Directory Services Restore Mode (DSRM) and run `ntdsutil` repair
Disk full / out of space	C:\ is full, log drive full	Free space, restart NTDS
Time skew > 5 min	W32Time broken	`w32tm /resync /rediscover` from a healthy DC, fix PDC's time source
Secure channel broken	Machine account password mismatch	Don't reset on a DC — see Step 3
GPT update failed, boot loop	Failed update	Boot to last known good, uninstall update via `wusa /uninstall`
Hardware fault (RAID degraded, bad RAM)	Physical failure	Stop. Image the disks before any further action

If you can boot the DC and get to a command prompt, run:

dcdiag /e /v
repadmin /showrepl

— from the broken DC itself this time. If replication is still flowing inbound and outbound, and dcdiag passes the critical tests (Connectivity, Advertising, Replications, MachineAccount, Services, SystemLog), you've recovered. Reboot, monitor for 24 hours, document what failed in your Autotask / ticketing system, and move on.

If replication is one-way, partial, or completely broken, proceed to Step 3.

NTDS database integrity check

If the DC boots but NTDS won't start, drop into DSRM (boot, F8, Directory Services Restore Mode — you'll need the DSRM password set when the role was promoted; if you don't know it, reset it from another DC with ntdsutil "set dsrm password" "reset password on server <dcname>").

In DSRM, check database integrity:

ntdsutil
activate instance ntds
files
integrity

If integrity passes but the database won't mount, try a semantic database analysis:

ntdsutil
activate instance ntds
semantic database analysis
go fixup

If integrity fails, you have a corrupt NTDS.dit. Do not keep trying to repair in place beyond a single recover pass — every attempt risks more damage. Treat the DC as lost (Step 4).

Step 3: When secure channel or trust is broken

Don't use Reset-ComputerMachinePassword or netdom resetpwd on a domain controller. The DC's machine account is also its identity in AD; resetting it the way you'd reset a member server's password leaves the directory in an inconsistent state.

Instead, on a broken DC that will still boot:

Stop and disable the Kerberos Key Distribution Center service.
Reboot.
Run netdom resetpwd /server:<healthy-dc> /userd:<domain>\<admin> /passwordd:* from the broken DC.
Re-enable and start KDC.
Reboot again.
Force replication: repadmin /syncall /AeP.

This procedure resets the DC's secure channel without breaking its directory identity. If it doesn't restore replication within 30 minutes, the DC is too far gone — move to Step 4.

Step 4: Demote and rebuild

If the DC is unrecoverable but still boots and is on the network, demote it cleanly:

Uninstall-ADDSDomainController -DemoteOperationMasterRole -ForceRemoval -LocalAdministratorPassword (ConvertTo-SecureString "TempPass123!" -AsPlainText -Force)

The -ForceRemoval flag tells AD this DC is not in good standing and to remove it without trying to coordinate with the rest of the domain. Once it reboots as a member server, immediately power it off and run metadata cleanup from a healthy DC (Step 5).

If the DC is completely dead — no boot, no network, no console — you cannot demote it. Skip directly to metadata cleanup. The DC's record exists in AD but the DC itself never will again.

Build the replacement

On a fresh Server 2022 or 2025 VM (sized to match your existing DCs), join the domain, then promote:

Install-WindowsFeature AD-Domain-Services -IncludeManagementTools
Install-ADDSDomainController `
  -InstallDns `
  -DomainName "yourdomain.local" `
  -SiteName "Default-First-Site-Name" `
  -DatabasePath "C:\Windows\NTDS" `
  -LogPath "C:\Windows\NTDS" `
  -SysvolPath "C:\Windows\SYSVOL" `
  -NoRebootOnCompletion:$false `
  -Credential (Get-Credential)

Set its IP statically. Set its DNS to another DC first, then itself second (never itself first — this causes well-known AD weirdness). Set the time source on the PDC, and let everything else inherit. Verify with dcdiag /e once it's up.

Step 5: Metadata cleanup

Whether you demoted with -ForceRemoval or the DC died without warning, AD still has its records — DC object, NTDS Settings, replication links, FRS/DFSR membership, DNS entries. Leave these in place and you'll see permanent replication errors and ghost DCs in repadmin /replsummary for years.

The modern way (Server 2008+ AD Users and Computers): right-click the dead DC in the Domain Controllers OU, choose Delete. AD detects the DC isn't responding and offers to remove the metadata. Confirm. This handles the bulk.

The thorough way (always run this even after the GUI cleanup):

ntdsutil
metadata cleanup
connect to server <healthy-dc>
quit
select operation target
list domains
select domain <number>
list sites
select site <number>
list servers in site
select server <dead-dc-number>
quit
remove selected server

Then check DNS and remove the dead DC's records:

_msdcs.<domain> zone — remove the dead DC's CNAME (the GUID-prefixed record) and any A/AAAA records.
Forward lookup zone — remove A/AAAA records for the dead DC's hostname.
The same DC's records in _sites, _tcp, _udp, gc._msdcs, etc. The Active Directory Sites and Services GUI will also clean up site links and connection objects.

Finally, if the dead DC held FSMO roles, see How to seize FSMO roles from a dead domain controller.

Step 6: Verify

After replacement and cleanup, run from any healthy DC:

dcdiag /e /v
repadmin /showrepl
repadmin /replsummary
Get-ADDomainController -Filter * | Select-Object Name, Site, OperatingSystem, IPv4Address
netdom query fsmo

All five should be clean. dcdiag should pass every test on every DC. repadmin /replsummary should show no failures and recent replication times across the board. The dead DC should appear nowhere.

Document the incident, the root cause, and the recovery steps in your ticketing system. If this was a hardware failure, capture the failed components' serials for warranty.

What NOT to do

These are the mistakes that turn a recoverable DC failure into a multi-day forest-wide cleanup:

Don't restore a DC from a backup older than the tombstone lifetime. Lingering objects will replicate out and corrupt the directory.
Don't bring a "fixed" DC back online after you've seized FSMO roles to another DC. This causes a USN rollback. The recovered DC will receive replication updates with USNs lower than its own and silently stop replicating — and other DCs will start filling with inconsistent data. If you've seized FSMO, the original DC must be permanently retired.
Don't dcpromo /forceremoval on a DC that's still talking to AD. Use Uninstall-ADDSDomainController -ForceRemoval only as a last resort, and follow it with metadata cleanup.
Don't restore a DC with VM snapshots / checkpoints unless your hypervisor and AD version support VM-GenerationID (Server 2012+ on Hyper-V 2012+ / vSphere 5.0+). Older combinations cause USN rollback. Even on supported versions, snapshot-restoring a DC is a last resort.
Don't reset a DC's machine account password the way you would for a member server. See Step 3.
Don't cleanup metadata before you're sure the DC is staying dead. If there's any chance you can recover it, hold off. Once metadata is removed you are committed to rebuild.

Prevention going forward

Once you're past the immediate fire, the prevention checklist:

Always have at least two DCs per domain. Single-DC domains are a disaster waiting to happen — you've just had one.
Run dcdiag and repadmin /replsummary from a scheduled script weekly. Email the results. Most failed DCs were unhealthy for weeks before they died.
Monitor Directory Service log event IDs 1311, 1865, 1925, 1988, 2042. Any of these is a precursor to bigger problems.
Test your DSRM password annually. A DSRM password no one knows is the same as no DSRM password.
Document FSMO role placement. When the PDC dies at 4pm on a Friday, you don't want to be discovering which DC held what.
Snapshot your DCs before any patching. Server 2012+ on a supported hypervisor handles this safely. It will save you one day.
Take regular system state backups of at least two DCs — Windows Server Backup or Veeam with application-aware processing. The backup must be of the DC system state itself, not just a VM image.

When to call us

Call us if any of the following apply:

The DC has been offline longer than your tombstone lifetime.
You've already attempted recovery, and replication is now broken across multiple DCs.
The dead DC held the Schema Master or PDC Emulator role.
You are seeing USN rollback events (Directory Service event ID 2095 or 2103) on any DC.
The domain has only one DC and it is the one that failed.
This is the second DC failure in 90 days — there's a root cause beyond hardware that needs finding.

These are scenarios where a wrong move propagates across the forest and is much harder to undo than to do right the first time.

Call 01923 372471 — senior engineer answers directly, responds quickly, on-site across London and the South East within 2 hours. No call queues, no junior triage, ceiling price agreed before any work begins.

FAQ

How long can a domain controller be offline before it can't be recovered? The tombstone lifetime — by default 180 days on domains created on Server 2003 SP1 or later, 60 days on older domains. Confirm yours with Get-ADObject -Identity "CN=Directory Service,CN=Windows NT,CN=Services,$((Get-ADRootDSE).configurationNamingContext)" -Properties tombstoneLifetime. A DC offline longer than this cannot be brought back online without risking lingering-object corruption.

Can I just restore the domain controller from a backup? Yes — but only from a system state backup taken less than the tombstone lifetime ago, restored to the same hardware or via VM-GenerationID-safe procedures. Restoring from older backups, or via "VM snapshot revert" on unsupported platforms, causes USN rollback and silent replication failure.

Do I have to seize FSMO roles when a DC dies? Only if the dead DC held FSMO roles and is not coming back. The PDC Emulator and Infrastructure Master roles affect day-to-day operations and should be moved within hours; the Schema Master and Domain Naming Master are less time-critical. Once seized, the original DC must never return to the network.

Will users notice a domain controller failure if there's a second DC? Mostly no — Windows clients fail over to a healthy DC automatically, usually within 30 seconds. Some side effects: GPO that targets a specific DC can fail; tools that resolve a DC by name (rather than _ldap._tcp.<domain>) may break; first logons after the failure may be slightly slow as clients refresh DC discovery.

Is it safe to demote a domain controller using Uninstall-ADDSDomainController -ForceRemoval? Only when normal demotion isn't possible — for example, when the DC can't reach the rest of the forest. The -ForceRemoval flag bypasses replication coordination, so AD never learns the DC is gone. You must run metadata cleanup afterwards from a healthy DC.

How do I know which FSMO roles the failed DC held? From a healthy DC: netdom query fsmo. This shows all five FSMO role holders. Run it now, before you have an incident, and save the output somewhere accessible.

If a domain controller is down across your business and you'd rather hand it to someone who does this every week, that is exactly what our emergency Active Directory support is for.

This guide is one of a series of disaster-recovery references for IT managers and infrastructure engineers. If your domain is offline right now: 01923 372471.

References

These are the authoritative vendor and standards resources behind this guide:

Dealing with this right now?

Don't read guides when your systems are down. Call and get a senior engineer on the phone directly.

📞 01923 372471