A domain controller has stopped responding. Authentication is failing across the business. Group Policy isn't applying. Email might still be working — for now — because clients have cached credentials, but you're on a clock.

This guide walks through the recovery procedure exactly as a senior engineer would run it: assess, attempt repair, decide whether to recover or replace, and clean up cleanly without leaving orphaned metadata that will haunt you for years. Written for IT managers and infrastructure engineers who have inherited or are responsible for a Windows AD domain.

If your business is offline right now and you need this resolved today, call 01923 372471 — a senior engineer answers directly. If you've got time to work through it, read on.

How to recognise a failed domain controller

You probably already know, but the symptoms cluster like this:

Don't panic-restart. The first move is always to assess, not to act.

Step 1: Assess — is it dead, or just unwell?

From a healthy domain controller (you do have more than one — right?), run:

dcdiag /e /v /f:dcdiag-output.txt
repadmin /showrepl * /csv > replication.csv
repadmin /replsummary

These three commands tell you, in order: what's failing across all DCs, the exact replication state of every partition between every pair, and a summary view of who hasn't talked to whom and for how long.

The single most important number to find is how long since the failed DC last replicated successfully. If that number is approaching your tombstone lifetime (default 180 days on domains created on Server 2003 SP1 or later, 60 days on older domains — confirm with Get-ADObject -Identity "CN=Directory Service,CN=Windows NT,CN=Services,$((Get-ADRootDSE).configurationNamingContext)" -Properties tombstoneLifetime), you cannot bring this DC back. Replicating a DC that has been offline longer than the tombstone lifetime causes lingering object corruption that propagates across the forest. Don't do it. Demote and rebuild instead.

If the DC is within tombstone lifetime, you have options.

Step 2: Try to recover the DC in place

Get console access — physical, iDRAC, iLO, Hyper-V console, vSphere console. RDP will not help if the box is in a failed state.

Boot the DC and check Event Viewer:

You're looking for the proximate cause. Common ones:

SymptomLikely causeFirst thing to try
NTDS service won't start, JET errors in event logDatabase corruptionBoot to Directory Services Restore Mode (DSRM) and run ntdsutil repair
Disk full / out of spaceC:\ is full, log drive fullFree space, restart NTDS
Time skew > 5 minW32Time brokenw32tm /resync /rediscover from a healthy DC, fix PDC's time source
Secure channel brokenMachine account password mismatchDon't reset on a DC — see Step 3
GPT update failed, boot loopFailed updateBoot to last known good, uninstall update via wusa /uninstall
Hardware fault (RAID degraded, bad RAM)Physical failureStop. Image the disks before any further action

If you can boot the DC and get to a command prompt, run:

dcdiag /e /v
repadmin /showrepl

— from the broken DC itself this time. If replication is still flowing inbound and outbound, and dcdiag passes the critical tests (Connectivity, Advertising, Replications, MachineAccount, Services, SystemLog), you've recovered. Reboot, monitor for 24 hours, document what failed in your Autotask / ticketing system, and move on.

If replication is one-way, partial, or completely broken, proceed to Step 3.

NTDS database integrity check

If the DC boots but NTDS won't start, drop into DSRM (boot, F8, Directory Services Restore Mode — you'll need the DSRM password set when the role was promoted; if you don't know it, reset it from another DC with ntdsutil "set dsrm password" "reset password on server <dcname>").

In DSRM, check database integrity:

ntdsutil
activate instance ntds
files
integrity

If integrity passes but the database won't mount, try a semantic database analysis:

ntdsutil
activate instance ntds
semantic database analysis
go fixup

If integrity fails, you have a corrupt NTDS.dit. Do not keep trying to repair in place beyond a single recover pass — every attempt risks more damage. Treat the DC as lost (Step 4).

Step 3: When secure channel or trust is broken

Don't use Reset-ComputerMachinePassword or netdom resetpwd on a domain controller. The DC's machine account is also its identity in AD; resetting it the way you'd reset a member server's password leaves the directory in an inconsistent state.

Instead, on a broken DC that will still boot:

  1. Stop and disable the Kerberos Key Distribution Center service.
  2. Reboot.
  3. Run netdom resetpwd /server:<healthy-dc> /userd:<domain>\<admin> /passwordd:* from the broken DC.
  4. Re-enable and start KDC.
  5. Reboot again.
  6. Force replication: repadmin /syncall /AeP.

This procedure resets the DC's secure channel without breaking its directory identity. If it doesn't restore replication within 30 minutes, the DC is too far gone — move to Step 4.

Step 4: Demote and rebuild

If the DC is unrecoverable but still boots and is on the network, demote it cleanly:

Uninstall-ADDSDomainController -DemoteOperationMasterRole -ForceRemoval -LocalAdministratorPassword (ConvertTo-SecureString "TempPass123!" -AsPlainText -Force)

The -ForceRemoval flag tells AD this DC is not in good standing and to remove it without trying to coordinate with the rest of the domain. Once it reboots as a member server, immediately power it off and run metadata cleanup from a healthy DC (Step 5).

If the DC is completely dead — no boot, no network, no console — you cannot demote it. Skip directly to metadata cleanup. The DC's record exists in AD but the DC itself never will again.

Build the replacement

On a fresh Server 2022 or 2025 VM (sized to match your existing DCs), join the domain, then promote:

Install-WindowsFeature AD-Domain-Services -IncludeManagementTools
Install-ADDSDomainController `
  -InstallDns `
  -DomainName "yourdomain.local" `
  -SiteName "Default-First-Site-Name" `
  -DatabasePath "C:\Windows\NTDS" `
  -LogPath "C:\Windows\NTDS" `
  -SysvolPath "C:\Windows\SYSVOL" `
  -NoRebootOnCompletion:$false `
  -Credential (Get-Credential)

Set its IP statically. Set its DNS to another DC first, then itself second (never itself first — this causes well-known AD weirdness). Set the time source on the PDC, and let everything else inherit. Verify with dcdiag /e once it's up.

Step 5: Metadata cleanup

Whether you demoted with -ForceRemoval or the DC died without warning, AD still has its records — DC object, NTDS Settings, replication links, FRS/DFSR membership, DNS entries. Leave these in place and you'll see permanent replication errors and ghost DCs in repadmin /replsummary for years.

The modern way (Server 2008+ AD Users and Computers): right-click the dead DC in the Domain Controllers OU, choose Delete. AD detects the DC isn't responding and offers to remove the metadata. Confirm. This handles the bulk.

The thorough way (always run this even after the GUI cleanup):

ntdsutil
metadata cleanup
connect to server <healthy-dc>
quit
select operation target
list domains
select domain <number>
list sites
select site <number>
list servers in site
select server <dead-dc-number>
quit
remove selected server

Then check DNS and remove the dead DC's records:

Finally, if the dead DC held FSMO roles, see How to seize FSMO roles from a dead domain controller.

Step 6: Verify

After replacement and cleanup, run from any healthy DC:

dcdiag /e /v
repadmin /showrepl
repadmin /replsummary
Get-ADDomainController -Filter * | Select-Object Name, Site, OperatingSystem, IPv4Address
netdom query fsmo

All five should be clean. dcdiag should pass every test on every DC. repadmin /replsummary should show no failures and recent replication times across the board. The dead DC should appear nowhere.

Document the incident, the root cause, and the recovery steps in your ticketing system. If this was a hardware failure, capture the failed components' serials for warranty.

What NOT to do

These are the mistakes that turn a recoverable DC failure into a multi-day forest-wide cleanup:

Prevention going forward

Once you're past the immediate fire, the prevention checklist:

When to call us

Call us if any of the following apply:

These are scenarios where a wrong move propagates across the forest and is much harder to undo than to do right the first time.

Call 01923 372471 — senior engineer answers directly, responds quickly, on-site across London and the South East within 2 hours. No call queues, no junior triage, ceiling price agreed before any work begins.

FAQ

How long can a domain controller be offline before it can't be recovered? The tombstone lifetime — by default 180 days on domains created on Server 2003 SP1 or later, 60 days on older domains. Confirm yours with Get-ADObject -Identity "CN=Directory Service,CN=Windows NT,CN=Services,$((Get-ADRootDSE).configurationNamingContext)" -Properties tombstoneLifetime. A DC offline longer than this cannot be brought back online without risking lingering-object corruption.

Can I just restore the domain controller from a backup? Yes — but only from a system state backup taken less than the tombstone lifetime ago, restored to the same hardware or via VM-GenerationID-safe procedures. Restoring from older backups, or via "VM snapshot revert" on unsupported platforms, causes USN rollback and silent replication failure.

Do I have to seize FSMO roles when a DC dies? Only if the dead DC held FSMO roles and is not coming back. The PDC Emulator and Infrastructure Master roles affect day-to-day operations and should be moved within hours; the Schema Master and Domain Naming Master are less time-critical. Once seized, the original DC must never return to the network.

Will users notice a domain controller failure if there's a second DC? Mostly no — Windows clients fail over to a healthy DC automatically, usually within 30 seconds. Some side effects: GPO that targets a specific DC can fail; tools that resolve a DC by name (rather than _ldap._tcp.<domain>) may break; first logons after the failure may be slightly slow as clients refresh DC discovery.

Is it safe to demote a domain controller using Uninstall-ADDSDomainController -ForceRemoval? Only when normal demotion isn't possible — for example, when the DC can't reach the rest of the forest. The -ForceRemoval flag bypasses replication coordination, so AD never learns the DC is gone. You must run metadata cleanup afterwards from a healthy DC.

How do I know which FSMO roles the failed DC held? From a healthy DC: netdom query fsmo. This shows all five FSMO role holders. Run it now, before you have an incident, and save the output somewhere accessible.


If a domain controller is down across your business and you'd rather hand it to someone who does this every week, that is exactly what our emergency Active Directory support is for.

This guide is one of a series of disaster-recovery references for IT managers and infrastructure engineers. If your domain is offline right now: 01923 372471.

References

These are the authoritative vendor and standards resources behind this guide:

Dealing with this right now?

Don't read guides when your systems are down. Call and get a senior engineer on the phone directly.

📞 01923 372471