Users can't log in. Some users, all users, just one site, just one PC — the symptoms vary, but the call is the same: "the network's broken". What's broken is authentication, and the cause is almost always one of a handful of root issues sitting underneath a longer list of confusing surface symptoms.
This guide walks through how a senior engineer diagnoses AD authentication problems systematically — without firefighting individual symptoms or breaking things further by guessing. Written for IT managers and infrastructure engineers responsible for Windows AD environments on Server 2016, 2019, 2022 or 2025.
If authentication is down across your business right now, call 01923 372471 — a senior engineer answers directly and responds quickly.
Step 1: Define the scope precisely
The first question is not "why is this user broken" — it's "how broken is the environment". The scope of the failure tells you where to look.
Ask:
- One user, or many? One user → user-account problem (locked, expired, disabled, password). Many users → infrastructure problem.
- One PC, or many? One PC → endpoint problem (secure channel, time, DNS, local cache). Many PCs → server-side problem.
- One site, or all? One site → site-local DC problem, replication problem, WAN problem. All sites → forest-level problem.
- Cached creds working but new logons failing? Classic DC unreachable / DNS / authentication path problem.
- Local admin login on the PC works, but domain accounts fail? Confirms the issue is the DC connection, not the PC itself.
- Worked this morning, broken this afternoon? Look at what changed: GPO, patching, time, certificate, network, password resets, DC reboot.
Write down the scope before troubleshooting. This is the difference between five minutes' diagnosis and three hours' guessing.
Step 2: The five things that cause 95% of AD authentication failures
In rough order of frequency:
- Time skew — Kerberos requires client and DC clocks within 5 minutes (default). One drifted clock takes the user offline.
- Secure channel break — the computer account password and the DC's stored copy have drifted, often after a VM restore, system clone, or
sysprepmishap. - DNS misconfiguration — clients pointing at internet DNS, or DCs with the wrong DNS settings, or stale DC records.
- DC unreachable / replication failure — a DC that's down, isolated, or no longer replicating.
- GPO / NTLM / LDAP signing changes — recent security policy that broke a previously-working configuration.
Almost everything you'll see on a typical SMB call is one of these five. Investigate them in order — the cheap checks first.
Step 3: Time
# On the affected client:
w32tm /query /status
w32tm /query /source
# Compare to the DC's time:
# (run on a DC)
w32tm /query /status
# Expected source on a domain-joined client: a DC name
# Expected source on the PDC Emulator: an external NTP source like time.windows.com or pool.ntp.org
If the client's time source is "Local CMOS Clock" or "Free-running System Clock", time sync is broken. If the time differs from the DC by more than a couple of minutes, Kerberos will fail with errors like KRB_AP_ERR_SKEW or "the security database on the server does not have a computer account for this workstation trust relationship".
Fix:
# On the client, force resync:
w32tm /resync /rediscover
# If still wrong, reset W32Time and re-sync:
net stop w32time
w32tm /unregister
w32tm /register
net start w32time
w32tm /resync /rediscover
# On the PDC Emulator, set an external authoritative source:
w32tm /config /manualpeerlist:"0.uk.pool.ntp.org,1.uk.pool.ntp.org,2.uk.pool.ntp.org" /syncfromflags:manual /reliable:yes /update
net stop w32time
net start w32time
Every other DC in the forest takes its time from the PDC Emulator. The PDC Emulator takes its time from the configured external source. Get this hierarchy right and the rest of the domain stays in sync automatically. If the PDC's time itself is wrong, the entire domain's time is wrong, and authentication fails everywhere.
In virtualised environments, disable time sync between the host and the guest DC for guest DCs — Hyper-V or VMware host time-sync fights with w32time's domain-time hierarchy and causes oscillation. On Hyper-V, disable in VM settings → Integration Services → Time synchronization. On VMware, set tools.syncTime = "FALSE" in the VMX file.
Step 4: Secure channel
The secure channel is the trust relationship between a domain-joined computer and AD. The computer has a password (set when it joined, rotated automatically every 30 days). If the password drifts, you get classic errors:
- "The trust relationship between this workstation and the primary domain failed"
- "The security database on the server does not have a computer account for this workstation trust relationship"
- Login works for cached credentials but fails for any uncached operation
\\server\shareaccess fails with "the trust relationship..."
Test:
# On the affected client (logged in as local admin or with cached creds):
Test-ComputerSecureChannel -Verbose
True means the channel is healthy. False means it's broken.
Fix without leaving and rejoining the domain:
# Reset the secure channel using a domain admin credential:
Test-ComputerSecureChannel -Repair -Credential (Get-Credential)
# Older PowerShell / pre-Windows 10 endpoints:
Reset-ComputerMachinePassword -Server <DCname> -Credential (Get-Credential)
# Or via netdom (Windows 7 / Server 2008 R2 era):
netdom resetpwd /server:<DCname> /userd:<domain>\<admin> /passwordd:*
Reboot the client after the repair. The reset only takes effect on next authentication and can leave a transient broken state.
For domain controllers themselves, do not use these commands — see the dedicated DC procedure in How to recover a failed domain controller. DC machine accounts have to be reset differently because they are also AD identities.
Why secure channels break
- VM restored from a snapshot older than the last password rotation
- VM cloned without sysprep
- Computer offline for >30 days, came back after the password rotated
- Image restored from a backup taken before the domain join was clean
- Tampering by a user who reinstalled / re-imaged
If you're seeing repeated secure channel breaks across many machines, the underlying cause is a bigger issue — usually backup or imaging discipline.
Step 5: DNS
AD authentication is built on DNS. Wrong DNS = no authentication.
# On the client:
ipconfig /all
# Expected: DNS servers are domain controllers (or DNS servers that host the AD zones).
# NEVER the router. NEVER public DNS like 8.8.8.8 or 1.1.1.1.
# Test name resolution:
nslookup <domain.local> # should return DC IPs
nslookup <DCname>.<domain.local> # should return that DC's IP
nslookup -type=srv _ldap._tcp.<domain.local> # should return DC names with port 389
# Test reverse lookup:
nslookup <DC-IP> # should return DC's hostname
The single most common DNS failure: a client (or worse, a DC) configured with public DNS as primary or secondary. AD records don't exist in public DNS, so the client can find google.com but not its own domain. Fix: every domain-joined machine should have only DCs (or DNS servers serving the AD zones) in its DNS settings. Public DNS goes on the firewall as a forwarder, not on clients.
For DCs themselves: DNS should point to another DC first, then itself (or itself second / loopback). DC pointing only to itself causes the well-known "DC island" problem where the DC believes it has the truth but is unable to validate against peers.
# Set DC DNS correctly (replace IPs):
Set-DnsClientServerAddress -InterfaceAlias "Ethernet" -ServerAddresses ("10.0.0.20","127.0.0.1")
Where 10.0.0.20 is another DC and 127.0.0.1 is loopback (for itself). Never 127.0.0.1 only.
Stale DNS records from a removed DC
After a DC is decommissioned (or dies), its DNS records persist in the AD zones until the scavenging schedule catches them — and clients that resolve the dead DC's name to its old IP will fail authentication intermittently.
# List DC SRV records for the domain:
Resolve-DnsName -Type SRV -Name "_ldap._tcp.<domain.local>"
# List the records hosted in the zone:
Get-DnsServerResourceRecord -ZoneName "<domain.local>" -RRType A
Get-DnsServerResourceRecord -ZoneName "_msdcs.<domain.local>"
Remove any A, AAAA, SRV, or CNAME records pointing to a DC that no longer exists. See the metadata-cleanup section of the DC recovery guide.
Step 6: DC reachability and replication
If clients have the right DNS but still can't authenticate, check the DCs themselves.
# From any DC:
dcdiag /e /v
repadmin /showrepl
repadmin /replsummary
Get-ADDomainController -Filter * | Select Name, Site, Enabled, IPv4Address
# Check that critical services are running on each DC:
Get-Service -ComputerName <DCname> -Name NTDS, Netlogon, "DNS", "kdc", "w32time"
dcdiag failing tests like MachineAccount, Connectivity, Advertising, Replications is your signal. Walk through the failures one DC at a time.
A DC that's online and reachable but not advertising itself (Advertising test failing) is a DC that won't be used by clients even though it appears healthy. Common cause: SYSVOL missing or unhealthy. Get-DfsrState and check the DFSR event log on the DC.
Step 7: NTLM, LDAP signing, and channel binding (security policy changes)
Microsoft has progressively hardened the defaults for LDAP and NTLM authentication across Server 2019, 2022 and 2025. If authentication started failing after a Windows Update or a security tightening exercise, the cause may be:
- LDAP signing required but a legacy app or appliance is using simple LDAP — error event 2887 in the Directory Service log on DCs
- LDAP channel binding required — same kind of issue, different mechanism
- NTLMv1 disabled — old printers, MFPs, NAS boxes, and certain SAML federation gateways still use NTLMv1
- SMBv1 disabled — old NAS, old scan-to-SMB on MFPs, old backup software fails
These changes are usually intentional — but they break things. Check Group Policy:
Computer Configuration > Policies > Windows Settings > Security Settings > Local Policies > Security Options
- Domain controller: LDAP server signing requirements
- Network security: LAN Manager authentication level
- Microsoft network server: Digitally sign communications (always)
Audit before enforcing — Microsoft provides audit modes for both LDAP signing and NTLMv1 deprecation. Run audit, identify the noisy clients, fix or replace them, then enforce.
Step 8: One stubborn user only
If 99% of users work and one specific user can't log in:
- Account locked or disabled?
Get-ADUser <user> -Properties LockedOut, Enabled, AccountExpirationDate, PasswordExpired, PasswordLastSet - Logon hours / workstation restrictions?
Get-ADUser <user> -Properties logonHours, userWorkstations - Group membership changed? Check recent changes — a user removed from "Domain Users" or a security group required by Conditional Access will fail.
- UPN or sAMAccountName changed? Cached credentials on the client may still want the old name.
- Password contains characters that confuse a downstream system? Some VPN appliances and SAML IdPs choke on non-ASCII characters. Reset to a known-good test password.
- Roaming profile corrupt? Errors at logon mentioning the profile, not authentication itself.
- Conditional Access policy excluding their device, location, or sign-in risk? If hybrid-joined to Entra ID, check the Entra sign-in logs.
What NOT to do
- Don't disjoin and rejoin every PC with secure channel issues at scale. Use
Test-ComputerSecureChannel -Repair. Disjoin/rejoin loses local profiles and is unnecessary. - Don't reset every user's password as a first response. You'll create a help desk wave that hides the actual root cause.
- Don't change DNS settings on the firewall as a first move. AD DNS lives on DCs; firewall changes there don't help.
- Don't disable security policies (LDAP signing, NTLMv1) to "fix" auth. Fix the broken client/app instead.
- Don't reboot DCs sequentially under stress — keep at least one DC up so clients always have an authentication target.
Prevention
- Monitor your DCs. Free options: PRTG (free up to 100 sensors), Nagios, Zabbix. Watch services, replication, time, disk, SYSVOL.
- Audit weekly: scheduled
dcdiag /e /vandrepadmin /replsummaryto email. The DCs that fail catastrophically were almost always unhealthy for weeks first. - Fix DNS for every domain-joined machine as part of standard build / GPO. No machine should have public DNS in its config.
- Investigate every secure-channel error on a client — they're a symptom of upstream discipline (cloning, snapshots, backup) gone wrong.
- Keep your DC patching cadence current but not bleeding-edge. Two weeks behind release is the SMB sweet spot.
- Don't run more than one critical FSMO role on the same DC unless the environment is small enough to justify it.
When to call us
Call us if:
- Authentication is broken across the entire business and the cause isn't obvious from the steps above.
- You've fixed time/DNS/secure channel and authentication is still failing.
- Replication is broken across multiple DCs and
repadminis showing errors you don't recognise. - You're seeing LDAP signing or NTLMv1 deprecation rejections and a critical legacy app or appliance can't be updated.
- A recent security policy change (Group Policy, Defender, Intune) appears to have broken auth and the culprit is unclear.
Engineerdirect.co.uk handles AD environments daily across regulated and high-stakes verticals. On-site response across London and the South East within two hours.
Call 01923 372471 — senior engineer answers directly. No call queues.
FAQ
Why does the user get "the trust relationship between this workstation and the primary domain failed"? The computer's secure channel password has drifted from what AD has stored. Most often after a VM restore from snapshot, a clone without sysprep, or a machine being offline longer than the password-rotation interval (30 days by default). Fix with Test-ComputerSecureChannel -Repair -Credential (Get-Credential) and reboot.
Should I disjoin and rejoin the PC to fix it? No, not as a first move. Disjoin/rejoin destroys the local profile and the AD computer account history. Test-ComputerSecureChannel -Repair is non-destructive and works for the same scenarios.
My users can log in but get no Group Policy. Is that an AD failure? Authentication is fine; GPO processing is broken. Common causes: SYSVOL replication failure (DFSR), GPO permission issues (after MS16-072 and the "Authenticated Users read" requirement), DNS pointing at a DC that's not the one with the GPO yet. Run gpresult /h gpresult.html and read the failures.
My DC clock is wrong and won't sync. What's the priority? The PDC Emulator is the time authority for the domain. Find which DC holds it (netdom query fsmo), fix its time source first (external NTP, e.g. 0.uk.pool.ntp.org), then resync the rest. If the PDC has wandered far enough that other DCs reject its time as unreasonable, you may need to manually set time on the PDC before the resync hierarchy will work.
Can I use nltest /sc_reset instead? Yes — nltest /sc_reset:<domain>\<DC> is functionally equivalent to Test-ComputerSecureChannel -Repair for resetting the secure channel against a specific DC. Both work; Test-ComputerSecureChannel is the modern, scriptable form.
My users can authenticate but Outlook keeps prompting for credentials. Is that AD? Possibly — but more likely a stale Modern Authentication token, a missing autodiscover record, or (for hybrid-joined) a Conditional Access policy issue. Check Outlook's Credential Manager, clear cached identities, and check Entra ID sign-in logs. If the user authenticates to Windows fine, the AD authentication path is working.
When no one can log in, the whole business stops — getting authentication back is the core of our emergency Active Directory support.
Part of a series of disaster-recovery references. If your business is locked out of AD right now: 01923 372471.
References
Authoritative vendor documentation behind this guide:
- Microsoft Learn — repadmin command reference
- Microsoft Learn — dcdiag command reference
- Microsoft Learn — Windows Time Service
- Microsoft Learn — Troubleshoot AD replication
Don't read guides when your systems are down. Call and get a senior engineer on the phone directly.