Skip to content
← All articles

DNS Failover: Automatic Traffic Switching for High Availability

What Is DNS Failover?

DNS failover is an automated mechanism that redirects traffic from a failing server or data center to a healthy one by modifying DNS responses in real time. When a health check detects that the primary server is down, the DNS provider automatically updates the record to point to a backup server, ensuring continuous service availability.

Traditional DNS is static — records are set manually and only change when an administrator updates them. DNS failover adds an intelligence layer: continuous health monitoring combined with automatic record switching. The result is that users are seamlessly routed to working infrastructure without manual intervention.

How DNS Failover Works

The process involves three components working together:

  • Health checks — the DNS provider continuously monitors your servers by sending HTTP requests, TCP Ping, or ICMP pings at regular intervals (typically every 30–60 seconds)
  • Failover logic — when a health check fails a configured number of times (e.g., 3 consecutive failures), the system marks the server as unhealthy
  • DNS record update — the DNS response changes to return the IP of the backup server instead of the failed primary

Example flow:

Normal operation:
  User → DNS query → api.example.com → 203.0.113.10 (primary)

Primary fails health check (3 consecutive failures):
  System marks 203.0.113.10 as DOWN

Failover activated:
  User → DNS query → api.example.com → 203.0.113.20 (backup)

Primary recovers (3 consecutive successes):
  System marks 203.0.113.10 as UP
  User → DNS query → api.example.com → 203.0.113.10 (primary)

TTL and Failover Speed

The critical factor in DNS failover speed is TTL (Time to Live). When a DNS record has a TTL of 3600 seconds (1 hour), resolvers cache the old IP for up to an hour after failover triggers. During this period, users still reach the failed server.

To make failover effective:

TTL ValueFailover SpeedDNS Query LoadUse Case
30s~30–90sVery highCritical services requiring fast failover
60s~1–3 minHighProduction API документацию and websites
300s~5–10 minModerateGeneral web properties
3600s~1 hourLowNot suitable for failover

Trade-off: low TTL means faster failover but increased DNS query volume (more load on DNS infrastructure and slightly higher latency for every request due to more frequent DNS Lookup).

Failover Strategies

Active-Passive

One primary server handles all traffic. One or more backup servers stay idle until failover. Simple and cost-effective, but backup capacity is wasted during normal operation.

# DNS configuration example
api.example.com  A  203.0.113.10  (primary, active)
api.example.com  A  203.0.113.20  (backup, returned only on failover)

Active-Active

Multiple servers share traffic during normal operation (round-robin or weighted). When one fails, its share is distributed among remaining servers. More efficient — no idle capacity — but requires all servers to handle extra load during failover.

# Active-active with health checks
api.example.com  A  203.0.113.10  weight=70  (primary region)
api.example.com  A  203.0.113.20  weight=30  (secondary region)
# If .10 fails, all traffic goes to .20

Geographic (GeoDNS) Failover

Route users to the nearest data center based on their location. If a regional server fails, users are redirected to the next closest healthy region. Combines latency optimization with high availability.

Health Check Configuration

Effective health checks must be:

  • Specific — check the actual service, not just that the server responds to ping. An HTTP check to /health that verifies database connectivity is better than an ICMP ping
  • Frequent — every 30–60 seconds for critical services
  • Resilient — require multiple consecutive failures before triggering failover (prevent flapping from transient network issues)
  • From multiple locations — a health check from a single location may fail due to network path issues, not actual downtime. Use checks from 3+ geographic locations
# Health check configuration example
endpoint: https://api.example.com/health
method: GET
interval: 30s
timeout: 10s
healthy_threshold: 3    # 3 successes to mark UP
unhealthy_threshold: 3  # 3 failures to mark DOWN
expected_status: 200
expected_body: "ok"
check_regions:
  - us-east
  - eu-west
  - ap-southeast

DNS Failover vs Load Balancer Failover

Both provide failover, but at different layers:

FeatureDNS FailoverLoad Balancer
LayerDNS (before connection)Network/Application (L4/L7)
SpeedSeconds to minutes (TTL dependent)Milliseconds to seconds
ScopeCross-region, cross-providerWithin a cluster or region
CostLow (DNS service fee)Higher (LB infrastructure)
GranularityServer/IP levelRequest level
Session persistenceNot possibleSupported

Best practice: use both. load balancer for fast failover within a region, DNS failover for cross-region disaster recovery.

Common Pitfalls

  • High TTL — the most common mistake. A 3600s TTL makes DNS failover nearly useless. Lower to 60–300s for services requiring failover
  • Flapping — aggressive health check thresholds cause rapid switching between primary and backup, confusing caches and users. Use 3+ consecutive failures before failover
  • Untested backup — the backup server has not been tested under production load. Failover activates, and the backup immediately collapses. Test backup capacity regularly
  • Sticky resolvers — some ISP resolvers ignore TTL and cache records longer. You cannot fully control client-side caching behavior
  • No failback plan — once the primary recovers, when and how do you switch back? Automatic failback can be risky if the primary is still unstable
  • Single point of failure in DNS — if the DNS provider itself goes down, failover does not work. Consider multi-provider DNS setups for critical services

Monitoring DNS Failover

Continuously verify your failover setup:

  • Monitor health check status and response times from multiple regions
  • Track DNS propagation benchmark after failover events — use tools like Enterno.io to verify records from global locations
  • Alert on failover events (both activation and recovery)
  • Regularly test failover by simulating primary failure
  • Monitor TTL compliance across major resolvers

Conclusion

DNS failover is an essential component of high availability architecture. It provides cross-region resilience at low cost, complementing load balancers that handle intra-region failover. Configure low TTLs, implement robust health checks from multiple locations, test your backup infrastructure under load, and monitor failover events continuously. Combined with proper monitoring, DNS failover ensures your services remain accessible even when individual servers or entire data centers fail.

Check your website right now

Check your site's DNS →
More articles: DNS
DNS
DNS Records: Complete Guide for Webmasters
14.04.2026 · 123 views
DNS
Best DNS Lookup Tools 2026
15.06.2026 · 40 views
DNS
SPF Too Many DNS Lookups (10-Lookup Limit) Fix
23.06.2026 · 22 views
DNS
Types of DNS Servers Explained: Recursive, Authoritative, Root
15.04.2026 · 136 views