DNS Failover: Automatic Traffic Switching for High Availability

Anatoly Oshmanovsky

DNS

DNS Failover: Automatic Traffic Switching for High Availability

Published: 16.03.2026 · ~5 min · 259 views

What Is DNS Failover?

DNS failover is an automated mechanism that redirects traffic from a failing server or data center to a healthy one by modifying DNS responses in real time. When a health check detects that the primary server is down, the DNS provider automatically updates the record to point to a backup server, ensuring continuous service availability.

Traditional DNS is static — records are set manually and only change when an administrator updates them. DNS failover adds an intelligence layer: continuous health monitoring combined with automatic record switching. The result is that users are seamlessly routed to working infrastructure without manual intervention.

How DNS Failover Works

The process involves three components working together:

Health checks — the DNS provider continuously monitors your servers by sending HTTP requests, TCP Ping, or ICMP pings at regular intervals (typically every 30–60 seconds)
Failover logic — when a health check fails a configured number of times (e.g., 3 consecutive failures), the system marks the server as unhealthy
DNS record update — the DNS response changes to return the IP of the backup server instead of the failed primary

Example flow:

Normal operation:
  User → DNS query → api.example.com → 203.0.113.10 (primary)

Primary fails health check (3 consecutive failures):
  System marks 203.0.113.10 as DOWN

Failover activated:
  User → DNS query → api.example.com → 203.0.113.20 (backup)

Primary recovers (3 consecutive successes):
  System marks 203.0.113.10 as UP
  User → DNS query → api.example.com → 203.0.113.10 (primary)

TTL and Failover Speed

The critical factor in DNS failover speed is TTL (Time to Live). When a DNS record has a TTL of 3600 seconds (1 hour), resolvers cache the old IP for up to an hour after failover triggers. During this period, users still reach the failed server.

To make failover effective:

TTL Value	Failover Speed	DNS Query Load	Use Case
30s	~30–90s	Very high	Critical services requiring fast failover
60s	~1–3 min	High	Production API документацию and websites
300s	~5–10 min	Moderate	General web properties
3600s	~1 hour	Low	Not suitable for failover

Trade-off: low TTL means faster failover but increased DNS query volume (more load on DNS infrastructure and slightly higher latency for every request due to more frequent DNS Lookup).

Failover Strategies

Active-Passive

One primary server handles all traffic. One or more backup servers stay idle until failover. Simple and cost-effective, but backup capacity is wasted during normal operation.

# DNS configuration example
api.example.com  A  203.0.113.10  (primary, active)
api.example.com  A  203.0.113.20  (backup, returned only on failover)

Active-Active

Multiple servers share traffic during normal operation (round-robin or weighted). When one fails, its share is distributed among remaining servers. More efficient — no idle capacity — but requires all servers to handle extra load during failover.

# Active-active with health checks
api.example.com  A  203.0.113.10  weight=70  (primary region)
api.example.com  A  203.0.113.20  weight=30  (secondary region)
# If .10 fails, all traffic goes to .20

Geographic (GeoDNS) Failover

Route users to the nearest data center based on their location. If a regional server fails, users are redirected to the next closest healthy region. Combines latency optimization with high availability.

Health Check Configuration

Effective health checks must be:

Specific — check the actual service, not just that the server responds to ping. An HTTP check to /health that verifies database connectivity is better than an ICMP ping
Frequent — every 30–60 seconds for critical services
Resilient — require multiple consecutive failures before triggering failover (prevent flapping from transient network issues)
From multiple locations — a health check from a single location may fail due to network path issues, not actual downtime. Use checks from 3+ geographic locations

# Health check configuration example
endpoint: https://api.example.com/health
method: GET
interval: 30s
timeout: 10s
healthy_threshold: 3    # 3 successes to mark UP
unhealthy_threshold: 3  # 3 failures to mark DOWN
expected_status: 200
expected_body: "ok"
check_regions:
  - us-east
  - eu-west
  - ap-southeast

DNS Failover vs Load Balancer Failover

Both provide failover, but at different layers:

Feature	DNS Failover	Load Balancer
Layer	DNS (before connection)	Network/Application (L4/L7)
Speed	Seconds to minutes (TTL dependent)	Milliseconds to seconds
Scope	Cross-region, cross-provider	Within a cluster or region
Cost	Low (DNS service fee)	Higher (LB infrastructure)
Granularity	Server/IP level	Request level
Session persistence	Not possible	Supported

Best practice: use both. load balancer for fast failover within a region, DNS failover for cross-region disaster recovery.

Common Pitfalls

High TTL — the most common mistake. A 3600s TTL makes DNS failover nearly useless. Lower to 60–300s for services requiring failover
Flapping — aggressive health check thresholds cause rapid switching between primary and backup, confusing caches and users. Use 3+ consecutive failures before failover
Untested backup — the backup server has not been tested under production load. Failover activates, and the backup immediately collapses. Test backup capacity regularly
Sticky resolvers — some ISP resolvers ignore TTL and cache records longer. You cannot fully control client-side caching behavior
No failback plan — once the primary recovers, when and how do you switch back? Automatic failback can be risky if the primary is still unstable
Single point of failure in DNS — if the DNS provider itself goes down, failover does not work. Consider multi-provider DNS setups for critical services

Monitoring DNS Failover

Continuously verify your failover setup:

Monitor health check status and response times from multiple regions
Track DNS propagation benchmark after failover events — use tools like Enterno.io to verify records from global locations
Alert on failover events (both activation and recovery)
Regularly test failover by simulating primary failure
Monitor TTL compliance across major resolvers

Conclusion

DNS failover is an essential component of high availability architecture. It provides cross-region resilience at low cost, complementing load balancers that handle intra-region failover. Configure low TTLs, implement robust health checks from multiple locations, test your backup infrastructure under load, and monitor failover events continuously. Combined with proper monitoring, DNS failover ensures your services remain accessible even when individual servers or entire data centers fail.

Check your website right now

Check your site's DNS →

DNS Failover: Automatic Traffic Switching for High Availability

What Is DNS Failover?

How DNS Failover Works

TTL and Failover Speed

Failover Strategies

Active-Passive

Active-Active

Geographic (GeoDNS) Failover

Health Check Configuration

DNS Failover vs Load Balancer Failover

Common Pitfalls

Monitoring DNS Failover

Conclusion

Start monitoring for free