Zero-Downtime Deployment Strategies

Anatoly Oshmanovsky

DevOps

Zero-Downtime Deployment Strategies

Published: 16.03.2026 · ~6 min · 141 views

Zero-downtime deployment is the practice of releasing new versions of an application without any interruption to end users. In a world where even seconds of downtime can mean lost revenue, damaged reputation, and broken SLAs, mastering zero-downtime deployment is essential for any production-grade web service.

Why Downtime Happens During Deployments

Traditional deployments cause downtime because the application must be stopped to replace its code and restarted to load the new version. During this window, incoming requests either fail or receive errors. Common causes include:

Stopping the old version before the new one is ready
Database migrations that lock tables
Configuration changes that require restarts
Dependency updates that break compatibility
Cold start time for the new application instance

Blue-Green Deployment

Blue-green deployment maintains two identical production environments. At any time, one (blue) serves live traffic while the other (green) is idle or being updated. When the new version is deployed to the green environment and validated, traffic is switched from blue to green.

# Conceptual flow
1. Blue environment serves production traffic
2. Deploy new version to Green environment
3. Run smoke tests on Green
4. Switch load balancer from Blue → Green
5. Green now serves production traffic
6. Blue becomes the staging/rollback environment

# Nginx switch (simplified)
upstream app {
    # Before switch:
    # server blue.internal:8080;
    # After switch:
    server green.internal:8080;
}

Advantages: Instant rollback by switching back to blue. Full environment validation before traffic switch. No mixed-version traffic.

Disadvantages: Requires double the infrastructure. Database schema changes must be backward compatible across both environments.

Rolling Deployment

Rolling deployment gradually replaces instances of the old version with the new version, one (or a few) at a time. At any point during the rollout, both versions may be serving traffic simultaneously.

# Kubernetes rolling update
apiVersion: apps/v1
kind: Deployment
metadata:
  name: web-app
spec:
  replicas: 4
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxUnavailable: 1    # At most 1 pod down during update
      maxSurge: 1           # At most 1 extra pod during update
  template:
    spec:
      containers:
      - name: app
        image: myapp:v2.0
        readinessProbe:
          httpGet:
            path: /health
            port: 8080
          initialDelaySeconds: 10
          periodSeconds: 5

Advantages: No need for double infrastructure. Gradual rollout reduces blast radius. Works natively with container orchestrators.

Disadvantages: Both versions run simultaneously, requiring backward compatibility. Rollback is slower (must roll back each instance). Session management across versions needs careful handling.

Canary Deployment

Canary deployment routes a small percentage of traffic to the new version while the majority continues to hit the old version. If metrics look healthy, the percentage is gradually increased until 100% of traffic reaches the new version.

# Nginx canary with weighted upstream
upstream app {
    server old-version.internal:8080 weight=90;
    server new-version.internal:8080 weight=10;  # 10% canary
}

# Gradual rollout stages:
# Stage 1: 5% to canary → monitor for 15 minutes
# Stage 2: 25% to canary → monitor for 30 minutes
# Stage 3: 50% to canary → monitor for 30 minutes
# Stage 4: 100% to canary → deployment complete

Advantages: Lowest risk — only a fraction of users are exposed to the new version. Real production traffic validation. Issues are detected before full rollout.

Disadvantages: Requires traffic-splitting infrastructure. Metrics and monitoring must be granular enough to detect issues in the canary pool. Slower overall deployment time.

Database Migration Strategies

Database schema changes are the hardest part of zero-downtime deployment. A migration that locks a table or removes a column will break the running application. The solution is the expand-and-contract pattern:

Expand Phase

Add new columns, tables, or indexes without removing anything. The old application version continues to work because nothing it depends on has changed.

-- Step 1: Add new column (nullable, no default — instant in MySQL 8+)
ALTER TABLE users ADD COLUMN email_verified TINYINT(1) DEFAULT 0;

-- Step 2: Deploy new application version that writes to both old and new columns
-- Step 3: Backfill existing data
UPDATE users SET email_verified = 1 WHERE verified_at IS NOT NULL;

Contract Phase

After the new application version is stable and all data has been migrated, remove the old columns or tables in a subsequent deployment.

-- Step 4: Deploy version that only reads from new column
-- Step 5: Remove old column (in a separate migration)
ALTER TABLE users DROP COLUMN verified_at;

Graceful Shutdown

Applications must handle in-flight requests during shutdown. A graceful shutdown sequence looks like this:

1. Receive SIGTERM signal
2. Stop accepting new connections
3. Complete all in-flight requests (with a timeout)
4. Close database connections and other resources
5. Exit with code 0

# PHP-FPM graceful shutdown
process_control_timeout = 30  # Wait up to 30s for workers to finish

# Nginx graceful shutdown
nginx -s quit  # Waits for active connections to complete

Health Checks and Readiness

Health checks are the glue that holds zero-downtime deployment together. The load balancer must know when a new instance is ready to receive traffic and when an old instance should stop receiving it.

// PHP health check endpoint
// GET /health
$checks = [
    'database' => checkDatabaseConnection(),
    'redis' => checkRedisConnection(),
    'disk' => disk_free_space('/') > 100 * 1024 * 1024,
];

$healthy = !in_array(false, $checks, true);
http_response_code($healthy ? 200 : 503);
echo json_encode(['status' => $healthy ? 'ok' : 'unhealthy', 'checks' => $checks]);

Deployment Checklist

Database migrations are backward compatible (expand-and-contract)
Application supports graceful shutdown (handles SIGTERM)
Health check endpoints are implemented and tested
Load balancer is configured to check health before routing
Rollback plan is documented and tested
Feature flags are in place for risky changes
Monitoring alerts are set for error rate, latency, and availability
Static assets are versioned (cache busting)
Session storage is externalized (Redis, database)

Monitoring During Deployment

Key metrics to watch during and after deployment:

Metric	Normal Range	Alert Threshold
Error rate (5xx)	< 0.1%	> 1%
P95 latency	< 500ms	> 2x baseline
Request rate	Stable	> 20% drop
CPU usage	< 70%	> 90%
Memory usage	< 80%	> 95%
Active connections	Stable	Sudden spike or drop

Summary

Zero-downtime deployment is achievable with any infrastructure stack through a combination of strategies: blue-green for instant switchover, rolling for gradual replacement, and canary for risk-minimized validation. The keys to success are backward-compatible database migrations, graceful shutdown handling, robust health checks, and comprehensive monitoring. Start with the simplest strategy that meets your needs and evolve as your application and team grow.

Check your website right now

Check your site →

Zero-Downtime Deployment Strategies

Why Downtime Happens During Deployments

Blue-Green Deployment

Rolling Deployment

Canary Deployment

Database Migration Strategies

Expand Phase

Contract Phase

Graceful Shutdown

Health Checks and Readiness

Deployment Checklist

Monitoring During Deployment

Summary

Start monitoring for free