Перейти к содержимому
Skip to content
← All articles

Zero-Downtime Deployment Strategies

Zero-downtime deployment is the practice of releasing new versions of an application without any interruption to end users. In a world where even seconds of downtime can mean lost revenue, damaged reputation, and broken SLAs, mastering zero-downtime deployment is essential for any production-grade web service.

Why Downtime Happens During Deployments

Traditional deployments cause downtime because the application must be stopped to replace its code and restarted to load the new version. During this window, incoming requests either fail or receive errors. Common causes include:

Blue-Green Deployment

Blue-green deployment maintains two identical production environments. At any time, one (blue) serves live traffic while the other (green) is idle or being updated. When the new version is deployed to the green environment and validated, traffic is switched from blue to green.

# Conceptual flow
1. Blue environment serves production traffic
2. Deploy new version to Green environment
3. Run smoke tests on Green
4. Switch load balancer from Blue → Green
5. Green now serves production traffic
6. Blue becomes the staging/rollback environment

# Nginx switch (simplified)
upstream app {
    # Before switch:
    # server blue.internal:8080;
    # After switch:
    server green.internal:8080;
}

Advantages: Instant rollback by switching back to blue. Full environment validation before traffic switch. No mixed-version traffic.

Disadvantages: Requires double the infrastructure. Database schema changes must be backward compatible across both environments.

Rolling Deployment

Rolling deployment gradually replaces instances of the old version with the new version, one (or a few) at a time. At any point during the rollout, both versions may be serving traffic simultaneously.

# Kubernetes rolling update
apiVersion: apps/v1
kind: Deployment
metadata:
  name: web-app
spec:
  replicas: 4
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxUnavailable: 1    # At most 1 pod down during update
      maxSurge: 1           # At most 1 extra pod during update
  template:
    spec:
      containers:
      - name: app
        image: myapp:v2.0
        readinessProbe:
          httpGet:
            path: /health
            port: 8080
          initialDelaySeconds: 10
          periodSeconds: 5

Advantages: No need for double infrastructure. Gradual rollout reduces blast radius. Works natively with container orchestrators.

Disadvantages: Both versions run simultaneously, requiring backward compatibility. Rollback is slower (must roll back each instance). Session management across versions needs careful handling.

Canary Deployment

Canary deployment routes a small percentage of traffic to the new version while the majority continues to hit the old version. If metrics look healthy, the percentage is gradually increased until 100% of traffic reaches the new version.

# Nginx canary with weighted upstream
upstream app {
    server old-version.internal:8080 weight=90;
    server new-version.internal:8080 weight=10;  # 10% canary
}

# Gradual rollout stages:
# Stage 1: 5% to canary → monitor for 15 minutes
# Stage 2: 25% to canary → monitor for 30 minutes
# Stage 3: 50% to canary → monitor for 30 minutes
# Stage 4: 100% to canary → deployment complete

Advantages: Lowest risk — only a fraction of users are exposed to the new version. Real production traffic validation. Issues are detected before full rollout.

Disadvantages: Requires traffic-splitting infrastructure. Metrics and monitoring must be granular enough to detect issues in the canary pool. Slower overall deployment time.

Database Migration Strategies

Database schema changes are the hardest part of zero-downtime deployment. A migration that locks a table or removes a column will break the running application. The solution is the expand-and-contract pattern:

Expand Phase

Add new columns, tables, or indexes without removing anything. The old application version continues to work because nothing it depends on has changed.

-- Step 1: Add new column (nullable, no default — instant in MySQL 8+)
ALTER TABLE users ADD COLUMN email_verified TINYINT(1) DEFAULT 0;

-- Step 2: Deploy new application version that writes to both old and new columns
-- Step 3: Backfill existing data
UPDATE users SET email_verified = 1 WHERE verified_at IS NOT NULL;

Contract Phase

After the new application version is stable and all data has been migrated, remove the old columns or tables in a subsequent deployment.

-- Step 4: Deploy version that only reads from new column
-- Step 5: Remove old column (in a separate migration)
ALTER TABLE users DROP COLUMN verified_at;

Graceful Shutdown

Applications must handle in-flight requests during shutdown. A graceful shutdown sequence looks like this:

1. Receive SIGTERM signal
2. Stop accepting new connections
3. Complete all in-flight requests (with a timeout)
4. Close database connections and other resources
5. Exit with code 0

# PHP-FPM graceful shutdown
process_control_timeout = 30  # Wait up to 30s for workers to finish

# Nginx graceful shutdown
nginx -s quit  # Waits for active connections to complete

Health Checks and Readiness

Health checks are the glue that holds zero-downtime deployment together. The load balancer must know when a new instance is ready to receive traffic and when an old instance should stop receiving it.

// PHP health check endpoint
// GET /health
$checks = [
    'database' => checkDatabaseConnection(),
    'redis' => checkRedisConnection(),
    'disk' => disk_free_space('/') > 100 * 1024 * 1024,
];

$healthy = !in_array(false, $checks, true);
http_response_code($healthy ? 200 : 503);
echo json_encode(['status' => $healthy ? 'ok' : 'unhealthy', 'checks' => $checks]);

Deployment Checklist

Monitoring During Deployment

Key metrics to watch during and after deployment:

MetricNormal RangeAlert Threshold
Error rate (5xx)< 0.1%> 1%
P95 latency< 500ms> 2x baseline
Request rateStable> 20% drop
CPU usage< 70%> 90%
Memory usage< 80%> 95%
Active connectionsStableSudden spike or drop

Summary

Zero-downtime deployment is achievable with any infrastructure stack through a combination of strategies: blue-green for instant switchover, rolling for gradual replacement, and canary for risk-minimized validation. The keys to success are backward-compatible database migrations, graceful shutdown handling, robust health checks, and comprehensive monitoring. Start with the simplest strategy that meets your needs and evolve as your application and team grow.

Check your website right now

Check now →
More articles: DevOps
DevOps
Docker Container Monitoring: Metrics, Tools, and Best Practices
16.03.2026 · 11 views
DevOps
Monitoring as Code: Prometheus Rules and Grafana Dashboards
16.03.2026 · 14 views
DevOps
Log Management Best Practices: From Chaos to Clarity
16.03.2026 · 10 views