API Rate Limiting: Why and How to Implement

Anatoly Oshmanovsky

Infrastructure

API Rate Limiting: Why and How to Implement

Published: 14.03.2026 · ~3 min · 122 views

Why Rate Limiting Matters

rate limiting is a mechanism that restricts the number of API документацию requests within a given time period. Without it, your API is vulnerable to several problems:

DDoS and brute-force attacks — an attacker can send thousands of requests per second
Scraping and crawling — bots can mass-extract your data
Accidental overload — a bug in client code can trigger a request avalanche
Unfair usage — one client can consume all server resources

Rate Limiting protects the server, ensures fair resource distribution, and is standard practice for any public API.

Rate Limiting Algorithms

Fixed Window

The simplest algorithm: the request counter resets at fixed time intervals (e.g., every minute). A client can make N requests in the current window.

Problem: at window boundaries, a client can send 2N requests in a short period — N at the end of one window and N at the beginning of the next.

Sliding Window

A combination of fixed window and weighted counting. It considers requests from the previous window proportionally to the remaining time. More accurate than fixed window with minimal overhead.

Token Bucket

A bucket fills with tokens at a fixed rate. Each request consumes one token. If the bucket is empty, the request is rejected. Allows short traffic bursts if the bucket has accumulated tokens.

Parameters: bucket size (burst capacity) and refill rate.

Leaky Bucket

Requests are processed at a fixed rate regardless of arrival speed. Incoming requests enter a queue. If the queue is full, they're rejected. Provides the most stable server load.

Sliding Window Log

Stores the timestamp of every request. Counts requests within the last N seconds. Most accurate but most memory-intensive.

Rate Limiting Headers

Standard headers (RFC 6585 and draft-ietf-httpapi-ratelimit-headers) inform clients about limits:

X-RateLimit-Limit: 100        # Max requests per window
X-RateLimit-Remaining: 42     # Requests remaining
X-RateLimit-Reset: 1640995200 # Reset time (Unix timestamp)
Retry-After: 30               # Seconds until retry (on 429)

When limits are exceeded, return HTTP 429 Too Many Requests with the Retry-After header.

Limiting Strategies

By IP Address

The simplest strategy. Good for bot protection but doesn't distinguish users behind NAT — dozens of legitimate users may share one IP.

By API Key or Token

More precise client identification. Allows different limits for different pricing tiers. Standard for commercial APIs.

By User

Limits tied to an authenticated user. Works for SaaS applications with user accounts.

By Endpoint

Different limits for different API endpoints. A search endpoint might have 10 requests/min while a profile endpoint gets 100 requests/min.

Implementation at Different Levels

Web Server Level (nginx)

http {
    limit_req_zone $binary_remote_addr zone=api:10m rate=10r/s;

    server {
        location /api/ {
            limit_req zone=api burst=20 nodelay;
            limit_req_status 429;
        }
    }
}

Application Level (PHP with Redis)

function checkRateLimit(string $key, int $limit, int $window): bool {
    $redis = getRedis();
    $current = $redis->incr($key);
    if ($current === 1) {
        $redis->expire($key, $window);
    }
    return $current <= $limit;
}

API Gateway Level

Cloud API Gateways (AWS, Google Cloud, Azure) provide built-in rate limiting with flexible configuration. This is the optimal choice for microservice architectures.

Best Practices

Always return limit headers — clients should know about limits before exceeding them
Use 429 status — not 403 or 503. HTTP 429 is specifically designed for rate limiting
Document limits — specify limits for each endpoint in your API documentation
Allow bursts — Token Bucket permits brief spikes for legitimate usage
Separate limits by criticality — authentication endpoints should have stricter limits
Monitor triggers — track 429 response counts to detect attacks or overly strict limits

Testing Rate Limiting

Verify your rate limiting with the Enterno.io HTTP Checker — send several consecutive requests to your API and confirm that limit headers are present in responses and 429 is returned when exceeded.

Summary

Rate Limiting is a mandatory component of any API. Choose an algorithm that fits your needs (Token Bucket for flexibility, Sliding Window for accuracy), implement it at the appropriate level (nginx, application, or API Gateway), and always inform clients about limits through standard headers.

Check your website right now

Check your site →

API Rate Limiting: Why and How to Implement

Why Rate Limiting Matters

Rate Limiting Algorithms

Fixed Window

Sliding Window

Token Bucket

Leaky Bucket

Sliding Window Log

Rate Limiting Headers

Limiting Strategies

By IP Address

By API Key or Token

By User

By Endpoint

Implementation at Different Levels

Web Server Level (nginx)

Application Level (PHP with Redis)

API Gateway Level

Best Practices

Testing Rate Limiting

Summary

Start monitoring for free