Перейти к содержимому
Skip to content
← All articles

API Rate Limiting: Why and How to Implement

Why Rate Limiting Matters

Rate Limiting is a mechanism that restricts the number of API документацию requests within a given time period. Without it, your API is vulnerable to several problems:

Rate Limiting protects the server, ensures fair resource distribution, and is standard practice for any public API.

Rate Limiting Algorithms

Fixed Window

The simplest algorithm: the request counter resets at fixed time intervals (e.g., every minute). A client can make N requests in the current window.

Problem: at window boundaries, a client can send 2N requests in a short period — N at the end of one window and N at the beginning of the next.

Sliding Window

A combination of fixed window and weighted counting. It considers requests from the previous window proportionally to the remaining time. More accurate than fixed window with minimal overhead.

Token Bucket

A bucket fills with tokens at a fixed rate. Each request consumes one token. If the bucket is empty, the request is rejected. Allows short traffic bursts if the bucket has accumulated tokens.

Parameters: bucket size (burst capacity) and refill rate.

Leaky Bucket

Requests are processed at a fixed rate regardless of arrival speed. Incoming requests enter a queue. If the queue is full, they're rejected. Provides the most stable server load.

Sliding Window Log

Stores the timestamp of every request. Counts requests within the last N seconds. Most accurate but most memory-intensive.

Rate Limiting Headers

Standard headers (RFC 6585 and draft-ietf-httpapi-ratelimit-headers) inform clients about limits:

X-RateLimit-Limit: 100        # Max requests per window
X-RateLimit-Remaining: 42     # Requests remaining
X-RateLimit-Reset: 1640995200 # Reset time (Unix timestamp)
Retry-After: 30               # Seconds until retry (on 429)

When limits are exceeded, return HTTP 429 Too Many Requests with the Retry-After header.

Limiting Strategies

By IP Address

The simplest strategy. Good for bot protection but doesn't distinguish users behind NAT — dozens of legitimate users may share one IP.

By API Key or Token

More precise client identification. Allows different limits for different pricing tiers. Standard for commercial APIs.

By User

Limits tied to an authenticated user. Works for SaaS applications with user accounts.

By Endpoint

Different limits for different API endpoints. A search endpoint might have 10 requests/min while a profile endpoint gets 100 requests/min.

Implementation at Different Levels

Web Server Level (nginx)

http {
    limit_req_zone $binary_remote_addr zone=api:10m rate=10r/s;

    server {
        location /api/ {
            limit_req zone=api burst=20 nodelay;
            limit_req_status 429;
        }
    }
}

Application Level (PHP with Redis)

function checkRateLimit(string $key, int $limit, int $window): bool {
    $redis = getRedis();
    $current = $redis->incr($key);
    if ($current === 1) {
        $redis->expire($key, $window);
    }
    return $current <= $limit;
}

API Gateway Level

Cloud API Gateways (AWS, Google Cloud, Azure) provide built-in rate limiting with flexible configuration. This is the optimal choice for microservice architectures.

Best Practices

  1. Always return limit headers — clients should know about limits before exceeding them
  2. Use 429 status — not 403 or 503. HTTP 429 is specifically designed for rate limiting
  3. Document limits — specify limits for each endpoint in your API documentation
  4. Allow bursts — Token Bucket permits brief spikes for legitimate usage
  5. Separate limits by criticality — authentication endpoints should have stricter limits
  6. Monitor triggers — track 429 response counts to detect attacks or overly strict limits

Testing Rate Limiting

Verify your rate limiting with the Enterno.io HTTP Checker — send several consecutive requests to your API and confirm that limit headers are present in responses and 429 is returned when exceeded.

Summary

Rate Limiting is a mandatory component of any API. Choose an algorithm that fits your needs (Token Bucket for flexibility, Sliding Window for accuracy), implement it at the appropriate level (nginx, application, or API Gateway), and always inform clients about limits through standard headers.

Check your website right now

Check now →
More articles: Infrastructure
Infrastructure
What Is a CDN and How Does It Speed Up Your Website
11.03.2026 · 20 views
Infrastructure
Reverse Proxy: How It Works and Why You Need One
16.03.2026 · 20 views
Infrastructure
Load Balancing Algorithms: Round Robin, Least Connections, and More
16.03.2026 · 41 views
Infrastructure
CDN: How It Works and Why You Need It
14.03.2026 · 12 views