API Rate Limiting: Why and How to Implement
Why Rate Limiting Matters
Rate Limiting is a mechanism that restricts the number of API документацию requests within a given time period. Without it, your API is vulnerable to several problems:
- DDoS and brute-force attacks — an attacker can send thousands of requests per second
- Scraping and crawling — bots can mass-extract your data
- Accidental overload — a bug in client code can trigger a request avalanche
- Unfair usage — one client can consume all server resources
Rate Limiting protects the server, ensures fair resource distribution, and is standard practice for any public API.
Rate Limiting Algorithms
Fixed Window
The simplest algorithm: the request counter resets at fixed time intervals (e.g., every minute). A client can make N requests in the current window.
Problem: at window boundaries, a client can send 2N requests in a short period — N at the end of one window and N at the beginning of the next.
Sliding Window
A combination of fixed window and weighted counting. It considers requests from the previous window proportionally to the remaining time. More accurate than fixed window with minimal overhead.
Token Bucket
A bucket fills with tokens at a fixed rate. Each request consumes one token. If the bucket is empty, the request is rejected. Allows short traffic bursts if the bucket has accumulated tokens.
Parameters: bucket size (burst capacity) and refill rate.
Leaky Bucket
Requests are processed at a fixed rate regardless of arrival speed. Incoming requests enter a queue. If the queue is full, they're rejected. Provides the most stable server load.
Sliding Window Log
Stores the timestamp of every request. Counts requests within the last N seconds. Most accurate but most memory-intensive.
Rate Limiting Headers
Standard headers (RFC 6585 and draft-ietf-httpapi-ratelimit-headers) inform clients about limits:
X-RateLimit-Limit: 100 # Max requests per window X-RateLimit-Remaining: 42 # Requests remaining X-RateLimit-Reset: 1640995200 # Reset time (Unix timestamp) Retry-After: 30 # Seconds until retry (on 429)
When limits are exceeded, return HTTP 429 Too Many Requests with the Retry-After header.
Limiting Strategies
By IP Address
The simplest strategy. Good for bot protection but doesn't distinguish users behind NAT — dozens of legitimate users may share one IP.
By API Key or Token
More precise client identification. Allows different limits for different pricing tiers. Standard for commercial APIs.
By User
Limits tied to an authenticated user. Works for SaaS applications with user accounts.
By Endpoint
Different limits for different API endpoints. A search endpoint might have 10 requests/min while a profile endpoint gets 100 requests/min.
Implementation at Different Levels
Web Server Level (nginx)
http {
limit_req_zone $binary_remote_addr zone=api:10m rate=10r/s;
server {
location /api/ {
limit_req zone=api burst=20 nodelay;
limit_req_status 429;
}
}
}
Application Level (PHP with Redis)
function checkRateLimit(string $key, int $limit, int $window): bool {
$redis = getRedis();
$current = $redis->incr($key);
if ($current === 1) {
$redis->expire($key, $window);
}
return $current <= $limit;
}
API Gateway Level
Cloud API Gateways (AWS, Google Cloud, Azure) provide built-in rate limiting with flexible configuration. This is the optimal choice for microservice architectures.
Best Practices
- Always return limit headers — clients should know about limits before exceeding them
- Use 429 status — not 403 or 503. HTTP 429 is specifically designed for rate limiting
- Document limits — specify limits for each endpoint in your API documentation
- Allow bursts — Token Bucket permits brief spikes for legitimate usage
- Separate limits by criticality — authentication endpoints should have stricter limits
- Monitor triggers — track 429 response counts to detect attacks or overly strict limits
Testing Rate Limiting
Verify your rate limiting with the Enterno.io HTTP Checker — send several consecutive requests to your API and confirm that limit headers are present in responses and 429 is returned when exceeded.
Summary
Rate Limiting is a mandatory component of any API. Choose an algorithm that fits your needs (Token Bucket for flexibility, Sliding Window for accuracy), implement it at the appropriate level (nginx, application, or API Gateway), and always inform clients about limits through standard headers.
Check your website right now
Check now →