Bloom filter — probabilistic data structure

Igor Verentsov

Что такое Bloom filter

Автор: Igor Verentsov · Обновлено 23 мая 2026

Коротко:

Bloom filter — probabilistic data structure для проверки "element is probably in set" или "definitely not in set". Использует bit array + multiple hash functions. Memory-efficient (килобайты для millions items), но false positives возможны (no false negatives). Применяется: Cassandra (skip disk reads для missing keys), CDN (dedupe), DB indices, Chrome URL safety checks.

Ниже: подробности, пример, смежные термины, FAQ.

Бесплатный онлайн-инструмент — проверка HTTP-заголовков: результат мгновенно, без регистрации.

Проверить свой сайт →

Подробности

False positive rate controllable (typically 1-5%)
No false negatives: "not in set" = definitively not
Cannot remove elements (counting bloom filter могут)
Redis: bf.add, bf.exists (RedisBloom module)
Typical size: 10 bits per element for 1% FPR

Пример

// Redis RedisBloom
> BF.RESERVE users 0.01 100000
> BF.ADD users alice
> BF.EXISTS users alice    → 1 (probably)
> BF.EXISTS users bob      → 0 (definitely not)

Смежные термины

How Bloom filters work

A Bloom filter is a probabilistic data structure used to quickly check if an element might be in a set. It works by hashing the element multiple times and setting corresponding bits in a bit array. If all the bits are set, it indicates that the element might be in the set, but there's a chance of a false positive (indicating presence when it's not there). If any bit is not set, the element is definitely not in the set.

The false positives occur because the Bloom filter uses multiple hash functions to map elements to bits, and the chance of multiple hash functions colliding (mapping to the same bit) increases as more elements are added. However, there are no false negatives—if the filter says an element is not in the set, it's definitely not there.

To reduce the chance of false positives, you can increase the size of the bit array or use more hash functions. However, this will also increase memory usage.

Advantages and disadvantages of Bloom filters

Bloom filters have several advantages, including:

They are memory-efficient, using only a few kilobytes of memory for millions of items.
They provide a fast way to check if an element is in a set, without needing to scan the entire set.
They are useful in situations where a small chance of false positives is acceptable.

However, Bloom filters also have disadvantages, including:

They cannot definitively say that an element is not in a set—there is always a small chance of a false positive.
They require careful tuning of the bit array size and number of hash functions to balance memory usage and false positive rate.
They do not support removing elements from the set efficiently.

Use cases for Bloom filters

Bloom filters are used in a variety of applications, including:

Cassandra: to skip disk reads for missing keys, improving performance.
CDNs: to deduplicate content, reducing bandwidth usage.
Database indices: to quickly check if a key exists in an index, without needing to scan the entire index.
Chrome URL safety checks: to quickly check if a URL is known to be unsafe, without needing to check a full blacklist.
Network traffic analysis: to quickly identify known malicious IP addresses or domains, without needing to check a full list.

In each of these cases, the ability to quickly check for the presence of an element is more important than the small chance of a false positive.

Больше по теме

Гайды

Исследования

Часто задаваемые вопросы

Когда использовать Bloom filter?

Pre-filter перед expensive operations: check before DB query, before CDN miss. Если >95% запросов "not in set" — saves huge load.

False positive rate — как выбрать?

Tradeoff: lower FPR = более crupнный filter. 1% FPR = 10 bits/element. 0.1% = 14 bits/element. Выбирайте по downstream cost.

Alternative: HyperLogLog?

HLL — count unique elements (cardinality), не membership. Разные задачи.

Запустить инструмент, который описан в этой статье

Бесплатный тариф — 10 мониторов, проверки каждые 5 мин, без карты. Платные тарифы — интервал от 1 минуты и проверки из нескольких регионов.

Начать бесплатно Тарифы