Bloom Filter — Probabilistic Data Structure · Definition & Examples

Igor Verentsov

What is a Bloom Filter

By Igor Verentsov · Updated May 23, 2026

Key idea:

Bloom filter — probabilistic data structure that answers "element is probably in set" or "definitely not in set". Uses bit array + multiple hash functions. Memory-efficient (kilobytes for millions of items), but false positives are possible (no false negatives). Used in Cassandra (skip disk reads for missing keys), CDN (dedupe), DB indices, Chrome URL safety checks.

Below: details, example, related terms, FAQ.

Free online tool — HTTP header checker: instant results, no signup.

Check your site →

Details

False positive rate controllable (typically 1-5%)
No false negatives: "not in set" = definitively not
Cannot remove elements (counting bloom filters can)
Redis: bf.add, bf.exists (RedisBloom module)
Typical size: 10 bits per element for 1% FPR

Example

// Redis RedisBloom
> BF.RESERVE users 0.01 100000
> BF.ADD users alice
> BF.EXISTS users alice    → 1 (probably)
> BF.EXISTS users bob      → 0 (definitely not)

Related Terms

How Bloom Filters Work: Mechanism Explained

A Bloom filter is a space-efficient probabilistic data structure that allows for fast membership checks of elements in a set. It operates on the principle of utilizing a bit array and multiple hash functions. Here’s how it works:

Bit Array Initialization: A Bloom filter starts with a bit array of size m, initialized to all zeros.
Hash Functions: It employs k independent hash functions, each producing a hash value that maps to an index in the bit array.
Adding Elements: To add an element x to the Bloom filter:

Compute the k hash values for x.
Set the bits at the resulting indices in the bit array to 1.

Membership Query: To check if an element y is in the set:

Compute the k hash values for y.
If all the bits at these indices are 1, y is probably in the set; if any bit is 0, y is definitely not in the set.

This mechanism allows Bloom filters to use significantly less memory compared to traditional data structures, making them ideal for applications requiring efficient space usage.

Practical Examples of Bloom Filter Implementation

Implementing a Bloom filter can be achieved using various programming languages. Below are examples in Python and Java:

Python Example

from bitarray import bitarray
import mmh3

class BloomFilter:
    def __init__(self, size, hash_count):
        self.size = size
        self.hash_count = hash_count
        self.bit_array = bitarray(size)
        self.bit_array.setall(0)

    def add(self, item):
        for i in range(self.hash_count):
            index = mmh3.hash(item, i) % self.size
            self.bit_array[index] = 1

    def check(self, item):
        for i in range(self.hash_count):
            index = mmh3.hash(item, i) % self.size
            if self.bit_array[index] == 0:
                return False
        return True

# Usage
bloom = BloomFilter(1000, 7)
bloom.add('example_item')
print(bloom.check('example_item'))  # Output: True
print(bloom.check('not_in_set'))  # Output: False

Java Example

import java.util.BitSet;
import java.util.Random;

public class BloomFilter {
    private BitSet bitSet;
    private int[] hashSeeds;
    private int size;

    public BloomFilter(int size, int hashCount) {
        this.size = size;
        bitSet = new BitSet(size);
        hashSeeds = new int[hashCount];
        Random rand = new Random();
        for (int i = 0; i < hashCount; i++) {
            hashSeeds[i] = rand.nextInt();
        }
    }

    public void add(String item) {
        for (int seed : hashSeeds) {
            int hash = Math.abs(item.hashCode() ^ seed) % size;
            bitSet.set(hash);
        }
    }

    public boolean check(String item) {
        for (int seed : hashSeeds) {
            int hash = Math.abs(item.hashCode() ^ seed) % size;
            if (!bitSet.get(hash)) {
                return false;
            }
        }
        return true;
    }
}

// Usage
BloomFilter bloom = new BloomFilter(1000, 7);
bloom.add("example_item");
System.out.println(bloom.check("example_item"));  // Output: true
System.out.println(bloom.check("not_in_set"));  // Output: false

Common Use Cases of Bloom Filters

Bloom filters are versatile data structures that find application in various domains where space efficiency and quick membership queries are crucial. Here are some common use cases:

Database Indexing: Bloom filters are used to reduce the number of disk reads in databases, such as Apache Cassandra, by filtering out non-existent keys before accessing the disk.
Web Caching: Content Delivery Networks (CDNs) utilize Bloom filters to avoid serving duplicate content by checking if a URL has already been cached.
Network Security: Browsers like Google Chrome implement Bloom filters for URL safety checks. They quickly determine if a URL is potentially harmful without needing to query a centralized database.
Distributed Systems: In distributed hash tables (DHTs), Bloom filters help to reduce the overhead of communication by allowing nodes to quickly determine if they have a particular key.
Spell Checking: Many spell checkers employ Bloom filters to quickly check if a word exists in a dictionary, improving the efficiency of text processing.

These use cases illustrate the practical benefits of Bloom filters, particularly in scenarios where memory conservation and rapid lookup times are essential.

Learn more

How-to

Glossary

What is CDC (Change Data Capture)

Research

Frequently Asked Questions

When to use Bloom filter?

Pre-filter before expensive ops: check before DB query, before CDN miss. If >95% queries are "not in set" — saves huge load.

How to choose false positive rate?

Trade-off: lower FPR = bigger filter. 1% FPR = 10 bits/element. 0.1% = 14 bits/element. Choose by downstream cost.

Alternative: HyperLogLog?

HLL counts unique elements (cardinality), not membership. Different use cases.

Try the live tool that powered this guide

Free plan — 10 monitors, checks every 5 min, no card required. Upgrade for 1-minute interval and multi-region monitoring.

Start free See pricing