前言

redis 的布隆过滤器,又可以叫做 rebloom,也可以叫做 redisbloom

先贴一些关于链接:

redisbloom 的官网链接:https://redis.io/bloom/
redisbloom 的 github 链接:https://github.com/RedisBloom/RedisBloom
redisbloom 的文档链接:https://redis.io/docs/latest/develop/data-types/probabilistic...
redisbloom 的 dockerhub 链接:https://hub.docker.com/r/redislabs/rebloom

更加详细的内容可以参考:硬核 | Redis 布隆(Bloom Filter)过滤器原理与实战

安装部署

使用 docker 部署 redis 的 redisbloom ,快速启动一个布隆过滤器

我这里使用 docker 部署作演示

如果你使用 k8s 部署,可能会遇到这个问题:rebloom k8s 报错

docker-compose.yaml

version: "3"

services: 
  redis:
    container_name: rebloom2
    restart: always
    image: redislabs/rebloom:2.6.12
    ports:
      - "6377:6377"
    volumes:
      - ./volumes:/data # /usr/local/redis/data 是你宿主机的路径;/data 是容器内的路径,容器内的 redis 会把需要持久化的数据都保存到 /data 目录下
      - ./redis.conf:/etc/redis/redis.conf # redis.conf 这个文件已经准备好了,可以放到这个路径,也可以自己修改,比如放到项目路径中

redis.conf

# 这个文件的地址,和你的 docker-compose.yaml 中的 /usr/local/redis/redis.conf:/etc/redis/redis.conf 冒号左边的要对应起来
# redis 支持两者持久化机制:RDB&AOF
# https://juejin.cn/post/6844903716290576392

appendonly yes
#default: 持久化文件
appendfilename "appendonly.aof"
#default: 每秒同步一次
appendfsync everysec

port 6377
# 绑定端口,不指定外网可能连不上服务器
bind 0.0.0.0


# maxmemory 100mb
# appendonly yes
# appendfilename "appendonly.aof"
# appendfsync everysec

内存控制

内存参数

启动容器后,日志中会有下面的内容

─➤  docker-compose logs -f -n 1000                                                                                                        130 ↵
rebloom2  | 1:C 05 May 2024 07:24:48.405 # WARNING Memory overcommit must be enabled! Without it, a background save or replication may fail under low memory condition. Being disabled, it can also cause failures without low memory condition, see https://github.com/jemalloc/jemalloc/issues/1328. To fix this issue add 'vm.overcommit_memory = 1' to /etc/sysctl.conf and then reboot or run the command 'sysctl vm.overcommit_memory=1' for this to take effect.
rebloom2  | 1:C 05 May 2024 07:24:48.405 * oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
rebloom2  | 1:C 05 May 2024 07:24:48.405 * Redis version=7.2.4, bits=64, commit=00000000, modified=0, pid=1, just started
rebloom2  | 1:C 05 May 2024 07:24:48.405 * Configuration loaded
rebloom2  | 1:M 05 May 2024 07:24:48.406 * monotonic clock: POSIX clock_gettime
rebloom2  | 1:M 05 May 2024 07:24:48.407 * Running mode=standalone, port=6379.
rebloom2  | 1:M 05 May 2024 07:24:48.407 * <bf> RedisBloom version 2.6.12 (Git=unknown)
rebloom2  | 1:M 05 May 2024 07:24:48.407 * Module 'bf' loaded from /usr/lib/redis/modules/redisbloom.so
rebloom2  | 1:M 05 May 2024 07:24:48.408 * Server initialized
rebloom2  | 1:M 05 May 2024 07:24:48.409 * Loading RDB produced by version 7.2.4
rebloom2  | 1:M 05 May 2024 07:24:48.409 * RDB age 2 seconds
rebloom2  | 1:M 05 May 2024 07:24:48.409 * RDB memory usage when created 0.88 Mb
rebloom2  | 1:M 05 May 2024 07:24:48.409 * Done loading RDB, keys loaded: 0, keys expired: 0.
rebloom2  | 1:M 05 May 2024 07:24:48.409 * DB loaded from disk: 0.000 seconds
rebloom2  | 1:M 05 May 2024 07:24:48.409 * Ready to accept connections tcp

尤其是第一句:

WARNING Memory overcommit must be enabled! Without it, a background save or replication may fail under low memory condition. Being disabled, it can also cause failures without low memory condition, see https://github.com/jemalloc/jemalloc/issues/1328. To fix this issue add 'vm.overcommit_memory = 1' to /etc/sysctl.conf and then reboot or run the command 'sysctl vm.overcommit_memory=1' for this to take effect.

翻译一下就是

警告 必须启用内存超量承诺!如果不启用,后台保存或复制可能会在内存不足的情况下失败。如果禁用,它也会在内存不足的情况下导致故障,请参阅 https://github.com/jemalloc/jemalloc/issues/1328。要解决这个问题,请在 /etc/sysctl.conf 中添加 “vm.overcommit_memory = 1”,然后重启或运行 “sysctl vm.overcommit_memory=1 ”命令使其生效。

关于这个知识点,可以参考:Redis 优化之内存分配控制 vm.overcommit_memory

容量估算防止 oom

参考: redis rebloom 报错 Maximum expansions reached

redisbloom 会随着需要去重的 key 增加而占用更多的内存, 具体多少个 key 对应多少的 RAM,下面测试一下

测试代码

多进程并发插入,开 20 个进程

import redis
from redisbloom.client import Client
from uuid import uuid4
from loguru import logger
from multiprocessing import Process


def get_uuid() -> str:
    return uuid4().hex


def insert_keys(bf, bloom_filter_name, start, end):
    try:
        for i in range(start, end):
            if i % 10000 == 0:
                logger.debug(i)
            key = get_uuid()
            bf.bfAdd(bloom_filter_name, key)
    except Exception as error:
        logger.exception(error)


if __name__ == "__main__":
    # 创建RedisBloom客户端
    bf = Client(host='127.0.0.1', port=6377, db=0)

    # 布隆过滤器的名称
    bloom_filter_name = 'my_bloom_filter'

    processes = []
    num_processes = 20  # 设定进程数量

    # 计算每个进程应该插入的数量
    total_keys = 10000000000
    keys_per_process = total_keys // num_processes

    # 创建并启动进程
    for i in range(num_processes):
        start = i * keys_per_process
        end = start + keys_per_process
        process = Process(target=insert_keys, args=(bf, bloom_filter_name, start, end))
        processes.append(process)
        process.start()

    # 等待所有进程完成
    for process in processes:
        process.join()

    print("插入完成。")

占用的 RAM

插入了 1.5亿key 的情况下,占用内存为 966.61 MB

127.0.0.1:6377> bf.debug my_bloom_filter
 1) "size:155801108"
 2) "bytes:144 bits:1152 hashes:8 hashwidth:64 capacity:100 size:100 ratio:0.005"
 3) "bytes:312 bits:2496 hashes:9 hashwidth:64 capacity:200 size:200 ratio:0.0025"
 4) "bytes:696 bits:5568 hashes:10 hashwidth:64 capacity:400 size:400 ratio:0.00125"
 5) "bytes:1536 bits:12288 hashes:11 hashwidth:64 capacity:800 size:800 ratio:0.000625"
 6) "bytes:3360 bits:26880 hashes:12 hashwidth:64 capacity:1600 size:1600 ratio:0.0003125"
 7) "bytes:7304 bits:58432 hashes:13 hashwidth:64 capacity:3200 size:3200 ratio:0.00015625"
 8) "bytes:15752 bits:126016 hashes:14 hashwidth:64 capacity:6400 size:6400 ratio:7.8125e-05"
 9) "bytes:33808 bits:270464 hashes:15 hashwidth:64 capacity:12800 size:12800 ratio:3.90625e-05"
10) "bytes:72224 bits:577792 hashes:16 hashwidth:64 capacity:25600 size:25600 ratio:1.95313e-05"
11) "bytes:153680 bits:1229440 hashes:17 hashwidth:64 capacity:51200 size:51200 ratio:9.76563e-06"
12) "bytes:325824 bits:2606592 hashes:18 hashwidth:64 capacity:102400 size:102400 ratio:4.88281e-06"
13) "bytes:688576 bits:5508608 hashes:19 hashwidth:64 capacity:204800 size:204800 ratio:2.44141e-06"
14) "bytes:1451016 bits:11608128 hashes:20 hashwidth:64 capacity:409600 size:409600 ratio:1.2207e-06"
15) "bytes:3049760 bits:24398080 hashes:21 hashwidth:64 capacity:819200 size:819200 ratio:6.10352e-07"
16) "bytes:6394984 bits:51159872 hashes:22 hashwidth:64 capacity:1638400 size:1638400 ratio:3.05176e-07"
17) "bytes:13380888 bits:107047104 hashes:23 hashwidth:64 capacity:3276800 size:3276800 ratio:1.52588e-07"
18) "bytes:27943632 bits:223549056 hashes:24 hashwidth:64 capacity:6553600 size:6553600 ratio:7.62939e-08"
19) "bytes:58250968 bits:466007744 hashes:25 hashwidth:64 capacity:13107200 size:13107200 ratio:3.8147e-08"
20) "bytes:121229360 bits:969834880 hashes:26 hashwidth:64 capacity:26214400 size:26214400 ratio:1.90735e-08"
21) "bytes:251913568 bits:2015308544 hashes:27 hashwidth:64 capacity:52428800 size:52428800 ratio:9.53674e-09"
22) "bytes:522736824 bits:4181894592 hashes:28 hashwidth:64 capacity:104857600 size:50943608 ratio:4.76837e-09"

图片.png

3亿的 key ,占用大概 3.22GB 的 RAM


universe_king
3.4k 声望680 粉丝