Redis高级数据结构详解

背景

Redis作为最为人熟知的key-value数据库，他的基本数据结构包括String、list、hash、set、sorted set等等，大家都已经耳熟能详了。但是Redis还有一些不是很常用的高级数据结构，如Bitmap、HyperLogLog、GEO，特殊场景下这些数据结构也能排上大用场。下面我们来看一下这些数据结构的底层原理、使用场景，以及如何使用。

1、 bitmap

1.1 介绍：

bitmap 存储的是连续的二进制数字（0 和 1），通过 bitmap, 只需要一个 bit 位来表示某个元素对应的值或者状态，key 就是对应元素本身。我们知道 8 个 bit 可以组成一个 byte，所以 bitmap 本身会极大的节省储存空间。

常用命令： setbit 、getbit 、bitcount、bitop

应用场景：适合需要保存状态信息（比如是否签到、是否登录...）并需要进一步对这些信息进行分析的场景。比如用户签到情况、活跃用户情况、用户行为统计（比如是否点赞过某个视频）。可以用于解决缓存穿透问题

1.2、布隆过滤器

1970 年布隆提出了一种布隆过滤器的算法，用来判断一个元素是否在一个集合中。这种算法由一个二进制数组和一个 Hash 算法组成。基于bitmap实现。

1.2.1 如何使用

存储：首先将给定集合的元素计算hash值，找到对应的位，映射到bitmap中。
使用：然后给一个元素，计算hash值，找到对应的位，如果是1表示存在，否则不存在

1.2.2 误判问题

布隆过滤器存在误判问题，本质上是hash冲突导致的，不同元素的hash值可能重复。
比如A元素在集合中，B元素不在集合中，但是B元素的hash值和A元素相同，那么按照判断逻辑，B元素也会被判断在集合中。
判断的两种结果：

通过hash计算在数组上不一定在集合
通过hash计算不在数组的一定不在集合

解决方法：
解决办法是通过各种方式较少hash冲突，但是不能完全避免。
1）增大数组：增大数组减少hash冲突
2）增加hash函数：增加hash函数较少hash冲突，多次hash是结果更加离散

1.2.3 布隆过滤器的使用

实现方式：

Redis+ Redission
Redis+自主实现
guava

下面我们通过 Redis+自主实现的方式来实现一个过滤器。

public class TestRedisBloomFilter {

    private static final int DAY_SEC = 60 * 60 * 24;

    @Autowired
    private RedisBloomFilter redisBloomFilter;

    @Test
    public void testInsert() throws Exception {
       // System.out.println(redisBloomFilter);
        redisBloomFilter.insert("bloom:user", "20210001", DAY_SEC);
        redisBloomFilter.insert("bloom:user", "20210002", DAY_SEC);
        redisBloomFilter.insert("bloom:user", "20210003", DAY_SEC);
        redisBloomFilter.insert("bloom:user", "20210004", DAY_SEC);
        redisBloomFilter.insert("bloom:user", "20210005", DAY_SEC);
    }

    @Test
    public void testMayExist() throws Exception {
        System.out.println(redisBloomFilter.mayExist("bloom:user", "20210001"));
        System.out.println(redisBloomFilter.mayExist("bloom:user", "20210002"));
        System.out.println(redisBloomFilter.mayExist("bloom:user", "20210003"));
        System.out.println(redisBloomFilter.mayExist("bloom:user", "20211001"));
    }

}

RedisBloomFilter实现

/**
 * 插入元素
 *
 * @param key       原始Redis键，会自动加上前缀
 * @param element   元素值，字符串类型
 * @param expireSec 过期时间（秒）
 */
public void insert(String key, String element, int expireSec) {
    if (key == null || element == null) {
        throw new RuntimeException("键值均不能为空");
    }
    String actualKey = RS_BF_NS.concat(key);

    try (Jedis jedis = jedisPool.getResource()) {
        try (Pipeline pipeline = jedis.pipelined()) {
            //getBitIndices 通过hash函数确定位置
            for (long index : getBitIndices(element)) {
                pipeline.setbit(actualKey, index, true);
            }
            pipeline.syncAndReturnAll();
        } catch (Exception ex) {
            ex.printStackTrace();
        }
        jedis.expire(actualKey, expireSec);
    }
}

/**
 * 检查元素在集合中是否（可能）存在
 *
 * @param key     原始Redis键，会自动加上前缀
 * @param element 元素值，字符串类型
 */
public boolean mayExist(String key, String element) {
    if (key == null || element == null) {
        throw new RuntimeException("键值均不能为空");
    }
    String actualKey = RS_BF_NS.concat(key);
    boolean result = false;

    try (Jedis jedis = jedisPool.getResource()) {
        try (Pipeline pipeline = jedis.pipelined()) {
            for (long index : getBitIndices(element)) {
                pipeline.getbit(actualKey, index);
            }
            result = !pipeline.syncAndReturnAll().contains(false);
        } catch (Exception ex) {
            ex.printStackTrace();
        }
    }
    return result;
}

2 HyperLogLog

HyperLogLog（字符串类型）

大型网站每个网页每天的 UV 数据(独立访客Unique Visitor)，然后让你来开发这个统计模块，你会如何实现？(尽量少的占用存储空间)

Redis 提供了 HyperLogLog 数据结构就是用来解决这种统计问题的。
HyperLogLog 提供不精确的去重计数方案，虽然不精确但是也不是非常不精确，标准误差是 0.81%，这样的精确度已经可以满足上面的 UV统计需求了。

HyperLogLog提供了3个命令: pfadd、pfcount、pfmerge

处理方式\存储	1天	1个月	1年
集合	80M	2.4G	28G
HyperLogLog	15K	450K	5M

HyperLogLog的使用：

添加了8个字符串，有两个相同的，最终计算出来7个。

127.0.0.1:6379> pfadd uv918 u1 u1 u3 u4 u5 u6 u7 u8
(integer) 0
127.0.0.1:6379> pfcount uv918
(integer) 7

下面是向HyperLogLog存入一万个元素，然后获取元素个数，误差<0.81%:

public void count() {
    Jedis jedis = null;
    try {
        jedis = jedisPool.getResource();
        for(int i=0;i<10000;i++){
            jedis.pfadd(RS_HLL_NS+"countest","user"+i);
        }
        long total = jedis.pfcount(RS_HLL_NS+"countest");
        System.out.println("实际次数:" + 10000 + "，HyperLogLog统计次数:"+total);
    } catch (Exception e) {

    } finally {
        jedis.close();
    }
}

3 GEO

Redis 3.2版本提供了GEO(地理信息定位)功能，支持存储地理位置信息用来实现诸如附近位置、摇一摇这类依赖于地理位置信息的功能。
业界比较通用的地理位置距离排序算法是 GeoHash 算法，Redis 也使用 GeoHash 算法。

地图元素的位置数据使用二维的经纬度表示
经度范围 (-180, 180)，纬度范围 (-90, 90)
纬度正负以赤道为界，北正南负
经度正负以本初子午线 (英国格林尼治天文台) 为界，东正西负。

GEO在Java的使用

下面是存储几个地点及它们的经纬度，然后查询beijing附近150公里的地点。

void testAddLocations(){
    Map<String, GeoCoordinate> memberCoordinateMap = new HashMap<>();
    //添加地点及经纬度
    memberCoordinateMap.put("tianjin",new GeoCoordinate(117.12,39.08));
    memberCoordinateMap.put("shijiazhuang",new GeoCoordinate(114.29,38.02));
    memberCoordinateMap.put("tangshan",new GeoCoordinate(118.01,39.38));
    memberCoordinateMap.put("baoding",new GeoCoordinate(115.29,38.51));
    System.out.println(redisGEO.addLocations("cities",memberCoordinateMap));
}

@Test
void testNearby(){
    //查找附近150公里地点
    List<GeoRadiusResponse> responses = redisGEO.nearbyMore("cities","beijing",150,
            true,true);
    for(GeoRadiusResponse city:responses){
        System.out.println(city.getMemberByString());
        System.out.println(city.getDistance());
        //System.out.println(city.getCoordinate());
        System.out.println("-------------------------");
    }
}

RedisGEO的实现

public Long addLocations(String key, Map<String, GeoCoordinate> memberCoordinateMap) {
    Jedis jedis = null;
    try {
        jedis = jedisPool.getResource();
        return jedis.geoadd(RS_GEO_NS+key,memberCoordinateMap);
    } catch (Exception e) {
        return null;
    } finally {
        jedis.close();
    }
}

public List<GeoRadiusResponse> nearbyMore(String key, String member, double radius,
                                          boolean withDist, boolean isASC) {
    Jedis jedis = null;
    try {
        jedis = jedisPool.getResource();
        GeoRadiusParam geoRadiusParam = new GeoRadiusParam();
        if (withDist) geoRadiusParam.withDist();
        if(isASC) geoRadiusParam.sortAscending();
        else geoRadiusParam.sortDescending();
        return jedis.georadiusByMember(RS_GEO_NS+key, member, radius, GeoUnit.KM,geoRadiusParam);
    } catch (Exception e) {
        return null;
    } finally {
        jedis.close();
    }
}

Redis高级数据结构详解

背景

1、 bitmap

1.1 介绍：

1.2、布隆过滤器

1.2.1 如何使用

1.2.2 误判问题

1.2.3 布隆过滤器的使用

2 HyperLogLog

HyperLogLog的使用：

3 GEO

GEO在Java的使用

杜若

引用和评论

Netty源码-业务流程之请求处理

Linux Redis 安装、配置、教程（一）

redis学习之事务与事件

基于k3s部署Nginx、MySQL、PHP和Redis的详细教程

10k star！一款轻量级的Redis客户端工具，贼好用！

如何基于Redis实现延时任务

当 Redis 集群说"分手"：Redis 集群脑裂问题及解决方案