Redis data tilt and JD open source hotkey source code analysis revealed

1 Before the preface, the friend next to me asked me about hot data. After giving him a rough explanation of a wave of redis data skew cases, I also reviewed some methodologies about hot data processing. I also remembered the JD open source I learned last year. Project hotkey - a framework dedicated to solving hot data problems. Combining the knowledge points associated with the two, through a few small pictures and some rough explanations, let everyone understand the relevant methodology and hotkey source code analysis.

2 Redis data skew
2.1 Definition and harm Let's talk about the definition of data skew first, and borrow the explanation of Baidu entry:

For a cluster system, the general cache is distributed, that is, different nodes are responsible for a certain range of cached data. We have insufficient dispersion of cache data, resulting in a large amount of cache data being concentrated on one or several service nodes, which is called data skew. Generally speaking, data skew is caused by the poor effect of load balancing implementation.

As can be seen from the above definition, the reason for data skew is generally because the effect of LB is not good, resulting in a very concentrated amount of data on some nodes.

So what harm would that be?

If data skew occurs, the processing pressure of the instance that saves a large amount of data or hotspot data will increase, and the speed will be slowed down. It may even cause the instance's memory resources to be exhausted, resulting in a crash. This is what we want to avoid when applying sliced clusters.

2.2 Classification of data skew
2.2.1 Data volume skew (write skew)

1. Icon

As shown in the figure, in some cases, the data distribution on the instances is not balanced, and there is a particularly large amount of data on a certain instance.

2.bigkey leads to tilt

The bigkey happens to be saved on an instance. The value of bigkey is very large (String type), or the bigkey stores a large number of collection elements (collection type), which will lead to an increase in the amount of data of this instance and a corresponding increase in memory resource consumption.

solution

When generating data in the business layer, try to avoid storing too much data in the same key-value pair.
If bigkey happens to be a collection type, there is another method, which is to split bigkey into many small collection type data and store them in different instances.
3. Uneven allocation of Slot leads to skew

Let’s briefly introduce the concept of slot. The full name of slot is Hash Slot. There are 16384 slots in the Redis Cluster slicing cluster. These hash slots are similar to data partitions. Its key, which is mapped to a hash slot. The Redis Cluster scheme uses hash slots to handle the mapping between data and instances.

A picture to explain the mapping distribution of data, hash slots, and instances.

The CRC16(city)%16384 here can be simply understood as taking the hash value of key1 according to the CRC16 algorithm and then modulo the number of slots. The result is that the slot position is 14484, and its corresponding instance node is the third one.

When building a sliced cluster, operation and maintenance needs to manually allocate hash slots and allocate all 16384 slots, otherwise the Redis cluster will not work properly. Due to manual allocation, some instances may be allocated too many slots, resulting in skewed data.

solution

Use the CLUSTER SLOTS command to check

Depending on the slot allocation, use the three commands CLUSTER SETSLOT, CLUSTER GETKEYSINSLOT, and MIGRATE to migrate the slot data. The specific content will not be detailed here, and interested students can learn it by themselves.

4. Hash Tag leads to tilt

Hash Tag definition: refers to when a key contains {}, the entire key is not hashed, but only the string included in {} is hashed.
Suppose the hash algorithm is sha1. For user:{user1}:ids and user:{user1}:tweets, the hash value is equal to sha1(user1).
Hash Tag advantage: If the Hash Tag content of different keys is the same, then the data corresponding to these keys will be mapped to the same Slot, and will be assigned to the same instance at the same time.
Disadvantages of Hash Tag: If it is not used reasonably, a large amount of data may be concentrated on one instance, data skew may occur, and the load in the cluster will be unbalanced.
2.2.2 Data access skew (read skew-hot key problem)

Generally speaking, the skewed data access is caused by the hot key problem, and how to deal with the redis hot key problem is often asked in interviews. Therefore, understanding related concepts and methodologies is also an indispensable part.

1. Icon

As shown in the figure, although the amount of data on each cluster instance is not much different, the data on a certain instance is hot data and is accessed very frequently.

But why is there hot data generated?

2. Causes and hazards of hot keys

1) The data consumed by users is much larger than the data produced (hot selling products, hot news, hot comments, star live broadcasts).

In some unexpected events in daily work and life, such as: the price reduction of some popular products during Double Eleven, when a certain product is clicked or purchased tens of thousands of times, it will form a larger demand In this case, it will cause a hot spot problem.

In the same way, hot news, hot comments, star live broadcasts, etc. that are widely published and browsed, these typical scenarios of reading more and writing less will also generate hot issues.

2) The request fragment set exceeds the performance limit of a single server.

When the server reads data for access, the data is often sharded and divided. During this process, the corresponding key will be accessed on a certain host server. When the access exceeds the server limit, it will lead to hot key problems. produce.

If the hotspots are too concentrated and the hotspot keys have too much cache, which exceeds the current cache capacity, the cache sharding service will be overwhelmed. When the cache service crashes, there are more requests generated at this time, which will be cached on the background DB. Due to the weak performance of the DB itself, request penetration is easy to occur in the face of large requests, which will further lead to an avalanche phenomenon, which will seriously affect the device's performance. performance.

3. Commonly used hot key problem solutions:

Solution 1: Backup hot key

You can copy multiple copies of hot data, and add a random suffix to the key of each copy of the data, so that it and other copy data will not be mapped to the same Slot.

This is equivalent to copying a piece of data to other instances, so that random prefixes are also added when accessing, and the access pressure on one instance is evenly distributed to other instances.

For example, when we put it in the cache, we split the cache key of the corresponding business into multiple different keys. As shown in the figure below, we first split the key into N parts on the side of the update cache. For example, a key name is "good_100", then we can split it into four parts, "good_100_copy1", "good_100_copy2", " good_100_copy3", "good_100_copy4", these N keys need to be changed each time they are updated or added. This step is to remove the keys.

For the service side, we need to find a way to try to make the traffic we access evenly enough.

How to add a suffix to the hot key that you are about to access? There are several ways to hash according to the ip or mac address of the local machine, and then take the remainder of the number of keys to be removed, and finally decide what kind of key suffix is spliced into, so as to hit which machine; one when the service is started The random number takes the remainder of the number of split keys.

The pseudo code is as follows:

 const M = N * 2
//生成随机数
random = GenRandom(0, M)
//构造备份新key
bakHotKey = hotKey + “_” + random
data = redis.GET(bakHotKey)
if data == NULL {
  data = GetFromDB()
  redis.SET(bakHotKey, expireTime + GenRandom(0,5))
}

Solution 2: Local cache + dynamic calculation to automatically discover hotspot cache

This solution solves the problem of hotspot keys by actively discovering hotspots and storing them. First, the Client will also access the SLB, and distribute various requests to the Proxy through the SLB, and the Proxy will forward the request to the back-end Redis in a route-based manner.

The solution to the hot key is to increase the cache on the server side. Specifically, a local cache is added on the Proxy. The local cache uses the LRU algorithm to cache hot data, and the back-end node adds a hot data calculation module to return the hot data.

The main advantages of the Proxy architecture are as follows:

Proxy local cache hotspot, read capability can be scaled horizontally
DB node regularly calculates hot data collection
The DB feedback proxy hotspot data is completely transparent to the client, and there is no need to do any discovery and storage of compatible hotspot data

For the discovery of hotspot data, firstly, the request statistics will be performed on the key in one cycle. After the request level is reached, the hotspot key will be located, and all the hotspot keys will be put into a small LRU linked list, and then through the Proxy When requesting access, if Redis finds that the point to be accessed is a hotspot, it will enter a feedback stage and mark the data at the same time.

An etcd or zk cluster can be used to store feedback hotspot data, and then all local nodes listen to the hotspot data and load it into the local JVM cache.

Acquisition of hotspot data

The processing of hot keys is mainly divided into two forms: writing and reading. During the data writing process, when SLB receives data K1 and writes it to a Redis through a Proxy, the data writing is completed.

If K1 is found to be a hotspot key after calculation by the back-end hotspot module, the Proxy will cache the hotspot, and when the client accesses K1 next time, it does not need to go through Redis.

Finally, since the proxy can be expanded horizontally, the access capability of hotspot data can be enhanced arbitrarily.

Best mature solution: JD open source hotKey This is a relatively mature solution for automatic detection of hot keys and distributed consistent caching. The principle is to make insights on the client side, and then report the corresponding hotkey. After the server side detects it, it will send the corresponding hotkey to the corresponding server for local caching, and it can ensure the consistency of the local cache and the remote cache.

We won't go into details here. The third part of this article: JD open source hotkey source code analysis will lead you to understand its overall principle.

3 JD open source hotkey - automatic detection of hot keys, distributed consistent cache solution
3.1 Solving Pain Points As can be seen from the above, the hot key problem occurs more frequently in systems with relatively high concurrency (especially when doing seckill activities), which is also very harmful to the system.

So what is the purpose of hotkey for this? What are the pain points that need to be addressed? and how it works.

Here is a paragraph from the project to summarize: for any sudden hot data that cannot be perceived in advance, including but not limited to hot data (such as a large burst of requests for the same product), hot users (such as malicious crawler brushes), Hot interfaces (burst massive requests for the same interface), etc., can be accurately detected in milliseconds. Then push these hot data, hot users, etc. to all server-side JVM memory to greatly reduce the impact on the back-end data storage layer, and users can decide how to allocate and use these hot keys (for example, for hot commodities local cache, deny access to hot users, fuse hot interfaces, or return to default). These hot data are consistent across the entire server cluster, and services are isolated.

Core function: hot data detection and push to each server in the cluster

3.2 Integration method The integration method is not described in detail here, and interested students can search by themselves.

3.3 Source code analysis
3.3.1 Introduction to Architecture

1. Panorama list

Process introduction:

By referencing the client package of hotkey, the client reports its own information to the worker at startup, and establishes a long connection with the worker at the same time. Regularly pull the rule information and worker cluster information on the configuration center.
The client calls the ishot() method of hotkey to match the rules first, and then counts whether it is a hot key.
Upload hot key data to worker nodes through scheduled tasks.
After the worker cluster receives all the data about the key (because the hash is used to determine which worker the key is uploaded to, so the same key will only be on the same worker node), it matches the defined rules to determine whether it is Hot key, if so, push it to the client to complete local caching.
2. Role composition

Here is a direct borrowing from the author's description:

1) etcd cluster As a high-performance configuration center, etcd can provide efficient monitoring and subscription services with minimal resource occupation. It is mainly used to store the rule configuration, the IP address of each worker, as well as the detected hot keys and manually added hot keys.

2) The client-side jar package is the reference jar added to the service. After introduction, it is possible to judge whether a key is a hot key in a convenient way. At the same time, the jar completes key reporting, monitoring rule changes in etcd, worker information changes, hot key changes, and local caffeine caching for hot keys.

3) Worker-side cluster The worker-side is an independently deployed Java program. After startup, it will connect to etcd and regularly report its own IP information for the client to obtain the address and make a long connection. After that, the main thing is to accumulate and calculate the key to be tested sent by each client. When the rule threshold set in etcd is reached, the hot key is pushed to each client.

4) Dashboard console The console is a Java program with a visual interface, which is also connected to etcd, and then sets the key rules of each APP in the console, such as 20 times in 2 seconds to calculate the heat. Then, when the worker detects the hot key, it will send the key to etcd, and the dashboard will also monitor the hot key information and store it in the database to save the record. At the same time, the dashboard can also manually add and delete hot keys for each client to monitor.

3. Hotkey project structure

3.3.2 Client side

The source code is mainly analyzed from the following three aspects:

1. Client Launcher

1) Startup method

 @PostConstruct
public void init() {
    ClientStarter.Builder builder = new ClientStarter.Builder();
    ClientStarter starter = builder.setAppName(appName).setEtcdServer(etcd).build();
    starter.startPipeline();
}

appName: is the name of the application, generally the value of ${spring.application.name}, and all subsequent configurations start with this

etcd: is the address of the etcd cluster, separated by commas, the configuration center.

You can also see that ClientStarter implements the builder pattern, making the code more introductory.

2) Core entry
com.jd.platform.hotkey.client.ClientStarter#startPipeline

 /**
 * 启动监听etcd
 */
public void startPipeline() {
    JdLogger.info(getClass(), "etcdServer:" + etcdServer);
    //设置caffeine的最大容量
    Context.CAFFEINE_SIZE = caffeineSize;

    //设置etcd地址
    EtcdConfigFactory.buildConfigCenter(etcdServer);
    //开始定时推送
    PushSchedulerStarter.startPusher(pushPeriod);
    PushSchedulerStarter.startCountPusher(10);
    //开启worker重连器
    WorkerRetryConnector.retryConnectWorkers();

    registEventBus();

    EtcdStarter starter = new EtcdStarter();
    //与etcd相关的监听都开启
    starter.start();
}

This method has five main functions:

① Set the maximum value of the local cache (caffeine) and create an etcd instance

 //设置caffeine的最大容量
Context.CAFFEINE_SIZE = caffeineSize;

//设置etcd地址
EtcdConfigFactory.buildConfigCenter(etcdServer);

caffeineSize is the maximum value of the local cache, which can be set at startup, and defaults to 200000 if it is not set.
etcdServer is the etcd cluster address mentioned above.

Context can be understood as a configuration class, which contains two fields:

 public class Context {
    public static String APP_NAME;

    public static int CAFFEINE_SIZE;
}

EtcdConfigFactory is the factory class of ectd configuration center

 public class EtcdConfigFactory {
    private static IConfigCenter configCenter;

    private EtcdConfigFactory() {}

    public static IConfigCenter configCenter() {
        return configCenter;
    }

    public static void buildConfigCenter(String etcdServer) {
        //连接多个时，逗号分隔
        configCenter = JdEtcdBuilder.build(etcdServer);
    }
}

Obtain and create etcd instance objects through its configCenter() method. The IConfigCenter interface encapsulates the behavior of etcd instance objects (including basic crud, monitoring, renewal, etc.)

② Create and start timed tasks: PushSchedulerStarter

 //开始定时推送
PushSchedulerStarter.startPusher(pushPeriod);//每0.5秒推送一次待测key
PushSchedulerStarter.startCountPusher(10);//每10秒推送一次数量统计,不可配置

pushPeriod is the push interval, which can be set when restarting. The minimum is 0.05s. The faster the push, the more intensive the detection, and the faster the detection will be, but the resource consumption of the client will increase accordingly.

PushSchedulerStarter class

 /**
     * 每0.5秒推送一次待测key
     */
    public static void startPusher(Long period) {
        if (period == null || period <= 0) {
            period = 500L;
        }
        @SuppressWarnings("PMD.ThreadPoolCreationRule")
        ScheduledExecutorService scheduledExecutorService = Executors.newSingleThreadScheduledExecutor(new NamedThreadFactory("hotkey-pusher-service-executor", true));
        scheduledExecutorService.scheduleAtFixedRate(() -> {
            //热key的收集器
            IKeyCollector<HotKeyModel, HotKeyModel> collectHK = KeyHandlerFactory.getCollector();
            //这里相当于每0.5秒，通过netty来给worker来推送收集到的热key的信息，主要是一些热key的元数据信息(热key来源的app和key的类型和是否是删除事件，还有该热key的上报次数)
            //这里面还有就是该热key在每次上报的时候都会生成一个全局的唯一id，还有该热key每次上报的创建时间是在netty发送的时候来生成，同一批次的热key时间是相同的
            List<HotKeyModel> hotKeyModels = collectHK.lockAndGetResult();
            if(CollectionUtil.isNotEmpty(hotKeyModels)){
                //积攒了半秒的key集合，按照hash分发到不同的worker
                KeyHandlerFactory.getPusher().send(Context.APP_NAME, hotKeyModels);
                collectHK.finishOnce();
            }

        },0, period, TimeUnit.MILLISECONDS);
    }
    /**
     * 每10秒推送一次数量统计
     */
    public static void startCountPusher(Integer period) {
        if (period == null || period <= 0) {
            period = 10;
        }
        @SuppressWarnings("PMD.ThreadPoolCreationRule")
        ScheduledExecutorService scheduledExecutorService = Executors.newSingleThreadScheduledExecutor(new NamedThreadFactory("hotkey-count-pusher-service-executor", true));
        scheduledExecutorService.scheduleAtFixedRate(() -> {
            IKeyCollector<KeyHotModel, KeyCountModel> collectHK = KeyHandlerFactory.getCounter();
            List<KeyCountModel> keyCountModels = collectHK.lockAndGetResult();
            if(CollectionUtil.isNotEmpty(keyCountModels)){
                //积攒了10秒的数量，按照hash分发到不同的worker
                KeyHandlerFactory.getPusher().sendCount(Context.APP_NAME, keyCountModels);
                collectHK.finishOnce();
            }
        },0, period, TimeUnit.SECONDS);
    }

From the above two methods, it can be seen that the timed tasks are implemented through the timed thread pool, and they are all daemon threads.

Let's focus on the KeyHandlerFactory class, which is a clever design on the client side, which is literally translated from the class name to the key processing factory. The specific instance object is DefaultKeyHandler:

 public class DefaultKeyHandler {
    //推送HotKeyMsg消息到Netty的推送者
    private IKeyPusher iKeyPusher = new NettyKeyPusher();
    //待测key的收集器，这里面包含两个map，key主要是热key的名字,value主要是热key的元数据信息(比如:热key来源的app和key的类型和是否是删除事件)
    private IKeyCollector<HotKeyModel, HotKeyModel> iKeyCollector = new TurnKeyCollector();
    //数量收集器,这里面包含两个map，这里面key是相应的规则,HitCount里面是这个规则的总访问次数和热后访问次数
    private IKeyCollector<KeyHotModel, KeyCountModel> iKeyCounter = new TurnCountCollector();

    public IKeyPusher keyPusher() {
        return iKeyPusher;
    }
    public IKeyCollector<HotKeyModel, HotKeyModel> keyCollector() {
        return iKeyCollector;
    }
    public IKeyCollector<KeyHotModel, KeyCountModel> keyCounter() {
        return iKeyCounter;
    }
}

There are three member objects in it, namely NettyKeyPusher, which encapsulates push messages to netty, TurnKeyCollector, a key collector to be tested, and TurnCountCollector, a quantity collector. The latter two implement the interface IKeyCollector, which can effectively aggregate hotkey processing. It fully reflects the high cohesion of the code.
Let's first take a look at the NettyKeyPusher that encapsulates the push message to netty:

 /**
 * 将msg推送到netty的pusher
 * @author wuweifeng wrote on 2020-01-06
 * @version 1.0
 */
public class NettyKeyPusher implements IKeyPusher {
    @Override
    public void send(String appName, List<HotKeyModel> list) {
        //积攒了半秒的key集合，按照hash分发到不同的worker
        long now = System.currentTimeMillis();

        Map<Channel, List<HotKeyModel>> map = new HashMap<>();
        for(HotKeyModel model : list) {
            model.setCreateTime(now);
            Channel channel = WorkerInfoHolder.chooseChannel(model.getKey());
            if (channel == null) {
                continue;
            }
            List<HotKeyModel> newList = map.computeIfAbsent(channel, k -> new ArrayList<>());
            newList.add(model);
        }

        for (Channel channel : map.keySet()) {
            try {
                List<HotKeyModel> batch = map.get(channel);
                HotKeyMsg hotKeyMsg = new HotKeyMsg(MessageType.REQUEST_NEW_KEY, Context.APP_NAME);
                hotKeyMsg.setHotKeyModels(batch);
                channel.writeAndFlush(hotKeyMsg).sync();
            } catch (Exception e) {
                try {
                    InetSocketAddress insocket = (InetSocketAddress) channel.remoteAddress();
                    JdLogger.error(getClass(),"flush error " + insocket.getAddress().getHostAddress());
                } catch (Exception ex) {
                    JdLogger.error(getClass(),"flush error");
                }
            }
        }
    }
    @Override
    public void sendCount(String appName, List<KeyCountModel> list) {
        //积攒了10秒的数量，按照hash分发到不同的worker
        long now = System.currentTimeMillis();
        Map<Channel, List<KeyCountModel>> map = new HashMap<>();
        for(KeyCountModel model : list) {
            model.setCreateTime(now);
            Channel channel = WorkerInfoHolder.chooseChannel(model.getRuleKey());
            if (channel == null) {
                continue;
            }
            List<KeyCountModel> newList = map.computeIfAbsent(channel, k -> new ArrayList<>());
            newList.add(model);
        }
        for (Channel channel : map.keySet()) {
            try {
                List<KeyCountModel> batch = map.get(channel);
                HotKeyMsg hotKeyMsg = new HotKeyMsg(MessageType.REQUEST_HIT_COUNT, Context.APP_NAME);
                hotKeyMsg.setKeyCountModels(batch);
                channel.writeAndFlush(hotKeyMsg).sync();
            } catch (Exception e) {
                try {
                    InetSocketAddress insocket = (InetSocketAddress) channel.remoteAddress();
                    JdLogger.error(getClass(),"flush error " + insocket.getAddress().getHostAddress());
                } catch (Exception ex) {
                    JdLogger.error(getClass(),"flush error");
                }
            }
        }
    }
}

send(String appName, List list)
It mainly pushes the key to be tested collected by TurnKeyCollector to the worker through netty. The HotKeyModel object is mainly the metadata information of some hot keys (the type of the app and key from which the hot key originates, and whether it is a deletion event, and the number of reports of the hot key). )

sendCount(String appName, List list)
It mainly pushes the key corresponding to the rules collected by TurnCountCollector to the worker through netty. The KeyCountModel object is mainly the rule information corresponding to some keys and the number of visits, etc.

WorkerInfoHolder.chooseChannel(model.getRuleKey())
Obtain the server corresponding to the key according to the hash algorithm, and distribute it to the corresponding Channel connection of the corresponding server, so the server can expand horizontally and infinitely without pressure.

Let's analyze the key collectors: TurnKeyCollector and TurnCountCollector:
Implement the IKeyCollector interface:

 /**
 * 对hotkey进行聚合
 * @author wuweifeng wrote on 2020-01-06
 * @version 1.0
 */
public interface IKeyCollector<T, V> {
    /**
     * 锁定后的返回值
     */
    List<V> lockAndGetResult();
    /**
     * 输入的参数
     */
    void collect(T t);

    void finishOnce();
}

lockAndGetResult()
The main purpose is to obtain the information collected by the return collect method, and clear the locally temporarily stored information to facilitate the accumulation of data in the next statistical cycle.

collect(T t)
As the name suggests, it stores the collected key information locally when collecting api calls.

finishOnce()
The current implementation of this method is empty, no need to pay attention.

Key collector to be tested: TurnKeyCollector

 public class TurnKeyCollector implements IKeyCollector<HotKeyModel, HotKeyModel> {
    //这map里面的key主要是热key的名字,value主要是热key的元数据信息(比如:热key来源的app和key的类型和是否是删除事件)
    private ConcurrentHashMap<String, HotKeyModel> map0 = new ConcurrentHashMap<>();
    private ConcurrentHashMap<String, HotKeyModel> map1 = new ConcurrentHashMap<>();
    private AtomicLong atomicLong = new AtomicLong(0);

    @Override
    public List<HotKeyModel> lockAndGetResult() {
        //自增后，对应的map就会停止被写入，等待被读取
        atomicLong.addAndGet(1);
        List<HotKeyModel> list;
        //可以观察这里与collect方法里面的相同位置，会发现一个是操作map0一个是操作map1，这样保证在读map的时候，不会阻塞写map，
        //两个map同时提供轮流提供读写能力，设计的很巧妙，值得学习
        if (atomicLong.get() % 2 == 0) {
            list = get(map1);
            map1.clear();
        } else {
            list = get(map0);
            map0.clear();
        }
        return list;
    }
    private List<HotKeyModel> get(ConcurrentHashMap<String, HotKeyModel> map) {
        return CollectionUtil.list(false, map.values());
    }
    @Override
    public void collect(HotKeyModel hotKeyModel) {
        String key = hotKeyModel.getKey();
        if (StrUtil.isEmpty(key)) {
            return;
        }
        if (atomicLong.get() % 2 == 0) {
            //不存在时返回null并将key-value放入，已有相同key时，返回该key对应的value，并且不覆盖
            HotKeyModel model = map0.putIfAbsent(key, hotKeyModel);
            if (model != null) {
                //增加该hotMey上报的次数
                model.add(hotKeyModel.getCount());
            }
        } else {
            HotKeyModel model = map1.putIfAbsent(key, hotKeyModel);
            if (model != null) {
                model.add(hotKeyModel.getCount());
            }
        }
    }
    @Override
    public void finishOnce() {}
}

It can be seen that there are two ConcurrentHashMap and one AtomicLong in this class. By self-increasing the AtomicLong, and then taking the modulo of 2, the read and write capabilities of the two maps are controlled separately, ensuring that each map can read and write, and the same one The map cannot be read and written at the same time, which can avoid concurrent collection reads and writes without blocking. This lock-free design is very clever, which greatly improves the collection throughput.

Key number collector: TurnCountCollector
The design here is similar to TurnKeyCollector, so we won't go into details. It is worth mentioning that it has a parallel processing mechanism. When the number of collections exceeds the threshold of DATA_CONVERT_SWITCH_THRESHOLD=5000, lockAndGetResult processing uses java Stream parallel stream processing to improve processing efficiency.

③ Open the worker reconnector

 //开启worker重连器
WorkerRetryConnector.retryConnectWorkers();
public class WorkerRetryConnector {

    /**
     * 定时去重连没连上的workers
     */
    public static void retryConnectWorkers() {
        @SuppressWarnings("PMD.ThreadPoolCreationRule")
        ScheduledExecutorService scheduledExecutorService = Executors.newSingleThreadScheduledExecutor(new NamedThreadFactory("worker-retry-connector-service-executor", true));
        //开启拉取etcd的worker信息，如果拉取失败，则定时继续拉取
        scheduledExecutorService.scheduleAtFixedRate(WorkerRetryConnector::reConnectWorkers, 30, 30, TimeUnit.SECONDS);
    }

    private static void reConnectWorkers() {
        List<String> nonList = WorkerInfoHolder.getNonConnectedWorkers();
        if (nonList.size() == 0) {
            return;
        }
        JdLogger.info(WorkerRetryConnector.class, "trying to reConnect to these workers :" + nonList);
        NettyClient.getInstance().connect(nonList);//这里会触发netty连接方法channelActive
    }
}

It is also executed through a timed thread. The default time interval is 30s, which cannot be set.
The worker connection information of the client is controlled by the WorkerInfoHolder. The connection information is a List. CopyOnWriteArrayList is used. After all, it is a scenario of reading more and writing less, similar to metadata information.

 /**
 * 保存worker的ip地址和Channel的映射关系，这是有序的。每次client发送消息时，都会根据该map的size进行hash
 * 如key-1就发送到workerHolder的第1个Channel去，key-2就发到第2个Channel去
 */
private static final List<Server> WORKER_HOLDER = new CopyOnWriteArrayList<>();
④ 注册EventBus事件订阅者

private void registEventBus() {
    //netty连接器会关注WorkerInfoChangeEvent事件
    EventBusCenter.register(new WorkerChangeSubscriber());
    //热key探测回调关注热key事件
    EventBusCenter.register(new ReceiveNewKeySubscribe());
    //Rule的变化的事件
    EventBusCenter.register(new KeyRuleHolder());
}

Use guava's EventBus event message bus to decouple the project using the publish/subscriber pattern. It can use very little code, to achieve multi-component communication.

The basic schematic is as follows:

Monitor worker information changes: WorkerChangeSubscriber

 /**
 * 监听worker信息变动
 */
@Subscribe
public void connectAll(WorkerInfoChangeEvent event) {
    List<String> addresses = event.getAddresses();
    if (addresses == null) {
        addresses = new ArrayList<>();
    }

    WorkerInfoHolder.mergeAndConnectNew(addresses);
}
/**
 * 当client与worker的连接断开后，删除
 */
@Subscribe
public void channelInactive(ChannelInactiveEvent inactiveEvent) {
    //获取断线的channel
    Channel channel = inactiveEvent.getChannel();
    InetSocketAddress socketAddress = (InetSocketAddress) channel.remoteAddress();
    String address = socketAddress.getHostName() + ":" + socketAddress.getPort();
    JdLogger.warn(getClass(), "this channel is inactive : " + socketAddress + " trying to remove this connection");

    WorkerInfoHolder.dealChannelInactive(address);
}

Listen to the hot key callback event: ReceiveNewKeySubscribe

 private ReceiveNewKeyListener receiveNewKeyListener = new DefaultNewKeyListener();

@Subscribe
public void newKeyComing(ReceiveNewKeyEvent event) {
    HotKeyModel hotKeyModel = event.getModel();
    if (hotKeyModel == null) {
        return;
    }
    //收到新key推送
    if (receiveNewKeyListener != null) {
        receiveNewKeyListener.newKey(hotKeyModel);
    }
}

After this method receives the new hot key subscription event, it will be added to the collector of KeyHandlerFactory for processing.

Core processing logic: DefaultNewKeyListener#newKey:

 @Override
public void newKey(HotKeyModel hotKeyModel) {
    long now = System.currentTimeMillis();
    //如果key到达时已经过去1秒了，记录一下。手工删除key时，没有CreateTime
    if (hotKeyModel.getCreateTime() != 0 && Math.abs(now - hotKeyModel.getCreateTime()) > 1000) {
        JdLogger.warn(getClass(), "the key comes too late : " + hotKeyModel.getKey() + " now " +
                +now + " keyCreateAt " + hotKeyModel.getCreateTime());
    }
    if (hotKeyModel.isRemove()) {
        //如果是删除事件，就直接删除
        deleteKey(hotKeyModel.getKey());
        return;
    }
    //已经是热key了，又推过来同样的热key，做个日志记录，并刷新一下
    if (JdHotKeyStore.isHot(hotKeyModel.getKey())) {
        JdLogger.warn(getClass(), "receive repeat hot key ：" + hotKeyModel.getKey() + " at " + now);
    }
    addKey(hotKeyModel.getKey());
}
private void deleteKey(String key) {
        CacheFactory.getNonNullCache(key).delete(key);
}
private void addKey(String key) {
  ValueModel valueModel = ValueModel.defaultValue(key);
  if (valueModel == null) {
      //不符合任何规则
      deleteKey(key);
      return;
  }

//If the original key already exists, then the value will be reset, and the expiration time will also be reset. If the original does not exist, the newly added hot key
JdHotKeyStore.setValueDirectly(key, valueModel);
}
If there is a delete event in the HotKeyModel, get the caffeine corresponding to the key timeout time in RULE_CACHE_MAP, delete the key cache from it, and then return (this is equivalent to deleting the local cache).
If it is not a delete event, add the cache of the key in the caffeine cache corresponding to RULE_CACHE_MAP.
There is a point to note here. If you do not delete the event, call the addKey() method to add a cache to caffeine. null.
Listen to Rule change events: KeyRuleHolder

You can see that there are two member attributes: RULE_CACHE_MAP, KEY_RULES

 /**
 * 保存超时时间和caffeine的映射，key是超时时间，value是caffeine[(String,Object)]
 */
private static final ConcurrentHashMap<Integer, LocalCache> RULE_CACHE_MAP = new ConcurrentHashMap<>();
/**
 * 这里KEY_RULES是保存etcd里面该appName所对应的所有rule
 */
private static final List<KeyRule> KEY_RULES = new ArrayList<>();

ConcurrentHashMap RULE_CACHE_MAP:

Save the mapping between timeout and caffeine, the key is the timeout, and the value is caffeine[(String, Object)].
Clever design: Here, the expiration time of the key is used as the bucketing strategy, so that the keys with the same expiration time will be in a bucket (caffeine), and each caffeine in this is the local cache of the client, that is, the local cache of the hotKey. The cached KV is actually stored there.
List KEY_RULES:

Here KEY_RULES is to save all the rules corresponding to the appName in etcd.
Specifically listening to the KeyRuleInfoChangeEvent event method:

 @Subscribe
public void ruleChange(KeyRuleInfoChangeEvent event) {
    JdLogger.info(getClass(), "new rules info is :" + event.getKeyRules());
    List<KeyRule> ruleList = event.getKeyRules();
    if (ruleList == null) {
        return;
    }

    putRules(ruleList);
}

Core processing logic: KeyRuleHolder#putRules:

 /**
 * 所有的规则，如果规则的超时时间变化了，会重建caffeine
 */
public static void putRules(List<KeyRule> keyRules) {
    synchronized (KEY_RULES) {
        //如果规则为空，清空规则表
        if (CollectionUtil.isEmpty(keyRules)) {
            KEY_RULES.clear();
            RULE_CACHE_MAP.clear();
            return;
        }
        KEY_RULES.clear();
        KEY_RULES.addAll(keyRules);
        Set<Integer> durationSet = keyRules.stream().map(KeyRule::getDuration).collect(Collectors.toSet());
        for (Integer duration : RULE_CACHE_MAP.keySet()) {
            //先清除掉那些在RULE_CACHE_MAP里存的，但是rule里已没有的
            if (!durationSet.contains(duration)) {
                RULE_CACHE_MAP.remove(duration);
            }
        }
        //遍历所有的规则
        for (KeyRule keyRule : keyRules) {
            int duration = keyRule.getDuration();
            //这里如果RULE_CACHE_MAP里面没有超时时间为duration的value，则新建一个放入到RULE_CACHE_MAP里面
            //比如RULE_CACHE_MAP本来就是空的，则在这里来构建RULE_CACHE_MAP的映射关系
            //TODO 如果keyRules里面包含相同duration的keyRule，则也只会建一个key为duration,value为caffeine，其中caffeine是(string,object)
            if (RULE_CACHE_MAP.get(duration) == null) {
                LocalCache cache = CacheFactory.build(duration);
                RULE_CACHE_MAP.put(duration, cache);
            }
        }
    }
}

Use the synchronized keyword to ensure thread safety;
If the rule is empty, clear the rule table (RULE_CACHE_MAP, KEY_RULES);
Override KEY_RULES with the passed in keyRules;
Clear the mapping relationship that is not in keyRules in RULE_CACHE_MAP;
Traverse all keyRules, if there is no relevant timeout key in RULE_CACHE_MAP, assign a value in it;
⑤ Start EtcdStarter (etcd connection manager)

 EtcdStarter starter = new EtcdStarter();
//与etcd相关的监听都开启
starter.start();

public void start() {
fetchWorkerInfo();
fetchRule();
startWatchRule();
//监听热key事件，只监听手工添加、删除的key
startWatchHotKey();
}

fetchWorkerInfo()

Pull worker cluster address information allAddress from etcd and update WORKER_HOLDER in WorkerInfoHolder

 /**
 * 每隔30秒拉取worker信息
 */
private void fetchWorkerInfo() {
    ScheduledExecutorService scheduledExecutorService = Executors.newSingleThreadScheduledExecutor();
    //开启拉取etcd的worker信息，如果拉取失败，则定时继续拉取
    scheduledExecutorService.scheduleAtFixedRate(() -> {
        JdLogger.info(getClass(), "trying to connect to etcd and fetch worker info");
        fetch();

    }, 0, 30, TimeUnit.SECONDS);
}

Use a timed thread pool to execute, single thread.
Timing is obtained from etcd, the address is /jd/workers/+$appName or default, the time interval cannot be set, the default is 30 seconds, and the ip+port of the worker address is stored here.
Post WorkerInfoChangeEvent event.
Note: The address has $appName or default, which is configured in the worker. If the worker is placed under an appName, the worker will only participate in the calculation of the app.
fetchRule()

Scheduled thread to execute, single thread, the time interval cannot be set, the default is 5 seconds, when the pull rule configuration and the manually configured hotKey are successful, the thread is terminated (that is to say, it will only be executed successfully once), and the execution will continue if it fails.

 private void fetchRule() {
    ScheduledExecutorService scheduledExecutorService = Executors.newSingleThreadScheduledExecutor();
    //开启拉取etcd的worker信息，如果拉取失败，则定时继续拉取
    scheduledExecutorService.scheduleAtFixedRate(() -> {
        JdLogger.info(getClass(), "trying to connect to etcd and fetch rule info");
        boolean success = fetchRuleFromEtcd();
        if (success) {
            //拉取已存在的热key
            fetchExistHotKey();
            //这里如果拉取规则和拉取手动配置的hotKey成功之后，则该定时执行线程停止
            scheduledExecutorService.shutdown();
        }
    }, 0, 5, TimeUnit.SECONDS);
}

fetchRuleFromEtcd()

Get the rule rule configured by the appName from etcd, the address is /jd/rules/+$appName.
If the rule rules are found to be empty, the local rule configuration cache and all rule key caches will be cleared by publishing the KeyRuleInfoChangeEvent event.
Post the KeyRuleInfoChangeEvent event.
fetchExistHotKey()

Obtain the hot key manually configured by the appName from etcd, the address is /jd/hotkeys/+$appName.
The ReceiveNewKeyEvent event is posted, and the content HotKeyModel is not a delete event.
startWatchRule()

 /**
 * 异步监听rule规则变化
 */
private void startWatchRule() {
    ExecutorService executorService = Executors.newSingleThreadExecutor();
    executorService.submit(() -> {
        JdLogger.info(getClass(), "--- begin watch rule change ----");
        try {
            IConfigCenter configCenter = EtcdConfigFactory.configCenter();
            KvClient.WatchIterator watchIterator = configCenter.watch(ConfigConstant.rulePath + Context.APP_NAME);
            //如果有新事件，即rule的变更，就重新拉取所有的信息
            while (watchIterator.hasNext()) {
                //这句必须写，next会让他卡住，除非真的有新rule变更
                WatchUpdate watchUpdate = watchIterator.next();
                List<Event> eventList = watchUpdate.getEvents();
                JdLogger.info(getClass(), "rules info changed. begin to fetch new infos. rule change is " + eventList);

                //全量拉取rule信息
                fetchRuleFromEtcd();
            }
        } catch (Exception e) {
            JdLogger.error(getClass(), "watch err");
        }
    });
}

Asynchronously monitor rule changes, and use etcd to monitor changes in nodes whose address is /jd/rules/+$appName.
Use a thread pool, single thread, and asynchronously monitor rule changes. If there is an event change, call the fetchRuleFromEtcd() method.
startWatchHotKey()
Asynchronously start monitoring hot key change information, use etcd to listen to the address prefix as /jd/hotkeys/+$appName

 /**
 * 异步开始监听热key变化信息，该目录里只有手工添加的key信息
 */
private void startWatchHotKey() {
    ExecutorService executorService = Executors.newSingleThreadExecutor();
    executorService.submit(() -> {
        JdLogger.info(getClass(), "--- begin watch hotKey change ----");
        IConfigCenter configCenter = EtcdConfigFactory.configCenter();
        try {
            KvClient.WatchIterator watchIterator = configCenter.watchPrefix(ConfigConstant.hotKeyPath + Context.APP_NAME);
            //如果有新事件，即新key产生或删除
            while (watchIterator.hasNext()) {
                WatchUpdate watchUpdate = watchIterator.next();

                List<Event> eventList = watchUpdate.getEvents();
                KeyValue keyValue = eventList.get(0).getKv();
                Event.EventType eventType = eventList.get(0).getType();
                try {
                    //从这个地方可以看出,etcd给的返回是节点的全路径，而我们需要的key要去掉前缀
                    String key = keyValue.getKey().toStringUtf8().replace(ConfigConstant.hotKeyPath + Context.APP_NAME + "/", "");
                    //如果是删除key，就立刻删除
                    if (Event.EventType.DELETE == eventType) {
                        HotKeyModel model = new HotKeyModel();
                        model.setRemove(true);
                        model.setKey(key);
                        EventBusCenter.getInstance().post(new ReceiveNewKeyEvent(model));
                    } else {
                        HotKeyModel model = new HotKeyModel();
                        model.setRemove(false);
                        String value = keyValue.getValue().toStringUtf8();
                        //新增热key
                        JdLogger.info(getClass(), "etcd receive new key : " + key + " --value:" + value);
                        //如果这是一个删除指令，就什么也不干
                        //TODO 这里有个疑问，监听到worker自动探测发出的惰性删除指令，这里之间跳过了，但是本地缓存没有更新吧？
                        //TODO 所以我猜测在客户端使用判断缓存是否存在的api里面，应该会判断相关缓存的value值是否为"#[DELETE]#"删除标记
                        //解疑:这里确实只监听手工配置的hotKey，etcd的/jd/hotkeys/+$appName该地址只是手动配置hotKey，worker自动探测的hotKey是直接通过netty通道来告知client的
                        if (Constant.DEFAULT_DELETE_VALUE.equals(value)) {
                            continue;
                        }
                        //手工创建的value是时间戳
                        model.setCreateTime(Long.valueOf(keyValue.getValue().toStringUtf8()));
                        model.setKey(key);
                        EventBusCenter.getInstance().post(new ReceiveNewKeyEvent(model));
                    }
                } catch (Exception e) {
                    JdLogger.error(getClass(), "new key err ：" + keyValue);
                }

            }
        } catch (Exception e) {
            JdLogger.error(getClass(), "watch err");
        }
    });

}

Use thread pool, single thread, asynchronously monitor hot key changes Use etcd to monitor the current node of the prefix address and all change values of child nodes Delete node action publish ReceiveNewKeyEvent event, and the content HotKeyModel is the value of delete event new or update node action event change The value is a delete marker#[DELETE]#
If it is a delete mark, it means that the worker automatically detects or the client needs to delete the instruction.
If it is a delete marker, do nothing and skip it directly (here, you can see from the HotKeyPusher#push method that when the delete event is performed, he will add a value to the node of /jd/hotkeys/+$appName as delete The marked node, and then delete the node of the same path, so that the above delete node event can be triggered, so here it is judged that if the delete mark is directly skipped).
Do not publish the ReceiveNewKeyEvent event for the delete tag. The createTime in the event content HotKeyModel is the timestamp corresponding to kv. Question: The code comment here says that only manually added or deleted hotKeys are monitored. Does it mean that the /jd/hotkeys/+$appName address is only manually configured address?

Answer: It is true that only the manually configured hotKey is monitored here. The /jd/hotkeys/+$appName address of etcd is only a manually configured hotKey, and the hotKey automatically detected by the worker is notified to the client directly through the netty channel.

5. API analysis

1) Flow chart ① Query flow

② Deletion process:

From the above flow chart, you should know how the hot key is reversed in the code. Here I will explain the source code analysis of the core API. Due to space limitations, we will not post the relevant source code one by one, but simply Tell you what its internal logic is like.

2) Core class: JdHotKeyStore

JdHotKeyStore is the core api class that encapsulates client calls, including the above 10 public methods, and we focus on analyzing 6 of them:

① isHotKey(String key)
Determine whether it is in the rule, if not, return false
Determine whether it is a hot key. If it is not or it is and the expiration time is within 2s, collect it for TurnKeyCollector#collect and finally collect statistics for TurnCountCollector#collect

② get(String key)
If the value obtained from the local caffeine is a magic value, it only means that it is added to the caffeine cache, and the query is null

③ smartSet(String key, Object value)
Determine whether it is a hot key, whether it is in the rule or not, if it is a hot key, assign a value to the value, if it is not a hot key, do nothing

④ forceSet(String key, Object value)
Forced to assign value to value If the key is not in the rule configuration, the passed value will not take effect, and the assigned value of the local cache will be changed to null

⑤ getValue(String key, KeyType keyType)
Get the value, if the value does not exist, call the HotKeyPusher#push method to send it to netty
If there is no rule configured for the key, there is no need to report the key and return null directly
If the obtained value is a magic value, it only means that it is added to the caffeine cache, and the query is null

⑥ remove(String key)
Deleting a key (local caffeine cache) will notify the entire cluster to delete (notify the cluster to delete through etcd)

3) Client uploads hot key entry calling class: HotKeyPusher
Core method:

 public static void push(String key, KeyType keyType, int count, boolean remove) {
    if (count <= 0) {
        count = 1;
    }
    if (keyType == null) {
        keyType = KeyType.REDIS_KEY;
    }
    if (key == null) {
        return;
    }
    //这里之所以用LongAdder是为了保证多线程计数的线程安全性，虽然这里是在方法内调用的，但是在TurnKeyCollector的两个map里面，
    //存储了HotKeyModel的实例对象，这样在多个线程同时修改count的计数属性时，会存在线程安全计数不准确问题
    LongAdder adderCnt = new LongAdder();
    adderCnt.add(count);

    HotKeyModel hotKeyModel = new HotKeyModel();
    hotKeyModel.setAppName(Context.APP_NAME);
    hotKeyModel.setKeyType(keyType);
    hotKeyModel.setCount(adderCnt);
    hotKeyModel.setRemove(remove);
    hotKeyModel.setKey(key);


    if (remove) {
        //如果是删除key，就直接发到etcd去，不用做聚合。但是有点问题现在，这个删除只能删手工添加的key，不能删worker探测出来的
        //因为各个client都在监听手工添加的那个path，没监听自动探测的path。所以如果手工的那个path下，没有该key，那么是删除不了的。
        //删不了，就达不到集群监听删除事件的效果，怎么办呢？可以通过新增的方式，新增一个热key，然后删除它
        //TODO 这里为啥不直接删除该节点，难道worker自动探测处理的hotKey不会往该节点增加新增事件吗？
        //释疑:worker根据探测配置的规则，当判断出某个key为hotKey后，确实不会往keyPath里面加入节点，他只是单纯的往本地缓存里面加入一个空值，代表是热点key
        EtcdConfigFactory.configCenter().putAndGrant(HotKeyPathTool.keyPath(hotKeyModel), Constant.DEFAULT_DELETE_VALUE, 1);
        EtcdConfigFactory.configCenter().delete(HotKeyPathTool.keyPath(hotKeyModel));//TODO 这里很巧妙待补充描述
        //也删worker探测的目录
        EtcdConfigFactory.configCenter().delete(HotKeyPathTool.keyRecordPath(hotKeyModel));
    } else {
        //如果key是规则内的要被探测的key，就积累等待传送
        if (KeyRuleHolder.isKeyInRule(key)) {
            //积攒起来，等待每半秒发送一次
            KeyHandlerFactory.getCollector().collect(hotKeyModel);
        }
    }
}

From the source code above:

The reason why LongAdder is used here is to ensure the thread safety of multi-thread counting. Although it is called in the method, the instance objects of HotKeyModel are stored in the two maps of TurnKeyCollector, so that the count is modified in multiple threads at the same time. When counting properties, there is a thread-safe count inaccuracy.
If it is the remove deletion type, while deleting the manually configured hot key configuration path, the dashboard display hot key configuration path will also be deleted.
Only the key configured in the rule will be detected and sent to the worker for calculation.
3. Communication mechanism (interacting with workers)

1) NettyClient:netty connector

 public class NettyClient {
    private static final NettyClient nettyClient = new NettyClient();

    private Bootstrap bootstrap;

    public static NettyClient getInstance() {
        return nettyClient;
    }

    private NettyClient() {
        if (bootstrap == null) {
            bootstrap = initBootstrap();
        }
    }

    private Bootstrap initBootstrap() {
        //少线程
        EventLoopGroup group = new NioEventLoopGroup(2);

        Bootstrap bootstrap = new Bootstrap();
        NettyClientHandler nettyClientHandler = new NettyClientHandler();
        bootstrap.group(group).channel(NioSocketChannel.class)
                .option(ChannelOption.SO_KEEPALIVE, true)
                .option(ChannelOption.TCP_NODELAY, true)
                .handler(new ChannelInitializer<SocketChannel>() {
                    @Override
                    protected void initChannel(SocketChannel ch) {
                        ByteBuf delimiter = Unpooled.copiedBuffer(Constant.DELIMITER.getBytes());
                        ch.pipeline()
                                .addLast(new DelimiterBasedFrameDecoder(Constant.MAX_LENGTH, delimiter))//这里就是定义TCP多个包之间的分隔符，为了更好的做拆包
                                .addLast(new MsgDecoder())
                                .addLast(new MsgEncoder())
                                //30秒没消息时，就发心跳包过去
                                .addLast(new IdleStateHandler(0, 0, 30))
                                .addLast(nettyClientHandler);
                    }
                });
        return bootstrap;
    }
}

Using the Reactor thread model, there are only 2 worker threads, no long connection of the main thread is set separately, and TCP_NODELAY is enabled
Netty's delimiter "$( )$" is similar to the standard of TCP packet segmentation, which is convenient for unpacking
Protobuf serialization and deserialization
When no message is sent to the peer for 30s, a heartbeat packet is sent to activate the worker thread processor NettyClientHandler
JDhotkey's tcp protocol design is to send and receive strings. Each tcp message packet uses special characters $( )$ to separate advantages: This is very simple to implement.

After obtaining the message package, json or protobuf deserialization is performed.

Disadvantage: It is necessary, from byte stream - "deserialize to string -" to deserialize into message object, two-layer serialization consumes part of the performance.

Fortunately, the serialization of protobuf is fast, but the speed of json serialization is only hundreds of thousands per second, which will consume some performance.

2) NettyClientHandler: Worker thread processor

 @ChannelHandler.Sharable
public class NettyClientHandler extends SimpleChannelInboundHandler<HotKeyMsg> {
    @Override
    public void userEventTriggered(ChannelHandlerContext ctx, Object evt) throws Exception {
        if (evt instanceof IdleStateEvent) {
            IdleStateEvent idleStateEvent = (IdleStateEvent) evt;
            //这里表示如果读写都挂了
            if (idleStateEvent.state() == IdleState.ALL_IDLE) {
                //向服务端发送消息
                ctx.writeAndFlush(new HotKeyMsg(MessageType.PING, Context.APP_NAME));
            }
        }

        super.userEventTriggered(ctx, evt);
    }
    //在Channel注册EventLoop、绑定SocketAddress和连接ChannelFuture的时候都有可能会触发ChannelInboundHandler的channelActive方法的调用
    //类似TCP三次握手成功之后触发
    @Override
    public void channelActive(ChannelHandlerContext ctx) {
        JdLogger.info(getClass(), "channelActive:" + ctx.name());
        ctx.writeAndFlush(new HotKeyMsg(MessageType.APP_NAME, Context.APP_NAME));
    }
    //类似TCP四次挥手之后，等待2MSL时间之后触发(大概180s)，比如channel通道关闭会触发(channel.close())

    //客户端channel主动关闭连接时，会向服务端发送一个写请求，然后服务端channel所在的selector会监听到一个OP_READ事件，然后
    //执行数据读取操作，而读取时发现客户端channel已经关闭了，则读取数据字节个数返回-1，然后执行close操作，关闭该channel对应的底层socket，
    //并在pipeline中，从head开始，往下将InboundHandler，并触发handler的channelInactive和channelUnregistered方法的执行，以及移除pipeline中的handlers一系列操作。
    @Override
    public void channelInactive(ChannelHandlerContext ctx) throws Exception {
        super.channelInactive(ctx);
        //断线了，可能只是client和server断了，但都和etcd没断。也可能是client自己断网了，也可能是server断了
        //发布断线事件。后续10秒后进行重连，根据etcd里的worker信息来决定是否重连，如果etcd里没了，就不重连。如果etcd里有，就重连
        notifyWorkerChange(ctx.channel());
    }
    private void notifyWorkerChange(Channel channel) {
        EventBusCenter.getInstance().post(new ChannelInactiveEvent(channel));
    }
    @Override
    protected void channelRead0(ChannelHandlerContext channelHandlerContext, HotKeyMsg msg) {
        if (MessageType.PONG == msg.getMessageType()) {
            JdLogger.info(getClass(), "heart beat");
            return;
        }
        if (MessageType.RESPONSE_NEW_KEY == msg.getMessageType()) {
            JdLogger.info(getClass(), "receive new key : " + msg);
            if (CollectionUtil.isEmpty(msg.getHotKeyModels())) {
                return;
            }
            for (HotKeyModel model : msg.getHotKeyModels()) {
                EventBusCenter.getInstance().post(new ReceiveNewKeyEvent(model));
            }
        }
    }
}

userEventTriggered

After receiving the heartbeat packet from the peer, return new HotKeyMsg(MessageType.PING, Context.APP_NAME)
channelActive

When Channel registers EventLoop, binds SocketAddress, and connects to ChannelFuture, it may trigger the invocation of the channelActive method of ChannelInboundHandler, which is similar to triggering after the successful TCP three-way handshake, and sends new HotKeyMsg(MessageType.APP_NAME, Context.APP_NAME) to the peer.
channelInactive

Similar to TCP after waving hands four times, it will be triggered after waiting for 2MSL time (about 180s). For example, when the channel is closed, the method will be triggered (channel.close()), the ChannelInactiveEvent event will be released, and the connection will be reconnected after 10s.
channelRead0

When receiving the PONG message type, make a log and return when receiving the RESPONSE_NEW_KEY message type, publish the ReceiveNewKeyEvent event
3.3.3 Worker side

1. Entry startup loading: 7 @PostConstruct

1) The worker side handles etcd-related processing: EtcdStarter
① The first @PostConstruct: watchLog()

 @PostConstruct
public void watchLog() {
    AsyncPool.asyncDo(() -> {
        try {
            //取etcd的是否开启日志配置，地址/jd/logOn
            String loggerOn = configCenter.get(ConfigConstant.logToggle);
            LOGGER_ON = "true".equals(loggerOn) || "1".equals(loggerOn);
        } catch (StatusRuntimeException ex) {
            logger.error(ETCD_DOWN);
        }
        //监听etcd地址/jd/logOn是否开启日志配置，并实时更改开关
        KvClient.WatchIterator watchIterator = configCenter.watch(ConfigConstant.logToggle);
        while (watchIterator.hasNext()) {
            WatchUpdate watchUpdate = watchIterator.next();
            List<Event> eventList = watchUpdate.getEvents();
            KeyValue keyValue = eventList.get(0).getKv();
            logger.info("log toggle changed : " + keyValue);
            String value = keyValue.getValue().toStringUtf8();
            LOGGER_ON = "true".equals(value) || "1".equals(value);
        }
    });
}

Put it in the thread pool for asynchronous execution to check whether etcd is enabled for log configuration, the address is /jd/logOn, the default is true
Monitor etcd address /jd/logOn whether to enable log configuration, and change the switch in real time. Because there is etcd monitoring, it will be executed all the time instead of executing once. ② The second @PostConstruct:watch()

 /**
 * 启动回调监听器，监听rule变化
 */
@PostConstruct
public void watch() {
    AsyncPool.asyncDo(() -> {
        KvClient.WatchIterator watchIterat

Redis data tilt and JD open source hotkey source code analysis revealed

京东云开发者

`引用和评论`

提高IT运维效率，深度解读京东云AIOps落地实践（异常检测篇）

蚂蚁技术研究院发布推理大模型强化学习框架，邀请开发者共同助力 AGI 生态

嘎嘎好用！推荐三款开源的 Redis 桌面客户端！

基于 MCP 的 AI Agent 应用开发实践

Koupleless 助力「人力家」实现分布式研发集中式部署，又快又省！

如何实现页面广告随时上下线、过期自动下线及到时自动上线

强烈推荐|新手从搭建到二开TinyEngine低代码引擎