Principles and Comparison of Commonly Used Registration Centers of Dewu Technology

Currently, the more commonly used registries are Eureka, Zookeeper, Consul, and Nacos. Recently, I have studied the overall framework and implementation of these four registration centers, and mainly learned the specific implementation of service registration and subscription from the source code point of view for Nacos. Finally, the differences between these four registration centers are compared.

One. Eureka

图片1 (1).png

The Eureka Client in the upper left corner is the service provider: register and update its own information with the Eureka Server, and at the same time obtain other service information from the Eureka Server registry. There are four specific operations:

Register: The client side registers its own metadata with the server side for service discovery;
Renew: Maintain and update the validity of the service instance metadata in the registry by sending a heartbeat to the Server. When the server does not receive the client's heartbeat information within a certain period of time, it will take the default service offline and delete the service instance information from the registry;
Cancel offline: The client actively cancels the service instance metadata from the server when it is closed. At this time, the client's service instance data will be deleted from the server's registry;
Get Registry to obtain the registry: Client requests registry information from the Server for service discovery, thereby initiating remote calls between services.

Eureka Server Service Registry: Provides service registration and discovery functions. Each Eureka Client registers its own information with Eureka Server, and can also obtain information about other services through Eureka Server to achieve the purpose of discovering and invoking other services.

Eureka Client service consumer: Obtain the information of other services registered on it through Eureka Server, and then find the required service based on the information to initiate remote calls.

Replicate synchronous replication: The synchronous replication of registry information between Eureka Servers makes the service instance information in different registries in the Eureka Server cluster consistent. Since the synchronous replication between clusters is performed through HTTP, based on the unreliability of the network, the registry information between the Eureka Servers in the cluster will inevitably have unsynchronized time nodes, which does not meet the C (data consistency) in the CAP.

Make Remote Call: remote call between service clients.

two. Zookeeper

2.1 Zookeeper overall framework

图片2 (1).png

Leader: The core of zookeeper cluster work, the only scheduler and processor of transaction requests (write operations), to ensure the sequence of cluster transaction processing; the scheduler of each service within the cluster. For create, set data, delete and other write requests, they need to be forwarded to the leader for processing. The leader needs to determine the number and perform the operation. This process is called a transaction.
Follower: Process client non-transactional (read operation) requests and forward transaction requests to Leader to participate in the cluster leader.
Observer: The observer role is a newly added role for the zookeeper cluster with a large amount of traffic. Observe the latest state changes of the zookeeper cluster and synchronize these states. Non-transaction requests can be processed independently, and transaction requests will be forwarded to the Leader server for processing. It does not participate in any form of voting and only provides services. It is usually used to improve the non-transactional processing capabilities of the cluster without affecting the transaction processing capabilities of the cluster, and is used to increase concurrent requests.

2.2 Zookeeper storage structure

The following figure describes the tree structure of the ZooKeeper file system used for memory representation. ZooKeeper nodes are called znodes. Each znode is identified by a name and separated by a path (/) sequence. In the figure, first, there is a znode separated by "/". There are two logical namespaces config and workers in the root directory. The config namespace is used for centralized configuration management, and the workers namespace is used for naming.

Under the config namespace, each znode can store up to 1MB of data. This is similar to the UNIX file system, except that the parent znode can also store data. The main purpose of this structure is to store synchronization data and describe the metadata of the znode. This structure is called the ZooKeeper data model. Each node in the ZooKeeper namespace is identified by a path.
图片3.tiff

Znode has both file and directory features. It not only maintains the data structure of data length, meta-information, ACL, timestamp, etc. like a file, but also can be used as a part of the path identification like a directory:

Version number-Each znode has a version number, which means that whenever the data associated with the znode changes, its corresponding version number will also increase. When multiple zookeeper clients try to perform operations on the same znode, the use of the version number is very important.
Operation Control List (ACL)-ACL is basically an authentication mechanism for accessing znode. It manages all znode read and write operations.
Timestamp-The timestamp represents the time elapsed since the znode was created and modified. It is usually in milliseconds. ZooKeeper identifies each change of znode from the "transaction ID" (zxid). Zxid is unique and reserves time for each transaction so that you can easily determine the elapsed time from one request to another.
Data length-The total amount of data stored in the znode is the data length. Up to 1MB of data can be stored.

ZooKeeper also has the concept of ephemeral nodes. As long as the session that created the znode is active, these znodes exist. At the end of the session, the znode will be deleted.

2.3 Zookeeper monitoring function

ZooKeeper supports the concept of watch, and the client can set up observations on the znode. When the znode changes, the monitoring will be triggered and deleted. After triggering the monitoring, the client will receive a data packet stating that the znode has been changed. If the connection between the client and one of the ZooKeeper servers is disconnected, the client will receive a local notification. New features in 3.6.0: The client can also set up permanent recursive monitoring on the znode, these monitoring will not be deleted when triggered, and will recursively trigger the registration of znode and all child znode changes.

2.4 Zookeeper election process

图片6 (1).tiff

ZooKeeper requires at least three nodes to work. The status of Zookeeper nodes is generally considered to be four:

LOOKING: Indicates the node that is being elected, in this state, it needs to enter the election process
LEADING: Leader status, the node in this status indicates that the role is already Leader
FOLLOWING: Follower status, indicating that the leader has been elected, and the current node role is follower
OBSERVER: Observer status, indicating that the current node role is observer. Observer indicates that it will not enter the election, but only accepts the election result, that is, it will not become a leader node, but it will provide services like a follower node.

The leader selection process is shown in the figure below:
图片7.tiff

In the cluster initialization phase, when one server ZK1 is started, it is impossible to conduct and complete Leader election alone. When the second server ZK2 starts, the two machines can communicate with each other at this time, and each machine tries to find the leader, so enter the leader The election process. The election process begins and the process is as follows:

(1) Each server sends a vote. Since it is the initial situation, ZK1 and ZK2 will vote on themselves as Leader servers. Each vote will include the server ID and ZXID (transaction ID) of the recommended server, which is represented by (ID, ZXID). At this time, ZK1's vote It is (1, 0), ZK2's vote is (2, 0), and then each will send this vote to other machines in the cluster.

(2) Accept votes from various servers. After each server in the cluster receives a vote, it first judges the validity of the vote, such as checking whether it is the current round of voting and whether it comes from a server in the LOOKING state.

(3) Process voting. For each vote, the server needs to compare the vote of others with its own vote. The rules are as follows:

Check ZXID first. The server with a larger ZXID is preferred as the leader.
If the ZXIDs are the same, then compare the server IDs. The server with the larger ID serves as the Leader server.

For ZK1, its vote is (1, 0), and the vote of ZK2 is (2, 0). First, the ZXID of the two are compared, and both are 0, and then the ID is compared. At this time, the ID of ZK2 is larger. So ZK2 won. ZK1 updates its vote to (2, 0), and resends the vote to ZK2.

(4) Statistical voting. After each vote, the server will count the voting information to determine whether more than half of the machines have received the same voting information. For ZK1 and ZK2, it is calculated that two machines in the cluster have already accepted (2, 0) votes. Information, at this time it is considered that ZK2 has been selected as the leader.

(5) Change the server status. Once the Leader is determined, each server will update its own status. If it is a Follower, it will be changed to FOLLOWING, and if it is a Leader, it will be changed to LEADING. When the new Zookeeper node ZK3 is started, it is found that there is already a Leader, and no more elections, the status is directly changed from LOOKING to FOLLOWING.

three. Consul

3.1 Consul overall framework

图片8.tiff

Consul supports multiple data centers. In the above figure, there are two Data Centers. They are interconnected on the Internet through WAN GOSSIP. At the same time, in order to improve communication efficiency, only Server nodes can join the cross-data center communication. Therefore, consul can support WAN-based synchronization between multiple data centers.

In a single data center, Consul is divided into two types of nodes: Client and Server (all nodes are also called Agents).

Server node: Participate in consensus arbitration, store cluster status (log storage), process queries, maintain the relationship with neighboring (LAN/WAN) nodes
Agent node: Responsible for the health check of the microservices registered to consul through this node, converting client registration requests and queries into RPC requests to the server, and maintaining the relationship with the surrounding (LAN/WAN) nodes

They communicate through GRPC. In addition, there is a LAN GOSSIP communication between Server and Client, which is used when topology changes inside the LAN, the surviving nodes can sense in time, for example, when the Server node is down, the Client will trigger Strip the corresponding Server node from the available list. All Server nodes form a cluster together, and they run the raft protocol between them, and the leader is elected through consensus arbitration. All business data is written to the cluster through the leader for persistence. When more than half of the nodes store the data, the server cluster will return ACK, thus ensuring strong data consistency. Of course, after the number of servers is large, it will also affect the efficiency of writing data. All followers will follow the leader's footsteps to ensure that they have the latest copy of the data. Consul nodes in the cluster maintain membership through the gossip protocol, such as which nodes are still in the cluster, and whether these nodes are Client or Server.

The rumor protocol of a single data center uses both TCP and UDP communication, and both use port 8301. The gossip protocol across data centers also uses TCP and UDP for communication, and the port uses 8302. Data read and write requests in the cluster can be sent directly to the Server, or forwarded to the Server through the Client using RPC, and the request will eventually reach the Leader node.

four. Nacos

4.1 Nacos overall framework

图片9.tif

When the service is registered, the service will be registered locally by polling the registration center cluster node address on the server side. In the registry, that is, the Nacos Server uses Map to save instance information, and the service configured with persistence will be saved to the database. On the caller of the service, in order to ensure the dynamic perception of the local service instance list, Nacos differs from other registries in that it adopts the simultaneous operation of Pull/Push.

4.2 Nacos election

The Nacos cluster is similar to zookeeper, which is divided into leader role and follower role. From the name of this role, it can be seen that there is an election mechanism in this cluster. Because if you do not have the election function, the role may be named master/slave.

Election algorithm:

The Nacos cluster is implemented using the raft algorithm, which is a simpler election algorithm than zookeeper. The core of the election algorithm is in RaftCore, including data processing and data synchronization.

In Raft, nodes have three roles:

Leader: Responsible for receiving client requests
Candidate: a role used to elect Leader (campaign status)
Follower: Responsible for responding to requests from Leader or Candidate

When all nodes are started, they are in the follower state. If within a certain period of time, if you do not receive the leader's heartbeat (maybe there is no leader, or the leader is down), then the follower will become a Candidate. Then an election is initiated. Before the election, a term will be added. This term is the same as the epoch in zookeeper.

The follower will vote for itself and send ticket information to other nodes. When other nodes reply, there may be several situations in this process:

After receiving more than half of the votes, it becomes the leader
Be informed that other nodes have become leaders, then switch to follower
If more than half of the votes are not received within a period of time, the election shall be re-initiated. Constraints In any term, a single node can only cast one vote at most

In the first case, after winning the election, the leader will send messages to all nodes to prevent other nodes from triggering new elections.

In the second case, for example, there are three nodes ABC. AB initiates an election at the same time, and A’s election message arrives at C first, and C casts a vote for A. When B’s message reaches C, the above-mentioned constraints cannot be met, that is, C will not vote for B, and A and Obviously B will not vote for the other party. After A wins, it will send heartbeat messages to B and C. Node B finds that the term of node A is not lower than its own term and knows that there is already a leader, so it converts to follower.

In the third case, no node gets the majority of votes, which may be a tie. There are a total of four nodes (A/B/C/D) to join. Node C and Node D become candidates at the same time, but Node A voted for Node D, and Node B voted for Node C, which results in a tie. Happening. At this time everyone is waiting until the election is re-initiated after the timeout. If there is a tie, the time that the system is unavailable is prolonged, so raft introduces randomizedelection timeouts to try to avoid tie.

4.3 Nacos service registration process source code

Nacos source code is the latest version 2.0.0-bugfix (Mar 30th, 2021)

When registration is required, Spring-Cloud will inject the instance NacosServiceRegistry.

@Override
    public void registerInstance(String serviceName, String groupName, Instance instance) throws NacosException {
        NamingUtils.checkInstanceIsLegal(instance);
        String groupedServiceName = NamingUtils.getGroupedName(serviceName, groupName);
        //添加心跳信息
        if (instance.isEphemeral()) {
            BeatInfo beatInfo = beatReactor.buildBeatInfo(groupedServiceName, instance);
            beatReactor.addBeatInfo(groupedServiceName, beatInfo);
        }
        //调用服务代理类进行注册
        serverProxy.registerService(groupedServiceName, groupName, instance);
    }

Then call the registerService method to register, construct the request parameters, and initiate the request.

public void registerService(String serviceName, String groupName, Instance instance) throws NacosException {

        NAMING_LOGGER.info("[REGISTER-SERVICE] {} registering service {} with instance: {}", namespaceId, serviceName,
                instance);

        final Map<String, String> params = new HashMap<String, String>(16);
        params.put(CommonParams.NAMESPACE_ID, namespaceId);
        params.put(CommonParams.SERVICE_NAME, serviceName);
        params.put(CommonParams.GROUP_NAME, groupName);
        params.put(CommonParams.CLUSTER_NAME, instance.getClusterName());
        params.put("ip", instance.getIp());
        params.put("port", String.valueOf(instance.getPort()));
        params.put("weight", String.valueOf(instance.getWeight()));
        params.put("enable", String.valueOf(instance.isEnabled()));
        params.put("healthy", String.valueOf(instance.isHealthy()));
        params.put("ephemeral", String.valueOf(instance.isEphemeral()));
        params.put("metadata", JacksonUtils.toJson(instance.getMetadata()));

        reqApi(UtilAndComs.nacosUrlInstance, params, HttpMethod.POST);

    }

Enter the reqApi method, we can see that the service polls the address of the configured registry when registering:

public String reqApi(String api, Map<String, String> params, Map<String, String> body, List<String> servers,
            String method) throws NacosException {

        params.put(CommonParams.NAMESPACE_ID, getNamespaceId());

        if (CollectionUtils.isEmpty(servers) && StringUtils.isBlank(nacosDomain)) {
            throw new NacosException(NacosException.INVALID_PARAM, "no server available");
        }

        NacosException exception = new NacosException();
        //service只有一个的情况
        if (StringUtils.isNotBlank(nacosDomain)) {
            for (int i = 0; i < maxRetry; i++) {
                try {
                    return callServer(api, params, body, nacosDomain, method);
                } catch (NacosException e) {
                    exception = e;
                    if (NAMING_LOGGER.isDebugEnabled()) {
                        NAMING_LOGGER.debug("request {} failed.", nacosDomain, e);
                    }
                }
            }
        } else {
            Random random = new Random(System.currentTimeMillis());
            int index = random.nextInt(servers.size());

            for (int i = 0; i < servers.size(); i++) {
                String server = servers.get(index);
                try {
                    return callServer(api, params, body, server, method);
                } catch (NacosException e) {
                    exception = e;
                    if (NAMING_LOGGER.isDebugEnabled()) {
                        NAMING_LOGGER.debug("request {} failed.", server, e);
                    }
                }
                //轮询
                index = (index + 1) % servers.size();
            }
        }

Finally, the call is initiated through callServer(api, params, server, method)

public String callServer(String api, Map<String, String> params, Map<String, String> body, String curServer,
            String method) throws NacosException {
        long start = System.currentTimeMillis();
        long end = 0;
        injectSecurityInfo(params);
        Header header = builderHeader();

        String url;
        //发送http请求
        if (curServer.startsWith(UtilAndComs.HTTPS) || curServer.startsWith(UtilAndComs.HTTP)) {
            url = curServer + api;
        } else {
            if (!IPUtil.containsPort(curServer)) {
                curServer = curServer + IPUtil.IP_PORT_SPLITER + serverPort;
            }
            url = NamingHttpClientManager.getInstance().getPrefix() + curServer + api;
        }
    }

Nacos server processing:

The server provides an InstanceController class, in which APIs related to service registration are provided

@CanDistro
    @PostMapping
    @Secured(parser = NamingResourceParser.class, action = ActionTypes.WRITE)
    public String register(HttpServletRequest request) throws Exception {

        final String namespaceId = WebUtils
                .optional(request, CommonParams.NAMESPACE_ID, Constants.DEFAULT_NAMESPACE_ID);
        final String serviceName = WebUtils.required(request, CommonParams.SERVICE_NAME);
        NamingUtils.checkServiceNameFormat(serviceName);
        // 从请求中解析出instance实例
        final Instance instance = parseInstance(request);

        serviceManager.registerInstance(namespaceId, serviceName, instance);
        return "ok";
    }

Then call ServiceManager to register the service

public void registerInstance(String namespaceId, String serviceName, Instance instance) throws NacosException {
        //创建一个空服务，在Nacos控制台服务列表展示的服务信息，实际上是初始化一个serviceMap，它是一个ConcurrentHashMap集合
        createEmptyService(namespaceId, serviceName, instance.isEphemeral());
        //从serviceMap中，根据namespaceId和serviceName得到一个服务对象
        Service service = getService(namespaceId, serviceName);

        if (service == null) {
            throw new NacosException(NacosException.INVALID_PARAM,
                    "service not found, namespace: " + namespaceId + ", service: " + serviceName);
        }
        //调用addInstance创建一个服务实例
        addInstance(namespaceId, serviceName, instance.isEphemeral(), instance);
    }

When creating an empty service instance

public void createServiceIfAbsent(String namespaceId, String serviceName, boolean local, Cluster cluster)
            throws NacosException {
        //从serviceMap中获取服务对象
        Service service = getService(namespaceId, serviceName);
        //如果为空。则初始化
        if (service == null) {
            Loggers.SRV_LOG.info("creating empty service {}:{}", namespaceId, serviceName);
            service = new Service();
            service.setName(serviceName);
            service.setNamespaceId(namespaceId);
            service.setGroupName(NamingUtils.getGroupName(serviceName));
            // now validate the service. if failed, exception will be thrown
            service.setLastModifiedMillis(System.currentTimeMillis());
            service.recalculateChecksum();
            if (cluster != null) {
                cluster.setService(service);
                service.getClusterMap().put(cluster.getName(), cluster);
            }
            service.validate();

            putServiceAndInit(service);
            if (!local) {
                addOrReplaceService(service);
            }
        }
    }

Map is used for storage in the getService method:

private final Map<String, Map<String, Service>> serviceMap = new ConcurrentHashMap<>();

Nacos maintains services through different namespaces, and there are different groups under each namespace, and only under different groups can there be corresponding Service, and then the service instance is determined by this serviceName. The first time you come in, it will enter initialization, and after initialization, putServiceAndInit will be called.

private void putServiceAndInit(Service service) throws NacosException {
        //把服务信息保存到serviceMap集合
        putService(service);
        service = getService(service.getNamespaceId(), service.getName());
        //建立心跳检测机制
        service.init();
        //实现数据一致性监听，ephemeral(标识服务是否为临时服务，默认是持久化的，也就是true)=true表示采用raft协议，false表示采用Distro
        consistencyService
                .listen(KeyBuilder.buildInstanceListKey(service.getNamespaceId(), service.getName(), true), service);
        consistencyService
                .listen(KeyBuilder.buildInstanceListKey(service.getNamespaceId(), service.getName(), false), service);
        Loggers.SRV_LOG.info("[NEW-SERVICE] {}", service.toJson());
    }

After the service is obtained, the service instance is added to the collection, and then the data is synchronized based on the consistency protocol. Then call addInstance

public void addInstance(String namespaceId, String serviceName, boolean ephemeral, Instance... ips)
            throws NacosException {
        // 组装key
        String key = KeyBuilder.buildInstanceListKey(namespaceId, serviceName, ephemeral);
        // 获取刚刚组装的服务
        Service service = getService(namespaceId, serviceName);

        synchronized (service) {
            List<Instance> instanceList = addIpAddresses(service, ephemeral, ips);

            Instances instances = new Instances();
            instances.setInstanceList(instanceList);
            // 也就是上一步实现监听的类里添加注册服务
            consistencyService.put(key, instances);
        }
    }

4.4 Nacos service subscription source code

Node subscriptions are implemented differently in different registries, and are generally divided into two types: pull and push.

Push means that when the subscribed node is updated, it will actively push to the subscriber. ZK is the implementation of push. The client and the server will establish a long TCP connection, and the client will register a watcher, and then when there is data update At the time, the server will push through the long connection. Through this mode of establishing a long connection, it will seriously consume the resources of the server, so when there are more watchers and frequent updates, the performance of Zookeeper will be very low, or even hang up.

Pulling means that the subscribed node actively obtains the information of the server node regularly, and then performs a comparison locally, and if there is a change, it will make some updates. There is also a watcher mechanism in Consul, but unlike ZK, it is implemented through Http long polling. The Consul server will return immediately if the requested url contains the wait parameter, or suspend it and wait for the designation. If the service changes during the wait time, it will return. The performance of using this method may be high but the real-time performance may not be high.

In Nacos, these two ideas are combined to provide both pull and active push.

For the pulled part, the specific operations for obtaining serviceInfo from hostReactor are as follows:

public ServiceInfo getServiceInfo(final String serviceName, final String clusters) {

        NAMING_LOGGER.debug("failover-mode: " + failoverReactor.isFailoverSwitch());
        //拼接服务名称+集群名称（默认为空）
        String key = ServiceInfo.getKey(serviceName, clusters);
        if (failoverReactor.isFailoverSwitch()) {
            return failoverReactor.getService(key);
        }
        //从ServiceInfoMap中根据key来查找服务提供者列表，ServiceInfoMap是客户端的服务地址的本地缓存
        ServiceInfo serviceObj = getServiceInfo0(serviceName, clusters);
        //如果为空，表示本地缓存不存在
        if (null == serviceObj) {
            serviceObj = new ServiceInfo(serviceName, clusters);
            //如果找不到则创建一个新的然后放入serviceInfoMap，同时放入updatingMap，执行updateServiceNow，再从updatingMap移除；
            serviceInfoMap.put(serviceObj.getKey(), serviceObj);

            updatingMap.put(serviceName, new Object());
            // 立马从Nacos server中去加载服务地址信息
            updateServiceNow(serviceName, clusters);
            updatingMap.remove(serviceName);

        } else if (updatingMap.containsKey(serviceName)) {
            //如果从serviceInfoMap找出来的serviceObj在updatingMap中则等待UPDATE_HOLD_INTERVAL
            if (UPDATE_HOLD_INTERVAL > 0) {
                // hold a moment waiting for update finish
                synchronized (serviceObj) {
                    try {
                        serviceObj.wait(UPDATE_HOLD_INTERVAL);
                    } catch (InterruptedException e) {
                        NAMING_LOGGER
                                .error("[getServiceInfo] serviceName:" + serviceName + ", clusters:" + clusters, e);
                    }
                }
            }
        }
        // 开启定时调度，每10s去查询一次服务地址
        //如果本地缓存中存在，则通过scheduleUpdateIfAbsent开启定时任务，再从serviceInfoMap取出serviceInfo
        scheduleUpdateIfAbsent(serviceName, clusters);
        return serviceInfoMap.get(serviceObj.getKey());
    }

Nacos push function, Nacos will record our subscribers above to our PushService

The PushService class implements ApplicationListener<ServiceChangeEvent>, so it will listen to the event, listen to the service state change event, and then traverse all the clients, and broadcast the message through the udp protocol:

public void onApplicationEvent(ServiceChangeEvent event) {
        Service service = event.getService();//获取到服务
        String serviceName = service.getName();//服务名
        String namespaceId = service.getNamespaceId();//命名空间
        //执行任务
        Future future = GlobalExecutor.scheduleUdpSender(() -> {
            try {
                Loggers.PUSH.info(serviceName + " is changed, add it to push queue.");
                ConcurrentMap<String, PushClient> clients = clientMap
                        .get(UtilsAndCommons.assembleFullServiceName(namespaceId, serviceName));
                if (MapUtils.isEmpty(clients)) {
                    return;
                }
                Map<String, Object> cache = new HashMap<>(16);
                long lastRefTime = System.nanoTime();
                for (PushClient client : clients.values()) {
                    if (client.zombie()) {
                        Loggers.PUSH.debug("client is zombie: " + client.toString());
                        clients.remove(client.toString());
                        Loggers.PUSH.debug("client is zombie: " + client.toString());
                        continue;
                    }
                    Receiver.AckEntry ackEntry;
                    Loggers.PUSH.debug("push serviceName: {} to client: {}", serviceName, client.toString());
                    String key = getPushCacheKey(serviceName, client.getIp(), client.getAgent());
                    byte[] compressData = null;
                    Map<String, Object> data = null;
                    if (switchDomain.getDefaultPushCacheMillis() >= 20000 && cache.containsKey(key)) {
                        org.javatuples.Pair pair = (org.javatuples.Pair) cache.get(key);
                        compressData = (byte[]) (pair.getValue0());
                        data = (Map<String, Object>) pair.getValue1();
                        Loggers.PUSH.debug("[PUSH-CACHE] cache hit: {}:{}", serviceName, client.getAddrStr());
                    }
                    if (compressData != null) {
                        ackEntry = prepareAckEntry(client, compressData, data, lastRefTime);
                    } else {
                        ackEntry = prepareAckEntry(client, prepareHostsData(client), lastRefTime);
                        if (ackEntry != null) {
                            cache.put(key, new org.javatuples.Pair<>(ackEntry.origin.getData(), ackEntry.data));
                        }
                    }
                    Loggers.PUSH.info("serviceName: {} changed, schedule push for: {}, agent: {}, key: {}",
                            client.getServiceName(), client.getAddrStr(), client.getAgent(),
                            (ackEntry == null ? null : ackEntry.key));
                    //执行 UDP  推送
                    udpPush(ackEntry);
                }
            } catch (Exception e) {
                Loggers.PUSH.error("[NACOS-PUSH] failed to push serviceName: {} to client, error: {}", serviceName, e);

            } finally {
                futureMap.remove(UtilsAndCommons.assembleFullServiceName(namespaceId, serviceName));
            }

        }, 1000, TimeUnit.MILLISECONDS);

        futureMap.put(UtilsAndCommons.assembleFullServiceName(namespaceId, serviceName), future);

    }

The service consumer needs to establish a udp service listener at this time, otherwise the server cannot push data. This monitor is initialized in the constructor of HostReactor.

The push mode of Nacos saves a lot of resources for Zookeeper's long tcp connection. Even a large number of node updates will not cause Nacos to have too many performance bottlenecks. In Nacos, the client will return if it receives a udp message. An ACK, if Nacos-Server does not receive the ACK for a certain period of time, it will be retransmitted. When the retransmission time exceeds a certain period of time, it will not be retransmitted. Although through udp, it is not guaranteed to be delivered to the subscriber, but Nacos also has regular rotation training as a bottom line, so there is no need to worry about the situation that the data will not be updated.

Through these two methods, Nacos not only ensures real-time performance, but also ensures that data updates will not be missed.