11
The case of this article is included in https://github.com/chengxy-nds/Springboot-Notebook

Hello everyone, I am Xiaofu~

Everyone is not unfamiliar with Nacos . It is a very well-known tool for dynamic service discovery and configuration management. However, the more people who use this technology, the greater the probability of being asked in the interview. If you only stay at the level of use, the interview may suffer a big loss.

For example, the topic we are going to discuss today, when Nacos used as the configuration center, is the interactive mode of configuration data pushed by the server or the client actively pulls it?

Here I first throw out the answer: the client takes the initiative to pull it!

Next, let’s take a look at the source code of Nacos

Configuration Center

Before talking about Nacos , briefly review the origin of the configuration center.

A simple understanding of the role of the configuration center is to manage the configuration uniformly. After the configuration is modified, the application can dynamically sense it without restarting.

Because in traditional projects, most of the static configuration methods are used, that is, the configuration information is written in yml or properties in the application. If you want to modify a configuration, you usually need to restart the application to take effect.

But in some scenarios, for example, we want to control the opening and closing of a certain function in real time by modifying a certain configuration item while the application is running. Frequent restart of the application is definitely unacceptable.

Especially under the microservice architecture, the granularity of our application service split is very fine, ranging from dozens to hundreds of services, and each service has its own unique or common configuration. If you want to change the general configuration at this time, do you want me to change hundreds of service configurations one by one? Obviously this is impossible. So in order to solve such problems, the configuration center came into being.

配置中心

Push and pull model

There are actually two ways of data interaction between the client and the configuration center, either push push or pull pull .

push model

The client establishes a TCP long connection with the server. When the server configuration data changes, the data is immediately pushed to the client through the established long connection.

Advantages: The advantage of a long link is real-time. Once the data changes, the changed data is immediately pushed to the client. For the client, this method is simpler. It only establishes a connection to receive data and does not need to care about whether there is data change. The processing of this kind of logic.

Disadvantages: Long connections may be unavailable due to network problems, which is commonly known as suspended animation. The connection status is normal, but communication is actually no longer possible. Therefore, a heartbeat mechanism KeepAlive required to ensure the availability of the connection to ensure the successful push of configuration data.

pull model

The client actively sends a request to the server to pull configuration data. A common method is polling, such as requesting configuration data from the server every 3s.

The advantage of polling is that it is relatively simple to implement. But the disadvantages are also obvious. Polling cannot guarantee the real-time performance of the data. When is the request? How often does the request occur? They are all issues that have to be considered, and the polling method will put a lot of pressure on the server.

Long polling

We gave the answer at the beginning. nacos uses the client actively pulls the pull model, and uses long polling ( Long Polling ) to obtain configuration data.

amount? I have only heard of polling before. What the hell is long polling? How is it different from polling in the traditional sense (called short polling for the time being, for comparison)?

short poll

Regardless of whether there is a change in the server configuration data, it keeps initiating requests for configuration, such as the previous JS polling the order payment status in the payment scenario.

The disadvantage of this is obvious. Since the configuration data does not change frequently, if the request is kept, it will inevitably cause a lot of pressure on the server. It will also cause a delay in pushing data. For example, a configuration is requested every 10s. If the configuration is updated in the 11th second, the push will be delayed for 9s and wait for the next request.

In order to solve the problem of short polling, there is a long polling scheme.

long polling

Long polling is not a new technology. It is just an optimization method for the server to control the return time in response to client requests to reduce invalid requests from the client. In fact, for the client, it is not the same as the use of short polling. Essentially the difference.

After the client initiates the request, the server will not immediately return the request result, but will suspend the request for a period of time. If the server data changes within this period, it will respond to the client request immediately, and if there is no change, it will wait until the specified timeout Responding to the request after time, the client re-initiates the long link.

Nacos first acquaintance

For the convenience of follow-up demonstration operation, I set up a Nacos . Note: encountered a small pit when running, because Nacos is started in the cluster standalone is usually stand-alone 060e2ae00690a2, here you need to manually change the startup mode in the startup startup.X

Just execute /bin/startup.X directly, the default user password is nacos .

Several concepts

Several core concepts of Nacos dataId , group , namespace , their hierarchical relationship is as follows:

dataId : It is the most basic unit in the configuration center. It is a key-value structure. key is usually our configuration file name, such as: application.yml , mybatis.xml , and value is the content under the entire file.

Currently supports multiple configuration formats such as JSON , XML , YAML

group : Group management of dataId configuration. For example, it is developed in the same dev environment, but different branches in the same environment require different configuration data. At this time, group isolation can be used, and the default group is DEFAULT_GROUP .

namespace : project development process will certainly be dev , test , pro and so many different environments, namespace is a different environment isolation, all the default configuration in public years.

Architecture design

The following figure briefly describes the architecture flow of nacos

The client and console register the configuration data to the server by sending Http requests, and the server persists the data to Mysql.

The client pulls the configuration data and sets up dataId in batches to initiate long polling requests. If the server configuration item changes, it will respond to the request immediately, and if there is no data change, the request will be suspended for a period of time until the timeout period is reached. In order to reduce the pressure on the server and ensure the availability of the configuration center, the client who pulls the configuration data will save a snapshot in a local file and read it first.

Here I have omitted more details, such as authentication, load balancing, and high-availability design (in fact, this part is really worth learning, and I will post another article later), mainly to clarify the data interaction between the client and the server mode.

Below we analyze the source code of Nacos 2.0.1 version. Versions later than 2.0 have changed a lot, which is slightly different from many information on the Internet.
Address: https://github.com/alibaba/nacos/releases/tag/2.0.1

Client source code analysis

The client source code of the Nacos nacos-client project, and the NacosConfigService implementation class is the core entry point for all operations.

Before talking about the client data structure cacheMap , we will focus on remembering it here, because it almost runs through all the operations of the Nacos client. Due to the existence of multi-threaded scenarios to ensure data consistency, cacheMap adopts AtomicReference atomic variables to achieve.

/**
 * groupKey -> cacheData.
 */
private final AtomicReference<Map<String, CacheData>> cacheMap = new AtomicReference<Map<String, CacheData>>(new HashMap<>());

cacheMap is a Map structure, the key is groupKey , which is a string concatenated by dataId, group, and tenant (tenant); the value is a CacheData object, and each dataId holds a CacheData object.

get configuration

Nacos The logic for obtaining configuration data is relatively simple. First, take the configuration in the local snapshot file. If the local file does not exist or the content is empty, then pull the corresponding dataId configuration data from the remote through an HTTP request and save it to the local snapshot. Request The default retry is 3 times and the timeout period is 3s.

Obtaining configuration has two interfaces getConfig() and getConfigAndSignListener() getConfig() just sends ordinary HTTP requests, while getConfigAndSignListener() addTenantListenersWithContent() to initiate long polling and register monitoring for dataId data changes.

@Override
public String getConfig(String dataId, String group, long timeoutMs) throws NacosException {
    return getConfigInner(namespace, dataId, group, timeoutMs);
}

@Override
public String getConfigAndSignListener(String dataId, String group, long timeoutMs, Listener listener)
        throws NacosException {
    String content = getConfig(dataId, group, timeoutMs);
    worker.addTenantListenersWithContent(dataId, group, content, Arrays.asList(listener));
    return content;
}

Register to monitor

The client registers for monitoring and first cacheMap the CacheData object corresponding to dataId

public void addTenantListenersWithContent(String dataId, String group, String content,
                                          List<? extends Listener> listeners) throws NacosException {
    group = blank2defaultGroup(group);
    String tenant = agent.getTenant();
    // 1、获取dataId对应的CacheData,如没有则向服务端发起长轮询请求获取配置
    CacheData cache = addCacheDataIfAbsent(dataId, group, tenant);
    synchronized (cache) {
        // 2、注册对dataId的数据变更监听
        cache.setContent(content);
        for (Listener listener : listeners) {
            cache.addListener(listener);
        }
        cache.setSyncWithServer(false);
        agent.notifyListenConfig();
    }
}

If not, initiate a long polling request to the server to obtain the configuration. The default Timeout time is 30s, and the returned configuration data is backfilled to CacheData object, and the content is used to generate the MD5 value; then the listener is registered addListener()

CacheData is also a category with a very high frequency of appearances. In addition to the basic attributes related to dataId, group, tenant, and content, there are also several more important attributes such as: listeners , md5 (md5 calculated by content real configuration data Value), as well as registration monitoring, data comparison, and server-side data change notification operations are all here.

Among them, listeners is the collection of all the listeners registered for dataId. In addition to ManagerListenerWrap Listener listener class, there is also a lastCallMd5 field. This attribute is very important, and it is an important condition for judging whether the server data has changed.

While adding the monitor, the current latest md5 value of the CacheData object will be assigned to the lastCallMd5 attribute of the ManagerListenerWrap

public void addListener(Listener listener) {
    ManagerListenerWrap wrap =
        (listener instanceof AbstractConfigChangeListener) ? new ManagerListenerWrap(listener, md5, content)
            : new ManagerListenerWrap(listener, md5);
}

Seeing this pair of dataId monitoring settings is over? We found that all operations are surrounded by CacheData object in the cacheMap structure, so we have a bold guess that there must be a special task to process this data structure.

change notification

How does the client perceive that the server data has changed?

We see from the beginning, NacosConfigService class's constructor to initialize a ClientWorker , and in ClientWorker class's constructor, he also started a thread pool to poll cacheMap .

In executeConfigListen() there are ways in such a period of logic, check cacheMap in dataId of CacheData within the object, listens MD5 field and registered listener in lastCallMd5 value does not represent the same configuration data changes trigger safeNotifyListener method to send data change notification.

void checkListenerMd5() {
    for (ManagerListenerWrap wrap : listeners) {
        if (!md5.equals(wrap.lastCallMd5)) {
            safeNotifyListener(dataId, group, content, type, md5, encryptedDataKey, wrap);
        }
    }
}

safeNotifyListener() method starts a separate thread to push the changed data content to all clients that have registered to listen dataId

The client receives the notification, directly implements the receiveConfigInfo() method to receive the callback data, and then processes its own business.

configService.addListener(dataId, group, new Listener() {
    @Override
    public void receiveConfigInfo(String configInfo) {
        System.out.println("receive:" + configInfo);
    }

    @Override
    public Executor getExecutor() {
        return null;
    }
});

In order to understand more intuitively, I use the test demo to demonstrate, get the server configuration and set up the monitor. Whenever the server configuration data changes, the client monitor will receive a notification. Let's see the effect together.

public static void main(String[] args) throws NacosException, InterruptedException {
    String serverAddr = "localhost";
    String dataId = "test";
    String group = "DEFAULT_GROUP";
    Properties properties = new Properties();
    properties.put("serverAddr", serverAddr);
    ConfigService configService = NacosFactory.createConfigService(properties);
    String content = configService.getConfig(dataId, group, 5000);
    System.out.println(content);
    configService.addListener(dataId, group, new Listener() {
        @Override
        public void receiveConfigInfo(String configInfo) {
            System.out.println("数据变更 receive:" + configInfo);
        }
        @Override
        public Executor getExecutor() {
            return null;
        }
    });

    boolean isPublishOk = configService.publishConfig(dataId, group, "我是新配置内容~");
    System.out.println(isPublishOk);

    Thread.sleep(3000);
    content = configService.getConfig(dataId, group, 5000);
    System.out.println(content);
}

The result is the same as expected. After the publishConfig , the client can immediately perceive it, and it is using the active pull pull mode to make the real-time push effect of the server.

数据变更 receive:我是新配置内容~
true
我是新配置内容~

Server source code analysis

Nacos server source code of the ConfigController nacos-config project. The logic of the server is slightly more complicated than that of the client. Here we will focus on it.

handles long polling

/v1/cs/configs/listener provided by the server to the outside world, this method is not much content, look doPollingConfig

Server at the request of header in Long-Pulling-Timeout attribute to distinguish between polling requests are long or short poll, here we are only concerned with long polling section, then look LongPollingService (remember that this service is critical) class addLongPollingClient() how to deal with customer approach The long polling request of the end.

The default request timeout time set by the normal client is 30s , but here we find that the server "secretly" subtracts 500ms , and now the timeout time is only 29.5s , so why do you do this?

According to the official explanation, it is necessary to respond to the request 500ms in advance, in order to ensure that the client will not time out due to network delays, considering that the request may take some time during load balancing. After all, Nacos was originally based on Ali’s own business body. It's designed in quantity!

At this time, groupkey the MD5 of 060e2ae0069b08 submitted by the client with the current MD5 of the server. If the md5 is different, it means that the configuration items of the server have changed. Directly put the groupkey into the changedGroupKeys collection and return it to the client.

MD5Util.compareMd5(req, rsp, clientMd5Map)

If not changed, then the client request is pending, the process to create a named ClientLongPolling scheduled task Runnable , and submitted to the scheduler timer thread pool postponed 29.5s execution.

ConfigExecutor.executeLongPolling(
                new ClientLongPolling(asyncContext, clientMd5Map, ip, probeRequestSize, timeout, appName, tag));

Here, each long polling task carries a asyncContext object, so that each request can be delayed in response. After the delay arrives or the configuration is changed, the asyncContext.complete() called to complete the response.

asyncContext is a new feature of Servlet 3.0, asynchronous processing, so that the Servlet thread does not need to be blocked all the time, waiting for the service processing to be completed before outputting a response; you can release the thread and related resources allocated to the request by the container first, reducing the burden on the system, and the response will be Delay, respond to the client after processing the business or calculation.

ClientLongPolling task is submitted to the delayed thread pool for execution, the server will allSubs queue. This is the process of client registration and monitoring.

If the client data has not changed during the delay period, the long polling task will be allSubs queue after the delay time is reached, and the request response will be responded to, which is cancel the monitoring. After receiving the response, the client initiates a long poll again, and it goes back and forth.

处理长轮询

At this point, we know how the server suspends the client's long polling request. Once the request is suspended, the user operates the configuration item through the management platform, or the server receives a request from other client nodes to modify the configuration.

How can the corresponding suspended task be cancelled immediately and the client is notified in time that the data has changed?

data change

The management platform or the client changes the configuration item connection publishConfig method in the ConfigController

It is worth noting that there is such a piece of logic publishConfig dataId configuration data is modified, a data change event Event will be triggered.

ConfigChangePublisher.notifyConfigChange(new ConfigDataChangeEvent(false, dataId, group, tenant, time.getTime()));

A closer look at LongPollingService will find that in its construction method, it just subscribes to the data change event, and executes a data change scheduling task DataChangeTask when the event is triggered.

订阅数据变更事件

DataChangeTask main logic within that traverse allSubs queue, above we know that this queue is maintained in all client long polling request task, to find that contains the current changed from these tasks groupkey of ClientLongPolling task, in order to achieve more variable data Push it to the client and remove this long polling task allSubs

DataChangeTask

When we see the response to the client response , we call asyncContext.complete() end the asynchronous request.

Concluding remarks

The above only reveals the nacos configuration center. In fact, there are still many important technical details that have not been mentioned. It is recommended that you look at the source code if you have nothing to do. The source code does not need to be read throughout, as long as the core part is enough. Up. For example, I didn’t care too much about this topic before. Suddenly, when I was asked, I couldn’t eat well. I looked at the source code decisively, and the memory is more profound (others chewed and fed you the knowledge is always a bit less interesting than what I chewed) .

I personally think the source code of nacos . The code does not have too much flamboyance, and it looks relatively easy. Please don't have any resistance to looking at the source code, it is just business code written by others, just so so !


I am a little rich - , if useful to you looking , attention and support, we see the next issue -

I have compiled hundreds of technical e-books for students in need. The technology group is almost full, students who want to join can add my friends and blow the technology with the big guys.

E-book address

Personal public number: Programmer's point of affairs , welcome to communicate


程序员小富
2.7k 声望5.3k 粉丝