High-performance data access middleware OBProxy (4): One article explains connection management

In the last article, we talked about the troubleshooting of OBProxy, analyzed the problems that you may encounter when using OBProxy, and gave practical solutions. Starting from this article, I will introduce the role and principle of OBProxy in the OceanBase distributed architecture, to help you understand OBProxy more thoroughly and achieve "easy use" and "good use". At the same time, with the continuous operation of OBProxy in hundreds of enterprises, we have accumulated a lot of engineering practice experience, and we will also use the problems encountered as cases to share with you the principle of OBProxy for your reference.

This article starts with connection management, an important feature of OBProxy. For connection mapping relationship, session state synchronization, etc., you can compare it with a stand-alone database and experience the similarities and differences between a distributed system and a stand-alone system.

OBProxy connection management steps and principles

Before the formal explanation, let me introduce the technical background of this content. We know that OBProxy provides database access and routing functions for users, and users can use the OceanBase database normally when they connect to OBProxy. When a user uses the database function, OBProxy interacts with OBServer, and the interaction process is transparent to the user. Connection management is one of the key points in the interaction process.
OBProxy's connection management has three features:

Proxy features: OBProxy is both a client and a server, and it also needs to ensure that the interaction behavior conforms to the MySQL protocol specification.
Features: OBProxy implements many connection features, such as accessing different clusters and different tenants, supporting active and standby databases, distributed ps functions, and compatibility with commands such as kill and show processlist.
High- availability features: OBProxy can handle issues such as timeouts, machine status changes, and network status changes, shielding back-end exceptions and making users unaware.

Next, we will explain the connection management content step by step according to the operation steps of using OBProxy.

Step 1: User login

1. Login Information

When logging in to OBProxy, we need to fill in the database IP and PORT, user name, password and other information. For the user name, the format of OBServer is user_name@tenant_name. Because OBProxy can proxy different clusters, the format changes to user_name@tenant_name#cluster_name, and the meanings of the fields are as follows:

user_name: The login user name. The password is stored in OBServer. OBProxy only forwards login packets and does not perform password verification.
tenant_name: OBServer is a multi-tenant architecture, and tenant_name indicates the name of the accessed tenant.
cluster_name: Cluster name. OBProxy supports accessing multiple clusters. Different clusters are distinguished by cluster_name.

With this information, we can connect to OBProxy to access the database through JDBC driver, MySQL command line, Navicat and other tools.

You may be wondering, how does OBProxy find the corresponding machine when a user logs in? This depends on the OCP system (the url address of the OCP is specified through the obproxy_config_server_url configuration item). OCP will save the cluster name and the machine list of the cluster. OBProxy obtains this information by accessing the OCP. The whole process is as follows.

With the tenant's machine list, we can perform routing forwarding. It should be noted here that the rslist startup method described in the second part of this series will omit steps 2 and 3 in the above figure. This method only supports access to one cluster.

2. Login authentication

After finding the machine, we can log in and authenticate. The key information is the user name and password. In the MySQL protocol (the official picture shows the Handshake related messages, in the actual process, the server needs to reply with an OK or Error message after step 2). The interaction process for the client) is as shown in the figure below.

As a proxy component, OBProxy has more steps to be compatible with MySQL behavior. Let's take a Java program to connect to the database as an example to illustrate the whole process. In the Java program, there is only one line of login code:

 conn = (Connection) DriverManager.getConnection(URL, USER_NAME, PASSWORD);

But the principle behind it is more complicated, as shown in the figure below.

After completing the login step (step 8) of the protocol, the login interaction at the MySQL protocol layer ends, but JDBC will send some initialization SQL, which is also part of the login process. Step 9 in the figure represents this process. There are multiple initialization SQLs, the contents are as follows:

 set autocommit=1, sql_mode = concat(@@sql_mode,',STRICT_TRANS_TABLES')
set names utf8
SELECT @@max_allowed_packet,@@system_time_zone,@@time_zone,@@auto_increment_increment,@@tx_isolation AS tx_isolation,@@session.tx_read_only AS tx_read_only
select @@version_comment, @@version limit 1

After the above SQL is executed, the Java program can send the business SQL.

3. Common login issues

Now that you have understood the login principle of OBProxy, you can quickly locate and solve the problem when the login fails. Common problems and reasons are as follows:

If the user name or password is incorrect, it can be determined by directly connecting to the OBServer.
The OCP fails and the cluster machine list cannot be pulled. You can use the curl command to access the url of the OCP to determine.
The OBServer failed to obtain the machine list or initialize the SQL. Log troubleshooting is required.
If it exceeds the maximum number of connections or is not in the whitelist, it will be explained in detail later.

For the above problem, you can solve it by viewing the obproxy_error.log mentioned in the third article of this series.

Step 2: Connection Management

After the login is successful, the network connection between the client <->OBProxy<->OBServer is established. At this time, the OBProxy only establishes a connection with one of the OBServers. With the arrival of SQL requests, if routed to a new OBServer, a connection will be established with the new OBServer. In this process, the mapping relationship, state synchronization and connection function characteristics of the connection are involved, which will be explained one by one.

1. Mapping relationship of connections

Connection mapping mainly talks about the relationship between client connections and server connections. Let's start with a client connection. After the client establishes a connection with OBProxy, OBProxy will establish a connection with the following N OBServers. The whole relationship is shown in the following figure.

As you can see, OBProxy establishes connections with two OBServers as needed. These two connections only belong to this one client connection and will not be reused by other client connections. The key point of connection mapping is to identify each connection with an id and record the mapping relationship between the ids. We can abstract the above figure into a model:
App<-----[proxy_sessid1]---->OBProxy<---[server_sessid1]----->OBServer1

 <---[server_sessid3]----->OBServer3

In this way, we can use proxy_sessid to uniquely identify the connection between App and OBProxy, and use server_sessid to uniquely identify the connection between OBProxy and OBServer. When SQL execution errors or slow execution occur, the mapping relationship will be printed in the log, so as to associate the App with the OBServer to locate full-link problems.

2. State synchronization

A client connection corresponds to multiple server connections. To ensure the correctness of the execution results, the session states of multiple server connections are required to be consistent. So, what problems can the state be out of sync with? As a counter example, suppose the user executes the following SQL command:

 set autocommit=1;
insert into t1 values(1);
insert into t2 values(2);

The execution process is as follows:

set autocommit=0 to send to OBServer1
insert into t1 values(1) send to OBServer1
Perform connection switching, insert into t2 values(2) and send it to OBServer2

For the third SQL, the connection between OBProxy and OBServer2 is not synchronized with the connection state autocommit=1, which may result in the third statement insert into t2 not committing the transaction.
The correct step is for OBProxy to synchronize the value of the autocommit variable before sending the INSERT SQL to OBServer2. OBProxy solves the problem of state synchronization through the version number mechanism, and realizes the state synchronization of database, session variables, last_insert_id, and ps prepare statements to ensure the correctness of functions.

3. Connection Features

Different from a stand-alone database, OBProxy changes the connection mapping relationship to M:N, so some connection functions require additional processing. For example, when a user checks the number of connections through show processlist, what he wants to see is the number of connections between the client and OBProxy, not the number of connections between OBProxy and OBServer. Below we introduce the common connection functions in detail.
Connection stickiness. OBProxy has not implemented state synchronization for all functions, such as transaction state, temporary table state, cursor state, etc. For these functions, OBProxy will only send subsequent requests to the node where the state begins, so that state synchronization is not required, and the disadvantage is that the advantages of the distributed system cannot be fully utilized. Therefore, we gradually support the decentralization of related functions according to their importance.
The show processlist and kill commands are used together. show processlist is used to display the connection between the client and the server. For OBProxy, show processlist only displays the connection between the client and OBProxy, not the connection between OBProxy and OBServer. The kill command is used to kill a client connection. After the client connection is closed, OBProxy will also close the corresponding server connection. For the kill command of OBProxy, you need to obtain the corresponding id first, as shown in the Id column in the following figure (show proxysession and show processlist have similar functions, show proxysession is an OBProxy exclusive command).

Load balancing effects. Because OBProxy processes the show processlist and kill commands, the show processlist and kill commands can only work properly if they are both sent to the same OBProxy. In the public cloud and other environments, there is a load balancer in front of OBProxy, and multiple OBProxy behind the load balancer. At this time, if the show prcesslist and kill commands are executed on two different connections, the load balancer component may send requests to different OBProxy , in this case, we'd better not use the related command.
After introducing the technical principles of OBProxy's connection mapping relationship, state synchronization and connection features, you may feel that if you use common proxies such as HAProxy, many places will be much simpler, such as:

After the user connects to the common agent, the common agent will only establish a connection with one OBServer, and the connection mapping is 1:1 management.
The connection mapping of ordinary agents is 1:1, so there is no need to perform state synchronization, and the connection function does not require special processing, and it can be directly forwarded.

So, why is OBProxy recommended, and what are its advantages?

Ordinary proxies cannot perform high-availability disaster recovery. OBProxy detects the OBServer status (upgrade, downtime, etc.) through internal tables, error codes, and other information, and blocks and cleans OBServer nodes.
Common agents cannot give full play to the performance of the OceanBase database. After the connection is established, common agents cannot send SQL to other OBServer nodes, and sometimes the execution link will be longer. OBProxy can route accurately and realize features such as read-write separation.
Ordinary agents have limited functions and cannot adapt to OceanBase active and standby databases, replicated tables, and LDC architectures.

The above are the steps and principles of using OBProxy. If we want to know more about the connection principle, we need to go back to the TCP protocol. Because the connection of OBProxy is based on the TCP protocol, understanding the connection mechanism of the TCP protocol can more thoroughly grasp the functions of connection management and the location of connection problems.

Extended Knowledge Point 1: Connection Mechanism Based on TCP Protocol

TCP parameters

OBProxy sets TCP's no_delay and keepalive attributes at the code level to ensure low latency and high availability.

The no_delay attribute solves the delay problem by disabling the TCP Nagle algorithm. The Nagle algorithm is enabled by default in the Linux network stack to solve the occurrence of small packets of network packets, but it will cause delay in sending network packets. We have encountered in the production environment that the Nagle algorithm is not disabled in TCP, resulting in a SQL sending time of about 40ms, which does not meet the business requirements.
The keepalive property is used for failure detection. Detecting machine failures in time and closing invalid connections is part of the high availability of the TCP layer.

These two properties can be configured through the configuration items of OBProxy. The recommended configuration is as follows:

 ## 和OBServer的tcp连接设置
sock_option_flag_out = 3;  --  这是个二进制位参数，bit 0 表示否是启用 no_delay，bit 1 表示是否启用 keepalive。3的二进制是 11，表示启用 no_delay 和 keepalive
server_tcp_keepidle = 5;  --  启动keepalive探活前的idle时间，5秒。
server_tcp_keepintvl = 5;  -- 两个keepalive探活包之间的时间间隔，5秒
server_tcp_keepcnt = 2;  --  最多发送多少个keepalive包，2个。最长5+5*2=15秒发现dead_socket。

## 和客户端的tcp连接设置
client_sock_option_flag_out = 3;  --  同上
client_tcp_keepidle = 5;  --  同上
client_tcp_keepintvl = 5;  -- 同上
client_tcp_keepcnt = 2;  --  同上

We have verified the no_delay and keepalive properties in the production environment, you can use them with confidence. For other TCP parameters, we directly use the default values, and there is no need to adjust the parameters.

timeout parameter

When we are troubleshooting the problem, we occasionally encounter a situation where the TCP connection is disconnected, but it is impossible to determine whether the client is disconnected or the server is disconnected. Judging from experience, it is often triggered by the timeout mechanism. Here are the full-link solutions for several timeout mechanisms.

1. JDBC timeout

(1) socketTimeout

socketTimeout is the timeout period of Java socket, which refers to the timeout period of TCP communication between the business program and the back-end database. If the business program sends a MySQL Packet and the time exceeds socketTimeout, the Response message has not been received from the back-end database. At this time, JDBC will throw an exception and the program execution fails.
The socketTimeout unit is milliseconds and can be set by:

 jdbc:mysql://$ip:$port/$database?socketTimeout=60000

(2) connectTimeout

connectTimeout is the timeout time for the business program to use jdbc to establish a TCP connection with the backend database. It is also equivalent to the socketTimeout in the connect phase. If the TCP connection is not established successfully after this time, jdbc will throw an exception.
The unit of connectTimeout is milliseconds and can be set in the following ways:

 jdbc:mysql://$ip:$port/$database?socketTimeout=60000&connecTimeout=5000

(3) queryTimeout

queryTimeout is the local timeout time set by jdbc when the business program executes SQL. When the business calls the interface of jdbc to execute SQL, jdbc will enable this timeout mechanism internally. If the execution does not end after quertTimeout, jdbc will send a kill query to the back-end database on the current connection, and throw a MySQLTimeoutException to the upper-layer business.
The queryTimeout unit is seconds and can be set in the following ways:

 1、通过jdbc的Statement.setQueryTimeout接口来设置
    int queryTimeout = 10;
java.sql.Statement stmt = connection.CreateStatement();
stmt.setQueryTimeout(queryTimeout);

It should be noted here that if you want to use queryTimeout, it is recommended to set a value smaller than socketTimeout, otherwise the network timeout will be triggered first and the connection will be disconnected. But we do not recommend using queryTimeout, you can use the ob_query_timeout property of the OceanBase database.

(4) JDBC timeout practice

Several important timeout parameters of JDBC can be set in the ConnectionProperties of the connection pool, or on the JDBC URL:

parameter	illustrate	Recommended value
socketTimeout	Network read timeout, if not set the default is 0, use OS default timeout	5000ms
connectTimeout	The connection establishment timeout period, if not set, the default is 0, the OS default timeout period is used	500ms

These parameters are summed up based on Ant's internal OLTP business for your reference.

2. OceanBase database timeout

The OceanBase database has its own specific timeout parameters. The difference between these parameters and JDBC is that the behavior of OceanBase after a timeout is triggered is to return an ERROR message instead of disconnecting the connection .

(1) ob_query_timeout

ob_query_timeout is an OBServer SQL-level timeout parameter, which can be set through the hint or session variable, indicating the timeout period for executing a SQL. When the execution time of the business SQL in the database exceeds the value set by ob_query_timeout, OBServer will return an ERROR packet to the client with the error code 4012. The default value of ob_query_timeout is 10000000 (10s), in microseconds. It can be set in the following ways:

 // Java程序示例

// 1、通过sql设置session变量或者global变量，设置完成之后，当前连接生效
stmt.execute("set @@ob_query_timeout = 10000000");
stmt.execute("set @@global.ob_query_timeout = 10000000");

// 2、通过在业务sql中添加hint设置，当前sql生效
String sql = "/*+ QUERY_TIMEOUT(10000000)*/ select count(*) from XXX";
stmt.execute(sql);

(2) ob_trx_timeout

ob_trx_timeout is the transaction timeout time of OBServer. It is a session variable. When a transaction exceeds ob_trx_timeout and has not ended, OBServer will also return a 4012 ERROR message to the client. If the transaction has not been committed when the timeout occurs, OBServer will roll back the transaction; if it occurs in the commit phase, OBServer will not actively roll back it because the transaction status is undetermined. The default value of ob_trx_timeout is 100000000 (100s) in microseconds, which can be set in the following ways:

 // Java程序示例

// 通过sql设置session变量或者global变量，设置完成之后，当前连接生效
stmt.execute("set @@ob_trx_timeout = 100000000");
stmt.execute("set @@global.ob_trx_timeout = 100000000");

(3) Compatible with MySQL timeout

In addition to the above-mentioned timeout parameters specific to the OceanBase database, the OceanBase database is also compatible with the MySQL timeout parameters. You can refer to the MySQL timeout document . The MySQL timeout parameter will cause disconnection, and there are three parameters.

wait_timeout: idle session time, the default is 8 hours, if no SQL is sent after this time, the connection will be disconnected.
net_read_timeout: Timeout for waiting to read data from the connection. A timeout will cause the connection to be closed.
net_write_timeout: The timeout for waiting to write data from the connection. A timeout will cause the connection to be closed.

Because most of the timeout mechanisms will directly close the connection after triggering, and do not print the timeout log, it is difficult to determine the cause of the TCP disconnection without understanding the timeout mechanism. As a proxy layer, OBProxy does not add a timeout mechanism, which will not make the mechanism more complicated. At the same time, OBProxy will sense the settings of the database timeout parameters, so that the overall performance conforms to the timeout mechanism.

Extended Knowledge Point 2: Common Network Tools

There are many times when we need to understand system network behavior to better solve problems. For example, if a Connect Error appears in the OBProxy log, we can further confirm the network failure between the two machines through the ping command. Next, I will introduce some commonly used network tools for you.

ping and telnet commands

The ping and telnet commands are simple and easy to use. The ping command can determine whether the network between two machines is connected, and can understand the delay between the networks, which is very useful for solving many problems. For example, if the delay is tens of milliseconds, it is determined that the route is routed to the computer room in other cities, which is not expected in the oltp business. The telnet command can further determine whether the machine port is listening, so as to find out that the service program is not started, and the program core is dropped.

Packet capture tool

When the problem involves the protocol interaction of multiple modules, more useful information cannot be obtained by only analyzing the log. In this case, you can use the tcpdump tool to obtain the real network situation. For example, if you see that the database executes SQL slowly in the OBProxy log, but the SQL execution is fast in the OBServer, you will suspect a network problem. You can only determine whether it is a network problem based on TCP packets.
In standard deployment, the port of OBProxy is 2881, and the port of OBServer is 2883. If you want to capture the interactive messages between OBProxy and OBServer, you can execute the following on the OBProxy machine:

 sudo tcpdump -i any port 2881 -w xxx.cap

The TCP message will be saved in xxx.cap. Get the file and copy it to the local machine and use wireshark software to analyze it.
In addition to the commands introduced above, commands such as ifconfig, ss, lsof, and curl are all very useful. There are a lot of information on the Internet, so I won't introduce more here.
In conclusion, in distributed systems, network problems are common. To understand the reason behind it, you need to be very familiar with the operating system, database system, and network tools, which increases the difficulty of judgment. The above content introduces these knowledge points in detail to help you better understand the distributed database system.

Extended Knowledge Point 3: Frequently Asked Questions about Connection Management

Although there are many knowledge points about connection management, when a connection problem occurs, the error reported by the client is basically Communications link failure. The key point here is to connect the two ends of the TCP through "key information" to obtain more useful information for further analysis. If the "key information" is SQL and time, then you can go to obproxy_error.log to check. After finding the log record, you can further investigate according to the content introduced in the third article of this series.
Here we give two examples. Example 1 The client reports an error:

 --- Cause: com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: Communications link failure
The last packet successfully received from the server was 70,810 milliseconds ago. The last packet sent successfully to the server was 5,005 milliseconds ago.

Analyzing the error message, you can see that the last time the client received a request from the server was 70810ms ago, and the last time the client sent data to the server was 5005ms (5s) before. This is a problem in the process of executing SQL, which is generally caused by the sockettimeout=5s set by Java itself, which can be confirmed to the application developer.
Example 2 The client reports an error:

 --- Cause: com.alipay.oceanbase.obproxy.druid.pool.GetConnectionTimeoutException: get connection timeout, maxWait=500ms, maxCount=10, currentCount=11,

You can see that this is an error from the Druid connection pool, and the connection timed out. At this time, the SQL has not been executed, but the connection cannot be obtained from the connection pool, and the cause of the problem is more complicated. Common reasons are:

The connection pool itself is configured with too few connections. Refer to the value of maxCount. Generally, it can be solved by increasing it.
If there is slow SQL, the connection is not returned in time. For this problem, it is necessary to check which SQL is executed slowly.

It can be seen that the error report is only an appearance, and it is difficult to determine the root cause at once, and more information needs to be collected for in-depth investigation.

Summarize

Connection management involves many comprehensive knowledge points and involves many modules (including drivers, OBProxy, and OBServer), so it becomes complicated, but this is also the common "pain point" of microservices and distributed databases. In this article, we start with user login, introduce connection management, network technology and common problems, and introduce relevant principles and practices to you. After years of practice, OBProxy has solved many connection management problems. You are also welcome to learn and communicate.

After class interaction

Last interactive answer

Xiao Ming wants to view all the SQL that has passed through OBProxy. Is there any way?
Answer: obproxy_digest.log is an audit log, which records all SQL whose execution time is greater than query_digest_time_threshold. Therefore, you can modify the value of query_digest_time_threshold to 1us, and you can see all the logs that pass through OBProxy in obproxy_digest.log.

Interactive questions in this issue

A connection is established between JDBC and OBProxy. At night, OBProxy is killed and restarted. Will JDBC throw an exception at this time?
You are welcome to post and discuss in the Q&A area, and the next article will reveal the answer.