Author: Shimian

background

In a distributed system architecture, business traffic is end-to-end. Each request will go through many layers of processing, such as from the ingress gateway to the Web Server to the invocation between services, and then to the service accessing cache or storage such as DB.

 title=

The database is a very important piece of our system. Therefore, whether in terms of stable governance or in scenarios such as development and efficiency improvement, database-related governance capabilities are the capabilities our system needs to have.

Here are some typical database-related governance scenarios:

  • A system provides a query interface to the outside world, and the SQL statement involves multi-table join. In some cases, a slow query will be triggered, which takes up to 30s. Eventually, the DB connection pool/Tomcat thread pool is full, and the application as a whole is unavailable.
  • The application has just started, because the database Druid connection pool is still being initialized, but a large number of requests have entered at this time, which quickly causes Dubbo's thread pool to be full, and many sites are stuck in the process of initializing the database connection, resulting in a large number of business requests reporting errors.
  • In the full-link grayscale scenario, because the new application version changed the content of the database table, the grayscale traffic caused the data in the online database to be chaotic, and business students manually corrected the online data overnight.
  • In the early stage of the project, the performance of SQL was not well considered. With the development of the business and the increase in the number of users, the SQL of the old online interface has gradually become a performance bottleneck. Therefore, we need effective SQL insight to help us discover the legacy SQL. , and perform performance optimization in time.
  • The long processing time of SQL statements results in a large number of slow calls on the online business interface. It is necessary to quickly locate the problematic slow SQL, and isolate it through certain governance methods to quickly restore the business. Therefore, when microservices access the data layer, real-time SQL insights can help us quickly locate slow SQL calls.

In fact, for most back-end applications, the bottleneck of the system is mainly limited by the database. Of course, the complexity of the business must be inseparable from the operation of the database. Therefore, database issues are also the highest priority work, and database governance is also an essential part of microservice governance.

Common Scenarios Related to Database Governance

The following summarizes some common scenarios and capabilities in database governance when microservices access the database layer.

 title=

An overview of database governance in the OpenSergo world

Slow SQL governance

Slow SQL is one of the more fatal factors affecting system stability. Slow SQL in the system may cause CPU, abnormal load, and system resource exhaustion. Severe slow SQL may drag down the entire database and cause disruptive risks to online business. The possible reasons for slow SQL in the online production environment are as follows:

  • Hardware reasons such as slow network speed, insufficient memory, low I/O throughput, and full disk space.
  • There is no index or the index is invalid.
  • Too much system data.
  • The performance of SQL was not considered at the beginning of the project.

 title=

For common online slow SQL problems, MSE service governance provides scenario-based solutions.

  • SQL Insights

MSE provides second-level SQL call monitoring:

 title=

We can observe the real-time data of the application and resource API dimensions (refined to the second level), and MSE also provides the TopN list of SQL, we can see the SQL statements with high RT at a glance, and quickly locate the root cause of application slowdown.

 title=

Through the SQL insight provided by MSE, we can effectively analyze whether the SQL statement is written reasonably, and whether the concurrency of SQL execution and RT meet the expectations of system performance. Based on these SQL insight data, we can effectively evaluate the overall performance of the system, Provides an important basis for the configuration of flow control degradation rules.

  • Flow control degradation for SQL

We can configure the flow control or downgrade rules of the number of threads for applications with slow SQL based on the SQL statements automatically identified by MSE. When slow SQL calls occur, we can limit the number of SQL executed at the same time to prevent excessive slow SQL statement execution. run out of resources.

 title=

Regarding the SQL flow control degradation capability of MSE, MSE supports the configuration of four rules: flow control, isolation, circuit breaker, and hotspot current limiting.

1. Flow control : Through the flow control capability, configure flow control rules for the service interface, allowing requests within the capacity range to pass, and unnecessary requests are rejected, which is equivalent to the role of an airbag, which can effectively ensure that the flow of SQL request access is controlled in the system within the capacity threshold.

Later, MSE will provide the SQL insight capability of database and table dimension aggregation. Based on this capability, we can control the traffic of the specified database and table within the estimated capacity range.

2. Concurrency isolation : When the traffic is approximately stable: the number of concurrent threads = QPS * RT(s), where RT increases and the number of concurrent threads increases, which means that service calls accumulate. Using the service concurrency isolation capability provided by traffic governance, configure a limit on the number of concurrent threads for important service calls, which is equivalent to a "soft insurance" to prevent slow SQL or unstable services from excessively crowding out normal service resources.

3. Circuit breaker and downgrade : During the peak period of business, some downstream service providers encounter performance bottlenecks when accessing a large amount of data, resulting in a large amount of slow SQL, which even affects the business. We configure automatic fuse rules for database access of some non-critical services. When the proportion of slow calls or errors in a period of time reaches a certain condition, the fuse is automatically triggered, and the service calls for a period of time will directly return the result of the Mock, which can not only ensure the calling end It will not be dragged down by accumulated data access requests, thus ensuring the normal operation of the entire business link.

4. Hotspot flow control : Through the hotspot parameter flow control capability , the parameter value of the TopN access heat in the SQL request access parameters is automatically identified, and these parameters are individually flow-controlled to avoid single hotspot access overload; and can be accessed for some special hotspots (such as a very popular snap-up item) configure a separate flow control value. The parameter can be any condition with business attributes in SQL access, such as the value of the tid parameter below.

 SELECT * FROM order WHERE tid = 1$

Connection Pool Governance

Connection pool governance is a very important part of database governance. Through some real-time indicators of connection pools, we can effectively identify risks in the system in advance. The following are some common connection pool governance scenarios.

1. Establish a connection in advance

In the scenario of application release or elastic expansion, if the connection in the newly started instance has not been established, but the instance has been started and the readiness check has passed, it means that a large amount of business traffic will enter the newly started instance at this time. pod. A large number of requests are blocked on the action of acquiring connections from the connection pool, resulting in the full thread pool of the service and the failure of a large number of business requests. If our application has the ability to establish connections in advance, then the number of connection requests can be guaranteed to be above minIdle before the traffic arrives, and with the ability to warm up with small traffic, the above headache can be solved. problem.

2. "Bad" connection culling

Sometimes there will be some problematic connections in the connection pool. It may be that the underlying network is jittering, or the business execution is slow or deadlocked. If we can detect abnormal connections in time from the perspective of connection pool, and remove and recycle them in time, then we can ensure the overall stability of the connection pool, and will not be dragged by individual problematic business processing or network jitter. collapse.

3. Access control

In theory, not all database tables can be accessed casually. At some point, some important tables may be for some less important services. We want it to be a write-forbidden, read-only state, or when the database appears In the case of jitter and full thread pool, we hope to reduce some time-consuming SQL execution of reading the database, or tables with some sensitive data only allow a certain application to read and write access. Then we can use the dynamic access control capability to issue access control rules in real time to achieve access control for individual methods, applied SQL-oriented database instances, and tables that prohibit reading and writing.

Database Grayscale

In the microservice architecture, the dependencies between services are intricate, and sometimes a function release depends on multiple services being upgraded and launched at the same time. We hope that the new versions of these services can be verified with small traffic at the same time. This is the unique full-link grayscale scene in the microservice architecture. By building an environment isolation from the gateway to the entire backend service, multiple different versions can be verified. service for grayscale verification. MSE uses the shadow table method, users can achieve full-link grayscale at the database level without modifying any business code.

 title=

Summarize

The above is a preview of a database governance capability that MSE will soon launch. From the perspective of application, we have sorted out and abstracted some of our practical experience in terms of stability governance, performance optimization, and efficiency improvement in accessing and using the database. For a back-end application, the database is undoubtedly the top priority. We hope that through our database governance capabilities, we can help everyone use database services better.

Standard OpenSergo for Service Governance

Q: What is OpenSergo?

A: OpenSergo is a set of open, general-purpose service governance standards oriented to distributed service architecture and covering the whole-link heterogeneous ecology. It forms a general service governance standard based on industry service governance scenarios and practices. The biggest feature of OpenSergo is that it defines service governance rules with a unified set of configuration/DSL/protocol, oriented to multi-language heterogeneous architecture, and achieves full-link ecological coverage . Whether the language of the microservice is Java, Go, Node.js or other languages, whether it is standard microservice or Mesh access, from gateway to microservice, from database to cache, from service registration discovery to configuration, developers can The same set of OpenSergo CRD standard configuration performs unified governance and control for each layer, without paying attention to the differences between frameworks and languages, reducing the complexity of heterogeneous and full-link service governance and control

OpenSergo will also launch standards related to database governance in September, which will further abstract capabilities related to standardized database governance. At present, the OpenSergo community is working with various communities for further cooperation, and through the community to discuss and define a unified service governance standard. The current community is also working with bilibili, ByteDance and other companies to jointly build standards. Interested developers, communities and companies are also welcome to join in the joint construction of OpenSergo service governance standards. Welcome to join the OpenSergo community exchange group (Dingding group) for discussion: 34826335

Click here to enter the OpenSergo official website~


阿里云云原生
1.1k 声望315 粉丝