The content of this article comes from the speech sharing of "Soundnet Developer Entrepreneurship Lecture Hall Vol.02 ", and the lecturer is Liu Yongzhi, an expert consultant of Thoughtworks. You can click this link to watch the video playback and download the lecturer's PPT.
From the end of last year to the present, with the repeated outbreaks of the epidemic, the One-Code-Pass system in many cities has failed, which proves that the One-Code-Pass system still has some technical deficiencies, so this sharing will introduce how to use the PAST problem solving framework, Research and solve these problems from the perspective of architecture and design.
01 PAST Problem Solving Framework
The first word P in PAST is Problem, which stands for problem . When encountering problems, don't rush into the planning stage, you should first conduct research and analysis to confirm what the problem is. This is also mentioned in Eric Evans' "Domain Driven Design", understanding the target domain and incorporating what has been learned into the software is the first task of Domain Driven Design (DDD), which emphasizes the importance of the problem.
The second word A is Analysis, which represents the analysis of contradictions . There may be many reasons for the problem. In this process, we must first find out the main contradiction and the secondary contradiction, and then find the corresponding solution for these reasons or contradictions, that is, S . At this point, you can list the options first, and then make tradeoffs and trade-offs in the Tradeoff stage . In the practice of software or architecture design, most of the time, trade-offs are made, not decisions, so it is necessary to make trade-offs in design and finally formulate a plan. Depending on the plan, it may be necessary to follow the trend and conduct a review of the results in the final stage.
02 Problem
The picture below shows the one-yard pass system in a city. Assuming that the place of production is Xihong City, Figure 1 (left) is the one-code-pass system under normal circumstances, including name, ID number and two-dimensional code, of which two-dimensional code is divided into green code, red code or yellow code. In addition, the lower part of Figure 1 (left) also contains vaccination information and nucleic acid test results within 15 days. As can be seen from the figure, the main entrance of Yicodetong integrates many functions. Figure 1 (right) is the fault display page of the One-Code-Pass system on Monday morning. It shows that the One-Code-Pass system is blank. In addition, there are problems such as the nucleic acid test results not displaying any information.
■Figure 1
03 Analysis
3.1 The main contradiction
In the analysis stage, it is necessary to try to investigate and analyze these problems to find the causes and contradictions of the problems. At this time, it is mainly a contradiction between a large number of citizens who need to open the nucleic acid certification page of the One-Code-Pass system from different occasions on Monday morning, and the inability of the One-Code-Pass system to meet the large concurrency at the same time. The One-Code-Pass system cannot be opened, causing users to refresh repeatedly, and system user requests soaring. At this time, the request may directly reach the background or even the service layer, and then enter the distributed cache or database, which causes the traffic of the background server to suddenly increase and the corresponding network bandwidth to increase.
3.2 Architecture Analysis
Next, we will further analyze the architecture and design. From the data level, there may not be a good caching mechanism. Many query requests go directly to the server or even the database, causing a breakdown of the cache, and a lot of traffic is "smashed" to the database. In terms of the flexibility of change frequency, the problem with the one-code-pass system is that the personal code page aggregates too much content, and there is no container-based cluster building and circuit breaker mechanism . From the perspective of CFR cross-functional requirements, the problem is that the developers did not consider the peak limit of the server when designing the service, and did not do a good job of performance and stress testing in the system testing and design stage, resulting in the system eventually exceeding the load . In terms of network bottlenecks, theoretically, the transmission speed of a 1000M network card is 125MB/s, and the transmission speed of a 100M network card is 12.5MB/s. The willingness to fail on the day may be that the network bandwidth is not supported enough, resulting in bottleneck.
■Figure 2
04 Solution Enumeration
After completing the analysis, it is necessary to formulate a solution to the problem. During the process of enumerating the solutions, do not express your inclination in a hurry. You should first determine the alternatives and arrange and combine them.
4.1 Based on Data Hierarchical Architecture
For data, it can be divided into UI layer personality data, cache data and DB full data. The cache design follows the principle of proximity. The closer the data is to the user, the better, so the performance may be better . According to this principle, the data-based layered architecture is actually a funnel-type architecture, which is a caching strategy based on objects and collections, which can reduce the access to the underlying system. For each layer of data, different methods can be used for processing.
For the UI layer personality data, as far as the problems in the One-Code-Pass system are concerned, when the user clicks to query the nucleic acid result, the button can be disabled, and repeated submission is prohibited from the function. For example, it can be opened after 15 seconds, which will prevent cut off the flow. For cached data, it can be cached in the browser, such as caching common pictures, static files, and scripts, which may only require limited resources to allow requests to enter the corresponding subsequent stages. In addition, the CDN cache can be used to distribute the network. Data arrives at the application layer or server from the UI layer, and can be proxied by NGINX or an intermediate server. For example, on the server side, a cache is established for data that is frequently queried but less modified.
For the full amount of DB data, the main focus should be on storage, rather than performing complex operations. Database vendors can support database current limiting when a certain amount of traffic is reached. Returning the corresponding exception code will tell the user that it is unavailable, and the corresponding server can perform fuse processing according to the situation, so as not to let the database keep processing the information and fail to respond, which will cause the entire application to crash. In addition, when there is a problem with the One-Code-Pass system that day, it is a good choice to establish different microservices for dynamic and elastic expansion of nucleic acid reports. The division method is DDD.
■Figure 3
4.2 Frequency and flexibility of business changes
In the process of starting a business or in some more complex systems, you can do some experience design to divide the business capabilities of the system. Based on the context, it is necessary to judge whether to switch from a monolith to a microservice according to the business capabilities of the system. In addition, the frequency and flexibility of business changes should also be considered. For example, nucleic acid detection is a function that has been used very frequently recently. Put it on the homepage, and you can directly query it after entering the system. Facts have also proved that Xihong City put the nucleic acid detection function directly on the page a few days after the outbreak of the epidemic, which indirectly shows that the frequency of business changes is very important to the design of the system.
■Figure 4
4.3 CFR – Test Design and Performance
For CFR, Figure 5 shows an instructive test quadrant. The Q1 quadrant supports the entire testing of the team from a technical perspective, including unit testing and component testing, which can help the team find problems as soon as possible. The Q2 quadrant supports team testing from a business perspective and focuses more on finding functional and business problems. The Q3 quadrant evaluates the product from a business perspective and mainly includes some exploratory testing. The Q4 quadrant evaluates products from a technical perspective, including performance testing, stress testing, and safety testing. The four quadrants can be divided into two guidelines: quality delivery (Q1, Q2, Q3) and operation and maintenance (Q1, Q2, Q3, Q4). With the prevalence of DevOps, these four quadrants are often combined to formulate an effective testing strategy, so that testing and development can be implemented in the project.
■Figure 5
In terms of performance design, in the case of high concurrency, such as 100 requests per second, should 100 requests be placed directly on the server and database for 100 queries? Apparently not, the solution should be to merge these requests together. Requests can be combined in the form of timers or timers, and the corresponding API can be found and returned after the query. This is actually a variant of batch query. But if there are few requests, there is no need for request merging and should be configured according to the situation. There is also a method called current limiting. At this time, the token bucket algorithm can be used. The capacity of the token bucket is certain, and the tokens are added at a certain rate. If the bucket is full, it will not continue to be added. . The leaky bucket algorithm can also be used. No matter how many concurrent numbers there are currently, the outflow rate ensures that the number of requests received by the background program is certain, which can achieve the purpose of current limiting. This method is not suitable for the case of one-pass system events. The middleware current limiting method is that Tomcat uses maxThreads to achieve current limiting, or NGINX's limit_req_zone and burst to achieve rate current limiting. NGINX's limit_conn_zone and limit_conn two instructions can control the total number of concurrent connections.
4.4 Network Bottleneck
From the perspective of network bottlenecks, in order to prevent network congestion, you can try to change the access method from HTTP to TCP, such as accessing Redis cache. In this case, use the RESP method. You can also use a higher-grade network card, such as DNS load balancing, so that multiple IPs correspond to the same domain name.
05 Tradeoff tradeoffs and tradeoffs
Making software is making tradeoffs. Specifically, after the front end is landed, the client cache, browser cache, CDN cache, etc. can all start running. First, access the server. The server here contains the corresponding NGINX or load balancer. The traffic then reaches the application layer and the service layer. If the traffic at this stage is large, you can perform multi-line performance optimization or high-performance RPC, or add a cache. Then you can enter the microservice framework.
Back to the cache part, the data access layer may include Redis, etc. Some frequently accessed but infrequently changed data can be cached here, reducing I/O through request merging or querying. In the storage layer, the database pays more attention to the full amount of data. If the database pressure is relatively large, you can consider sub-database and sub-table. According to different data situations, even different people and different districts can build their own databases for access. In terms of infrastructure, the system must be able to support rapid expansion. If the flexibility of business change frequency is taken into account, cloud native is indispensable. The last one listed is for reference only.
Q&A session
1. How to lead and manage the technical team of start-ups?
To lead a team, you should first determine the direction of the team, which is the project vision. After the vision is determined, there are goals, and then according to the actual situation, the personnel skills required to support the completion of these goals are confirmed. Secondly, the team needs to improve its capabilities, because to achieve business goals, corresponding capabilities output is required. In addition, if there are more team members, there must be team norms, so that the strategy of the company or the project can be streamlined and the process can be tooled. For organizations, I think there is a constant learning attempt, and the pain points should be solved from the customer's point of view.
2. What are the principles for the CDN design of the One-Code-Pass system?
In general, when configuring, it is necessary to clarify what can be cached. For example, data that is not frequently accessed or changed frequently can be placed in the CDN cache, depending on the situation of the business data.
3. How are requests merged?
There are features in Java that can be introduced into requests. For example, if there are 100 requests in 1 second, after the request is introduced, it can be divided into 10 parts and traversed 10 times in one second through the thread pool. Specifically, requests can be added to the thread pool, and then the thread pool periodically triggers the call request. The feature function enables the corresponding request to be found after the request is returned from the database. For these problems, there are already relatively mature solutions in JavaSpring.
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。