2

It's been a long time since I wrote anything. I just took advantage of the post-holiday syndrome during the Spring Festival to write an article to warm up. I remember that I occasionally wrote some articles about the interpretation of TiDB product functions in the past few years. TiDB 5.0 has been published for so long, so I should also write Once written. In fact, I have expressed my emphasis on 5.0 on many occasions. This version may be MySQL 5.x for TiDB. Friends who are familiar with MySQL ecology must know what I am talking about. MySQL 5.x, especially 5.5~ 5.7 These important versions basically laid a solid foundation for the rapid expansion of MySQL, and also cultivated a large number of MySQL users and talents. I think TiDB 5.x, especially after 5.2, has also seen a trend of entering the fast lane and has begun to show ecological dominance. For me, TiDB is an excellent sample. Before this, there were few open source basic software products made from zero to one in China. Most engineers and product managers were "users" of these software. It is more about building business systems, and TiDB allows us to participate in it from the perspective of a "designer" for the first time: the thinking behind the setting of each functional feature, the value presentation of basic software products, and the experience are still very different. I will write some feelings through this article. The other article is compiled from the sharing I gave to Presales and PM training in PingCAP before the Spring Festival. It is not necessarily correct and is for reference only.

What do we do, what does it mean to users?

To talk about the product value of basic software, we must first overcome the first hurdle: learn to empathize. In fact, each version of TiDB comes with dozens of features and fixes, but most of the time our Release note just faithfully reflects "what we did":

Screenshot of Release Note of TiDB 4.0 GA

Please don't get me wrong here, this type of record is necessary, but it's not enough. For example, in TiDB 5.0~5.5, we introduced many new features: clustered index, asynchronous commit transaction model, optimized SQL optimizer, supported CTE, introduced lock view and continuous performance diagnostic tools, improved Hotspot scheduler, reducing the delay of obtaining TSO, introducing Placement Rules SQL... These names are no problem in the eyes of TiDB developers, but please note that the more important question is: what do these mean to users (customers)?

There are two ways to answer this question, I will talk about them separately:

  1. Through an imaginary target scene, and then through the product to meet this scene to show value.
  2. Solve the most annoying problems of existing solutions (including your own old versions) to show value.

For the first line of thinking, it usually applies to relatively new features, especially some new things that have never been seen in the past. Take a more understandable example: if you invent a car when everyone is driving a horse-drawn carriage, at this time, if you use the car to solve the problem of horses grazing as a value point, it is obviously absurd and more reasonable. It is the convenience brought by the scene depicting high-speed commuting as a selling point. There are two key points to this approach:

  1. First of all, this scenario was originally imagined by the product manager (of course, many interviews and fieldwork must have been done), so how to ensure that this scenario is "high-value" and "universal"? For a successful basic software, this is especially important. Usually, if you can catch such a point in the early stage of the project, it is equivalent to half the success. Of course, the requirements for product managers are very high, and it usually requires a strong vision And the driving force, which is why many CEOs of product companies are early plus product managers, because the CEO needs to have both in the early stages of the project. Of course, it is stronger like the reality distortion field of Jobs, who created the iPod/iPhone out of nothing and changed the whole world. What kind of courage and vision is this (I believe that Jobs should be able to imagine today's world when he conceived the iPhone). There is no way to do this, it basically depends on people.
  2. Whether the value of your product is most directly reflected in this scene. The best directness is usually directed at the heart, the “feelings” that people can directly experience. For developer products, the anchor point I usually choose is "user experience", because a good experience is self-evident, and the comparison of cars and carriages is a complete victory when it comes to commuting comfort and efficiency; just like TiDB and MySQL The solution of sub-database and sub-table is the same as the elastic expansion capability, and the experience is also a complete victory. There are many ways to refer to this point. If you are interested, you can refer to my article on user experience.

The first idea is essentially Storytelling, and the benefits of this approach are:

Very good verification. When you understand the story, the typical user journey will naturally come out. At this time, you can fully experience yourself as an imaginary user to verify. This is what I usually use to verify our own products. The way the manager works 😁.
Users are very receptive, and the reason is simple: everyone likes to listen to stories, everyone likes to watch demos, and everyone likes to copy homework.
For the second way of thinking, it is usually applicable to some improved features. The key point is: how painful is the problem to be solved? There is no perfect software, there must be various problems on the side of heavy users, and such problems are usually difficult for developers of this function to understand. At this time, what to do is very simple, just bend down to understand , to use, to feel. I often go to chat with the front-line classmates of our customer delivery team, and it was no exception before this sharing. The general conversation is as follows:

Me: Regarding our SQL optimizer, what do you think is the biggest headache in your daily work?

Ta: Execution plan mutation.

Me: By the way, is that hint is not enough? And 3.0 introduced SQL Binding? Did these help?

Ta: For some incurable diseases, it is difficult for you to specify a specific execution plan through hint (and then attach an example in a real business scenario, a hundred lines of SQL, it is really impossible to start), and SQL Binding problem The thing is, after I bind the SQL execution plan, if there is a better plan, do I need to start again?

Me: Why don't we introduce SQL Plan Management in 4.0? Isn't the automatic evolution function inside it just to solve this problem?

Ta: Yes, but we dare not open the production environment. For extremely important OLTP scenarios, we cannot tolerate the risk of jitter caused by automatic changes to the execution plan.

Me: What does our product do to make life easier for you?

Ta: 1. For complex SQL, I can choose the target execution plan, just let me choose binding instead of constructing through Hint; 2. SPM finds a better execution plan, just notify me in time, I will make changes, not Automate decision changes.

After hearing the two feedbacks in the last sentence, I found them very enlightening. In fact, these two requirements are very pertinent, and the development cost is not large, but it does save a lot of time and the mental burden of the DBA.

There are many similar examples, but the key point is: find heavy users of the product, dig deep into his most troublesome problems, and sometimes have unexpected gains (for example, go to the OnCall site to observe everyone's operations). Moreover, the solution of such problems is usually accompanied by a good sense of movement. Some of the improvements in observability of TiDB in recent versions are basically obtained through similar observations.

However, to demonstrate the value of the second idea, we must find a suitable audience. For example, the problems we solve for application developers (database users) are usually different from those for database operators (DBAs). Faced with the wrong object, the result may be a chicken-and-duck talk.

When the user is saying: "I want this", what is Ta actually saying?

It is difficult to find product managers and solution engineers for basic software in China. I think there are historical reasons. As I mentioned above, for a long time in the past, we usually looked at software from the perspective of "users". This means that it is usually obvious from the problem to the solution. For example, if I need to do a high-performance User Profile system that supports sub-millisecond low-latency read and write, the data volume is not large, and data loss can be tolerated, then I use Redis All right! But as a Redis product manager, it is difficult for him to design Redis functions for the very specialized scenario of User Profile.

Good basic software product managers usually choose common skill points, with the smallest possible set of functions to encompass greater possibilities (such flexibility is encouraged, e.g. UNIX), so this is very important for basic software vendors. The pre-sales and solution engineers put forward higher requirements: many "features" required by the business need to be combined with multiple "technical points", or provide better solutions by guiding them to the right problems. Below I will illustrate this point with a few examples:

The first example, we are often asked by users: Does TiDB have a multi-tenancy function?

My answer to this question is not a simple "yes" or "no", but what is the problem that users really want to solve? What is the subtext? In the case of multi-tenancy, there are probably no escapes from the following situations:

Subtext 1: "It is too expensive to deploy a set of TiDB for each business", value point: cost saving
Subtext 2: "I do have a lot of businesses using TiDB. For me, the machine cost is not a problem, but the configuration management is too troublesome, and I have to upgrade one by one. The monitoring and so on can't be reused." Value point: reduce the complexity of operation and maintenance
Subtext 3: "Some of my scenes are very important, and some scenes are not so important and need to be treated differently. I want to share the unimportant ones, but isolate the important ones", value point: resource isolation + unified management and control
Subtext 4: "I have regulatory requirements, such as encryption and auditing of different tenants", value point: compliance
After figuring out the situation, for these different situations, I will take one of them as an example: save costs and expand on it. The next step is to think about what we have on hand.

For TiDB 5.x, there are roughly the following technical points related to the above feature:

Placement Rule in SQL (flexible function to decide data placement)
TiDB Operator on K8s
XX (a new product of PingCAP, which has not been released yet, please look forward to it, it is roughly a multi-cluster visual control platform)
TiDB Managed Cloud Service
For the appeal of cost saving, the usual reason is that the proportion of hot and cold data is quite different. We have observed that most large clusters conform to the 2/8 principle, that is, 20% of the data carries 80% of the traffic, and especially for financial services, In many cases, data can never be deleted, which means that users also need to pay storage costs for cold data. In this case, it is not cost-effective to deploy according to unified hardware standards, so from the user's point of view, it is very reasonable. 's demands.

The next thing to think about is: there is nothing new in the world, how do users solve such problems now?

In scenarios like the separation of hot and cold, I have seen many solutions that use HBase or other relatively low-cost database solutions for cold data (for example, MySQL sub-database and sub-tables run on mechanical disks), while hot data is still placed in the OLTP database, Then manually import it into the cold data cluster according to the time index (or partition) on a regular basis. In this way, for the application layer, it is necessary to know which data to query and where to query, which is equivalent to the need to connect two data sources, and such an architecture is usually difficult to deal with sudden cold data read and write hot spots (especially ToC-side business, occasionally There will be some burst traffic "digging graves").

Then the next question is: what difference can our product bring to users by solving this problem? If users still need to manually relocate data, or build two TiDB clusters with different configurations, then there is no big difference. In this scenario, if TiDB can support heterogeneous clusters, and can automatically solidify hot and cold data in a specific configuration At the same time, it supports the automatic exchange of cold data to hot data, which is the best experience for users: a DB means the lowest cost of business changes and maintenance. A new function was released in TiDB 5.4, called Placement Rules in SQL. This function allows users to use SQL to declaratively determine the data distribution strategy, and can naturally specify the distribution strategy of hot and cold data. Further, for more complex data distribution methods required by multi-tenancy, for example, the data of different tenants are placed on different physical machines, but they can be managed and controlled uniformly through a TiDB cluster, which can also be achieved through the function of Placement Rules in SQL.

Meta Feature: A Solution Architect's Treasure

Having said that, I would like to further expand a concept. There are some functions that are different from other functions. Such functions can be used as the basis for building other functions and combining new features. I call this type of function: Meta Feature. The Placement Rule mentioned above is a typical Meta Feature. For example: Placement Rule + Follower Read can be combined into a database close to the traditional sense of one write and many read (but more flexible, More fine-grained, especially suitable for temporary data acquisition or temporary query, ensuring fresh data without affecting online business), Placement Rule + user-defined permission system = support for physical isolation and multi-tenancy; Placement Rule + Local Transaction + cross-center subordinate = multi-location (WIP); Placement Rule can also carefully place the facility data, allowing TiDB to avoid distributed transactions (simulating sub-databases and sub-tables) and improve OLTP performance.

Meta Feature is usually not directly exposed to terminal users, because the flexibility is too strong, users will have a certain learning cost and threshold for getting started (unless careful UX design), but this kind of capability is very important for architects/solution providers. / Ecological partners are especially important, because the more Meta Features, the higher the 'playability' of a system, and the more differentiated solutions can be created. But often we make a mistake: Does flexibility equal product value? I don't think so, although engineers (especially Geek) have a natural fondness for this kind of openness, but I am skeptical about whether end users can tell such a story well, looking at the end user market for Windows and UNIX The occupancy rate is known. On this example, I recently heard an excellent example, and I want to share it with you: You can't say that a latte is better for an American lover, because you can flexibly control the milk content, and the milk is reduced to 0. American.

Let's look at another scenario, about batch processing. Friends who are familiar with the history of TiDB must know that the original intention of our earliest project actually started from the replacement of MySQL Sharding. Later, many users found out: my data is already in TiDB anyway, why not do calculations directly on it? Or some complex data transformation work using SQL encountered the bottleneck of single-computer computing power, and because of some business requirements, these calculations still need to maintain strong consistency or even ACID transaction support. A typical scenario is the bank's clearing and settlement business. I don’t quite understand the young ones. It’s good to run this kind of batch processing business directly on Hadoop. Later, after I understand the situation, I realize that I am still young. For banks, many traditional clearing and settlement services run directly on the core database. And the business is not simple. A job with hundreds of lines of SQL is commonplace. It is very likely that the developer who developed the job has disappeared, and no one dares to easily rewrite it into an MR job. In addition, for batch results, it may be necessary to backfill database, and the whole process needs to be completed in just a few hours, and it is a production accident if it is not completed. Originally, if the amount of data was not that large, it would be fine to run on Oracle and DB2 minicomputers, but in recent years, with the rise of mobile payment and e-commerce, the amount of data has become larger and faster, and Scale -Up must become a bottleneck sooner or later. TiDB hits two very high value points in it:

SQL compatibility (especially after 5.0 supports CTE and the temporary table function introduced in 5.3, the compatibility and performance of complex SQL are greatly improved), and also supports financial-level consistent transaction capabilities.
Horizontal computing expansion capabilities (especially after 5.0 supports TiFlash MPP mode, unlocking the ability to perform distributed computing on columnar storage), theoretically, as long as there are enough machines, the throughput can be expanded.
For the bank's batch business, the troublesome problem of structural transformation has become a simple problem of buying a machine. But there are several pain points in the solution designed in the early days of TiDB:

Bulk data import
Distributed Computing
For the first question, usually a typical TiDB process for batch tasks is: download (daily transaction records are published in the form of files) -> write these records to TiDB in batches -> calculation (through SQL) -> The calculation results are backfilled into the TiDB table. The file records may be in a large number of text files (such as CSV) format. The easiest way to write is to directly write records one by one using SQL Insert. This method is not a problem to deal with a small amount of data, but a large amount of data. In fact, it is not cost-effective. After all, most imports are offline imports. Although TiDB provides large transactions (maximum 10G for a single transaction), there are several problems from the user's point of view:

Batch writing is usually offline. The core demands of users in this scenario are: fast! In this scenario, it is not necessary to complete the distributed transaction process.
Although there is a 10G boundary, it is difficult for users to cut accurately.
The writing process of a large transaction means that a larger memory cache is required, which is often overlooked.
A better way is to support physical import, directly generate data files of the underlying storage engine in a distributed manner, distribute them to storage nodes, and directly insert physical files, which is what TiDB's Lightning does. In a recent real user scenario, Lightning used 3 machines, and completed the transcoding and import of ~30T raw data in about 72 hours, and the import throughput was about 380GB/h. So in batch scenarios, being able to use the Lightning physics import mode is usually a faster and more stable solution.

Another pain point, the computational bottleneck (it sounds unreasonable, hahaha), in the early days when TiDB did not support MPP, TiDB only supported 1-layer operator pushdown, that is, the calculation of Coprocessor distributed in TiKV The results can only be aggregated on a TiDB node for aggregation. If the intermediate results are too large and exceed the memory of the TiDB node, OOM will occur, which is why TiDB needs to introduce Spark for more complex distributed computing in the past. Because of the reason (especially the Join of large tables and large tables), in the past, for complex batch business, it was necessary to introduce a batch of Spark machines to supplement the computing power of TiDB through TiSpark. However, after TiDB 5.0, TiFlash's MPP mode was introduced, which can aggregate calculation results through multiple TiDB nodes, so computing power is no longer a bottleneck, which means that it is very likely that in some TiDB batch computing scenarios, 5.0 Being able to save a batch of Spark machines means a simpler technology stack and higher performance.

Further, there is another reason for introducing Spark, that is, in the stage of backfilling the calculation results, due to the transaction size limit of TiDB and the improvement of the efficiency of concurrent writing, we will use Spark to perform distributed data insertion into TiDB. In theory, this process can also be improved by Lightning. TiFlash MPP can output the result data into CSV, and Lightning supports data import in CSV format.

Therefore, the original link may theoretically become: Lightning imports data to TiDB -> TiDB uses TiFlash MPP to perform calculation results and outputs CSV -> Lightning again passes the CSV results to TiDB; it may be better than using TiSpark + The solution for large transactions is faster, less resource-efficient, and more stable.

We can expand on this solution and think about it carefully. The optimization of the above solution actually utilizes the large-scale data writing capability of Lightning. In theory, data import scenarios with 'large write pressure' can be improved through this idea. . Here I share a real feedback from TiDB users: after this customer's business system is on TiDB, there will be a scenario of regular large table import. They want to first assign the large empty table to a specific idle host through the Placement Rule, and then quickly import data through Lightning. There is no need to consider measures such as current limiting, which can reduce the impact on the overall cluster and achieve fast import; on the contrary, if TiDB does not have this scheduling capability, customers can only keep the cluster stable through current limiting, but the import speed will be very slow. This example realizes online batch writing through Placement Rule + Lightning, which also echoes the previous description of Meta Feature.

Originally, there was an example of 'database and table' vs TiDB in the offline sharing, but it will not be expanded due to space constraints. Those who are interested can think according to the above ideas.

More implicit, but bigger and longer-term value: Observability and Troubleshooting capabilities

In the last part, as you can see, I have been working hard to convey this message recently. For a basic software product, an important long-term competitiveness and product value comes from observability and Troubleshooting capabilities. There is no perfect software in this world, and for experienced developers, the ability to quickly discover and locate problems is a must. For the commercialization of basic software, service support efficiency and self-serving are also scaled. Base. I'm here to talk about some of the new things we've done recently, and the challenges ahead.

TiDB Clinic (tiup diag)

Why do this thing? In the past, when we were doing troubleshooting, it was a painful process. Except for the problem that the experience of the old driver was only in the old driver's mind, which I mentioned in the previous article on observability, I observed that it was actually time consuming. The bulk of the data comes from collecting information, especially if it is deployed in the user's own environment. The user is not familiar with system diagnosis. When asking for our service support, the frequent dialogues are:

Service support: please run this command xxx and tell me the result

Customer: (The result was given after 2 hours)

Service support: Sorry, please look at the graph of a certain indicator on your monitoring interface

Customer: Screenshot for you

Service support: Sorry, the time period is wrong. . . Then adjust the rules of grafana and do it again

client:! @##¥#¥%

Service support (changed personal duty every few days): please run this command xxx and tell me the result

Customer: Didn't you give it before?

In this way, asynchronous and inefficient problem diagnosis is a source of great pain, and one of the core reasons why oncall cannot scale. User pain points are:

'Can't you just ask for all the information at once? I don't know what to give you'
'The information is too large and too complicated, how can I give it to you? '
'My dashboard is in the intranet, you can't see it, I can only take screenshots'
'I cannot expose business information, but can submit diagnostic information'
But in turn, the pain points of TiDB's service support staff are:

'The direction of the original guess is not right, and other metrics are needed to verify it'
'I can't fully reproduce the metrics and system status of the fault site, I want to operate Grafana freely'
'Different service support staff share context for the same user'
So there is the Clinic product, under the premise of the user's consent:

One-click automatic collection of various indicators related to system diagnosis through tiup
Automatically diagnose some common errors with a constantly learning rules engine
Diagnostic information storage and playback platform for different tenants (similar to SaaS)
If you are familiar with AskTUG (TiDB User Forum), you may see links like this: https://clinic.pingcap.net/xxx (for example, this case: https://asktug.com/t/topic /573261/13 )

For users, only need to execute a simple command in the cluster, a link like the above will be generated, and important diagnostic information will be shared with the professional service support staff of PingCAP, we can see in the background:

In fact, TiDB Clinic is also a new attempt for the maintainability of basic software: the SaaS-based diagnostic capability, through a rule engine that is continuously strengthened in the cloud, couples fault diagnosis and repair suggestions with local operation and maintenance deployment. Such capabilities will become a new value point for users to choose TiDB, and it is also a strong ecological moat for TiDB.

Profiling in TiDB Dashboard

In my mind, whether a basic software product is good or not, I have a special standard: those with built-in Profiler are basically products of conscience, and those that can optimize the UX of the Profile experience are more conscientious. For example, Golang's pprof, used to say good. In fact, this point is not difficult to do, but it can save lives at critical moments, and usually there is no way to profile when something goes wrong. At this time, if the system tells you that you have saved a profile record at the time of the failure, this kind of profile The experience of helping in the snow is a good one.


In fact, this function comes from several oncall cases that we have actually dealt with, and they are all problems that cannot be covered by metrics. There is a large type of failure, which is due to hardware bottlenecks, and probably cannot escape the CPU and disk. The disk bottleneck is relatively good. Check to see if there is a large IO (Update / Delete / Insert) or the Compaction of RocksDB itself, but the way to find the CPU bottleneck is much more ambiguous. Profiler is almost the only way:

What does the Call Stack on the critical path of the CPU look like
What do the function calls on these critical paths imply?
The second question is usually the key to the problem and will give an optimization direction. For example, we found that the CPU consumption of SQL Parse/optimization is particularly large, which implies that a mechanism such as Plan Cache should be used to improve CPU utilization.

Currently TiDB provides two profiles in 5.x: manual profile and automatic continuous profile. The two application scenarios are different. The manual one is usually used for targeted performance optimization; the automatic continuous profile is usually used for backtracking after system problems. .

challenges

It's almost over, let's talk about the challenge. PingCAP was established in 2015, and it is about to be 7 years old now. During these 7 years, it has experienced some very important industry changes:

Database technology transitions from distributed systems to cloud native; although many people may think that these two terms are not the same concept, because cloud native is also implemented by distributed systems? But I think cloud native is a fundamental change in the way of thinking about designing systems. I have mentioned this in many of my other articles, so I won't go into details.
Open source database software companies have found a model for scalable commercialization: Managed Services on the cloud.
The global basic software field is going through the stage of changing from 'usable' to 'easy to use'
These points represent cognitive changes in two directions:

Technically, it is necessary to complete the transition from relying on computer operating systems and hardware to relying on cloud services. This is a huge technical challenge. For example, if using EBS, is Data Replication still necessary? Is it possible to break the constraints of limited computing resources by using Serverless? If this problem is superimposed on the existing system, there may be a large number of existing users, and it will become more complicated. Of course, cloud-native technology is not equal to public cloud only, but how to design a path to slowly transition to cloud-native Technology-based new architecture? This will be a huge challenge for R&D and product teams.
The second change will be a bigger challenge, because the business model is changing. In traditional open source database companies, the mainstream business model is a manpower business based on service support, and the advanced one is an insurance business like Oracle. But none of these business models can well answer two questions:
The difference in value between commercial and open source editions
How to scale, it is impossible to scale by manpower
The SaaS model can answer these two questions very well, and the integration of infrastructure software and SaaS model will have a greater amplification effect. It has been mentioned, but the real challenge is: how to adjust an organization for a traditional software sales + service support-oriented software company to an operation-oriented online service company? From the perspective of the R&D system, here are a few small examples: 1. Version release. For traditional software companies, it is good to release 1-2 versions a year, but online services may be upgraded once a week, so don’t be too small. Looking at the difference in the release rhythm, this difference determines the difference in the entire R&D and quality assurance system model. 2. If services are provided on the cloud, then supporting operation support systems (billing, auditing, fault diagnosis, etc.) and corresponding SRE teams are required. These systems may not be in the traditional software R&D system. The focus on developer experience becomes especially important.

Of course, the challenges are more than these, and there are no standard answers, but I am still full of confidence in the future. After all, these trends are essentially accelerating the transformation of technology into social and commercial value and lowering the threshold, all of which are good and pragmatic. Change, which is of course good for companies like PingCAP. There is a sea of stars ahead. Everything is man-made, and there is a lot to do. I originally wanted to write a small article, but I didn't expect that it would be a bit long to write, so let's stop here. I wish everyone a happy new year.


PingCAP
1.9k 声望4.9k 粉丝

PingCAP 是国内开源的新型分布式数据库公司,秉承开源是基础软件的未来这一理念,PingCAP 持续扩大社区影响力,致力于前沿技术领域的创新实现。其研发的分布式关系型数据库 TiDB 项目,具备「分布式强一致性事务...