Database upgrade itself is an operation with certain risks. How does TiDB achieve a safe, smooth, and nonsensical migration? This article combines a case of K8s cluster upgrade for users with over 100 million usage, introduces the usage of TiDB's upgrade toolkit, how to realize from simple parameter comparison to full-scene simulation replay, and choose independently according to actual needs and cost considerations Matching plan.
With TiDB upgrade kit, you do not have to worry about the outcome before the upgrade option, do not worry anxiety among the upgrade process . Read this article and bring you a happy upgrade! **
The bitterness and joy of
As an old user of TiDB, I believe that when you use TiDB, you may encounter situations where you need to upgrade the database. On the one hand, it is because TiDB itself is a fast iterative product, and the new features of the new version can well solve problems in actual business; or the current version encounters a security vulnerability or bug and needs high quality and stability new version support. either case, the upgrade is on the agenda.
But as an old player of the database, you should be very clear that the upgrade itself is a risky operation , for example: there are some new configuration parameters in the new version that you need to understand and adapt, and if the configuration is not good, problems may occur; The new version fixes security vulnerabilities and tightens security permissions. Some access modes supported by the old version also need to be rewritten. Some SQL execution plans were stabilized by various means before, but the new version also brings uncertainty. sex. It's like you go to a restaurant you know very well and suddenly find a different chef, the new chef may be a Michelin chef, but whether the dish is to your liking varies from person to person. In short, you and I think the same, and hope that the upgrade or migration is safe, smooth, and even unintentional. So that when the business finds you one day, it is not complaining about the sudden problem, but praises the system for being a lot easier to use recently. This is the ultimate purpose and value of the upgrade.
TiDB's treasure chest
So how do you ensure that your upgrade is safe and smooth? TiDB technical team provides a complete set of upgrade toolkits, from simple parameter comparison to full-scene simulation replay, you can choose an optimal matching solution according to your actual needs and cost considerations. Escort your upgrade operation. toolkit has actually been successfully applied to a K8s cluster upgrade for a user with over 100 million usage. We will introduce this user case in the following pages. Next, let's take a look at what these tools are:
- TiDBA helps you quickly identify changes in parameters of the new version by comparing the parameters of the version before and after the upgrade.
- Pt-upgrade used in the upgrade of many commercial customers such as mysql/maria/aurora, and it is determined to be available and reliable after a lot of practice. Main tool for the percona consulting team. The tool uses slow query log to synchronously replay on the upgraded source cluster (old version) and target cluster (new version) to test SQL compatibility.
- Plan Change Capturer (PCC) By detecting the execution plan changes of different versions of TiDB, it can help you identify SQL with degraded performance and identify the risks brought by the execution plan changes before upgrading.
- Workload-sim helps you fully evaluate the effect of an upgrade by running a real workload on a test system using Database Replay.
To put it simply, TiDBA is a parameter comparison tool that prompts you to add or modify parameters; pt-upgrade mainly solves compatibility and correctness problems; PCC is designed for execution plans to solve potential performance regression problems; and Workload -sim is a copy of the full business load, which can not only discover potential problems of upgrade in advance, but also evaluate the effect of upgrade. The resources they spend are from low to high, and the corresponding verification effects are also from rough to detailed. You can choose and freely combine them according to your needs.
's upgrade journey
Next we look at the actual combat exercise. The TiDB user in the case is a high-quality Q&A community on the Chinese Internet and an original content platform where creators gather, and is the largest Q&A community in Chinese. The reason for choosing to upgrade is that the old cluster version is earlier, and they are worried about encountering known problems that have been repaired and causing operation and maintenance accidents; in addition, customers need to unify all services to the same version to facilitate unified operation and maintenance management. The cluster to be upgraded is one of its most important commercial clusters, which carries its commercial and radio platform business. Therefore, customers attach great importance to the security of this upgrade. Therefore, we recommend customers to adopt TiDBA + PCC + Workload-sim , and the final customer uses TiDBA and Workload-sim to test and verify the upgrade.
upgrade environment
First let's look at the size of the cluster.
Production environment cluster information
Test environment cluster information
Note: In order to accurately assess the risk after the upgrade, it is recommended to create a cluster with similar specifications to the production environment for testing.
Upgrade
Then we look at the specific upgrade testing process.
- Use BR to back up full data from the production cluster
- Use BR to restore full data to the test cluster
- Step 2 During the operation, use the traffic replication tool workload-sim to collect traffic on a TiDB node in the production environment v4.0.9 at the same time (you need to confirm that all TiDB nodes have balanced access to business traffic)
- After step 2 is completed and the collection in step 3 is completed, use the traffic playback tool to play back traffic in the test environment to collect information
- Upgrade the test cluster from v4.0.9 to v4.0.14 target version and clear data
- The full amount of data backed up by BR Restore (it is recommended that the cluster can be completely rebuilt without the impact of empty regions)
- Use the traffic playback tool to play back traffic in the test environment v4.0.14 and collect information
- Compare the traffic playback information of v4.0.9 and v4.0.14
- 4.0.9 production and 4.0.14 test environments, using TiDBA to compare parameter changes
upgrade compared to
Next, let's take a look at the comparison of traffic playback effects.
before
after
It can be clearly seen from the figure: Before and after the upgrade, the business has not changed significantly, because it is a small version upgrade, which is in line with the test expectations.
Postscript
Three days after completing the comparison of the upgrade test tools, the customer upgraded the production environment during the business trough, and the actual upgrade in the end had the same effect. The whole process was quite stable, and the business was basically unaware. Seeing this, you may say, don't the results all look the same? Yes, the upgrade of the minor version and the correct upgrade operation have a high probability of successfully harvesting a new version. The difference is that is verified and compared using the upgrade tool, and then the upgrade operation is performed. You know that the upgrade is safe and predictable, so you don't have to worry about the gains and losses before the upgrade choice, and you don't have to worry about during the upgrade process. If you are still worried about the pain and pleasure of upgrading, you may wish to try our upgrade tools and let them help you upgrade happily!
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。