1. Background
The mobile team of Manbang Group started to try React Native in early 2018. After nearly three years of development, it has carried most of the core business scenarios, involving 16+ business modules, 200+ pages, and the average daily PV data is in the tens of millions. After the core business was also developed with React Native, we got rid of the restrictions of APP release and used dynamic release uniformly. Compared with APP releases, the frequency of dynamic releases has increased a lot, with a minimum of two releases a week, and sometimes even 5 releases a week.
When React Native was launched in 2018, the relatively new version 0.51 was used. In subsequent versions, Facebook officially introduced many new features, such as Hooks, Hermes engine and so on. We continue to use version 0.51, these new features are unavailable, and many third-party library businesses in the community based on the updated version of ReactNative are also unavailable. So, after using version 0.51 for 3 years, we decided to upgrade to the current newer version 0.62.
2. Improvements in version 0.62
Before, we have been using version 0.51. However, after nearly two years of iteration, React Native released version 0.62, and the performance of versions above 0.60 has been greatly improved compared to the previous version, which is mainly reflected as follows.
2.1 Performance improvement
Compared with version 0.51, the biggest improvement in version 0.62 is that Hermes is used as the JS execution engine on android, which greatly improves the startup speed, memory usage, and JS running efficiency.
2.2 Stability improvement
From version 0.51 to version 0.62, a large number of functional and stability bugs have been fixed. For example, the robustness of the SDK in the Native part has been greatly enhanced. For example, ReactHostView in Android, the security of show() and hide() is carried out. strengthened. Another example is the ViewManager part, which directly handles exceptions when it is illegal.
2.3 Community ecology
The ReactNative ecosystem is mainly divided into two parts:
React's own language features.
0.51 uses React 16.0, 0.6x uses 16.11.+, and many exciting new features have been added in the middle, such as 16.2.0 Context, 16.8.0 Hooks, these are undoubtedly the development tools!
React Native, React third-party libraries
The third-party libraries in the community often improve the dependency version of React every two years, such as the more famous navigation library:
React-Navigation , and adds many useful new features, such as ReactNative's internal routing stack began to support activation between pages, back to the background and other features. This is very practical in our daily development.
2.4 Android side performance
Starting from 0.6x, the Hermes engine was introduced on the Android side, which brought a great performance improvement. Compared with JSC, the biggest improvement of Hermes is to support the direct running of JS code precompiled products, so the cold start performance has been greatly improved, and the memory usage has also decreased to a certain extent, but the package size has increased.
In order to understand the performance improvement data, we conducted a performance comparison test between JSC and Hermes on the Android side. Test equipment: VIVO X21 RAM: 6G .
2.4.1 Cold start time-consuming data
As can be seen from the above figure, the cold start time of Hermes+HBC is reduced by more than 50% compared with JSC+JS, so we decided to use the Hermes+HBC solution.
2.4.2 Packet size data
As can be seen from the above figure, the compression ratio of the HBC binary package is obviously not as good as that of Jsbundle, and the volume is almost twice that of the latter. However, this can be avoided by subsequent unpacking and end-to-end conversion of HBC.
2.4.3 Code Instruction Processing Speed
When faced with a large number of operations and parsing, the performance of JSC has a particularly serious decline, while Hermes is relatively stable. The time-consuming ratio between Hermes and JSC is basically about 1/6, and the excellent processing speed has a great impact on frame rate and animation fluency. promote.
2.4.4 Memory usage
From the data calculated above, two conclusions can be drawn:
1. The memory performance of ReactNative 0.62 is significantly better than that of ReactNative 0.51. This is due to the loading mechanism of Hermes, which does not load the entire file into memory at one time for parsing.
2. The memory jitter of ReactNative 0.62 is relatively smooth, which is due to the fact that the product executed by Hermes is binary, not JS code, and does not require secondary transcoding.
The overall operation process involves 4 ReactInstanceManagers and 5 pages, saving 56 M of memory space, and the benefits are indeed considerable.
2.5 iOS Performance
After upgrading from 0.51 to 0.62, the JS engine on the iOS side is still only JSC. But outside of Jsbundle, RAM format is supported, using RAM and inline scheme, cold start speed and memory can be greatly improved. However, considering that we will split the base in the future, the RAM format is not used, and the JSC+Jsbundle solution is still used on the iOS side. Therefore, the iOS side does not have much improvement in memory, cold start, and instruction execution speed. However, with the recently released version 0.64 of React Native, Hemers is officially supported on iOS.
From the performance data point of view, the performance of the Android side has been greatly improved. After the upgrade, the latest features of React such as hooks can also be used to improve development efficiency, so we decided to upgrade to version 0.62.
3. Perform a non-aware upgrade
3.1 Challenges and Risks
3.1.1 Multi-departmental cooperation and collaboration
As mentioned earlier, ReactNative carries most of Manbang's core business scenarios, involving 16+ business modules, 200+ pages, and 50+ developers. The business of Manbang Group is in a period of rapid development, and the development of various business operations is timed in units of days. There are many businesses, many personnel, fast iteration rhythm, and high stability requirements. Need to coordinate the work of multiple lines of testing, development teams, and release teams.
3.1.2 SDK upgrade and high-frequency release in parallel
In order to meet the fast-paced business iteration, we release a minimum of two dynamic versions per week (up to 5 times a week). We require that technological transformations cannot affect business iterations (including APP version iterations and dynamic version iterations), and any business requirements cannot be postponed due to technological transformations. Therefore, we need the 0.51 release work and the 0.62 upgrade work to be carried out simultaneously, and to not interfere with each other.
3.1.3 Reduce upgrade costs
Under the fast-paced and high-frequency release, the SDK upgrade should not bring too much burden to the development and testing of business requirements, and it is necessary to minimize the impact on business development and testing as much as possible. As a major version upgrade spanning 3 years, this upgrade involves a lot of Release Notes. We need to try our best to be compatible with these differences from the bottom layer, so as to reduce the developer's modification surface and the regression strength of downgraded testers as much as possible. costs in all respects.
3.1.4 Ensure online stability
The two core APPs of Manbang Group have an average daily UV of 5 million, and the requirements for APP experience are very strict. The abnormal rate increases by 1/10,000, which will lead to an increase in the customer complaint rate. Stability guarantee is the top priority of the upgrade plan. heavy. However, no matter how perfect our plan is, no one can guarantee that there will be no surprises. Therefore, we need to perceive the online abnormality at the first time, reduce the impact and repair it in time.
This update of the React Native SDK is like changing the tires of a heavy truck traveling at a high speed of 120 yards.
3.2 Principles of the upgrade plan
3.2.1 Low risk
It mainly includes two points:
1. Low business risk: iterations that do not affect business requirements.
2. Low stability risk: It does not affect the stability of the online, and the abnormal rate should be controlled at a very low level.
3.2.1.1 Publishing scheme design
In order to meet the above two conditions, we decided to release it online in batches and grayscale.
Batching is to divide online users into multiple batches, and after one batch is online, other batches are carried out. Manbang has four APPs: Yunmanman driver terminal, truck help driver terminal, Yunmanman cargo owner terminal, and truck help cargo owner terminal. After analyzing the business characteristics, the plan adopted is that the two driver terminals are used in the first batch, and the two The second batch on the owner's side.
Grayscale is now very common in the industry, and the meaning is not explained here. The details of the grayscale scheme will be explained in detail below.
3.2.1.2 Design of alarm and fallback scheme
To truly achieve low risk, we also need to stifle online problems in the cradle, and we need an alert mechanism. Before the upgrade, ReactNative already has an alarm mechanism, so we only need to split 0.62 into a statistical dimension and calculate it separately. Because the amount of grayscale in the early stage is small, it is difficult to trigger the alarm condition if it is reused with the original alarm mechanism.
We also need to have a downgrade plan for online problems that cannot be solved in a short time, which can switch the online 0.62 to 0.51 in a short time, and then switch back to 0.62 after the problem is solved.
3.2.2 Low cost:
The low cost here refers to reducing the impact on business development and testing as much as possible. Reduce the amount of code modification and modification difficulty, thereby reducing the labor cost of development investment; reduce the scope of influence, thereby narrowing the scope of test regression and reducing the intensity of regression, thereby saving the labor cost of test investment.
3.2.2.1 A set of codes
In order to reduce the risk, we use multiple batches of grayscale and heavy volume to release and go online. The entire online cycle will last for a long time. During the online period, each business module is constantly iteratively developing new requirements. That is to say, the existing business code and the new demand business code must be compatible with the two versions of the SDK. To be compatible with the two versions of the SDK, the easiest solution is to maintain two sets of codes and adapt to the two versions of the SDK respectively, but this requires writing the code twice, which is a very heavy burden for business development. In order to avoid this burden, we propose a solution to adapt the code to two versions of the SDK.
3.2.2.2 Development environment switching
A set of code adapts to two versions of the SDK, and the code must of course be placed on one branch. When developing business requirements, you need to run the code on two versions of the SDK environment. We provide an environment switching script, which can switch to different ReactNative environments with one line of commands. For example, the driver terminal has already started to increase the volume online, but the cargo owner has not started to increase the volume. For the code that needs to be run on both the driver and the cargo owner at the same time, the developer can switch to a different environment for development through the script, as shown in the figure below.
3.2.2.3 Code modification scan
In order to further reduce the adaptation cost of developers, we have developed a special script tool that can scan all the places that need to be modified and give specific modification methods.
By adopting the above solution, we can completely control the upgrade risk (control the stability through multiple batches of grayscale upgrades), and minimize the adaptation cost of developers (through a set of codes to adapt two versions of SDK and Script scan for modifications).
4. Preliminary preparation
4.1 Combing API changes
Before upgrading, you need to sort out the API differences between the two versions of the SDK, and have a comprehensive understanding of all the modifications from 0.51 to 0.62. There are two types of API changes:
- breaking change
- non-breaking change
Our method is to violently read the Release Note of all versions from 0.51 to 0.62, sort out all breaking changes, and formulate a special adaptation plan for each breaking change. For example, AsyncStorage, the usage of AsyncStorage in version 0.51 is xxx, and the usage in version 0.62 is yyy, so the code of version 0.51 and the code of version 0.62 are not compatible with each other. Our adaptation scheme is to use our own encapsulated Bridge[MBBridge.app.storage] uniformly.
//npm install --save @react-native-community/asyncstorage
不建议使用
// import AsyncStorage from '@react-native-async-storage/async-storage';
// 建议修改为Bridge形式
// 根据KEY获取VALUE
MBBridge.app.storage.getItem({ key: BootPageModalKey.KEY_IS_SHOW_BOOTPAGEMODAL }).then(res => {
if (this.isGuidanceSwitch(res?.data?.text)) {
retuReactNative null
}
})
// 存储<KEY,VALUE>
MBBridge.app.storage.setItem({ key: Constant.StorageKey.Common.RefeReactNativeame, text: commonStore.refeReactNativeame })
4.2 Code adaptation scheme
The company's business iteration rhythm is three times a week or more, and the ReactNative technology stack is mainly used, so if it is necessary to synchronize two sets of codes (0.51 && 0.62) under such a fast-paced development rhythm, the cost is too high. Therefore, we think that a set of codes can adapt to both 0.51 and 0.62 solutions: for all incompatible APIs, encapsulate an adaptation layer to shield the underlying differences. As shown below:
For example, the adaptation ideas of the navigation library are as follows:
as follows before modification
import { StackNavigator } from "native-navigation"
const RootStack = StackNavigator(...)
export default class xxxx extends Component<any, any> {
render() {
retuReactNative (
<RootStack screenProps={this.props} />
)
}
}
modified, ReactNative-lib-protocal is our protocol layer
import { createStackNavigatorCompat, createAppContainerCompat } from "@ymm/ReactNative-lib-protocal"
const RootStack = createStackNavigatorCompat(...)
export default class StickerPageRouter extends Component<any, any> {
render() {
const App = createAppContainerCompat(RootStack)
retuReactNative (
<App screenProps={this.props} />
)
}
}
Then, the protocol implementation layer code is as follows.
import { NavigationActions } from 'react-navigation';
export default class StackActionsCompat {
static reset(resetAction: any){
retuReactNative NavigationActions.reset(resetAction)
}
static push(pushAction: any) {
retuReactNative NavigationActions.push(pushAction)
}
static pop(popAction: any) {
retuReactNative NavigationActions.pop(popAction)
}
static popToPop() {
retuReactNative NavigationActions.popToTop()
}
}
In this way, business development students can implement one set of code to run on two React Native versions, saving the cost of maintaining two sets of code.
4.3 Script Tools
The tools here include three:
1. API inspection tool (support local && CI/CD);
2. Code engineering environment switching tool;
3. Run the environment check tool.
4.3.1 API Inspection Tool
The API check tool is to check those APIs that can run in the 0.51 environment but are no longer compatible with 0.62. In order to solve this problem, we abstract the API detection rules for many changes in the two versions. The inspection tool is written in Python script. Developers can either inspect locally (run python script directly or run npm command) or enable the inspection when Jekins is packaged. The inspection effect is as follows:
4.3.2 Environment switching tool
The engineering environment switching tool is to facilitate the development of students to easily switch the protocol implementation layer and configuration files (package.json, metro.config.js, etc.) between 0.51 and 0.62, which can be implemented with Shell or Python.
This tool ensures that business development students can develop on a branch without focusing on the API differences and configuration differences between 0.51 and 0.62.
4.3.3 Environmental Check Tool
The running environment check tool is used to check the mismatch between the ReactNative SDK environment and Bundle products in the test environment. For example, the 0.51 native SDK loads the 0.62 Bundle/HBC, or the 0.62 native SDK loads the 0.51 Bundle package, so as to avoid unnecessary Hassle and Communication Costs:
Five, landing plan
The following figure is a simplified diagram of our upgrade plan. The whole process is divided into four main lines based on roles: developers, testers, APP version, and dynamic version. The timeline corresponding to each main line has detailed actions at key time points.
For example: for business developers (the first line), it is necessary to merge the adapted business code into the dynamic-1231 main release branch on 2020-12-18, and then 0.51 and 0.62 share a set of codes until the entire upgrade process Finish.
5.1. Upgrade in batches
As mentioned above, we adopt a batch upgrade plan. The first batch of driver-side APPs will be launched, and the second batch of shipper-end APPs will be launched.
For this upgrade on the Android side, the React Native environment has been plugged in. In order to control risks as much as possible, the first batch of Android driver terminals were launched in the form of dynamic release of plug-ins: SDK and HBC products of version 0.62 were simultaneously distributed to the terminal through dynamic upgrade. The way of dynamic publishing can control the grayscale rhythm very flexibly: in order to ensure stability, we can pull the grayscale time long enough. Moreover, our dynamic upgrade platform supports online real-time rollback.
From the perspective of stability, we decided to launch the Android terminal through dynamic upgrade. However, while ensuring stability, it cannot affect the online business requirements. When the 0.62 version of the SDK and HBC products are released online in grayscale, they will also be released based on the 0.51 version of the SDK and jsbundle products. That is: the 0.51 and 0.62 environments need to exist in parallel on a longer line.
A more important point in the entire grayscale process is the functional synchronization of the online environment: the products of 0.51 and 0.62 will be released online at their own rhythm, and they cannot interfere with each other, but at the same time they must contain all business requirements.
For example: 0.51 has a version every 2 days, and the 0.62 grayscale period is 10 days. Therefore, it is necessary to ensure that users need to include the latest functions regardless of whether they are using 0.62 or 0.51. Our strategy is as follows:
As shown in the figure above, the releases of 0.51 and 0.62 are two parallel lines. The version number of 0.62 is designed to be larger than the version number of 0.51 (guarantee that the product of 0.62 will never be covered by the product of 0.51), and every 0.51 business The release of the package will release a 0.62 business package synchronously, so the following two points can be guaranteed:
1. The functions used by online users are always up to date;
2. The 0.62 product is always in grayscale at its own rhythm and will not be covered by the 0.51 product.
After the 0.62 grayscale is full, the 0.51 service package will not be released online, and the online upgrade and switch can be completed.
5.2、CI/CD
Because the business bundles of 0.51 and 0.62 need to exist in parallel online for a long time, and the environments and products of the two versions will also be incompatible. Therefore, in addition to the environmental detection methods in the testing phase, we also need to insert our series of verification processes in the CI/CD phase:
1. Environment switching;
2. Integrate python scripts that check for compatible APIs into the build process;
3. Generate version number rules according to product type:
- 0.51 version number rule: 5.91.xxx.yy
- 0.62 version number rule: 5.91.1xxx.yyyy
4. Generate additional map for Android's hbc product and upload ftp.
5.3, data preparation
This is mainly a buried point, which is used to distinguish it from the data of version 0.51. We expect that the data generated by the online version 0.62 can accurately reflect the real situation of the upgrade (access ratio, stability). At the same time, we also separate the data of 0.62. An alert policy is configured.
6. Online verification
After the project is launched, all we have to do is to follow up the online data in time, verify the previous laboratory data, pay attention to the monitoring data, and adjust the plan in time.
6.1 Daily report output
During the 62 upgrade package grayscale period, reports will be output every day, including the DAU, PV, JS abnormal user count, JS exception rate, SDK exception user count, and SDK exception rate of each module. Development and testing students can analyze the overall Find out what's going on online. We have also formulated a plan in advance, and the grayscale will be stopped when the abnormal rate reaches a certain threshold.
6.2 Performance data output
Taking the performance data on the Android side as an example, the final performance data we collected online is as follows, which is basically consistent with the data we measured offline:
1. Package size
The Android side adopts the Hermes+HBC solution. The packaged output product changes from .jsbundle in string format to binary package .hbc, and the package volume increases by more than 45%. This is an optimization that trades space for time (JIT becomes AOT).
2. Cold start
After using the Hermes+HBC solution, the command running speed is greatly improved, the cold start time is reduced by about 64%, and the startup speed is increased by nearly three times, which is basically in line with our previous tests and expectations.
3. Hot start
We have made an engine reuse mechanism. After the engine is created once, it will reside in memory, so the second startup is a warm startup. Compared with cold start, the process of hot start does not require time-consuming operations such as JS code loading and initialization execution. Therefore, there is little improvement in the hot start time, which is basically in line with our previous tests and expectations.
4. Memory usage
The Hermes engine executes HBC, omitting the process of JS code interpretation, so the memory is reduced by more than 30% when running a single page in a cold start.
6.3 Subsequent batches
The first batch of upgrade work has basically come to an end, and many best practices will be accumulated in the process: launch plan, fault tolerance plan, test plan, performance analysis, etc. The second batch of upgrade work on the shipper side only needs to be done on this basis. Minor adjustments, online risks and overall planning will be much smoother.
The first batch of the driver terminal has done a basic verification of the stability, and we can confirm that the risk is generally controllable. Therefore, when the second batch of cargo owners went online, they went online directly following the release of the APP.
There is no plug-in mechanism on the iOS side, so the two batches are launched in the way of following the APP release, using the default 7-day grayscale of the AppStore.
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。