Heart Encounter Android Startup Optimization Practice: Reduce Startup Time by 50%

Image from: https://unsplash.com/photos/_2mLg07Yn6s
Author of this article: ZZG

foreword

As an important part of APP experience, startup speed is the focus of each technical team. The increase or decrease of the startup time of hundreds of milliseconds will affect the user's experience and directly reflect on the retention. As an application to meet the social demands of users in the young and middle-aged market, Xinyu APP requires a good start-up experience for mobile phone models of various performance levels. Therefore, with the rapid growth of the number of users, startup optimization has been put on the agenda as a performance project.

Startup optimization, as the name suggests, is to optimize the duration of the process from the time the user clicks the icon to the fully visible home page. In order to better measure the startup time, we divide it into two parts: the startup phase and the first page refresh. The startup phase is from clicking the icon to the first frame display on the home page. The first refresh stage of the homepage is to record the duration from the first frame of the homepage to the complete visibility of the homepage.

After 5 months of optimization practice, the average startup time of Xinyu Online has been reduced from more than 8 seconds to about 4 seconds, and the startup time has been reduced by more than 50% . Startup optimization, an important part of a performance optimization project, has successfully achieved the expected baseline goals.

Optimization practice

This article will introduce the work done by the Xinyu team in startup optimization, as well as some insights gained in optimization practice.

The application has three startup states: cold startup, warm startup and warm startup. This article focuses on the time-consuming cold start. First of all, we need to understand which steps to start optimization optimization:

At the beginning of the cold start, the system process will first perform a series of operations, and finally create an application process, and then the application process will perform tasks such as main thread startup and page creation. There are actually many points involved in this process, but based on the goal of reducing the duration of the main link from startup to homepage display, we can focus our work on three stages: Application creation, main thread tasks, and Activity page rendering. In the subsequent optimization, we also focus on optimizing the time-consuming points of these three stages.

In order to better explain the implementation and benefits of each optimization measure, we take the oppo A5 mobile phone as an example.

This is the time-consuming of each stage of the app before optimization on oppo A5.

启动流程1

From clicking the icon to the fully interactive home page, this stage took 19 seconds. As a mobile phone with poor performance, oppo A5 will certainly have a certain impact on the startup time, but various unreasonable logic and code implementation during the app startup process are the main reasons for the entire lengthy startup process. After a series of optimization work, the time-consuming of each startup process of the app is as follows:

启动流程2

The entire startup time is shortened to 9 seconds, and the optimization work benefit is about 10 seconds. Next, we explain in stages how the 10-second benefit was achieved.

Application optimization

The Application stage is usually used to initialize the core business library. In the early stage of application development, we did not control the startup tasks at this stage, which often resulted in a large number of strong business-related codes accumulating here. Before taking over this optimization project, there were more than 90 tasks performed in the entire Application. In the subsequent optimization, we have simplified the entire task process based on the following principles:

The tasks in the Application should be the global basic tasks
Application creation should minimize network request operations
No strong business-related tasks are allowed when the Application is created
Minimize the work of Json parsing and IO operations when the Application is created

After optimization, the startup tasks in Application have been reduced to more than 60, which are mainly divided into three categories: basic library initialization, function configuration and global configuration. The basic class library is mainly used to initialize and configure basic libraries such as network library and log library. Except for the main process, other processes also depend on these tasks. Removing them will affect the global stability. They are also the largest and most time-consuming tasks in startup tasks, so reducing their time-consuming is also the focus of continuous optimization in the future. Function configuration is mainly the pre-configuration of some globally related business functions, such as pre-loading of business caches, specific business pre-loading, etc. Removing them will cause business damage. In this case, we need to find business appeals and functional configuration. Global configuration is mainly for global UI configuration and file path processing operations. They account for less and take less time. They are the pre-tasks of home page creation, so they will not be processed for the time being.

The core of task arrangement is to deal with the problem of pre- and post-dependency of tasks. This requires developers to have a deep understanding of business logic. Since each application is different, it will not be expanded here. We mainly introduce some details of Xinyu in task arrangement and optimization:

Task scheduling based on processes . Xinyu will start multiple processes during operation, which are used to achieve specific tasks and facilitate module isolation. Many processes, such as the IM process, usually only need to start a few core SDK initialization tasks such as the Crash SDK and the network SDK. For this type of process, if all the main link code is executed according to the main process flow, it will cause unnecessary waste of resources. To this end, we will divide the tasks of Application in detail, and run the tasks down to the process level, so as to avoid unnecessary tasks being performed in the non-main process.
lazy loading . The main purpose here is to transform some basic tasks, separate task initialization and task startup, move startup work out of the Application creation process, and streamline it to remove redundant logic. When creating an object, the creation of its member objects can be delayed, and keywords such as by lazy can be used flexibly to make the object lightweight.
Process converges . Multi-process can achieve module isolation, and at the same time avoid the upper limit of memory caused by the high proportion of memory in a single process. In addition, if the memory ratio is too high when the application starts, it may cause the mobile phone to recycle memory and occupy a lot of CPU resources, which is reflected in the user's experience as slow startup and application stuck. Comparing the pros and cons of multiple processes, our current strategy is to delay the start of processes other than the main process as much as possible, and at the same time reduce the number of processes through process merging. With these strategies, we ended up effectively reducing the number of processes at startup to two. Combined with the task arrangement work, we screened the simplest task set for each process to avoid them from performing unnecessary tasks and causing waste of resources. These tasks ultimately greatly reduce the memory occupied by the process startup.
Thread convergence . For multi-core CPUs, the appropriate number of threads can improve efficiency, but if the threads are flooded, the CPU will be overloaded. Multi-threaded concurrency is essentially a process in which multiple threads take turns to obtain the right to use the CPU. In the case of heavy load, too many threads compete for time slices, which not only reduces the startup speed, but also causes the main thread to freeze, affecting the user experience. When doing this optimization, you need to ensure that a unified thread pool is used globally. At the same time, many second-party and third-party SDKs are also major players in creating sub-threads. At this time, it is necessary to communicate with relevant technical departments to remove unreasonable thread creation. On the other hand, avoiding network requests during the startup phase is also the key to reducing the number of threads.

Application optimization is the key to the entire startup process. Reasonable task arrangement can not only reduce the application creation time, but also greatly optimize the subsequent home page creation. At present, on the oppo A5 mobile phone, the creation time of Xinyu Application has been reduced from 5 seconds to about 2.5 seconds, and there is still a lot of room for optimization.

start link

After the application is created, the main job of the application process is to create an Activity. It should be noted here that there are a lot of tasks that post to the main thread from Application to Activity, as well as callback listeners of registered ActivityLifecycleCallbacks, which will secretly increase the time gap from Application to Activity. ActivityLifecycleCallbacks registration is usually related to business, and its registration is relatively hidden. In the previous business development, we did have a certain abuse of ActivityLifecycleCallbacks, which requires us to pay attention.

Regarding the time-consuming of main thread messages, we found many such problems when we used Profiler and Systrace to locate the time-consuming startup process. For various reasons, tasks in the Application will post time-consuming work to the main thread. On the surface, the creation time of the Application is shortened, but the overall startup time has been expanded. For the time-consuming point, we should locate the root cause and solve it, instead of blindly post it, so as to cure the symptoms and not the root cause.

Secondly, shortening the link from the start to the home page is the focus of our optimization.

In the original startup process, the loading page, as the startup page of Xinyu, undertakes two tasks of routing and permission request.

Under normal circumstances, the user starts the App, determines whether to log in on the loading page, and if not, enters the login page
If the user is already logged in, determine whether the screen opening page needs to be displayed. If necessary, enter the screen opening page. When the screen opening ends, jump back to the loading interface, and then enter the home page.

As can be seen from the above, even if there is no opening page, the user starts the APP to display the home page, at least two activities are required to start. The core of startup link shortening is to combine loading, main and opening pages into one page. Doing so can not only reduce at least one Activity startup, but also process other tasks in parallel while displaying the open screen.

The homepage here is more like a canvas, and the creation and rendering of the homepage interface is one of them.

The code logic to realize this function is relatively simple, that is, set the main page as the startup page, and encapsulate the home page and the opening page into two fragments, which are displayed according to the business logic. The user clicks the icon to enter the home page. If it is judged to be logged in, the home page pre-task and home page UI rendering are performed, and at the same time, it is judged whether to load the opening page fragment. It is worth noting that we did not remove the loading page. When it is judged that the user is not logged in, it will enter the loading page to perform the original management and login routing. Since in the vast majority of cases the user is using the application while logged in, doing so is the most profitable and the least expensive to modify.

In order to realize this process, we need to deal with the homepage instance and homepage task arrangement:

Homepage example

The original launchMode of the home page is singleTask. The purpose of this is to ensure that there is only one instance of the home page globally. But we set the home page as the startup page after the transformation. If we continue to set the home page as singleTask, it will cause business bugs: when we return from the secondary page to the background, and click the icon to return to the foreground, we will jump to the home page, and Not the original secondary page. The reason here can be simply understood as, when the icon is clicked, the system will call the home page of the launcher property. Due to the existence of the home page instance and its singleTask attribute in the stack, the system will use this existing instance and pop all the activities on it from the stack, which finally causes this exception. The solution is to select singleTop as the launchMode of the home page. singleTop does not guarantee the global uniqueness of the top page instance. Fortunately, Xinyu APP realizes the function of router jumping, and you can open the home page through a unified url. In the last step of starting the homepage Activity, we increase the flags of FLAG_ACTIVITY_NEW_TASK and FLAG_ACTIVITY_CLEAR_TOP in the intent to achieve the effect of singleTask. This solution can basically meet our needs, but it does not rule out the operation of directly starting the home page under certain circumstances. To this end, we register Application.ActivityLifecycleCallbacks to monitor the activity instances in the stack. When multiple homepage instances appear in the stack, we clear the new homepage instance and give a prompt.

task scheduling

The transformed home page is not a page activity in the traditional sense, but a container that carries a series of task executions. In the follow-up transformation, we will extract the data request and UI rendering of the home page. In addition, some high-quality business tasks are also extracted from the Application and placed on the home page. In order to effectively manage the before and after dependencies of these tasks, we A directed acyclic graph is required to manage these tasks.

The core idea of task scheduling is to break up tasks, load them at different peaks, delay a low-priority and time-consuming task, or put it in a later idle workflow for execution. Before and after dependencies between, make sure not to go wrong. Reasonably dividing the granularity of business tasks and sorting them is the key to determining the running speed of the graph, and it also tests the developer's familiarity with the business. At present, there are many open source solutions for directed acyclic graphs in the industry, such as alpha, etc. In order to meet specific business needs, the team has also developed a set of startup frameworks to realize the arrangement of homepage tasks.

Regarding the loading of homepage tasks, we initially proposed the concept of workflow. We divide startup tasks into three workflow stages, including basic workflow, core workflow, and idle workflow. The business logic in the entire startup process is split into relatively independent tasks, and assigned to these three workflows according to priority and dependencies.

Basic workflow : This stage is mainly to execute the creation of Application, which is used to place basic SDKs such as network library and monitoring. The task requirements of this workflow are as few as possible, and they are all pre-tasks of subsequent workflows.
Core workflow : In this stage, core business work will be placed. In addition to the initialization work of some core businesses, it also includes the rendering of the home page UI, the request for business data, and the display of the opening page. These tasks are arranged and managed according to a directed acyclic graph, and are executed from the time the front page is created. Since this stage has entered the home page, in order to allow users to see the first frame as soon as possible, we need to advance the tasks of business data acquisition and home page rendering as early as possible.
Idle time workflow : This stage is mainly suitable for placing some tasks with low priority, long time consumption and no requirement for completion time. There are several ways to judge the timing of idle time. Xinyu has done a simple process here, that is, it is executed in IdleHandler 10 seconds after the end of the core workflow. If you want to determine the idle time more accurately, you can make a determination by combining post messages to the main thread, counting the message interval duration in the main thread and monitoring the application memory water level.

The advantage of using the startup framework to manage startup tasks is that the core business can be loaded and completed in advance, and the tasks can be fine-grained. For example, in order to make the home page display faster, we separate the data request and UI rendering of the home page. The data request of the home page is advanced to the Application. The optimization effect on this low-end machine is remarkable. The call object creation and Json parsing of data requests are time-consuming on low-end machines. By using the time when Application and Activity are created to perform interface request operations at the same time, on low-end machines, the loading time of the home page starts from The original 3 seconds has been shortened to less than 1 second.

In the follow-up work, task scheduling is always the focus of our optimization. In particular, by locating the time-consuming points of each task and sorting out the entire task process, reducing the startup time to the extreme is one of our long-term goals for startup optimization.

The startup optimization has reached this point, and the code of the entire startup process is also greatly renovated, but the offline evaluation data shows that the entire startup time is only shortened by about 3 seconds, and even the first brushing time is still degraded to a certain extent. This is because the startup process is a whole, and startup and homepage cannot be separated. The previous task arrangement for startup will inevitably affect the creation of the home page. In addition, our optimization work is not delicate enough, and some details are not grasped enough, especially the processing of locks, which will be introduced later.

Homepage optimization

After processing the sorting of startup links, we need to turn our attention to the home page. The home page is the core page of the entire APP. Its business logic is complex and the UI hierarchy is complex.

lazy loading

After the previous transformation, our homepage is roughly as shown in the figure:

After opening the home page, the APP will load the five TabFragments on the home page, which is extremely time-consuming.

We have calculated the creation time of each fragment of the app on the oppo A5. The approximate data is as follows:

fragment

If the creation and loading of the other four fragments such as dynamic can be delayed, the startup time can theoretically be reduced by about 2 seconds.

Consider that only the first fragment is visible when the home page is displayed. To this end, we implemented lazy loading for the home page. The home page uses the general ViewPager2+tabLayout architecture. ViewPager2 naturally supports lazy loading operations. In order to prevent existing fragments from being recycled when pages are switched, we increase the size of the cache pool of the recyclerView inside viewPager2.

 ((RecyclerView)mViewPager.getChildAt(0)).setItemViewCacheSize(mFragments.size());

Although this solution can greatly speed up the rendering of the home page, it essentially delays the creation and rendering of other pages until the switch. The white screen situation is also unacceptable.

To this end, we have remodeled the home page and each fragment.

page plugin

The creation of View is a big time-consuming homepage rendering. Usually, we use LayoutInflater to load xml files, which involves xml parsing, and then the process of generating instances by reflection, which is generally time-consuming. We use code to build the relatively simple xml, but for complex layout files, the use of code to build is time-consuming and unmaintainable.

In order to make the home page "lighter", we componentize the view based on the business perspective. The core idea here is to let users see the most basic interface and use the most core functions. For example, for a video playback application, the first thing users want to see is its playback interface, and what they want to use is the video playback function. As for the other top icons, etc., they don't care. Based on this idea, we should first create and display playback components and playback functions, and other business modules can be loaded later in the form of ViewStub.

So what is the core page of Xinyu? It is the fate list on the home page, and the core function is the operation of the fate list. Understanding this, the complex logic of the home page becomes clear, and we understand what the core needs of users are.

Here's an introduction to Plugin. Plugin is a set of UI componentization solutions precipitated within the team. It is essentially an upgraded version of ViewStub, but it also has the ability to fragment, which naturally adapts to mvvm. Plugin is a powerful component library. Regarding the specific implementation of Plugin, after careful polishing, we will have the opportunity to introduce it in subsequent articles. We will not expand it here for the time being. Through Plugin, we cut the complex view into pieces of independent business function components based on the business level, and load them according to the priority. This ensures that users can see the homepage faster and use the most core functions.

The Fate plugin will be displayed first when the homepage is created, while the rest of the plugins can wait until the Fate plugin is fully displayed and the related data is returned before rendering and loading, which greatly reduces the load on the homepage.

Json parsing processing

The Json parsing operation is also a point that needs to be optimized. Before optimization, according to the test data of test students, on low-end mobile phones, the Json parsing time of the main interface is as high as 3 seconds, which is unacceptable.

The reason why Json parsing takes time is essentially that during parsing, the creation and assignment of objects from Json data to objects are performed through reflection operations. The more complex the object, the longer the time-consuming. For the main interface of the home page, the time-consuming of parsing the returned object on low-end machines has exceeded the time-consuming of UI rendering, which is a point we must overcome.

Our current solution is to use Kotlin to refactor the data objects on the home page, and annotate related objects with @JsonClass(generateAdapter = true), which will generate corresponding parsing adapters for the marked objects during compilation, thereby shortening parsing time.

XML parsing optimization

Test data shows that xml inflate takes between 200 and 500 ms on less powerful phones. Custom controls and deeper UI layers can add to this parsing time.

In order to reduce the time of xml parsing. We have optimized the xml of each UI module on the home page to minimize the xml level and avoid the use of unnecessary custom controls. On the Xinyu App, a certain degree of abuse of custom controls related to strong business is also an important reason for the time-consuming xml loading.

In addition, we have also considered other solutions to reduce the parsing time, such as placing the xml parsing operation in a child thread and executing it in the Application in advance. This scheme is simple and effective, and has achieved certain benefits.

A specific example is that it takes about 200ms to parse the xml of the item on the Fate page on the home page. We put it in the sub-thread and preprocess it in advance. After the successful parsing, the view is stored in the cache. When entering the home page to create the item , get the view from the cache for rendering. As a result, the item creation time was successfully reduced to less than 50ms.

The asynchronous parsing scheme seems to work, but if you expect to preprocess all the xml through the asynchronous parsing scheme, you are bound to be disappointed. Because it actually has a lot of limitations.

The first thing to pay attention to is the lock of the view, which will change the parsing of xml from asynchronous to synchronous, resulting in slower parsing. Regarding the explanation and processing of locks, we will explain in detail in the next chapter. In the above optimization example, we partially bypassed the lock restriction by duplicating the LayoutInflater instance.

 LayoutInflater inflater = LayoutInflater.from(context).cloneInContext(context);
   View view = inflater.inflate(R.layout.item_view, null, false);

In fact, the lock of the view is not limited to the LayoutInflater. There are also locks inside the resources and assets, so the above scheme cannot fully achieve the effect of synchronization.

The second point is that the priority of the sub-thread is low, especially in the case of heavy load, the sub-thread parsing xml will cause the whole parsing process to lengthen. This is easy to occur when the view is actually needed, and the xml parsing has not been completed, which leads to a downgrade plan and re-parses the xml in the main thread, which leads to a waste of resources and ultimately makes the rendering time longer. Therefore, asynchronous parsing of xml can only be used for very little core xml preprocessing, and the xml level should not be too complicated.

The parsing optimization of xml has always been the focus of our exploration. At present, we try to use the compose solution as the main direction to solve the time-consuming problem of xml parsing. It is still in the experimental and precipitation stage, and I believe that there will be results that meet the expectations soon.

The homepage optimization work has brought great benefits. The offline evaluation shows that on the oppo A5 mobile phone, not only the first frame display time is reduced by about 3 seconds compared to before optimization, but the entire homepage rendering time also reaches 1 second. within. The data on the homepage can be displayed faster, and the user experience will be greatly improved.

Lock

The hassle that locks cause when starting optimizations is so impressive that we need a separate section on it.

During startup optimization, if we encounter a time-consuming task, we usually place it in a sub-thread for processing. In theory, if the resources are sufficient, the time of this time-consuming task will be completely optimized. But this is often not the case, and this operation may not be effective or even worse. The reason for this is the lock. Here we pick a few representative locks to talk about.

Retrofit

We all know that Retrofit is the Call instance when the request is generated by dynamic proxy, but we often ignore the existence of the lock.

If a large number of interfaces initiate requests at the same time on the home page, multiple interfaces will create competition for the lock, which will virtually change the interface request from parallel to serial, which is especially obvious on mobile phones with poor performance.

So in the actual startup process, we can see that the time-consuming of an api request is often only 300ms for the request itself, but it takes 200ms to wait for the lock of Retrofit.

In addition, analyzing the common writing method of Retrofit, we can see that this part of the time-consuming is generated when the create is executed. Unfortunately, this way of writing it often leads us to believe that this is just creating an object, not a time-consuming operation, thus easily exposing it to the main thread.

 GitHubService service = retrofit.create(GitHubService.class);

By rectifying the code on the home page, and by switching the form of threads, we cut this part of the code into sub-threads. For the lock problem, through investigation, it is basically caused by the time-consuming process of parsing the data returned by the interface. This part involves the problem of Json parsing optimization, you can see the solution above.

reflection

We know that reflection is a time-consuming operation, especially for Kotlin's reflection. Because of various syntactic sugars in Kotlin, reflection operations need to read class information from Metadata, so the reflection efficiency of Kotlin itself is much lower than that of Java.

At the same time, because of the existence of kotlin_builtins, many built-in information in Kotlin (such as basic types Int, String, Enum, Annotation, Collection are stored in the apk in the form of files, and also include coroutines, multi-platforms, etc. Kotlin will information used), class loading and IO operations on these files are implicitly triggered during reflection.

 static {
        Iterator<ModuleVisibilityHelper> iterator = ServiceLoader.load(ModuleVisibilityHelper.class, ModuleVisibilityHelper.class.getClassLoader()).iterator();
        MODULE_VISIBILITY_HELPER = iterator.hasNext() ? iterator.next() : ModuleVisibilityHelper.EMPTY.INSTANCE;
        }

For class loading, it is a lock + IO operation, so ANR often appears on the line.

Not to mention the IO operation, limited by the load of the overall file system of the system, the time-consuming of the IO operation itself is uncontrollable, and the IO in the lock invisibly exacerbates this time-consuming.

Therefore, reflection operations, especially Kotlin's reflection, can be simply understood as a potential locked IO operation (especially during APP startup).

This can cause all kinds of weird problems. We have encountered such an example in the optimization. We hope to let users see the UI earlier by means of local caching, but due to the blessing of various locks, it not only slows down the speed of each api request, but also slows down the speed of each api request. Like a butterfly effect, this part of the preloading time is converted back to the UI thread, causing the thread to get stuck. After a series of investigations, we finally located the reason for the time-consuming Kotlin loading of buildins for the first time during the startup process. We can circumvent this problem by manually triggering the first reflection when necessary.

View lock

As mentioned earlier, the creation of views is time-consuming, and forward optimization will be more difficult. We will also think about whether it is possible to throw part of the UI into the IO thread to inflate. There are also schemes for asynchronous parsing of xml above. At the same time, we also mentioned that the inflate part of the view is also locked.

This lock follows the LayoutInflater instance, in other words, it follows the Context in general. In the example we encountered, the loading of the view was placed in the child thread. Due to the existence of the lock, the loading time of other views was prolonged, and due to the high CPU load, the priority of the IO thread was low. This series of reasons Instead, the startup process deteriorates.

Moshi

Moshi's deep traversal of a Class and generation of JsonAdapter involves a lot of reflection, which is time-consuming and uncontrollable. However, what is more scientific is that although Moshi's internal cache is also useful for locks, with the help of ThreadLocal, time-consuming operations can be placed outside the lock, and subsequent similar scenarios can also refer to this writing method.

图10

Best Practices

Through the analysis of the above series of problems, we can summarize the following best practices:

Do not do any Moshi parsing on the main thread;
When parsing Kotlin classes through Moshi, use the JsonClass annotation;
Do not perform any Retrofit related operations on the main thread;
Asynchronous inflate xml needs to pay attention to the problem of multi-thread competition.

In reality, problems are ever-changing, rigid formulas are not acceptable, and best practices are not a silver bullet. When encountering a time-consuming task, I don't want to find the reason, but rudely put it in a child thread. Whether it is because of synchronization locks or other mechanisms, the consumed CPU time will always affect the UI in another form. thread efficiency.

Anti-deterioration

Startup optimization is a long-term optimization project, and its timeliness can be said to run through the life cycle of a product. So it doesn't mean that after a period of key work, everything will be fine when the start-up time is reduced. If there is no anti-deterioration measures, after several iterations, the start-up time will pick up again, especially when the start-up optimization reaches the deep water area, changes in various aspects will have an inextricable impact on the start-up speed. Therefore, startup optimization is not only a tough battle, but also a long-term tug of war.

Therefore, we need online and offline monitoring data as a guide for us to initiate optimization efforts.

online data

The core nodes are as follows:

监控链路

In the online data, we mainly collect the attachBaseContext method of the Application as the starting point of startup, the onWindowFocusChanged of the home page as the visible node of the home page, and the onViewAttachedToWindow as the node of the home page data on the screen.

The monitoring node currently used by Xinyu is more inclined to be used as a horizontal comparison. If more accurate measurement data is desired, the starting point of startup can be the creation time of the process, and the first frame data collection can be called when dispatchDraw is located. However, considering the ease of use and the expectation that it will not be affected by the business, Xinyu uses the current monitoring solution, which is mainly used to compare the optimization and deterioration of historical versions.

It is worth mentioning that online data collection needs to pay attention to the impact of noise. Some models will kill the process in the background and restart, but during the restart process, due to the power saving strategy, the restart process will be forced to stop, resulting in abnormal timing and noise.

The solution adopted by Xinyu here is to compare the startup duration with the thread startup duration. If the difference exceeds the threshold, the record will be discarded.

 val intervalTime =
        abs(System.currentTimeMillis() - attachStartTime - SystemClock.currentThreadTimeMillis())

In practice, 20 seconds is a good threshold number.

offline data

Data statistics through point management can reflect the current startup status in a certain sense, but it will be different from the actual user experience. Therefore, when collecting offline data, we recommend using mobile phones of various performance levels to start the application and record the screen, and finally measure the startup time.

measure

Regarding the work of anti-deterioration, we are still in the exploratory stage. At present, after each version is released, the testers will give a performance evaluation report of the current version, and we will conduct a comprehensive analysis based on the online startup data to determine whether the startup time of the current version has deteriorated. If data degrades, we analyze the entire startup process to find outliers and fix them.

This solution is relatively inefficient, because in many cases the degree of deterioration is low and it is not easy to show on the data. By the time it can be reflected from the data, there may have been a lot of abnormal points accumulated.

To this end, we will conduct a special Code Review for the code changes of the Application and the home page. We discourage adding code to the Application, and if it does, it will be evaluated for its necessity. In addition, we set a startup time-consuming alarm for the startup framework. If the startup time exceeds the threshold, the developer will be reminded that the code may be abnormal during the development stage. We believe that all optimization work ultimately depends on the developers themselves, so it is the most important for every team member to have optimization awareness in this regard. We are also currently planning to formulate relevant normative measures to regulate the development work of team members in this area.

Summarize

Startup optimization Now, Xinyu's startup speed and first screen rendering time have entered the baseline. But as mentioned above, startup optimization is a special project that requires long-term attention, and our optimization of the startup duration of Xinyu will not be limited to this. In this optimization project, we have encountered many problems and summed up a lot of best practice solutions. The biggest gain is to deeply understand one point: there is no time-consuming for no reason. If there is, then it must be where there is a problem. In the face of time-consuming, I don't want to solve it, just put it in the sub-thread, and then ignore it, this problem will definitely wait for you at the next intersection. We have sometimes considered using some black technologies to optimize the startup speed, but the results are often unsatisfactory. After thinking about it, in fact, the road is simple, and often the simplest solution is the best. Blindly pursuing high-level technology to optimize, often fell into the dilemma of cannons hitting mosquitoes.

In the follow-up work, we will continue to iterate and polish the startup. Compared with the previous work, we will be more refined. From the shallower to the deeper, we will customize the technical solution in combination with the specific business to achieve speed improvement. After finding a more suitable solution, we will generalize the solution and apply it to other in business. Then looking back, we may have a new understanding of the entire startup process from the outside in, and then from the inside out.

References

This article is published from the NetEase Cloud Music technical team, and any form of reprinting of the article is prohibited without authorization. We recruit various technical positions all year round. If you are ready to change jobs and happen to like cloud music, then join us at grp.music-fe(at)corp.netease.com!