3
头图
Reducing the size of the application installation package is of great benefit to improving the user experience and the download conversion rate. This article will combine the practical experience of the Meituan platform to share the ideas and benefits of so volume optimization, as well as the precautions in engineering practice.

1. Background

The size of the application installation package affects many aspects such as the user's download time, installation time, and disk space. Therefore, reducing the size of the installation package is of great benefit to improving the user experience and download conversion rate. The Android application installation package is actually a zip file, which is mainly compressed by various types of files such as dex, assets, resource, and so. At present, the common package volume optimization solutions in the industry can be roughly divided into the following categories:

  • Optimization for dex, such as Proguard, DebugItem deletion of dex, bytecode optimization, etc.;
  • Optimization for resources, such as AndResGuard, webp optimization, etc.;
  • Optimization for assets, such as compression, dynamic delivery, etc.;
  • The optimization for so, the same assets, and the removal of debugging symbols, etc.

With the wide application of technologies such as dynamization and terminal intelligence, after the above optimization methods are adopted, the proportion of so in the volume of the installation package is still very high. We began to think about whether this part of the volume can be further optimized.

After a period of research, analysis and verification, we have gradually found a solution that can further reduce the volume of so in the application installation package by 30% to 60%. The solution includes a series of pure technical optimization methods, which are low-intrusive to the business. Through simple configuration, it can be quickly deployed and effective. At present, the Meituan App has been deployed and used online. In order to let everyone know what it is and why, this article will start with the so file format, and analyze what content can be optimized in combination with the file format.

2. so file format analysis

so is a dynamic library, which is essentially an ELF (Executable and Linkable Format) file. You can view the internal structure of the so file from two dimensions: Linking View and Execution View. The link view regards the so body as a combination of multiple sections. This view reflects how the so is assembled and is the perspective of compiling and linking. The execution view regards the so body as a combination of multiple segments, and the view tells the dynamic linker how to load and execute the so, which is the runtime perspective. In view of the fact that the optimization of so focuses more on the compilation and linking angle, and usually a segment contains multiple sections (that is, the decomposition granularity of so is smaller for the link view), we only discuss the link view of so here.

Through the readelf -S command, you can view a list of all sections of a so file, refer to the ELF file format description, here is a brief introduction to the sections involved in this article:

  • .text : The compiled machine instructions are stored, and most functions of the C/C++ code are stored here after compilation. There are only machine instructions here, no strings and other information.
  • .data : store some readable and writable variables whose initial value is not zero.
  • .bss : Stores some readable and writable variables whose initial value is zero or uninitialized. This section only indicates the memory size required at runtime, and will not occupy the size of the so file.
  • .rodata : Some read-only constants are stored.
  • .dynsym : dynamic symbol table, which gives the information of the externally provided symbols (exported symbols) and externally dependent symbols (imported symbols) of the so.
  • .dynstr : String pool, different strings are separated by '\0', used by .dynsym and other parts.
  • .gnu.hash and .hash : Two types of hash tables for fast lookup of exported symbols or all symbols in .dynsym .
  • .gnu.version.gnu.version_d.gnu.version_r :这三个section 用于指定动态符号表中每个符号的版本, .gnu.version是An array whose number of elements is the same as the number of symbols in the dynamic symbol table, that is, each element of the array has a one-to-one correspondence with each symbol in the dynamic symbol table. The type of each element of the array is Elfxx_Half , which means the index, indicating the version of each symbol. .gnu.version_d describes the versions of all symbols defined by this so, for indexing by .gnu.version . .gnu.version_r describes the versions of all symbols that this so depends on, and is also indexed by .gnu.version . Because different symbols may have the same version, using this index structure can reduce the size of the so file.

Before optimizing, we need to have a clear understanding of these sections and the relationship between them. The following figure more intuitively shows the relationship between the various sections in so (only the sections involved in this article are drawn here):

图1 so文件结构示意图

Combined with the above figure, we understand the structure of the so file from another perspective: imagine that we put all the function implementation bodies in .text .text will go to Read data in .rodata , read or modify data in .data and .bss . It seems that there are enough of these in so. But how do these functions execute? That is to say, it is not enough to just load these functions and data into memory, these functions can only work if they are actually executed.

We know that to execute a function, we just need to jump to its address. How does the outside caller (the module outside the so) know the address of the function it wants to call? Here is a problem of function ID: the external caller gives the ID of the function to be called, and the dynamic linker (Linker) finds the address of the target function according to the ID and informs the external caller. So the so file also needs a structure to store the "ID-address" mapping relationship, which is all the exported symbols of the dynamic symbol table.

Specific to the implementation of the dynamic symbol table, the type of ID is "string". It can be said that all the exported symbols of the dynamic symbol table constitute a "string-address" mapping table. After the caller obtains the address of the target function, it is ready to jump to the address to execute the function. On the other hand, the current so may also need to call functions in other sos (such as read, write, etc. in libc.so). The import symbols of the dynamic symbol table record the information of these functions. Before the functions in the so are executed, the dynamic linker The address of the target function will be filled into the corresponding position for the so to use. Therefore, the dynamic symbol table is a "bridge" connecting the current so and the external environment: exporting symbols for external use, and importing symbols declares the external symbols that the so needs to use (Note: in fact, the symbols in .dynsym can also be used Represents other types such as variables, which are similar to function types, so I won't repeat them here).

Combined with the so file structure, then we start to analyze what content in so can be optimized.

3. so optimizes content analysis

Before discussing what can be optimized by so, let's take a look at the strip optimization (removing debugging information and symbol table) of the so volume by the Android build tool (Android Gradle Plugin, hereinafter referred to as AGP). When AGP compiles so, it first generates the so with debugging information and symbol table (the task name is externalNativeBuildRelease), and then strips the newly generated so with debugging information and symbol table to get the final package into apk or aar. so (task named stripReleaseDebugSymbols).

The function of strip optimization is to delete the debugging information and symbol table in the input so. The symbol table mentioned here is different from the "dynamic symbol table" above. The section name of the symbol table is usually .symtab, which usually contains all the symbols in the dynamic symbol table, and there are many additional symbols. Debugging information, as the name suggests, is the information used to debug the so, mainly the sections whose names start with .debug_ . Through these sections, the mapping relationship between each instruction of the so and the source file can be established (that is, the Find the corresponding source code file name, file line number and other information for each instruction in the . The reason why it is called strip optimization is that it actually calls the strip command provided by NDK (the parameter used is --strip-unneeded).

Note: Why does AGP compile the so with debugging information and symbol table first, instead of directly compiling the final so (by adding -s parameters can be directly compiled without debugging information and symbol table? so's)? The reason is that the crash call stack needs to be restored using so with debug information and symbol table. The so with the debug information and symbol table deleted can run normally, but when it crashes, it can only guarantee the location of the corresponding instruction in the so of each stack frame of the crashed call stack, and the symbols may not be obtained. But when troubleshooting crashes, we want to know where in the source code so crashes. So with debugging information and symbol table can restore each stack frame of the crash call stack to its corresponding source file name, file line number, function name, etc., which greatly facilitates the troubleshooting of crash problems. So, although the so with debug information and symbol table will not be packaged into the final apk, it is very important for troubleshooting.

By turning on strip optimization, AGP can greatly reduce the size of so, even more than ten times. Take a test so as an example, the final so size is 14 KB, but the corresponding so with debugging information and symbol table size is 136 KB. However, in use, we need to pay attention that if AGP cannot find the corresponding strip command, it will directly package the so with debugging information and symbol table into apk or aar, and the packaging will not fail. For example, when the strip command corresponding to the armeabi architecture is missing, the prompt information is as follows:

 Unable to strip library 'XXX.so' due to missing strip tool for ABI 'ARMEABI'. Packaging it as is.

In addition to the above-mentioned optimizations that Android build tools do by default for so volume, what other optimizations can we do? First, clarify the principles of our optimization:

  • Consider reducing the content that must be retained to reduce the volume occupancy;
  • Directly delete content that does not need to be retained.

Based on the above principles, so can continue to be further optimized from the following three aspects:

  • Refinement of the dynamic symbol table : As mentioned above, the dynamic symbol table is the "bridge" between so and the outside world, and the export table is equivalent to the externally exposed interface of so. Which interfaces must be exposed to the outside world? In Android, most sos are used to implement Java's native methods. For this kind of so, just let the application run to obtain the function address corresponding to the Java native method. To achieve this, there are two ways: one is to use RegisterNatives to dynamically register Java native methods, and the other is to define java_*** style functions according to the JNI specification and export their symbols. The RegisterNatives method can detect the problem of method signature mismatch in advance, and can reduce the number of exported symbols, which is also recommended by Google. So in the best case, just export JNI_OnLoad (using RegisterNatives to dynamically register Java native methods) and JNI_OnUnload (can do some cleanup) . If you don't want to rewrite the project code, you can also export the symbols in the java_*** style. In addition to the above types of sos, the remaining sos are usually dynamically dependent on other sos that are applied. For this type of so, it is necessary to determine which symbols of all sos that dynamically depend on it depend on it, and only keep these dependent symbols. In addition, the symbol table entry and the implementation body should be distinguished here. The symbol table entry is the corresponding Elfxx_Sym entry in the dynamic symbol table (see the figure above), and the implementation body is its in .text , .data , .bss , .rodata , etc. or other parts of the entity. The symbol table entry is removed, the implementation body does not have to be removed. Combined with the schematic diagram of the so file structure above, it can be estimated that the reduced volume of so after deleting a symbol entry is: symbol name string length + 1 + Elfxx_Sym + Elfxx_Half + Elfxx_Word .
  • Remove useless code : In the actual project, there are some codes that will never be used in the Release version (such as legacy code, code for testing, etc.), these codes are called DeadCode. According to the above analysis, only all codes directly or indirectly referenced by the exported symbols of the dynamic symbol table need to be retained, and all other remaining codes are DeadCode, which can be deleted (Note: In fact .init_array and other special sections involved in the code should also be retained). The potential benefits of removing dead code are greater.
  • Optimizing instruction length : The instructions that implement a function are not fixed, and the compiler may be able to use fewer instructions to complete the same function, thereby achieving optimization. Since instructions are the main component of so, the potential benefits of optimizing this part are also relatively large.

So can optimize the content as shown in the figure below (the part that can be deleted is marked with a red background, and the part that can be optimized is .text ), among which funC, value2, value3, and value6 need to be reserved because they are used by the reserved part respectively. Its implementation can only delete its symbol table entry. funD, value1, value4, value5 can delete the symbol table entry and its implementation body (Note: because the implementation body of value4 is in .bss , and .bss does not actually occupy the volume of so, so Deleting the implementation body of value4 does not reduce the size of so).

图2 so可优化部分

After determining the content that can be optimized in so, we also need to consider the problem of optimization timing: should we directly modify the so file, or control its generation process? Considering the risk and difficulty of directly modifying the so file, it is obviously more secure to control the so generation process. In order to control the generation process of so, let's briefly introduce the generation process of so:

图3 so文件的生成过程

As shown in the figure above, the generation process of so can be divided into four stages:

  • Preprocessing : Expand the include header file to the actual file content and perform macro definition replacement.
  • Compile : Compile the preprocessed file into assembly code.
  • Assembly : Assemble assembly code into an object file, which contains machine instructions (in most cases machine instructions, see the LTO section below) and data and other necessary information.
  • Link : Link all input object files and static libraries (.a files) into so files.

It can be seen that the output generated by the preprocessing and assembly stages for a specific input is basically fixed, and the optimization space is small. Therefore, our optimization scheme is mainly optimized for the compilation and linking stages.

4. Introduction of optimization scheme

We have investigated all the schemes that can control the final so volume, verified their effects, and finally concluded a more general feasible scheme.

4.1 Reducing the dynamic symbol table

Use visibility and attribute to control symbol visibility

You can control the global symbol visibility by passing -fvisibility=VALUE to the compiler, VALUE always takes the values default and hidden:

  • default : All symbols are in the dynamic symbol table unless symbol visibility is specifically specified for a variable or function, which is also the default when -fvisibility is not used.
  • hidden : All symbols are invisible in the dynamic symbol table unless symbol visibility is specifically specified for the variable or function.

How the CMake project is configured:

 set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -fvisibility=hidden")
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -fvisibility=hidden")

How the ndk-build project is configured:

 LOCAL_CFLAGS += -fvisibility=hidden

On the other hand, for a single variable or function, its symbol visibility can be specified by attribute, for example:

 __attribute__((visibility("hidden")))
int hiddenInt=3;

Its common values are also default and hidden, which are similar to the visibility mode, and will not be repeated here.

The priority of the visibility of the symbol specified by the attribute method is higher than the visibility specified by the visibility method, which is equivalent to the visibility switch of the global symbol, and the attribute method of the visibility switch of a single symbol. The combination of these two methods can control the visibility of each symbol in the source code.

It should be noted that the above two methods can only control whether the variable or function exists in the dynamic symbol table (that is, whether to delete its dynamic symbol table entry), but not delete its implementation body.

Use the static keyword to control symbol visibility

In the C/C++ language, the static keyword has different meanings in different scenarios. When static is used to mean "the function or variable is only visible in this file", then the function or variable will not appear in the dynamic symbol table. But only its dynamic symbol table entry is deleted, not its implementation body. The static keyword is equivalent to an enhanced hidden (because functions or variables declared static are only visible to the current file when compiling, while functions or variables declared hidden just do not exist in the dynamic symbol table, and are still visible to other files during compilation) . In project development, it is a good habit to use the static keyword to declare a function or variable "visible only in this file", but it is not recommended to use the static keyword to control the visibility of symbols: you cannot use the static keyword to control the visibility of a multi-file Symbolic visibility of functions or variables.

Use exclude libs to remove symbols from static libraries

The above visibility method, attribute method and static keyword all control the visibility of symbols in the source code of the project, but cannot control whether the symbols in the dependent static library exist in the final so. Exclude libs is used to control whether the symbols in the dependent static library are visible or not. It is a parameter passed to the linker, which can make the symbols of the dependent static library not exist in the dynamic symbol table. Similarly, only the symbol table entry can be deleted, and the implementation body will still exist in the generated so file.

How the CMake project is configured:

 set(CMAKE_SHARED_LINKER_FLAGS "${CMAKE_SHARED_LINKER_FLAGS} -Wl,--exclude-libs,ALL")#使所有静态库中的符号都不被导出
set(CMAKE_SHARED_LINKER_FLAGS "${CMAKE_SHARED_LINKER_FLAGS} -Wl,--exclude-libs,libabc.a")#使 libabc.a 的符号都不被导出

How the ndk-build project is configured:

 LOCAL_LDFLAGS += -Wl,--exclude-libs,ALL #使所有静态库中的符号都不被导出
LOCAL_LDFLAGS += -Wl,--exclude-libs,libabc.a #使 libabc.a 的符号都不被导出

Use version script to control symbol visibility

The version script is a parameter passed to the linker to specify which symbols are exported by the dynamic library and the version of the symbols. This parameter affects the contents of .gnu.version and .gnu.version_d in the "so file format" section above. We now just use its ability to specify all exported symbols (i.e. use an empty string for symbol version names). To open the version script, you need to write a text file to specify which symbols are exported by the dynamic library. The example is as follows (only the usedFun function is exported):

 {
    global:usedFun;
    local:*;
};

Then pass the path to the above file to the linker (assuming the above file is named version_script.txt ).

How the CMake project is configured:

 set(CMAKE_SHARED_LINKER_FLAGS "${CMAKE_SHARED_LINKER_FLAGS} -Wl,--version-script=${CMAKE_CURRENT_SOURCE_DIR}/version_script.txt") #version_script.txt 与当前 CMakeLists.txt 同目录

How the ndk-build project is configured:

 LOCAL_LDFLAGS += -Wl,--version-script=${LOCAL_PATH}/version_script.txt #version_script.txt 与当前 Android.mk 同目录

It seems that version script explicitly specifies the symbols that need to be preserved. If you control whether each symbol is exported or not through visibility and attribute, you can also achieve the effect of version script, but version script has some additional benefits:

  1. The version script method can control whether the symbols of the static library compiled into so are exported, and neither the visibility nor the attribute method can do this.
  2. The combination of visibility and attribute method needs to indicate each symbol to be exported in the source code, which is very complicated for projects with many exported symbols. The version script puts together the symbols that need to be exported in a unified way, which can be viewed and modified intuitively and conveniently, and is also very friendly to projects with many exported symbols.
  3. version script supports wildcards, * represents 0 or more characters, ? represents a single character. For example my*; represents all symbols starting with my. With wildcard support, it is more convenient to configure the version script.
  4. There is also a very special point, the version script method can delete some symbols such as __bss_start (this is the symbol added by the linker by default).

To sum up, the version script method is better than the visibility combined attribute method. At the same time, using the version script method, there is no need to use the exclude libs method to control whether the symbols in the dependent static library are exported.

4.2 Remove useless code

Enable LTO

LTO is the abbreviation of Link Time Optimization, that is, link time optimization. LTO can detect DeadCodes and remove them when linking object files, thereby reducing the size of the compiled product. DeadCode example: an if condition is always false, then the code block under the if is true can be removed. Further, the functions called by the removed code block may also become DeadCode because of this, and they can be removed. The reason why optimization can be done at link time is that a lot of information cannot be determined at compile time, and there is only local information, and some optimizations cannot be performed. However, most of the information is determined when linking, which is equivalent to obtaining global information, so some optimizations can be made. Both GCC and Clang support LTO. Object files compiled in LTO mode no longer store specific machine instructions, but a machine-independent intermediate representation (GCC uses GIMPLE bytecode, Clang uses LLVM IR bitcode).

How the CMake project is configured:

 set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -flto")
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -flto")
set(CMAKE_SHARED_LINKER_FLAGS "${CMAKE_SHARED_LINKER_FLAGS} -O3 -flto")

How the ndk-build project is configured:

 LOCAL_CFLAGS += -flto
LOCAL_LDFLAGS += -O3 -flto

A few things to keep in mind when using LTO:

  1. If Clang is used, LTO must be enabled in both compilation parameters and link parameters, otherwise there will be a problem that the file format cannot be recognized (this problem existed before NDK22). If you use GCC, you only need to enable LTO in the compilation parameters.
  2. If the project depends on the static library, you can use the LTO method to recompile the static library, then when compiling the dynamic library, you can remove the DeadCode in the static library, thereby reducing the size of the final so.
  3. After testing, if Clang is used, the linker needs to enable non-zero level optimization before LTO can really take effect. After the actual test (NDK is r16b), the optimization effect of O1 is poor, and the optimization effect of O2 and O3 is relatively close.
  4. Due to the need for more analysis and calculation, the link time will increase significantly after LTO is enabled.

Enable GC sections

This is the parameter passed to the linker, GC is Garbage Collection (garbage collection), which is to recycle useless sections. Note that the section here does not refer to the section in the final so, but the section in the object file as input to the linker.

Briefly introduce the object file, the object file (extension .o) is also an ELF file, so it is also composed of sections, but it only contains the content of the corresponding source file: the function will be placed in the .text style In section, some readable and writable variables will be placed in .data style section, and so on. The linker will merge the sections of the same type of all input object files to assemble the final so file.

The GC sections parameter informs the linker: keep only the sections directly or indirectly referenced by dynamic symbols (and .init_array etc.), and remove other useless sections. This reduces the volume of the final so. But there is one more problem to consider when opening GC sections: the compiler will put all functions in the same section by default, and put all the data with the same characteristics into the same section, if there are both the parts that need to be deleted and the data in the same section The section that needs to be preserved will make the entire section preserved. So we need to reduce the granularity of the object file section, which requires the help of two other compilation parameters -fdata-sections and -ffunction-sections , these two parameters inform the compiler, each variable and function respectively Put them in their own separate sections, so that the above problems will not occur. In fact, when Android compiles the object file, it will automatically bring the -fdata-sections and -ffunction-sections parameters, which are listed here to highlight their functions.

How the CMake project is configured:

 set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -fdata-sections -ffunction-sections")
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -fdata-sections -ffunction-sections")
set(CMAKE_SHARED_LINKER_FLAGS "${CMAKE_SHARED_LINKER_FLAGS} -Wl,--gc-sections")

How the ndk-build project is configured:

 LOCAL_CFLAGS += -fdata-sections -ffunction-sections
LOCAL_LDFLAGS += -Wl,--gc-sections

4.3 Optimizing Instruction Length

Use Oz/Os optimization level

The compiler determines the optimization level of compilation according to the input -Ox parameter, where O0 means that optimization is not turned on (this situation is mainly for the convenience of debugging and faster compilation speed). From O1 to O3, the degree of optimization is getting stronger and stronger. Both Clang and GCC provide an optimization level of Os, which is close to O2, but optimizes the volume of the generated product. Clang also provides the Oz optimization level, which can further optimize the product volume on the basis of Os.

To sum up, the compiler is Clang, and Oz optimization can be turned on. If the compiler is GCC, you can only turn on Os optimization (Note: NDK has changed the default compiler from GCC to Clang since r13, and GCC was officially removed in r18. GCC does not support Oz refers to the last version of GCC4.9 used by Android The Oz parameter is not supported). Compared with O3 optimization, Oz/Os optimization optimizes the product volume, and there may be a certain loss in performance. Therefore, if the project originally uses O3 optimization, it can be decided whether to use the Os/Oz optimization level according to the actual test results and performance requirements. , if the project does not use the O3 optimization level, it can directly use the Os/Oz optimization.

How the CMake project is configured (if using GCC, Oz should be changed to Os):

 set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -Oz")
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Oz")

How the ndk-build project is configured (if using GCC, Oz should be changed to Os):

 LOCAL_CFLAGS += -Oz

4.4 Other measures

Disable C++'s exception mechanism

If the exception mechanism of C++ is not used in the project (eg try...catch etc.), the size of so can be reduced by disabling the exception mechanism of C++.

How the CMake project is configured:

 set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -fno-exceptions")

ndk-build will disable the exception mechanism of C++ by default, so there is no need to disable it specially (if the existing project has enabled the exception mechanism of C++, it is clearly necessary to disable it after careful confirmation).

Disable RTTI mechanism of C++

If the RTTI mechanism of C++ (such as typeid and dynamic_cast, etc.) is not used in the project, the size of so can be reduced by disabling RTTI of C++.

How the CMake project is configured:

 set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -fno-rtti")

ndk-build will disable the RTTI mechanism of C++ by default, so there is no need to disable it specially (if the existing project has enabled the RTTI mechanism of C++, it is clearly necessary to disable it after careful confirmation).

merge so

The above are all optimization schemes for a single so. After optimizing a single so, you can also consider merging the sos, which can further reduce the size of the so. Specifically, when some sos in the installation package are only dynamically dependent on another so, these sos can be merged into one so. For example liba.so and libb.so are only dynamically depended on by libx.so, these three so can be merged into a new libx.so. Merging so has the following benefits:

  1. Some dynamic symbol table entries can be deleted to reduce the total size of so. Specifically, it is possible to delete all exported symbols in the dynamic symbol table of liba.so and libb.so, and symbols imported from liba.so and libb.so in the dynamic symbol table of libx.so.
  2. Some PLT entries and GOT entries can be deleted to reduce the total volume of so. Specifically, the PLT entries and GOT entries related to liba.so and libb.so in libx.so can be deleted.
  3. The workload of optimization can be reduced. If the so is not merged, when optimizing the volume of liba.so and libb.so, it is necessary to determine which symbols libx.so depends on, so that they can be optimized. After the so is merged, it is not needed. The linker will automatically analyze the reference relationship and retain the corresponding content of all symbols used.
  4. Since the linker has more contextual information about the exported symbols of the original liba.so and libb.so, LTO optimization can also achieve better results.

The so can be merged at the compilation level without modifying the project source code.

Extract multi-so common dependencies

The above "merge so" is to reduce the total number of so, and here is to increase the total number of so. When multiple sos depend on the same library in a static way, you can consider extracting the library into a single so, and the original sos are changed to dynamically depend on the so. For example, both liba.so and libb.so statically depend on libx.a, which can be optimized to dynamically depend on libx.so for both liba.so and libb.so. Extracting the common dependency library of multiple SOs can merge the same code in different SOs, thereby reducing the total SO size.

The typical example here is the libc++ library: if there are multiple sos that all depend on the libc++ library statically, it can be optimized that these sos all depend dynamically on libc++_shared.so .

4.5 General solution after integration

Through the above analysis, we can integrate a general optimization scheme that can be used by ordinary projects. The configuration method of CMake projects (if GCC is used, Oz should be changed to Os):

 set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -Oz -flto -fdata-sections -ffunction-sections")
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Oz -flto -fdata-sections -ffunction-sections")
set(CMAKE_SHARED_LINKER_FLAGS "${CMAKE_SHARED_LINKER_FLAGS} -O3 -flto  -Wl,--gc-sections -Wl,--version-script=${CMAKE_CURRENT_SOURCE_DIR}/version_script.txt") #version_script.txt 与当前 CMakeLists.txt 同目录

How the ndk-build project is configured (if using GCC, Oz should be changed to Os):

 LOCAL_CFLAGS += -Oz -flto -fdata-sections -ffunction-sections
LOCAL_LDFLAGS += -O3 -flto -Wl,--gc-sections -Wl,--version-script=${LOCAL_PATH}/version_script.txt #version_script.txt 与当前 Android.mk 同目录

Among them version_script.txt the more general configuration is as follows, and the exported symbols that need to be reserved can be added according to the actual situation:

 {
    global:JNI_OnLoad;JNI_OnUnload;Java_*;
    local:*;
};

Description : The version script mode specifies all the symbols that need to be exported, and the visibility mode, attribute mode, static keyword and exclude libs mode are no longer required to control the exported symbols. Whether to disable the exception mechanism and RTTI mechanism of C++, merge so, and extract multi-so common dependency libraries depends on the specific project and is not universal.

So far, we have summed up a set of feasible so volume optimization solutions. But in engineering practice, there are still some problems to be solved.

5. Engineering Practice

Support for multiple build tools

Meituan has many businesses that use so, and the build tools used are also different. In addition to the common CMake and ndk-build mentioned above, there are also projects that use various build tools such as Make, Automake, Ninja, GYP, and GN. Different build tools apply the so optimization scheme in different ways, especially for large projects, the configuration complexity is high.

Based on the above reasons, configuring the so optimization scheme for each business will consume a lot of labor costs, and the configuration may be invalid. In order to reduce the configuration cost, speed up the advancement of the optimization scheme, and ensure the validity and correctness of the configuration, we uniformly support the optimization of so on the build platform (supporting projects using any build tool). Businesses only need to perform simple configuration to enable the volume optimization of so.

Considerations for Configuring Export Symbols

There are two points to note:

  1. If some symbols of a so are used by other sos through dlsym, these symbols should also be kept in the exported symbols of the so (otherwise it will cause a runtime exception).
  2. When writing version_script.txt , you need to pay attention to the modification of symbols in languages such as C++, and you cannot directly fill in the function name. Symbol modification is to add a function's namespace (if any), class name (if any), parameter types, etc. to the final symbol, which is also the basis for overloading in the C++ language. There are two ways to add C++ functions to the exported symbols: the first is to check the exported symbol table of unoptimized so, find the modified symbols of the target function, and then fill in version_script.txt . For example there is a class MyClass:
 class MyClass{
   void start(int arg);
   void stop();
};

To determine the real symbol of the start function execute the following command on the unoptimized libexample.so. Because C++ modifies the symbol, the function name is part of the symbol, so grep can be used to speed up the search:

图4 查找 start 函数真正符号

You can see that the real symbol of the start function is _ZN7MyClass5startEi . If you want to export this function, version_script.txt fill in the corresponding position _ZN7MyClass5startEi .

The second way is to use extern syntax in version_script.txt as follows:

 {
    global:
      extern "C++" {
          MyClass::start*;
        "MyClass::stop()";
      };
    local:*;
};

The above configuration can export the start and stop functions of MyClass. The principle is that when linking, the linker demangles each symbol (deconstructs, that is, restores the decorated symbol to a readable representation), and then matches the entry in extern "C++", if it can match any entry. The symbol is preserved on success. The matching rule is: Entries with double quotes cannot use wildcards, and the entire string must be completely matched (for example, the stop entry, if there is one more space between the parentheses, the match will fail). Wildcards can be used for entries without double quotes (eg start entry).

View the exported symbols of the optimized so

After the business optimizes the so, it is necessary to check which export symbols are retained in the final so file to verify whether the optimization effect is as expected. On both Mac and Linux, use the following command to see which exported symbols are preserved by so:

 nm -D --defined-only xxx.so

E.g:

图5 nm命令查看so文件的导出符号

It can be seen that there are two exported symbols of libexample.so: JNI_OnLoad and Java_com_example_MainActivity_stringFromJNI .

Parse the crash stack

The optimization scheme in this article will remove unnecessary exported dynamic symbols, so if a crash occurs, is it impossible to parse the crash stack? The answer is that it does not affect the parsing result of the crash stack at all.

As already mentioned in the section "So Optimized Content Analysis", using so with debug information and symbol table to parse online crashes is the standard way to analyze so crashes (this is also how Google parses so crashes). The optimization scheme in this paper does not modify the debugging information and symbol table, so you can use the so with debugging information and symbol table to completely restore the crash stack, and parse out the source file, line number and function name corresponding to each stack frame of the crash stack. and other information. After the business compiles the release version of the so, upload the corresponding so with debugging information and symbol table to the crash platform.

6. Program benefits

Optimizing so has direct benefits on the volume of the installation package and the local storage space occupied after installation. The size of the benefit depends on the specific circumstances of the original so redundant code and the number of exported symbols. The following is a comparison of the size of the installation package before and after some so optimizations:

so Size before optimization optimized size Optimization percentage
A library 4.49MB 3.28MB 27.02%
B library 995.82KB 728.38KB 26.86%
C library 312.05KB 153.81KB 50.71%
D library 505.57KB 321.75KB 36.36%
E library 309.89KB 157.08KB 49.31%
F library 88.59KB 62.93KB 28.97%

The following is a comparison of the local storage space occupied before and after the above so optimization:

so Size before optimization optimized size Optimization percentage
A library 10.67MB 7.04MB 34.05%
B library 2.35MB 1.61 MB 31.46%
C library 898.14KB 386.31KB 56.99%
D library 1.30MB 771.47KB 41.88%
E library 890.13KB 398.30KB 55.25%
F library 230.30KB 146.06KB 36.58%

7. Summary and follow-up plans

Optimizing the so size can not only reduce the size of the installation package, but also gain the following benefits:

  • Removed a lot of unnecessary exported symbols to improve the security of so.
  • Because .data .bss .text and other sections that occupy the memory at runtime are reduced, it can also reduce the memory usage of the application at runtime.
  • If the externally dependent symbols of so are reduced in the optimization process, the loading speed of so can also be accelerated.

We have made the following plans for the follow-up work:

  • Improve compilation speed. Because the use of LTO, gc sections, etc. will increase the compilation time, it is planned to investigate solutions such as ThinLTO to optimize the compilation speed.
  • Details why each function/data is kept.
  • Further improve the platform's ability to optimize so.

8. References

  1. https://www.cs.cmu.edu/afs/cs/academic/class/15213-f00/docs/elf.pdf
  2. https://llvm.org/docs/LinkTimeOptimization.html
  3. https://gcc.gnu.org/onlinedocs/gccint/LTO-Overview.html
  4. https://sourceware.org/binutils/docs/ld/VERSION.html
  5. https://clang.llvm.org/docs
  6. https://gcc.gnu.org/onlinedocs/gcc

9. The author of this article

Hong Kai and Chang Qiang, from Meituan Platform/App Technology Department.

Read more collections of technical articles from the Meituan technical team

Frontend | Algorithm | Backend | Data | Security | O&M | iOS | Android | Testing

| Reply keywords such as [2021 stock], [2020 stock], [2019 stock], [2018 stock], [2017 stock] in the public account menu bar dialog box, you can view the collection of technical articles by the Meituan technical team over the years.

| This article is produced by Meituan technical team, and the copyright belongs to Meituan. Welcome to reprint or use the content of this article for non-commercial purposes such as sharing and communication, please indicate "The content is reproduced from the Meituan technical team". This article may not be reproduced or used commercially without permission. For any commercial activities, please send an email to tech@meituan.com to apply for authorization.


美团技术团队
8.6k 声望17.6k 粉丝