2
头图

Author: Liu Tianyu (Qianfeng)

A series of articles review " at engineering corruption | proguard governance ", " at engineering corruption | manifest governance ", " at engineering corruption: Java code management ". This article, the fourth in a series, focuses on Android resources, a subdivision of the field. For engineering corruption, shoot directly!

To be precise, the protagonist of this article is Android resources, and java resources belong to the category of java code management, and the solution is given in the article " at Engineering Corruption: Java Code Management ".

Android resources can be divided into two categories: Resource and Asset, from the perspective of definition and usage. The former provides a controlled structured access method, each resource has a unique id identification, and a variety of configuration qualifiers to support multi-language, multi-device, multi-feature and other capabilities; the latter provides primitive and relatively free directory and file access . The Resource type is the best choice for most resource usage scenarios. This article mainly focuses on this type of resources, for conflict, useless, missing class references, hard-coded text, these corrupt situations, tool development, and governance practice.

Basic knowledge

This chapter briefly introduces some basic knowledge, so that everyone can have a clear understanding of the "framework" of Android resources, and lay the foundation for understanding the governance practice content in Chapter 2. In addition, I also try to explain some interesting technical points from a unique perspective.

1.1 Resource classification

For Resource resources, according to the usage scenarios, the official documents have given the division and specific instructions. After compiling with resources, this section gives a classification corresponding to the type of R internal class:

The above 24 resources can all be referenced in the java code in the form of R.<type>.<name> , and some of them can also be referenced in the manifest and resources in the form of @<type>/<name> . Some interpretations of the two dimensions of "whether it is an independent file" and "whether it is located in resources.arsc" in the above classification:

  • Is it a separate file. If a resource corresponds to a complete independent file, which belongs to File-Base Resource, there will also be a corresponding file in the res directory of the final apk; otherwise, it belongs to Value-Base Resource, and there is no independent file corresponding to it in the apk , whose value (if any) is stored in resources.arsc. The color type is special. A single color resource is Value-Base, but the color state list (ColorStateList) belongs to File-Base. In addition, whether it is an independent file is viewed from the perspective of resource compilation. When defining resources, Android provides a form of embedded xml resources, which can write multiple independent file type resources in one xml file. , not discussed here;
  • Is it located in resources.arsc. For most resources, the value of field in the R$<type> class is 0x7fxxxxxx, and there is a corresponding record in resources.arsc. For File-Base resources, the record value is the relative path to the file, and for Value-Base resources, the record value is the resource value itself. It should be noted that the styleable type resource is special and only exists in the R$styleable class, and its field value is not in the 0x7fxxxxxxx format, but an integer or an integer array, and does not exist in resources.arsc.

Help us understand the above knowledge through a styleable definition example:

# 资源定义于 res/value/attrs.xml
<resources>
    <declare-styleable name="DeclareStyleable1" >
        <attr name="attr_enum" format="enum">
            <enum name="attrEnum1" value="1"/>
            <enum name="attrEnum2" value="2"/>
        </attr>
        <attr name="attr_integer" format="integer"/>
        <attr name="android:padding" format="dimension"/>
    </declare-styleable>
</resources>

During apk compilation, the following R.java code is generated:

# R.java文件中,生成以下代码
public static final class id {
    public static final int attrEnum1=0x7f060000;
    public static final int attrEnum2=0x7f060001;
}
public static final class attr {
    public static final int attr_enum=0x7f020000;
    public static final int attr_integer=0x7f020001;
}
 
public static final class styleable {
    public static final int[] DeclareStyleable1 = {0x010100d5, 0x7f020000, 0x7f020001};
    public static final int DeclareStyleable1_android_padding=0;
    public static final int DeclareStyleable1_attr_enum=1;
    public static final int DeclareStyleable1_attr_integer=2;
}

Finally, in resources.arsc, the following records are generated:

# resources.arsc中,生成记录
type | id           | name         | value
id     0x7f060000     attrEnum1      None
id     0x7f060001     attrEnum2      None
attr   0x7f020000     attr_enum      1,2
attr   0x7f020001     attr_integer   0

A styleable definition will eventually generate a series of products. It can be seen that the processing logic of Android resources is relatively complex. In this example, there are several interesting technical points worth mentioning:

  • An attr whose name uses android:xxxx will not generate corresponding content in R.java and resources.arsc. Therefore, when the semantics are reusable, using the attr provided by the system can save a little package size space;
  • If multiple styleables or styles define attr with the same name, only one attr resource will actually be generated, which is equivalent to improving the degree of reuse;
  • AttrEnum1, attrEnum2 such id type resources, if other types of resources (such as layout) are also defined with the same name, then only one id resource will actually be generated, which also improves the reuse.

Well, for resource classification, that's it. If you don't know enough about resource compilation, R.java, resources.arsc, etc., it doesn't matter, and the answer may be given in the following sections.

1.2 Resource references

After the resource is defined, it needs to be referenced from another place. From the perspective of reference certainty, it can be divided into two types: direct and indirect (dynamic); from the perspective of reference elements, it can be divided into three types: java code, manifest, and resource:

Legend: Resource reference method

Among them, indirect (dynamic) reference provides a dynamic resource reference method, which can be used to determine which resource to reference at runtime according to context conditions, which is highly flexible. However, compared with direct reference, this resource reference method requires additional processing of finding the resource id by the resource name, so the performance is slightly worse, and it should be used with caution.

1.3 Resource compilation

Next, take a look at the compilation process of the resource:

Legend: Resource compilation process

First, the resources will be merged, and only one resource with the same name will be kept. At the same time, the manifest will also be merged. Next, the above two will be used as the core input data, and the resource will be compiled through (2). kancloud.cn/alex_wsc/androids/473798 ), here we focus on resource compilation products and the relationship with other processing logic:

  • AndroidManifest.xml file. The reference to the resource will be replaced with the corresponding resource id, compiled into binary format, and eventually packaged into the apk.
  • resources.arsc file. The resource symbol (index) table records all resource IDs and resource values under each configuration, and will eventually be packaged into apk.
  • The processed (compiled) collection of resource files. All independent resource files (such as layout) that need to be compiled will be compiled into binary format, and together with the resource files that do not need to be compiled, they will be finally packaged into apk.
  • R.java file. Record the resource type/name, the corresponding relationship with the id value, for direct reference in java code. Each module (subproject, flat aar, external aar) will generate the corresponding package.R.java file, and finally compile javac together with all other java source files.
  • The resource corresponds to the keep rule file. It mainly includes the java class corresponding to the view node in the layout, the onClick attribute value corresponding to the java method, and the four components in the manifest corresponding to the java class. These keep rules, together with other custom keep rules, will be used for subsequent proguard processing.

From the whole process above, resource compilation is closely related to several other core processing processes. Therefore, understanding the resource compilation process is of great value to mastering the entire apk construction.

1.4 Resource tailoring

Google's official Android Gradle Plugin provides resource clipping function. The core principle is that for the direct reference relationship of computing resources, the references in manifest and java code are used as root references, and all resources that are not referenced are useless resources. It seems to be a very effective function, but due to the existence of indirect (dynamic) references in java code, in order to cover this part of the reference, a conservative strategy is adopted: collect all string constants in java code, if the resource name Starting with these constants, the resource is also considered to have a reference. In addition to this, there are several logics for handling special referencing methods. The above processing logic has the following problems:

  • If the name parameter is completely a variable when dynamically referencing resources through Resources.getIdentifier, the related resources will be deleted by mistake;
  • If the java code constant pool contains almost all single characters, such as az, 1-9, then all resources will be considered to be referenced, resulting in no resource clipping (this is the case with Youku).

Therefore, from a technical point of view, the resource clipping function is a non-deterministic algorithm in any case, and there is bound to be the possibility of misjudgement or omission. In this regard, Google provides a whitelist mechanism to solve the problem of misjudgment, as well as strict mode, which is used to cancel the reservation logic for indirect (dynamic) references.

For apps that do not have heavy historical burdens, enabling this function as soon as possible will help reduce the burden of package size. For large-scale apps with high code complexity and heavy historical burdens (youku is the case), there should be a lot of indirect (dynamic) references. If strict mode is not enabled, it has almost no effect. If strict mode is enabled, the cost of inventory confirmation & whitelisting And very high. In this regard, Youku's choice is to establish an independent useless resource detection function, combined with package size management, to promote direct deletion of resources from the source, which can not only reduce resource processing time, but also achieve the effect of reducing package size. For newly added useless resources, non-real-time (delayable) cleaning is achieved through the package size checkpoint.

1.5 A few interesting questions

Finally, let’s talk about a few technical points that are more interesting and are not easily noticed.

neglected member - id type resource

The id type resource, as a unique identifier, plays the role of "threading the needle" under the Android resource system. For example, the most commonly used method is to define a view node in the layout and give it an id name, so that in the java code, the view instance can be easily obtained, so as to perform various subsequent operations. For another example, in the previous styleable example, each enumeration value contained in an enum type attr will generate a corresponding id type resource.

An important feature of id type resources at compile time is that they can be reused globally, which has been described in the previous styleable example. When the app is running, the characteristics of id type resources are that they are locally unique. For example, in a layout, or in an attr of type enum. Speaking of this, some students must be able to think of whether we can use these two features to keep only a minimum set under the premise of ensuring local uniqueness at runtime, and all other definitions and references are selected from this minimum set, that is, Yes, and the number of this minimum set depends on the maximum number of ids required in all local usage scenarios. for example:

# styleable类型资源,定义于 res/value/attrs.xml
<resources>
    <declare-styleable name="DeclareStyleable1" >
        <attr name="attr_enum" format="enum">
            <enum name="attrEnum1" value="1"/>
            <enum name="attrEnum2" value="2"/>
        </attr>
        <attr name="attr_integer" format="integer"/>
        <attr name="android:padding" format="dimension"/>
    </declare-styleable>
</resources>

# layout类型资源,定义于 res/layout/main.xml
<LinearLayout xmlns:android="http://schemas.android.com/apk/res/android"
    android:layout_width="match_parent"
    android:layout_height="match_parent">

    <TextView
        android:id="@+id/main_textview"
        android:layout_width="wrap_content"
        android:layout_height="wrap_content"
        android:textColor="@color/purple_200"
        android:text="Hello World!"/>
</LinearLayout>

A total of 3 id type resources are generated above: attrEnum1, attrEnum2, main_textview. In these two usage scenarios, styleable needs 2 ids, and layout needs one id, so the minimum id set only needs to contain 2 ids. Assuming that we can change "@+id/main_textview" to "@+id/attrEnum1" during the resource compilation process, we can reduce one id type resource. In an app as complex as Youku, there are more than 13,000 id-type resources, and in all local usage scenarios, the maximum number of ids is required, and I believe it will not exceed two digits. The size (Byte) occupied by an id type resource in the apk can be simply considered to be equal to the length of the id name. It is conservatively estimated to be calculated at an average of 20 Bytes. For 13,000 id resources, the package size can be saved by 250KB. Since the benefit is not significant, no actual development has been carried out, as an interesting thought, left to the readers of this article.

Bridge between resources and java code - R class

Through the previous explanation, I believe that readers already have a certain understanding of the R class. Here, we consider several situations.

In the first case, each module (subproject, flat aar, external aar) will generate a corresponding package.R.java file, but the content of these files is a subset of the <app_package>.R class. So, can we remove the R classes of all modules and use the app's R classes uniformly to reduce the package size? The answer is yes. In fact, Youku deletes all module R classes during the apk build process, and converts the references to these R classes in the java code into references to the appR class. In this way, the size of the apk in MB is reduced, and the more the number of modules, the more obvious the benefit.

In the second case, the content of the R class is very simple, that is, it records the resource type/name, the corresponding relationship with the resource id value, the reference to the resource by the manifest and the resource, and has been converted into the corresponding resource id value during the compilation process. , then, if we replace all the references of R.<type>.<name> in all java codes with the corresponding id values, can the R class be deleted? The answer is yes. After the optimization of the first case has been completed, the benefits of this process are relatively limited, so there is no actual investment in research and development and use. But we can do it!

Resources Bai Xiaosheng - resources.arsc

The resources.arsc file, as a resource symbol (index) table, records all resource types, names, id values, and values under each configuration. All Resource type resources (runtime perspective, styleable resources excluding compile-time perspective) are recorded. When the app is running, regardless of java code or resources, it takes the resource id value and goes to resources.arsc to get the resource value. It is no exaggeration to call it "resources 100 Xiaosheng". This lookup process is very efficient, equivalent to given a key, get its value in a hashmap.

In fact, when the resource id value is obtained through the indirect (dynamic) method of Resources.getIdentifier, the resource type + name is also used to perform a reverse lookup in resources.arsc. After finding it, continue to obtain the resource value through the id value. This search process is equivalent to giving a value and obtaining its key in a hashmap. So is there any way to implement this kind of runtime flexible reference resource more efficiently? A more natural idea is to obtain the R.<type>.<name> value through java reflection, then the question is, which has better performance than the Resources.getIdentifier method? The answer may not be a simple choice of two. The time consumption may be related to the number of resources and whether the same type of resources is queried for the first time. The answer is left to the reader to think and verify.

governance practice

With the increase of engineering modules & functions, resource corruption gradually accumulates: Conflicts between resources with the same name become more frequent, resulting in multiple apk builds, and resource values cannot guarantee consistency; resource reference relationships are complex, and code is often forgotten after deletion, or dare not The corresponding resources are easily deleted, resulting in the continuous accumulation of useless resources; the custom view is referenced in the layout, but the java implementation class of the view is deleted, and a java exception will be thrown when the layout is "loaded" when the app is running; the hard-coded text in the resources brings Online privacy compliance risks, or country/regional/religious cultural disputes. Many of the above problems are the real problems that Youku has encountered in the past struggle against the "corruption" of resources. We have established effective detection capabilities through relevant tools, and based on this, we have formed a daily R&D bayonet mechanism, under the premise of ensuring zero new problems. , and gradually digest the existing stock problems.

In the process of problem location and troubleshooting, it is a basic requirement to quickly obtain which module the resources come from. The introduction of a large number of second and third-party modules and the improvement of the modularity of the app project have made the cost of obtaining the above-mentioned information higher and higher. To this end, we first developed the module containing resource list function, which can quickly check which module the target resource is located in (app project, subproject project, flat aar, external dependency module):

com.youku.android:aln:1.9.49
|-- string/m_mode
|-- layout/pager_last
|-- dimen/h_n_bar_pop_star
|-- asset/config/custom_config.json

com.youku.android:YHP:1.23.511.1
|-- layout/channel_list_footer
|-- layout/f_cover_s_feed_item
|-- drawable-night-xhdpi-v8/ic_ho

Next, the governance practice of each resource "corruption" item will be explained one by one.

2.1 Conflicting resources

Conflicting resources refer to resources with the same name from different modules, and the content values under the corresponding configuration are inconsistent. During the resource compilation process, only one resource with the same name will be reserved, and which resource is selected can be considered as "random" (actually related to the module declaration order), which will cause the corresponding resource value to change for each apk built each time . Conflicting resources will bring uncertainty risks to the runtime, ranging from unexpected changes in text content, size, and UI color, to exceptions in severe cases.

In the previous iterations of Youku, there have been many online crashes caused by conflicting resources. In order to solve this stubborn problem, a conflict resource detection tool was first developed. The example results are as follows:

[conflict] drawable/al_down_arrow
|-- xhdpi-v4
|   |-- md5:cc2ef446bf586b03fd08332a5a75b304 (com.ali.user.sdk:au:4.10.6.18)
|   |-- md5:5f9c59ec3fba027c5783120effa12789 (com.ta.android:lo4android:4.10.6.18)

[conflict] string/str_retry
|-- en
|   |-- not calculated (com.ali.android.phone:bee-build:10.2.3.358)
|-- default
|   |-- 重试 (com.ali.android.phone:photo-build:10.2.3.57)
|   |-- 点击重试 (com.ali.android.phone:bee-build:10.2.3.358)

In the above detection results, when the resource with the same name is in the same configuration and more than two modules contain the resource value, a conflict may occur, so the resource characteristic value calculation will be performed, otherwise it will be displayed as not calculated. The eigenvalues of different types of resources are calculated as follows:

At the same time, two kinds of ignore list configurations with different granularity of resource name and module are provided to temporarily exclude conflicting resources between some second- and third-party modules. Going a step further, provide options to terminate the build process when the test result fails, forming a bayonet mechanism.

In 2020, Youku first developed the first version of the conflict resource detection tool. At that time, there were more than 600 conflict resources in stock. After that, it cooperated with QA students to carry out two rounds of special cleanup projects, reducing the number to less than 100. After the checkpoint was launched in early 2021, as of now It has been reduced to more than 40 (mainly from conflicts between second and third-party modules):

Legend: Conflict resource governance

Since the launch of the conflict resource checkpoint, a total of 13 times have been intercepted, effectively preventing conflicting resources, unexpected online situations, and even serious app crashes.

2.2 Useless resources

In the previous section "Resource Reference", the basic knowledge of the reference relationship of resources has been explained. In summary, resources may be directly referenced in the following three places:

  • java code. Referenced by R.resourceType.resourceName, such as R.string.app_name; or directly referenced by resource id, such as 0x7fxxxxxx;
  • Manifest file AndroidManifest.xml;
  • other resources.

Using java code and manifest as the reference root node, the resource reference relationship is fully expanded, and the resources that are not included in the end are useless resources. The resources referenced by the indirect (dynamic) method of Resources.getIdentifier are not included in the calculation process of the resource reference relationship here. Therefore, the useless resource detection result needs to be confirmed whether there is such a reference method. Based on the useless resource analysis logic in google's official AndroidGradlePlugin, comprehensively enhance the compatibility of project structure, AndroidGradlePlugin version, various toolchain versions, etc., complete the reference analysis between more types of resources, add additional module attribution information, and finally settle for this Useless resource detection function.

Legend: Useless resource analysis

Useless resource detection, analysis results example:

project:app:1.0
|-- array/planets_array
|-- color/white
|-- drawable/fake_drawable
|-- layout/layout_miss_view
|-- raw/app_resource_raw_chinese_text
|-- string/string_resource_chinese_name
|-- xml/app_resource_xml_chinese_text

project:library-aar-1:1.0
|-- layout/layout_contain_merge
|-- string/library_aar_1_name

In addition, the direct reference relationship of resources can also be output to the analysis results:

Resource Reference Graph:
array:planets_array:2130771968 is reachable: false. The references =>

attr:attr_enum:2130837504 is reachable: true. The references =>
referenced by code : [com/example/libraryaar1/CustomImageView (project:library-aar-1:1.0)]
referenced by resource : [layout:layout_use_declare_styleable1:2131099652]

attr:attr_integer:2130837505 is reachable: true. The references =>
referenced by resource : [style:CustomTextStyle:2131361792]

Useless resources, considering that there is the problem of false detection caused by indirect (dynamic) references, so there is no further bayonet, but as a thin item in the package size analysis result. When the function was launched in June 2020, there were a total of 17,000 useless resources, and now it has dropped to 9,000, and the effect of stock cleaning is remarkable.

Useless resource management

2.3 Missing class references

A custom view node can be declared in the layout. If the corresponding class of this custom view is not in the dex file of the apk, due to the characteristics of resource compilation, the above situation will not cause the apk build process to fail, but when the app is running, once " Loading "This layout throws an exception. In the above case, we call it the missing class reference of the resource.

Resource missing class reference detection, which lists the problem resource, the module to which it belongs, and the missing reference class. Example results are as follows:

* [ignored] layout-xxxhdpi/layout_include_layout (project:library-aar-1:1.0)
|-- com.example.libraryaar1.NonExistCustomView

* layout/layout_miss_view (project:app:1.0, project:library-aar-1:1.0)
|-- com.example.myapplication.NonExistView2
|-- com.example.myapplication.NonExistView

At the same time, it provides the ignore list configuration of resource name granularity, and temporarily excludes some problematic resources in second- and third-party modules. Going a step further, provide options to terminate the build process when the test result fails, forming a bayonet mechanism. This function has just been launched for the corresponding bayonet, and there is no case of triggering the bayonet interception. There are 30 problem resources in stock, which have been distributed to the corresponding R&D team.

In fact, when AAPT processes each custom view node in the layout, a keep rule will be generated, which will become a useless keep rule. condition. Here, the example is shown again:

# layout中引用不存在的class,在apk编译过程中,并不会引发构建失败,但依然会生成相对应的keep规则。
# 这个layout一旦在运行时被“加载“,那么会引发Java类找不到的异常。

<?xml version="1.0" encoding="utf-8"?>
<LinearLayout xmlns:android="http://schemas.android.com/apk/res/android"
    xmlns:app="http://schemas.android.com/apk/res-auto"
    xmlns:tools="http://schemas.android.com/tools"
    android:layout_width="match_parent"
    android:layout_height="match_parent">

    <com.example.myapplication.NonExistView
        android:id="@+id/main_textview"
        android:layout_width="wrap_content"
        android:layout_height="wrap_content"
        android:text="Hello World!"/>

</LinearLayout>

# 生成的keep规则为:-keep class com.example.myapplication.NonExistView { <init> ( ... ) ; }

Although the useless keep rule bayonet has completely covered the reference of resource missing classes, but the dimensions of the two are not consistent, so the resource missing class reference is still provided as an independent capability.

2.4 Hardcoded text

Hardcoded text refers to string literals written directly in the resource. The privacy compliance testing agency will detect some sensitive texts in the apk as the key suspicion & verification points for privacy compliance issues, such as "invoice header", "ID card", etc., some of which are hardcoded from resources Text (other possible sources are java code, so). Hard-coded text has the following disadvantages:

  • Easily redundant. When multiple resources use the same text, there will be multiple copies of this text;
  • not flexible. When there is a problem with the online version (such as various operational activities), it is difficult to modify it dynamically;
  • Low security. Some sensitive information, if it exists in the form of hard-coded text in clear text, is very easy to be obtained and used for improper purposes.

For such problems, the corresponding detection capability has been developed, and regular expressions can be customized to match the hard-coded text in the above resources. The detection results are aggregated step by step according to modules and resources. String literals in the following types of resources are supported:

Take the detection of all Chinese characters as an example:

project:app:1.0
|-- array/planets_array
|   |-- [text] string-array包含的中文item
|-- raw/app_resource_raw_chinese_text
|   |-- [text]     <files-path name="我是raw类型xml资源文件中,包含的中文文本" path="game-bundles/" />
|-- string/string_resource_chinese_name
|   |-- [text] 我是中文string资源
|-- xml/app_resource_xml_chinese_text
|   |-- [text]     <files-path name="我是xml资源中的中文文本" path="game-bundles/" />
|-- layout/activity_main
|   |-- [text]         android:text="你好,世界!" />

project:library-aar-1:1.0
|-- asset/library_aar_1_asset_chinese_text
|   |-- [text] 我是包含中文文本的asset资源文件.

Currently on Youku, some sensitive texts related to privacy compliance are an ongoing exploration direction. Since there are currently no clear rules, they have not been actually used. In the daily research and development process, it has been able to play a very good role in assisting and improving the scene that needs to find specific hard-coded text.

2.5 Panorama of Governance

So far, for Android resources, a more comprehensive and effective anti-corrosion capability construction and governance has been carried out. Finally, give a panorama:

Legend: Panorama of Resource Governance

what else can be done

Android resources are not as changeable and complex as java code. The previous governance items have basically covered most of the resource corruption scenarios, but Android resources are very easy to be ignored in the daily development process: a string, a color /dimensions value, an attribute value, a layout file, it seems like each one is "trivial", even if it's defined repeatedly, even if you forget to clean it up, it doesn't seem to matter much. And this is exactly the terrible thing about resource corruption: a single resource is too "tiny", and the developer's professional awareness is slightly loose, and it becomes a fish that slips through the net.

It is commendable to be able to clean up in batches, but it is even more commendable to be able to adhere to the spirit of craftsmanship and reduce the generation of "corrupted" codes during the daily research and development. "An embankment of a thousand feet will collapse with the holes of ants; a room of a hundred feet will be burnt with the smoke of a sudden gap" (Han Feizi·Yu Lao), and he shared with all the lords.

【Reference document】

Follow [Alibaba Mobile Technology] WeChat public account, 3 mobile technology practices & dry goods per week for you to think about!


阿里巴巴终端技术
336 声望1.3k 粉丝

阿里巴巴移动&终端技术官方账号。