头图
Author of this article: Shao Mai

Preface

The author previously shared a part of the implementation of the static check of Android privacy compliance on the official account of the big front-end of Cloud Music.
Android privacy compliance static check

The previous article scanned the check of privacy method calls in the APP by decompiling the APP. But there are some problems:

  • It is impossible to check whether there may be a privacy method call in the so file.
  • When we fully scan and find out that there is a private method call somewhere, we don't know where the actual call entry is.

Call in the so file

Sometimes we have some privacy methods that are called by JNI reflection to execute Java layer code, which cannot be found by scanning Java layer files. Therefore, a special treatment is required for the so file.

Let's sort out our needs: For APP business parties, generally speaking, only need to know whether certain privacy methods are called through so. In which so there may be a call. The rest, we just leave it to the developers of so to investigate.

The requirements are clear, then how do we know whether a method is called in the so file? In Java, if a method is called by reflection, the string of class name + method name must be stored in the constant pool of the class file as a string constant. So is there a similar storage method in so?

The answer is yes, the string of the linux C program may exist in the following two areas:

  • The .text code segment usually refers to a memory area used to store program execution code. The size of this area is determined before the program runs, and the memory area is usually read-only. Some architectures also allow the code segment to be writable, which means that the program can be modified. In the code segment, there may also be some read-only constant variables, such as string constants.
  • .rodata This section is also called constant area, used to store constant data, ro means ReadOnly. Store the strings in C and the constants defined by #define.

We can get the strings used in the so file strings

strings xx.so

We check the string of each so file in the apk file, and if it can match the configured privacy method name, then mark the current so as a suspicious call. The inspection process is as follows:

Refer to the demo diagram below for the check output result:

Method call chain analysis

Many times we don't know where a certain Android API is called. Generally, we can only deal with it at runtime, such as the hook method to replace its implementation. But runtime checking cannot cover all scenarios. Therefore, it is necessary to statically check the method call chain of the apk. At least we can see which class the call source of a sensitive method is, so as to trace the source and attribution.

Based on the technical scheme shared in the previous article, the author further analyzed the method call chain. In the last article, we mentioned that by decompiling the apk, we can convert and generate related smali files, and there will be related method call information in the smali files. We can organize the method call relationships of the entire app through these method information.

Method collection

At the beginning of the smali file, information about the current class is marked:

.class public final Lokhttp3/OkHttp;
.super Ljava/lang/Object;

We will get the modifier and complete type descriptor of the current class.

The .method instruction in smali describes the methods in the current class:

.method constructor <init>(Lokhttp3/Call$Factory;Lokhttp3/HttpUrl;Ljava/util/List;Ljava/util/List;Ljava/util/concurrent/Executor;Z)V

.method private validateServiceInterface(Ljava/lang/Class;)V

.method public baseUrl()Lokhttp3/HttpUrl;

Taking Retrofit as an example, we can see the method description in Retrofit.smali

  • Construction method, the parameters passed in are Factory, HttpUrl, List, List, Executor and boolean
  • The private method validateServiceInterface, the parameter is Class, and it returns void
  • Public method baseUrl, no parameters, return HttpUrl

Through the above information, we can collect all the methods in an APP. We need to establish its own identifiability for each method, we use the following fields to make judgments:

  • The class where the method is defined must be the complete package name + class name
  • The required fields in a method signature include:

    • Method name
    • Incoming parameters

In smali, the descriptor of the method is the descriptor of the JVM used. We need to parse the information in the descriptor to save each of our fields for output display.
The descriptor rules of the method will correspond to the symbol and the type. The relationship between the basic types is:

|Symbol|Type|
|---|---|
|V|void|
|Z|boolean|
|S|short|
|C|char|
|I|int|
|J|long|
|F|float|
|D|double|

The object is expressed as the complete package name and class name, L with 061c99cc2a88dc, using the file descriptor interval, and ending with a semicolon, such as Strig:

LJava/lang/String;

Method relationship establishment

After collecting all the methods, we need to know who is calling the method and who is calling the method when we establish the call chain.
In smali, we can find out which other methods are called in a method invoke-

invoke- includes

  • invoke-direct directly call a method
  • invoke-static calls a static method
  • invoke-virtual calls a virtual method
  • invoke-super directly call the virtual method of the parent class
  • invoke-interface The method of calling an interface

Except that invoke-interface needs to confirm the calling object at runtime, the other several can know which methods are called by the current method invoke-

invoke-virtual {v2, p2, v1}, Ljava/util/HashMap;->put(Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object;

invoke- second half of the instruction of 061c99cc2a89f1 describes the specific call class name and method, separated by ->. By parsing this part of the instruction, we can get the complete information of the called method.

We can collect the calling relationship of smali files decompiled in the entire app. During the collection process, each method will be stored. In addition to its own method information, each method also includes a list of called:

  • calleds: called own method list

When a method call is scanned, we will add this method to the callers of the current caller, and at the same time add the caller to our own calleds. The final method relationship is established as shown in the figure below:

We finally built a graph structure of a polytree. In this graph, we can regard the privacy method we need to check the call chain as the leaf nodes of the tree.

Of course, we can also add a callers array to represent the list of methods called by each method, so that we can also build a tree structure with a two-way binding relationship between nodes:

In the two-way binding tree structure, we can analyze the call chain of this method according to a certain method. You can also start from the top level and analyze all possible call chains for certain entries.
For example, when we suspect that there are non-compliant calls on certain pages, we can find the classes of these Activities and look for privacy methods from the top down.

Call chain traversal

After the method call relationship is established, we need to traverse all the call chains and output to the user. Here is relatively simple, we can use depth-first traversal to find all our possible paths:

There is a special case here. During recursion, it may happen that A is called by B, and B is called by A again, reflecting the current data structure that the graph structure forms a ring. So we need to judge whether there is a ring.

When we judge that there is a duplicate node in the current call chain, it can be regarded as a ring. At this time, the recursion on this chain can be directly ended, and in fact, it will not affect the compliance of this call chain after we analyze it.

This part of the logic can be represented by pseudo code:

fun traversal(method) {
    val paths = []
    dfs(method, [], paths)
}

fun dfs(method, path, temp) {
    if (method.calleds.isNotEmpty) {
        for (called in method.calleds) {
            if (path.contains(called)) {
                temp.add(path)
                continue
            } else {
                newPath = []
                newPath.addAll(path)
                newPath.add(0, method)
                dfs(called.method, newPath, temp)
            }
        }
    } else {
        path.add(0, method)
        temp.add(path)
    }
}

The final effect of the call chain analysis is as follows:

Summarize

It’s almost the sharing of statically checking Android privacy compliance calls here, but there is much more work that can be done about privacy compliance.
Static inspection is only to assist us in locating and inspecting possible problems. We can still explore many runtime monitoring solutions, and the effect will be better after the two complement each other.

This article was published from big front-end team of NetEase Cloud Music . Any form of reprinting of the article is prohibited without authorization. We recruit front-end, iOS, and Android all year round. If you are ready to change jobs and you happen to like cloud music, then join us at grp.music-fe(at)corp.netease.com!

云音乐技术团队
3.6k 声望3.5k 粉丝

网易云音乐技术团队