Author of this article: Shao Mai
At present, the domestic supervision of application security and privacy issues has become more and more stringent. Each application market also has stricter inspections on APP launches. Cloud Music also launched some overseas social services on Google Play this year. When Google Play reviews apps, it also has corresponding policies. Whenever we encounter a problem, we need to check some code logic based on the information of the inspector.
This is a relatively inefficient process. Developers generally do not use sensitive APIs when writing code. Most of the sensitive API calls are in some third-party SDKs, or some APIs that are not very sensitive, there will be multiple calls, such as:
- An application is available on Google Play. The basic SDK inside Cloud Music includes some domestic third-party SDKs. These SDKs use the function of hot fix or dynamic distribution of so. It was found rejected by Google Play.
- An application found that there is a frequent call every 30 seconds to obtain the geographic location during the inspection of the three parties.
In order to avoid such problems from dragging down the APP launch, but also to improve the accuracy and efficiency of the inspection. The author developed a static check tool for sensitive method calls of Android APK.
Check the keywords, for some sensitive API calls, such as oaid and androidId related calls. In fact, we only need to be able to detect some keywords in these related APIs and find out where these methods are directly called in the entire APP.
For some of the above scenarios, this tool works in two directions:
- Scan the APK package to find out where in the entire APK there are direct calls to the API keywords that contain the above.
- Run-time check. For this scenario of frequent calls at runtime, it is still necessary to assist in checking the calling of specific APIs at runtime.
Tool solution
Runtime
Run-time detection needs to know when our method is called. So if the detected method can have a call stack, it will be easier for us to rectify some scenes at runtime. Here we use a Gradle plug-in to insert a line of code to print the call stack to the method we need to detect in transfrom.
Here you can use Javassist
to insert a line of call stack printing for the found method.
method.insertBefore(
"android.util.Log.e(\"隐私方法调用栈\", android.util.Log.getStackTraceString(new Throwable()));"
)
Product scan
The idea of APK scanning is actually very simple. Our appeal is to check all the codes. But we only have one APK file at this time. The most direct and simple way to scan is to find a way to convert our package into Java code, scan our Java code line by line, and check whether there are sensitive API calls.
If we usually want to see the code in an APK package, what would we do? The easiest way is to decompile the APK, and then convert the dex file inside to Java to view. We can use scripts to implement this process again.
- The first step is to unzip the APK file and take out the dex file inside.
- Use
dex2jar
to convert the dex file into a jar file. - Convert jar file to java file
- Scan java files line by line
So how to convert jar files into java files? We usually click on the jar or aar in Android Studio to see the Java files. We can also refer to the approach of Android Studio.
In IDEA
, we can actually find the jar package that the relevant function depends on, or we can type a jar by ourselves for the relevant module in the clone IDEA source code.
The workflow of the scanning tool is shown in the figure:
Multi-APP configuration
Cloud Music currently has many apps under its umbrella, and different apps may have different scanning types and different keyword rules.
The configuration is as follows:
Each APP currently has up to two configurations:
gp.json and privacy.json correspond to google play scanning and privacy compliance scanning respectively.
The configuration inside includes
- keys The keywords to scan.
- filterPackages The name of the package to be filtered out. If we are concerned about whether some third-party SDKs have written some non-compliant code, then we can filter our package names. Avoid outputting too many results.
The scan result will output a json file and an html file. The json file can compare the last scan result, and output the newly added scan result incrementally. The html file is used to display the scan results and assist the corresponding investigator to troubleshoot related issues.
For example, related technologies that are dynamically issued such as hot repair will have keywords such as getField
and ClassLoader
We can find places to directly call these APIs. From the figure below, we can see that many calls are found in the third party SDK.
optimization
After the development of the first version of the compliance scan, there are still some problems in use:
- Runtime: It is easy to check the methods in our own code, but it is invalid if you want to check the system API. Because the API of Android Framework will not participate in packaging. It is naturally impossible to insert bytecode.
- Product scanning: The process of converting jar to java is very time-consuming. The overall scanning time will be reduced to 3-5 minutes.
- The process of converting jar to java is actually a decompilation process. Because of the syntax problems of java and kotlin, some decompile fails. If there are too many such cases, the scan will actually be missed.
- Scanning the java file is to traverse line by line, and the keywords in other places are also scanned in, such as field and import. These scan results are actually redundant.
In view of the above problems, targeted optimization has been carried out.
If you want to detect system API calls at runtime, two solutions can be thought of:
- When transform is processing each class and jar file, check whether the method inside the class calls this system API. But this depends on the support of the bytecode manipulation library.
- Use a dedicated mobile phone and use plug-ins such as xposed to hook the system API.
The second implementation cost is relatively high and is not suitable. But the better luck is that the javassist used supports the operation of the first idea.
CtMethod
javassist is inherited from the CtBehavior
object. Including a instrument
method. This method will find the expression in the method and allow substitution. The expression here includes MethodCall
.
In this way, when we find all the calls through this function, we can insert the print of the call stack to the method that directly calls the system API.
The runtime check becomes:
After completing this optimization, we can find that the method scan actually during the compile time is done by directly reading the class file. For APK packages, we can also take a similar approach. Use the same method to read the class file in the jar package decompressed after dex2jar.
But think about it carefully, Android will have a dex file after the class file. The Android virtual machine should directly execute the dex file. The dex file is essentially just a binary format, and will eventually be executed according to the assembly according to the content in this file format.
The idea is clear here. If we try to disassemble the dex file directly into a smali file, it may be better to traverse the smali file.
Introduction to smail syntax
A smail file corresponds to a Java class. To be precise, it corresponds to a .class or .dex file.
The internal classes will be named in the format ClassName$InnerClassA
and ClassName$InnerClassB
The basic types in smail correspond to the basic types of Java, as shown in the following table:
Type keywords | Java basic types |
---|---|
V | void |
Z | boolean |
B | byte |
S | short |
C | char |
I | int |
J | long |
F | float |
D | double |
Some common basic commands of smail are as follows:
instruction | meaning |
---|---|
.class | Package name and class name |
.super | father |
.source | Source file name |
.implements | Interface implementation |
.field | variable |
.method | method |
.end method | End of method |
.line | Rows |
.param | Function parameters |
.annotation | annotation |
.end annotation | End of annotation |
The method call is also divided into the following instructions:
instruction | meaning |
---|---|
invoke-virtua | Call virtual method |
invoke-static | Call static method |
invoke-direct | Call methods that have not been overridden, such as private and constructor methods |
invoke-super | Invoke the method of the parent class |
invoke-interface | Call interface method |
Let's look at the format of smali
.class public abstract Lcom/horcrux/svg/RenderableView;
.super Lcom/horcrux/svg/VirtualView;
.source "RenderableView.java"
# static fields
.field private static final CAP_BUTT:I = 0x0
.field static final CAP_ROUND:I = 0x1
# instance fields
.field public fillOpacity:F
.field public fillRule:Landroid/graphics/Path$FillType;
.method static constructor <clinit>()V
.registers 1
.line 97
invoke-static {v0}, Ljava/util/regex/Pattern;->compile(Ljava/lang/String;)Ljava/util/regex/Pattern;
return-void
.end method
.method resetProperties()V
.registers 4
.line 635
invoke-virtual {p0}, Ljava/lang/Object;->getClass()Ljava/lang/Class;
invoke-virtual {v1, v2}, Ljava/lang/Class;->getField(Ljava/lang/String;)Ljava/lang/reflect/Field;
return-void
.end method
smali
file will tell the class name, parent class, and source file name.
The class name of this file is com.horcrux.svg.RenderableView
. The parent class is com.horcrux.svg.VirtualView
, and the source file name is RenderableView.java
.
The variables and methods inside are marked at the beginning and end.
In .method
, we can see
line
line number will be marked at the beginning of 0617fbab1a978ainvoke-
will mark method calls at the beginning
The above example includes two methods:
- Construction method. Line 97 calls a static method.
resetProperties
method. In line 635, the two virtual functions getClass() and getField() are called.
The corresponding java code is:
// line 635
Field field = this.getClass().getField((String)this.mLastMergedList.get(i));
Here we can basically define the scanning method of smali files:
- Read a smali file line by line, and read the basic information of the class when the first three lines are read.
- When
.method
and.end method
are read, they are marked as the method of reading themselves. - When
.line
and the next.line
are read, they are marked as the specific line number read in the method. invoke-
is read, and the method call is marked as read. If the method signature at the end of this line meets our keyword match, it is recorded as one of the scan results.
In practice, we can use the open source baksmali.jar
to transfer dex to smali. Use the above rules to scan smali files directly. Avoid the defects mentioned above. The scan time has also been greatly improved. Basically, the entire scan can be completed in about half a minute. The long time to decompile the jar package is omitted.
This tool is finally presented as a jar file, which is run through the command line. It is very useful when troubleshooting suspicious API calls for privacy compliance.
Summarize
Through this tool, we can obtain relatively complete suspicious calls to assist us in handling compliance work when checking APK privacy compliance issues.
The advantages of this tool are:
- The APK package is the final product, and the scanned content is relatively complete.
- Do not scan during compile time, will not reduce development efficiency.
But this tool has some shortcomings, such as
- It is impossible to pinpoint which module or aar the privacy function call is attributable to, and it is difficult to integrate it in CI/CD for attribution processing.
- It is more difficult to obtain a complete function call chain.
Therefore, we will continue to conduct compliance checks during the compilation period. Combine the two to improve related work.
This article was published from , the big front-end team of NetEase Cloud Music. Any form of reprinting of the article is prohibited without authorization. We recruit front-end, iOS, and Android all year round. If you are ready to change jobs and you happen to like cloud music, then join us at grp.music-fe(at)corp.netease.com!
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。