I. Overview
Solving ANR has always been an important skill that Android developers need to master, generally starting from three aspects.
Development stage: Check the time-consuming and stall conditions of each method through the tool, and find one modification and one modification.
Online stage: This stage mainly relies on monitoring tools to discover and report ANRs, such as matrix.
Analysis stage: If an online user has an ANR and you get a log, this involves the content to be shared in this article-ANR log analysis techniques.
2. ANR generation mechanism
Answers to a popular interview on the Internet
ANR-the application is not responding, Activity is 5 seconds, BroadCastReceiver is 10 seconds, and Service is 20 seconds.
This sentence is very general. If you want to analyze and locate ANR in depth, you need to know more knowledge points. Generally speaking, ANR is divided into 4 categories according to the generation mechanism:
2.1 Input event timeout (5s)
InputEvent Timeout
a.InputDispatcher发送key事件给 对应的进程的 Focused Window ,对应的window不存在、处于暂停态、或通道(input channel)占满、通道未注册、通道异常、或5s内没有处理完一个事件,就会发生ANR
b.InputDispatcher发送MotionEvent事件有个例外之处:当对应Touched Window的 input waitQueue中有超过0.5s的事件,inputDispatcher会暂停该事件,并等待5s,如果仍旧没有收到window的‘finish’事件,则触发ANR
c.下一个事件到达,发现有一个超时事件才会触发ANR
2.2 Broadcast type timeout (15s in the foreground, 60s in the background)
BroadcastReceiver Timeout
a.静态注册的广播和有序广播会ANR,动态注册的非有序广播并不会ANR
b.广播发送时,会判断该进程是否存在,不存在则创建,创建进程的耗时也算在超时时间里
c.只有当进程存在前台显示的Activity才会弹出ANR对话框,否则会直接杀掉当前进程
d.当onReceive执行超过阈值(前台15s,后台60s),将产生ANR
e.如何发送前台广播:Intent.addFlags(Intent.FLAG_RECEIVER_FOREGROUND)
2.3 Service timeout (20s in the foreground, 200s in the background)
Service Timeout
a.Service的以下方法都会触发ANR:onCreate(),onStartCommand(), onStart(), onBind(), onRebind(), onTaskRemoved(), onUnbind(),
onDestroy().
b.前台Service超时时间为20s,后台Service超时时间为200s
c.如何区分前台、后台执行————当前APP处于用户态,此时执行的Service则为前台执行。
d.用户态:有前台activity、有前台广播在执行、有foreground service执行
2.4 ContentProvider type
a.ContentProvider创建发布超时并不会ANR
b.使用ContentProviderclient来访问ContentProverder可以自主选择触发ANR,超时时间自己定
client.setDetectNotResponding(PROVIDER_ANR_TIMEOUT);
ps : Will activity life cycle timeout be ANR? ——After testing, it does not.
override fun onCreate(savedInstanceState: Bundle?) {
Thread.sleep(60000)
super.onCreate(savedInstanceState)
setContentView(R.layout.activity_main)
}
3. Reasons for ANR
Many developers believe that it is time-consuming operations that lead to ANR, all of which are problems at the app application layer. In fact, most of the ANR in the online environment is caused by system reasons.
3.1 Application layer causes ANR (time-consuming operation)
a. 函数阻塞:如死循环、主线程IO、处理大数据
b. 锁出错:主线程等待子线程的锁
c. 内存紧张:系统分配给一个应用的内存是有上限的,长期处于内存紧张,会导致频繁内存交换,进而导致应用的一些操作超时
3.2 The system causes ANR
a. CPU被抢占:一般来说,前台在玩游戏,可能会导致你的后台广播被抢占CPU
b. 系统服务无法及时响应:比如获取系统联系人等,系统的服务都是Binder机制,服务能力也是有限的,有可能系统服务长时间不响应导致ANR
c. 其他应用占用的大量内存
Four, analysis log
When an ANR occurs, the system will generate an anr log file (under the /data/anr directory of the mobile phone, the file name may vary from manufacturer to manufacturer, most of which are called trace files in the industry), which contain the following important information.
4.1 CPU load
Load: 2.62 / 2.55 / 2.25
CPU usage from 0ms to 1987ms later (2020-03-10 08:31:55.169 to 2020-03-10 08:32:17.156):
41% 2080/system_server: 28% user + 12% kernel / faults: 76445 minor 180 major
26% 9378/com.xiaomi.store: 20% user + 6.8% kernel / faults: 68408 minor 68 major
........省略N行.....
66% TOTAL: 20% user + 15% kernel + 28% iowait + 0.7% irq + 0.7% softirq
As shown above:
- First line: the average number of active processes that are using and waiting to use the CPU in 1, 5, and 15 minutes
- The second line: indicates that the load information is captured 0~1987ms after the occurrence of ANR. At the same time, it also pointed out the time point of ANR: 2020-03-10 08:31:55.169
- The middle part: the details of the CPU occupied by each process
- The last line: the total CPU information occupied by each process.
Glossary:
a. user:用户态,kernel:内核态
b. faults:内存缺页,minor——轻微的,major——重度,需要从磁盘拿数据
c. iowait:IO使用(等待)占比
d. irq:硬中断,softirq:软中断
note:
- The high proportion of iowait means that it is very likely that ANR is caused by the time-consuming io. For details, check whether there are more process faults major.
- The load of a single-process CPU is not limited to 100%, but there are several hundred per cent if there are several cores, for example, the upper limit of 4 cores is 400%.
4.2 Memory information
Total number of allocations 476778 进程创建到现在一共创建了多少对象
Total bytes allocated 52MB 进程创建到现在一共申请了多少内存
Total bytes freed 52MB 进程创建到现在一共释放了多少内存
Free memory 777KB 不扩展堆的情况下可用的内存
Free memory until GC 777KB GC前的可用内存
Free memory until OOME 383MB OOM之前的可用内存
Total memory 当前总内存(已用+可用)
Max memory 384MB 进程最多能申请的内存
It can be concluded from the meaning: Free memory until OOME the value of 160c70c8cba7d0 is very small, it is already in a memory tight state. The application may be consuming too much memory.
In addition, in addition to the memory information in the trace file, there is also memory information in the ordinary eventlog log (not necessarily printed)
04-02 22:00:08.195 1531 1544 I am_meminfo: [350937088,41086976,492830720,427937792,291887104]
The above four values refer to:
- Cached
- Free,
- Zram,
- Kernel,Native
The memory of Cached+Free represents the current available memory of the entire mobile phone. If the value is small, it means that it is in a memory tight state. Generally, the threshold for low memory is: 4G memory mobile phone, the following threshold: 350MB, the above threshold is: 450MB
ps: If onTrimMemory is printed in the log before and after the ANR time point, it can also be used as a reference judgment for memory shortage
4.3 Stack Message
Stack information is the most important information, showing the current state of all threads of the process where ANR occurs.
suspend all histogram: Sum: 2.834s 99% C.I. 5.738us-7145.919us Avg: 607.155us Max: 41543us
DALVIK THREADS (248):
"main" prio=5 tid=1 Native
| group="main" sCount=1 dsCount=0 flags=1 obj=0x74b17080 self=0x7bb7a14c00
| sysTid=2080 nice=-2 cgrp=default sched=0/0 handle=0x7c3e82b548
| state=S schedstat=( 757205342094 583547320723 2145008 ) utm=52002 stm=23718 core=5 HZ=100
| stack=0x7fdc995000-0x7fdc997000 stackSize=8MB
| held mutexes=
kernel: __switch_to+0xb0/0xbc
kernel: SyS_epoll_wait+0x288/0x364
kernel: SyS_epoll_pwait+0xb0/0x124
kernel: cpu_switch_to+0x38c/0x2258
native: #00 pc 000000000007cd8c /system/lib64/libc.so (__epoll_pwait+8)
native: #01 pc 0000000000014d48 /system/lib64/libutils.so (android::Looper::pollInner(int)+148)
native: #02 pc 0000000000014c18 /system/lib64/libutils.so (android::Looper::pollOnce(int, int*, int*, void**)+60)
native: #03 pc 0000000000127474 /system/lib64/libandroid_runtime.so (android::android_os_MessageQueue_nativePollOnce(_JNIEnv*, _jobject*, long, int)+44)
at android.os.MessageQueue.nativePollOnce(Native method)
at android.os.MessageQueue.next(MessageQueue.java:330)
at android.os.Looper.loop(Looper.java:169)
at com.android.server.SystemServer.run(SystemServer.java:508)
at com.android.server.SystemServer.main(SystemServer.java:340)
at java.lang.reflect.Method.invoke(Native method)
at com.android.internal.os.RuntimeInit$MethodAndArgsCaller.run(RuntimeInit.java:536)
at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:856)
........省略N行.....
"OkHttp ConnectionPool" daemon prio=5 tid=251 TimedWaiting
| group="main" sCount=1 dsCount=0 flags=1 obj=0x13daea90 self=0x7bad32b400
| sysTid=29998 nice=0 cgrp=default sched=0/0 handle=0x7b7d2614f0
| state=S schedstat=( 951407 137448 11 ) utm=0 stm=0 core=3 HZ=100
| stack=0x7b7d15e000-0x7b7d160000 stackSize=1041KB
| held mutexes=
at java.lang.Object.wait(Native method)
- waiting on <0x05e5732e> (a com.android.okhttp.ConnectionPool)
at com.android.okhttp.ConnectionPool$1.run(ConnectionPool.java:103)
- locked <0x05e5732e> (a com.android.okhttp.ConnectionPool)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1167)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:641)
at java.lang.Thread.run(Thread.java:764)
As shown in the above log, this article screenshots two thread information, one is the main thread, and its status is native.
The other is OkHttp ConnectionPool, and its status is TimeWaiting. As we all know, textbooks say that there are five types of thread states: new, ready, executing, blocking, and dead . There are 6 kinds of thread states in Java, all of which are defined in: java.lang.Thread.State
question is here, what is the native state of the above main thread and where did it come from? In fact, the status in the trace file is the status defined in the CPP code. The following is a corresponding table.
It can be seen that the native state of the main function is executing the JNI function. Stack information is the first important information we analyze ANR, generally speaking:
The main thread is in the state of BLOCK, WAITING, and TIMEWAITING, which is basically a function block leading to ANR;
If the main thread is normal, you should check the CPU load and memory environment.
Five, typical case analysis
5.1 The main thread is not stuck, and the stack is in a normal state
"main" prio=5 tid=1 Native
| group="main" sCount=1 dsCount=0 flags=1 obj=0x74b38080 self=0x7ad9014c00
| sysTid=23081 nice=0 cgrp=default sched=0/0 handle=0x7b5fdc5548
| state=S schedstat=( 284838633 166738594 505 ) utm=21 stm=7 core=1 HZ=100
| stack=0x7fc95da000-0x7fc95dc000 stackSize=8MB
| held mutexes=
kernel: __switch_to+0xb0/0xbc
kernel: SyS_epoll_wait+0x288/0x364
kernel: SyS_epoll_pwait+0xb0/0x124
kernel: cpu_switch_to+0x38c/0x2258
native: #00 pc 000000000007cd8c /system/lib64/libc.so (__epoll_pwait+8)
native: #01 pc 0000000000014d48 /system/lib64/libutils.so (android::Looper::pollInner(int)+148)
native: #02 pc 0000000000014c18 /system/lib64/libutils.so (android::Looper::pollOnce(int, int*, int*, void**)+60)
native: #03 pc 00000000001275f4 /system/lib64/libandroid_runtime.so (android::android_os_MessageQueue_nativePollOnce(_JNIEnv*, _jobject*, long, int)+44)
at android.os.MessageQueue.nativePollOnce(Native method)
at android.os.MessageQueue.next(MessageQueue.java:330)
at android.os.Looper.loop(Looper.java:169)
at android.app.ActivityThread.main(ActivityThread.java:7073)
at java.lang.reflect.Method.invoke(Native method)
at com.android.internal.os.RuntimeInit$MethodAndArgsCaller.run(RuntimeInit.java:536)
at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:876)
The above main thread stack is a normal idle stack, indicating that the main thread is waiting for new messages.
If the main thread in the ANR log is in such a state, there may be two reasons:
The ANR is caused by other factors such as CPU preemption or memory tightness
When this ANR log is captured, the main thread has returned to normal
When encountering this kind of idle stack, you can analyze the CPU and memory conditions according to the method in Section 3. Secondly, you can pay attention to whether the time when the log is captured and the time when the ANR occurs is too long. If the time is too long, the stack has no analytical meaning.
5.2 The main thread performs time-consuming operations
"main" prio=5 tid=1 Runnable
| group="main" sCount=0 dsCount=0 flags=0 obj=0x72deb848 self=0x7748c10800
| sysTid=8968 nice=-10 cgrp=default sched=0/0 handle=0x77cfa75ed0
| state=R schedstat=( 24783612979 48520902 756 ) utm=2473 stm=5 core=5 HZ=100
| stack=0x7fce68b000-0x7fce68d000 stackSize=8192KB
| held mutexes= "mutator lock"(shared held)
at com.example.test.MainActivity$onCreate$2.onClick(MainActivity.kt:20)——关键行!!!
at android.view.View.performClick(View.java:7187)
at android.view.View.performClickInternal(View.java:7164)
at android.view.View.access$3500(View.java:813)
at android.view.View$PerformClick.run(View.java:27640)
at android.os.Handler.handleCallback(Handler.java:883)
at android.os.Handler.dispatchMessage(Handler.java:100)
at android.os.Looper.loop(Looper.java:230)
at android.app.ActivityThread.main(ActivityThread.java:7725)
at java.lang.reflect.Method.invoke(Native method)
at com.android.internal.os.RuntimeInit$MethodAndArgsCaller.run(RuntimeInit.java:526)
at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:1034)
The above log shows that the main thread is in the execution state. Looking at the stack information, it can be seen that it is not in the idle state. ANR occurs because a time-consuming operation is performed in a click monitoring function.
5.3 The main thread is blocked by a lock
"main" prio=5 tid=1 Blocked
| group="main" sCount=1 dsCount=0 flags=1 obj=0x72deb848 self=0x7748c10800
| sysTid=22838 nice=-10 cgrp=default sched=0/0 handle=0x77cfa75ed0
| state=S schedstat=( 390366023 28399376 279 ) utm=34 stm=5 core=1 HZ=100
| stack=0x7fce68b000-0x7fce68d000 stackSize=8192KB
| held mutexes=
at com.example.test.MainActivity$onCreate$1.onClick(MainActivity.kt:15)
- waiting to lock <0x01aed1da> (a java.lang.Object) held by thread 3 ——————关键行!!!
at android.view.View.performClick(View.java:7187)
at android.view.View.performClickInternal(View.java:7164)
at android.view.View.access$3500(View.java:813)
at android.view.View$PerformClick.run(View.java:27640)
at android.os.Handler.handleCallback(Handler.java:883)
at android.os.Handler.dispatchMessage(Handler.java:100)
at android.os.Looper.loop(Looper.java:230)
at android.app.ActivityThread.main(ActivityThread.java:7725)
at java.lang.reflect.Method.invoke(Native method)
at com.android.internal.os.RuntimeInit$MethodAndArgsCaller.run(RuntimeInit.java:526)
at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:1034)
........省略N行.....
"WQW TEST" prio=5 tid=3 TimeWating
| group="main" sCount=1 dsCount=0 flags=1 obj=0x12c44230 self=0x772f0ec000
| sysTid=22938 nice=0 cgrp=default sched=0/0 handle=0x77391fbd50
| state=S schedstat=( 274896 0 1 ) utm=0 stm=0 core=1 HZ=100
| stack=0x77390f9000-0x77390fb000 stackSize=1039KB
| held mutexes=
at java.lang.Thread.sleep(Native method)
- sleeping on <0x043831a6> (a java.lang.Object)
at java.lang.Thread.sleep(Thread.java:440)
- locked <0x043831a6> (a java.lang.Object)
at java.lang.Thread.sleep(Thread.java:356)
at com.example.test.MainActivity$onCreate$2$thread$1.run(MainActivity.kt:22)
- locked <0x01aed1da> (a java.lang.Object)————————————————————关键行!!!
at java.lang.Thread.run(Thread.java:919)
This is a typical example of the main thread being blocked by a lock;
waiting to lock <0x01aed1da> (a java.lang.Object) held by thread 3
The waiting lock is <0x01aed1da>, and the holder of this lock is thread 3. Further search "tid=3" to find thread 3, and found that it is TimeWating.
Then the reason for the ANR is found: thread 3 holds a lock and does not release itself for a long time, and the main thread waits for this lock to time out. In the online environment, the common scenario of ANR due to lock is SharePreference writing.
5.4 CPU is preempted
CPU usage from 0ms to 10625ms later (2020-03-09 14:38:31.633 to 2020-03-09 14:38:42.257):
543% 2045/com.alibaba.android.rimet: 54% user + 89% kernel / faults: 4608 minor 1 major ————关键行!!!
99% 674/android.hardware.camera.provider@2.4-service: 81% user + 18% kernel / faults: 403 minor
24% 32589/com.wang.test: 22% user + 1.4% kernel / faults: 7432 minor 1 major
........省略N行.....
As in the above log, the second line is the nailing process, which occupies up to 543% of the CPU, which preempts most of the CPU resources, which leads to the occurrence of ANR.
5.5 Memory Tight Causes ANR
If there is a log, the CPU and stack are normal (not posted), and ANR still occurs, consider the memory shortage.
From the first line of CPU information, you can find that the time of ANR is 2020-10-31 22:38:58.468—CPU usage from 0ms to 21752ms later (2020-10-31 22:38:58.468 to 2020-10-31 22 :39:20.220)
Then go to the system log to search for am_meminfo, this is not found. Searching onTrimMemory again, I found a lot of records;
10-31 22:37:19.749 20733 20733 E Runtime : onTrimMemory level:80,pid:com.xxx.xxx:Launcher0
10-31 22:37:33.458 20733 20733 E Runtime : onTrimMemory level:80,pid:com.xxx.xxx:Launcher0
10-31 22:38:00.153 20733 20733 E Runtime : onTrimMemory level:80,pid:com.xxx.xxx:Launcher0
10-31 22:38:58.731 20733 20733 E Runtime : onTrimMemory level:80,pid:com.xxx.xxx:Launcher0
10-31 22:39:02.816 20733 20733 E Runtime : onTrimMemory level:80,pid:com.xxx.xxx:Launcher0
It can be seen that before and after the time of the ANR, the memory is in a tight state, the level is 80, check the Android API document;
/**
* Level for {@link #onTrimMemory(int)}: the process is nearing the end
* of the background LRU list, and if more memory isn't found soon it will
* be killed.
*/
static final int TRIM_MEMORY_COMPLETE = 80;
It can be seen that the level of 80 is very serious. The application is about to be killed. The application that is killed is the desktop from the name, and even the desktop is about to be killed. How can ordinary applications be better?
Generally speaking, the occurrence of memory tension will cause multiple applications to have ANRs, so if you find that multiple applications have ANRs in the log, you can preliminarily determine that this ANR has nothing to do with your application.
5.6 ANR caused by system service timeout
The system service timeout usually contains the BinderProxy.transactNative keyword, please see the following log:
"main" prio=5 tid=1 Native
| group="main" sCount=1 dsCount=0 flags=1 obj=0x727851e8 self=0x78d7060e00
| sysTid=4894 nice=0 cgrp=default sched=0/0 handle=0x795cc1e9a8
| state=S schedstat=( 8292806752 1621087524 7167 ) utm=707 stm=122 core=5 HZ=100
| stack=0x7febb64000-0x7febb66000 stackSize=8MB
| held mutexes=
kernel: __switch_to+0x90/0xc4
kernel: binder_thread_read+0xbd8/0x144c
kernel: binder_ioctl_write_read.constprop.58+0x20c/0x348
kernel: binder_ioctl+0x5d4/0x88c
kernel: do_vfs_ioctl+0xb8/0xb1c
kernel: SyS_ioctl+0x84/0x98
kernel: cpu_switch_to+0x34c/0x22c0
native: #00 pc 000000000007a2ac /system/lib64/libc.so (__ioctl+4)
native: #01 pc 00000000000276ec /system/lib64/libc.so (ioctl+132)
native: #02 pc 00000000000557d4 /system/lib64/libbinder.so (android::IPCThreadState::talkWithDriver(bool)+252)
native: #03 pc 0000000000056494 /system/lib64/libbinder.so (android::IPCThreadState::waitForResponse(android::Parcel*, int*)+60)
native: #04 pc 00000000000562d0 /system/lib64/libbinder.so (android::IPCThreadState::transact(int, unsigned int, android::Parcel const&, android::Parcel*, unsigned int)+216)
native: #05 pc 000000000004ce1c /system/lib64/libbinder.so (android::BpBinder::transact(unsigned int, android::Parcel const&, android::Parcel*, unsigned int)+72)
native: #06 pc 00000000001281c8 /system/lib64/libandroid_runtime.so (???)
native: #07 pc 0000000000947ed4 /system/framework/arm64/boot-framework.oat (Java_android_os_BinderProxy_transactNative__ILandroid_os_Parcel_2Landroid_os_Parcel_2I+196)
at android.os.BinderProxy.transactNative(Native method) ————————————————关键行!!!
at android.os.BinderProxy.transact(Binder.java:804)
at android.net.IConnectivityManager$Stub$Proxy.getActiveNetworkInfo(IConnectivityManager.java:1204)—关键行!
at android.net.ConnectivityManager.getActiveNetworkInfo(ConnectivityManager.java:800)
at com.xiaomi.NetworkUtils.getNetworkInfo(NetworkUtils.java:2)
at com.xiaomi.frameworkbase.utils.NetworkUtils.getNetWorkType(NetworkUtils.java:1)
at com.xiaomi.frameworkbase.utils.NetworkUtils.isWifiConnected(NetworkUtils.java:1)
It can be seen from the stack that ANR: getActiveNetworkInfo occurred when obtaining network information.
As mentioned in the previous article: the system services are all Binder mechanisms (16 threads), and the service capabilities are also limited. It is possible that the system services will not respond for a long time and cause ANR. If other applications occupy all Binder threads, then the current application can only wait.
Further search: blockUntilThreadAvailable keywords:
at android.os.Binder.blockUntilThreadAvailable(Native method)
If you find the stack of a thread that contains this word, you can further look at its stack to determine what system service is called. This type of ANR is also a problem of the system environment. If this problem occurs frequently on a certain type of machine, the application layer can consider evasive strategies.
Six, conclusion
The techniques summarized in this article come from a lot of ANR log analysis experience in the author's work. If there are any mistakes or omissions, please leave a message and point out that communication promotes progress!
Author: vivo Internet Client Team—Wang Qinwei
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。