systrace statistical method takes time

Author of this article: Fangx3

Android is a single-threaded model, the user's key events, screen touch and UI drawing are processed in the UI thread. Single thread means serial execution. If an operation takes time, subsequent operations will have to wait. At this time, the first perception of users is that they are stuck. Therefore, one of the easiest ways to troubleshoot Caton is to find the time-consuming method.

How do statistical methods take time?

The easiest way to count the time-consuming of a method during development is to place a timestamp at the start and end of the method. The subtraction of the two timestamps is the time-consuming of this method.

 fun take() {
  val start = System.currentTimeMillis()
  //..
  service.take();
  //...
  val end = System.currentTimeMillis()
  val const = end - start
}

The above method can count the time-consuming of our application code, but cannot count the time-consuming of Android's system methods.
In fact, the Android system has already embedded some points on some key links, but its implementation is not by burying timestamps like we do, but by Trace类 , and the Trace class also supports Our application layer calls and inserts a custom point, and captures and processes the point information of the Trace class in the systrace tool provided by Android. Finally, an Html file is generated, and the time-consuming situation of a complete link can be visually viewed through Chrome.

systrace is indeed a powerful tool for tuning during the development phase, but it has two obvious limitations that prevent this tool from being used online:

You need to connect to the PC and enable the Trace function by executing commands.
Developers need to manually add Trace.beginSection and Trace.endSection , which becomes the need to manually add the Trace function to predict the time-consuming location of development, but the online environment cannot predict where the time-consuming will be.

So if the above two problems can be solved, systrace can be used for online troubleshooting.

Run systrace off the PC

Here is a simple drawing of how systrace works:

As can be seen from the above figure, the data captured by systrace can be divided into two categories:

Function call information that occurs in the Java layer and the Native layer
Kernel-mode event information

The function call information of the Java layer and the Native layer is the information we collected by calling the method of the Trace class (also the data we need to care about this time), and the data information will be recorded in trace_marker ;
The time information of the kernel mode is provided by the ftrace function provided by Linux. By activating different event nodes, when the kernel is running, according to the node enable state, events will be recorded in the ftrace buffer.
Finally, systrace generates an Html file by retrieving the above two data integration.

As can be seen from the above figure, systrace sets the Tag through Atrace. If you can find the corresponding Tag of the type information that needs to be captured, and set it directly on the terminal, you can get rid of the restrictions on the PC side by taking out the data in the trace_marker. .

Set Tag on the terminal

 public static void beginSection(@NonNull String sectionName) {
  if (isTagEnabled(TRACE_TAG_APP)) {
    if (sectionName.length() > MAX_SECTION_NAME_LEN) {
      throw new IllegalArgumentException("sectionName is too long");
    }
    nativeTraceBegin(TRACE_TAG_APP, sectionName);
  }
}

The above is the beginSecion method in the system Trace class. First, it will judge whether the corresponding Tag is available, and then call the TraceBegin method of the native layer to write data when it is available.
And the implementation of isTagEnabled is as follows:

 public static boolean isTagEnabled(long traceTag) {
  long tags = sEnabledTags;
  if (tags == TRACE_TAG_NOT_READY) {
    tags = cacheEnabledTags();
  }
  return (tags & traceTag) != 0;
}

private static long cacheEnabledTags() {
  long tags = nativeGetEnabledTags();
  sEnabledTags = tags;
  return tags;
}

Seeing here, I wonder if the Trace function can be turned on by modifying the value of sEnabledTags ?
Through practice, it can be found that the Trace function cannot be turned on only by modifying sEnabledTags, so it can be roughly guessed that there should be a similar judgment in the native layer. The specific native code is in the /system/core/libcutils/trace-dev.c (Android O version code) file.

 static inline void atrace_begin(uint64_t tag, const char* name)
{
    if (CC_UNLIKELY(atrace_is_tag_enabled(tag))) {
        void atrace_begin_body(const char*);
        atrace_begin_body(name);
    }
}

It can be seen that the logic here is similar to the processing in Java. It is also to first determine whether the Tag is available, and then execute the write data logic if it is available. Continue to look at the implementation of atrace_is_tag_enabled .

 static inline uint64_t atrace_is_tag_enabled(uint64_t tag)
{
    return atrace_get_enabled_tags() & tag;
}
static inline uint64_t atrace_get_enabled_tags()
{
    atrace_init();
    return atrace_enabled_tags;
}

It can be seen that the value of the atrace_enabled_tags field is being obtained, and the sEnabledTags in the Trace class is also obtained by the nativeGetEnabledTags method. Therefore, we should modify this value of the native layer to enable the Trace function.

Referring to Facebook's profilo solution, get the handle corresponding to libcuitls.so through dlopen , and find the pointer of atrace_enabled_tags from the corresponding symbol, thereby setting atrace_enabled_tags to open the Trace function.

 std::string lib_name("libcutils.so");
std::string enabled_tags_sym("atrace_enabled_tags");

if (sdk < 18) {
  lib_name = "libutils.so";
  enabled_tags_sym = "_ZN7android6Tracer12sEnabledTagsE";
}
if (sdk < 21) {
  handle = dlopen(lib_name.c_str(), RTLD_LOCAL);
} else {
  handle = dlopen(nullptr, RTLD_GLOBAL);
}

atrace_enabled_tags = reinterpret_cast<std::atomic<uint64_t> *>(dlsym(handle, enabled_tags_sym.c_str()));

atrace_enabled_tags has different symbol names in different versions, so it is necessary to distinguish the next version here.
To query the specific symbol name, you can view it through the objdump tool, and on Mac, you can use the binutils gobjdump tool to view it.
Symbols like atrace_enabled_tags above are under Android version 18, we can directly view them through the gobjdump tool:

But sometimes a symbol name may be mangle. It is not very intuitive to check and confirm whether it is the symbol name we need. You can get a more intuitive symbol name through the c++filt tool demangle this symbol. It is convenient for us to confirm.

At this point, we get the pointer corresponding to the atrace_enabled_tags symbol, modify it to the specific corresponding Tag value, and simultaneously modify the sEnabledTags value in the Trace class through reflection to enable the Trace function. For the Tag to be set here, you can look at the Trace class provided by the system, which defines all the Tag values. We can obtain an int type value that needs to be set finally by ORing these Tag values.

data retrieval

After the above steps, we can enable the Trace function without executing the systrace script on the PC side, but from the above implementation schematic, we can see that the data is finally written in trace_marker , and this is in the kernel mode , the application layer cannot be read directly. In the process of finding the corresponding Tag that Trace opens, you can see that there are also definitions in the native code:

 int  atrace_marker_fd     = -1;

By looking at the code, you can find that this field is the corresponding file descriptor trace_marker .
And when we call Trace.beginSection to write, we will eventually call the native layer atrace_begin_body method

 void atrace_begin_body(const char* name)
{
    char buf[ATRACE_MESSAGE_LENGTH];

    int len = snprintf(buf, sizeof(buf), "B|%d|%s", getpid(), name);
    if (len >= (int) sizeof(buf)) {
        ALOGW("Truncated name in %s: %s\n", __FUNCTION__, name);
        len = sizeof(buf) - 1;
    }
    write(atrace_marker_fd, buf, len);
}

It can be seen that the final writing process is actually implemented by calling the write method.

We can get the pointer corresponding to the file descriptor of trace_marker in the same way as obtaining atrace_enabled_tags above. In this way, we have the file descriptor. In the write method of the hook system, we can use the file descriptor in the write method to determine whether to write to the trace_marker. Enter the content, if so, you can save the content directly to a file we customized to realize data retrieval.

But from the above code, we can see that the content finally written to trace_marker is a string of data such as "B|pid|name". If it is only such a string of data, it cannot be recognized and parsed by the systrace tool, so we also need to Data completion is performed in the following format.

 <线程名> - <线程id>  [000] ...1 <时间-秒>: tracing_mark_write: <B|E>|<进程id>|<TAG>

After exporting the data save file, you can convert the file into an Html file through the --from-file parameter provided by the systrace tool, and you can open and view it through Chrome.
The latest SDK Platform Tools has removed the systrace tool. Here, you can directly open the exported file through Perfetto without converting the file.
The effect of the data captured after opening Trace on the terminal is as follows. You can see that the data of the points buried in the Android system and the points added by ourselves can be captured.

Pre-judgment time?

Another pain point of systrace is the need to manually insert the Trace.beginSection and Trace.endSection methods, which means that it takes time to develop functions to predict where. But in most cases, you may not know where the time-consuming may occur, especially in the online environment, it is impossible to judge where the time-consuming will occur. Since it is impossible to predict, then increase all of them, but this is a huge workload for a large project, and it is not practical. Therefore, the Trace method is added to each method by means of function instrumentation.

If you only set the method name during instrumentation, the final generated file will be less readable, which is not conducive to data analysis. However, if the fully qualified name of the insertion method Trace.beginSection method has a limit on the length of sectionName, so here is the reference The implementation of Tencent's martix generates a methodId, and inserts a methodId when inserting, so as to avoid the length limit of the beginSection method.

The final effect is as shown above.

After a few operations, you will find that there will be some data of Did not Finish in the final generated file

After analyzing the data before and after, it was found that these places threw an exception and went through the abnormal process, resulting in the trace data not being closed. Therefore, it is also necessary to insert Trace.endSection in the catch code block that throws an exception when instrumenting to complete data closure.

Summarize

With the help of the Trace function provided by the system, we can generate a complete call link information, which is conducive to development and troubleshooting; enabling the systrace function on the terminal gets rid of the limitation of the PC, which is convenient for us to capture data in various environments, which can help us Discover some occasional or hidden time-consuming lag issues.

References

This article is published from the NetEase Cloud Music technical team, and any form of reprinting of the article is prohibited without authorization. We recruit various technical positions all year round. If you are ready to change jobs and happen to like cloud music, then join us at grp.music-fe(at)corp.netease.com!

systrace statistical method takes time

How do statistical methods take time?

Run systrace off the PC

Set Tag on the terminal

data retrieval

Pre-judgment time?

Summarize

References

云音乐技术团队

引用和评论

AI Code 在团队开发工作流的融合思考

Java8的新特性

Java11的新特性

Java5的新特性

Java9的新特性

Java13的新特性

Java7的新特性