Author: Hao Lianfu, senior computer technology expert in the industry, currently the chief front-end architect of Agora. He has successively served as Principal Engineer/Engineering Director (UTStarcom), Sr. architect (Intel), T4 architect (YY), etc. He has designed and developed a dedicated operating system for telecom core network, high-performance TCP/IP protocol stack, and sound network SDK architecture Refactoring and other major projects.
introduction
This article is the beginning of the Dev for Dev (Developer for Developer) interactive innovation practice activity jointly initiated by Agora and the RTC developer community. It is also a real record of open source technology enthusiasts in their front-line work. The situations encountered in the article are quite representative, and are specially organized and shared for readers.
There are usually three ways to implement application Hook in iOS:
1. Method Swizzling: Use the Runtime feature of OC (Objective C) to dynamically change the corresponding relationship between SEL (method number) and
IMP (method implementation) to achieve the purpose of changing the OC method calling process, only applicable to dynamic OC methods ;
2. Fishhook: FaceBook (now renamed Meta) provides a tool for dynamically modifying and linking Mach-O files, using the principle of Mach-O file loading, by modifying the pointers of the lazy loading and non-lazy loading tables to achieve the C function HOOK effect; for static C methods;
3.Cydia Substrate: Formerly known as Mobile Substrate
, it is a powerful framework. Its main function is to perform HOOK operations on OC methods, C functions and function addresses. It is suitable for OC methods, C functions and function addresses. It is also suitable for the Android platform.
Fishhook is a third-party framework open sourced by Meta Company, which can dynamically rebind symbols of Mach-O binaries running on iOS/macOS on simulators and devices, thereby realizing dynamic modification of C language Function, commonly used for debugging/tracing of applications. This framework only contains two core files: fishhook.c
and fishhook.h
, so it is very lightweight and popular in many enterprise applications. However, in this open source project, which is known for its succinctness, there is an unnoticeable problem buried in it...
With the release of the iOS 15 Beta, many developers found widespread app crashes—often caused by system compatibility issues, and as we went deeper into the troubleshooting process, we discovered that the problem was not that simple. At first, after the developers reported the problem to Fishhook, various groups and individuals contributed several fixed PRs, but none of them could fundamentally solve the problem. After careful analysis of the XNU source code of the operating system kernel of iOS and macOS, we finally located the RootCause of the problem.
Tracing the source of the Fishhook Crash problem
In order to locate the problem, we usually try to reproduce the problem according to the existing error log. Through debugging and tracing, we found that in the environment of iOS 15 or macOS 12, the Fishhook code will crash 100% when rebinding symbols. , it is this crash that makes the application that integrates Fishhook unusable. In view of the large impact of this problem, some applications using the fishhook project urgently removed the component after finding the to alleviate its impact.
Root cause of fishhook crash
The working principle of Fishhook requires the Hook to modify the symbol to dynamically bind the data segment. The default permission of these data segments is generally read-only , so it is necessary to add the "write" permission to modify, and the problem is exactly here - we are in During the investigation, it was found that there is a Bug in the code for adding "write" permission to 16232b95324332 in . The relevant code of the problem is as follows:
There are 3 serious errors in this code. In order to make it easier to read, we will use red, green, and blue boxes to identify the relevant codes. The specific explanation of these errors is as follows:
1. First of all, it is not possible to judge whether you need to increase the "write" permission based on the segname of __DATA_CONST
, because starting from iOS 14.5 or even earlier versions, you need to hook a segment called __AUTH_CONST
, so it is not enough to only hook a field of __DATA_CONST
and ;
2. Secondly, when obtaining the current vm prot, transmits the wrong address , which should not be rebindings
, because the address we want to write is indirect_symbol_bindings
;
3. Finally, the COW mechanism of the XNU kernel is different from the Linux Kernel. For the vm segment mapping of RO, VM_PROT_COPY
needs to be explicitly specified to increase the "write" permission, but the mprotect system call of XNU BSD cannot do this at all, so this sentence The mprotect system call is just like a , which is equivalent to doing nothing! The key code logic of XNU MACH is as follows:
The above three errors in the Fishhook code are superimposed together, and eventually lead to the occurrence of "write" protection error indirect_symbol_bindings
when modifying the data pointed to by , and the resulting Crash affects the entire application system.
Best way to fix Fishhook crashes
Now that we have found the location of the bug, the idea of repairing is only to prescribe the right medicine:
- Modify the original wrong address rebindings to
indirect_symbol_bindings
; - Change the mprotect system call to use the
vm_protect
system call, and addVM_PROT_COPY
option; - The code is logically modified to perform the "write" action only when the vm_protect system call is executed successfully.
Therefore, the core code of the bug fix is as follows:
It should be noted here that, first of all, the VM_PROT_COPY
option must be added when adding the "write" permission to the data segment that is dynamically bound to the symbol, otherwise the write operation will fail; secondly, "only vm_protect
system call returns successfully" must be added to the code logic. Do actually "write" these data segments, otherwise do nothing.
After rigorous testing and repeated verification, we have completely fixed this bug and submitted a PR to the Fishhook official on June 12, 2021 ( https://github.com/facebook/fishhook/pull/87 ), Fishhook After comparing several fixes, the maintenance team at Merge finally chose our fix and merged it into the master branch, and the issue was finally resolved.
System heating (level) makes the "frozen" bug visible
Readers are likely to be curious, why there is no such problem in versions before iOS 15 or macOS 12?
In fact, the operating system before iOS 15 or macOS 12 also has this defect. The protection of these data segments is not rigorous. The data segment that should be "read-only" for does not remove the "write" permission . We investigate. The relevant evidence is as follows:
In the above evidence fragment, the protection value of 3 indicates that the permission is "readable and writable" , so the "write" operation of the Hook action in the Fishhook code does not have any problems in the old version of iOS/macOS. However, the protection of these data segments in the new operating system of iOS 15/macOS 12 is more strict, and some adjustments have been made to the corresponding permissions-the "read and write" permissions that should be given to "read-only" data segments are revised to "Read-only", which means that the value of protection in the above evidence fragment has changed. The relevant evidence is as follows:
The protection value of 1 in the above code snippet stands for "read-only" -- as it should. But it was this "correction" that created a logical conflict with the original "improper" configuration, and finally the Fishhook bug was exposed in the newer iOS 15/macOS 12 system, resulting in a serious crash. From a code point of view, this bug in Fishhook has obviously always existed, but it did not constitute a trigger condition in the early iOS and macOS versions, so the hidden danger has been hidden until the relevant conditions are changed.
Summarize
Usually in the process of application development, we often introduce third-party modules, especially low-level open source components that are widely used, based on the principles of not repeating the creation of wheels, rapid launch, and continuous iteration. However, as the IT infrastructure changes, the system environment will continue to add new features and discard old implementations over time. During this process, our applications will inevitably encounter unavailability challenges due to dependency problems. As developers of business applications, we must continue to improve our ability to trace the source of problems to upstream components, uphold the original intention of developers, take from open source, and give back to open source.
Dev for Dev column introduction
Dev for Dev (Developer for Developer) is a developer interactive innovation practice activity jointly initiated by Agora and the RTC developer community. Through various forms of technology sharing, communication and collision, and project co-construction from the perspective of engineers, the power of developers is gathered, the most valuable technical content and projects are mined and delivered, and the creativity of technology is fully released.
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。