Research on Fairplay DRM and Obfuscation Realization

The two most important points for studying Fairplay DRM (Digital Rights Management) are authorization and encryption. But for a long time, there has been very little research on App DRM, and under this premise, Fairplay DRM has added a layer of "obstacles" to the security research of iOS App. By analyzing the problems in the design and implementation of the confusion system, we have overcome the obstacles of debugging and tracking, and designed a variety of static and dynamic countermeasures. At the same time, through a large number of reverse engineering, we have filled in the mechanism of the macOS system by security researchers. The cognitive gap in this part of Fairplay.

What is DRM?

DRM stands for Digital Rights Management, or digital copyright protection. In order to protect the music/videos/books/Apps distributed by the App Store from piracy, Apple developed the Fairplay DRM technology and applied for many related patents, such as:

For a long time, there has been little research on App DRM, and the key to DRM is authorization and encryption. The method of cracking Fairplay DRM encryption is commonly known as "smashing the shell", which is a necessary prerequisite for iOS App security research. Since Apple introduced the App DRM mechanism in 2013, classic "shell smashing tools" such as Cluth, Bagbak, and Flexdecrypt have been born, and such "shell smashing tools" usually require the support of jailbroken devices, so they have certain limitations.

The M1 Mac released in 2020 introduced the Fairplay DRM mechanism into MacOS. Since the permissions of Mac devices are not as strict as those of iOS, we are able to explore more Fairplay DRM principles on MacOS. The ultimate goal is to make the decryption process not restricted by the Apple platform. Next, let's talk about how it is implemented in Apple?

Implementation of DRM on Apple: Fairplay DRM

Flags in LC_ENCRYPTION_INFO

The encrypted MachO contains the LC_ENCRYPTION_INFO field, where cryptoff identifies the start offset of the encrypted part in the file, cryptsize identifies the size of the encrypted part, and cryptid indicates the encryption method. For apps protected by Fairplay DRM, the encryption size is a multiple of 4096, and the encryption method is identified as 1.

The components responsible for decrypting Mach-O mainly include: , FairplayIOKit in kernel mode and user mode.

Fairplay's Open

The XNU Kernel of MacOS has the export symbol text_crypter_create_hook. The IOTextEncryptionFamily driver registers this hook and acts as a bridge to forward the call to the FairplayIOKit kernel driver.

The function ultimately responsible for processing is:

com_apple_driver_FairPlayIOKit::xhU6d1(
  char const* executable_path,
  long long cpu_type,
  long long cpu_subtype,
  rp6S0jzg** out_handle
)

After that, FairplayIOKit in the kernel starts to initialize, and MIG calls are sent to fairplayd in user mode through the unfreed port in host_get_special_port. Fairplayd starts to process sinf and supp files in the SC_Info directory and returns the processed data to FairplayIOKit in the kernel.

Note: The specific workflow of fairplayd in user mode is beyond the scope of this article.

The structure of MIG call is as follows:

struct FPRequest{
    mach_msg_header_t header;
    mach_msg_body_t body;
    mach_msg_ool_descriptor_t ool;
    NDR_record_t ndr;
    uint32_t size;
    uint64_t cpu_type;
    uint64_t cpu_subtype;
};

struct FPResponse{
    mach_msg_header_t header;
    mach_msg_body_t body;
    mach_msg_ool_descriptor_t ool1; //supf文件映射
    mach_msg_ool_descriptor_t ool2; //unk，正比与加密内容的尺寸
    uint64_t unk1;
    uint8_t unk2[136];
    uint8_t unk3[84];
    uint32_t size1;
    uint32_t size2;
    uint64_t unk5;
};

After completing all the calls, the returned structure rp6S0jzg* is actually a handle of type uint32_t, and then you can use this handle to complete the decryption operation.

Fairplay's Decrypt Page

The aforementioned Fairplay Open operation finally returns a pager_crypt_info structure, where the Hook of page_decrypt is taken over by the IOTextEncryptionFamily driver, and finally forwarded to FairplayIOKit.

Finally, the function responsible for decryption in FairplayIOKit is defined as follows:

com_apple_driver_FairPlayIOKit::bvqhJ(
  rp6S0jzg *hanlde,
  unsigned long long offset,
  unsigned char const* src,
  unsigned char * dst
)

At this point, Fairplay's decryption logic has been called. It is worth noting that in Fairplay DRM, the concept of page is 4096 bytes.

So, what do sinf and supp files processed by user mode fairplayd look like?

SINF and SUPF files

structure

Fairplayd in user mode will read two important files carried with IPA: SINF and SUPF, which are stored in the SC_Info directory of the App.

The SUPF file and IPA are distributed together, and each user's IPA and SUPF files are the same. The SUPF file stores the key for encrypting Mach-O, but the key itself is encrypted by another mechanism. The SINF files as DRM is licensed per user, the purchase record user names and identifiers, and decrypt information SUPF needed, so in Sandbox Strategies, App can not read their own SINF file to prevent it from being used as The unique ID tracks the user .

SINF

The SINF file is a file with an LTV+KV structure, and its fields are as follows:

sinf.frma: game
sinf.schm: itun
sinf.schi.user: 0xdeadbeef
sinf.schi.key : 0x00000005
sinf.schi.iviv: 0x12345678901234567890123456789012
sinf.schi.righ.veID: 0x000007d3
sinf.schi.righ.plat: 0x00000000
sinf.schi.righ.aver: 0x01010100
sinf.schi.righ.tran: 0xdc64f80c
sinf.schi.righ.sing: 0x00000000
sinf.schi.righ.song: 0x59a73c58
sinf.schi.righ.tool: P550
sinf.schi.righ.medi: 0x00000080
sinf.schi.righ.mode: 0x00002000
sinf.schi.righ.hi32: 0x00000004
sinf.schi.name: User Name
sinf.schi.priv: (432 Bytes Private Key)
sinf.sign: (128 Bytes Private)

SUPF

The SUPF file is mainly divided into three parts. We named them Key Segments, Fairplay Certificate, and RSA Signature. Key Segments can contain multiple sub-segments to store the decryption information of multiple architectures.

KeyPair Segments:
    Segment 0x0: arm64, Keys: 0x36c/4k, sha1sum = e369546960d805dd1188d42e3350430c7e3a0025

Fairplay Certificate:
    Data:
        Version: 3 (0x2)
        Serial Number:
            33:33:af:08:07:08:af:00:01:af:00:00:10
        Signature Algorithm: sha1WithRSAEncryption
        Issuer: C=US, O=Apple Inc., OU=Apple Certification Authority, CN=Apple FairPlay Certification Authority
        Validity
            Not Before: Jul  8 00:48:29 2008 GMT
            Not After : Jul  7 00:48:29 2013 GMT
        Subject: C=US, O=Apple Inc., OU=Apple FairPlay, CN=AP.3333AF080708AF0001AF000010
        Subject Public Key Info:
            Public Key Algorithm: rsaEncryption
                RSA Public-Key: (1024 bit)
                Modulus:
                    00:b0:01:16:4b:62:b2:37:8d:60:12:4f:02:15:15:
                    a0:32:1b:e8:ed:44:ed:e9:17:5b:ec:9e:5d:11:24:
                    5a:66:2f:dc:a3:25:aa:52:70:e1:09:22:09:4b:65:
                    0f:67:f5:82:dc:af:78:9b:4c:45:f3:b4:f4:77:aa:
                    fc:a3:b2:84:c3:8b:09:c6:2e:55:f5:14:85:07:ac:
                    ae:0d:ff:ff:ca:41:3b:44:cb:52:b6:28:60:55:23:
                    35:8d:26:71:c6:12:a5:e0:72:58:09:3c:4a:9e:b6:
                    63:df:2a:91:94:27:eb:65:0a:b2:36:45:11:c1:91:
                    43:58:12:d9:e5:18:a1:ad:db
                Exponent: 65537 (0x10001)
        X509v3 extensions:
            X509v3 Key Usage: critical
                Digital Signature, Key Encipherment, Data Encipherment, Key Agreement
            X509v3 Basic Constraints: critical
                CA:FALSE
            X509v3 Subject Key Identifier: 
                7B:07:34:81:A5:75:D0:F6:11:BB:D2:36:3F:79:93:4B:A1:70:EB:CF
            X509v3 Authority Key Identifier: 
                keyid:FA:0D:D4:11:91:1B:E6:B2:4E:1E:06:49:94:11:DD:63:62:07:59:64

    Signature Algorithm: sha1WithRSAEncryption
         06:11:4e:87:ed:b1:08:70:c2:0d:e4:d2:94:bb:7f:ee:50:18:
         c0:2a:21:34:0e:99:1f:bf:60:a2:58:d0:0c:28:3d:03:5b:ab:
         4e:72:69:ba:41:52:45:b2:29:27:4a:c8:ba:7f:b5:9b:63:78:
         b1:68:41:40:59:3f:05:8a:57:74:c5:63:30:cc:f3:20:41:c0:
         3c:65:d4:0d:22:47:f3:97:76:e6:d6:3c:eb:e7:20:78:10:59:
         fd:96:09:82:c3:41:f0:5f:d0:3e:91:44:6d:77:3f:a5:d9:da:
         f0:f7:53:ad:94:61:28:1c:4c:40:3b:17:2b:dd:e3:00:df:77:
         71:22

RSA Signature: 6aeb00124d62f75f5761f7c26ec866a061f0776be7e84bfad4b6a1941dbddfdb3bd1afdcc5ef305877fa5bee41caa37b1a9d4ce763cf7d2cb89efa60660a49dd5ddff0f46eee7cd916d382f727d912e82b6e0a62e8110c195e298481aa8c8162faac066ef017c6c2c508700d7adb57e0c988af437621e698946da1b09adf89e9

Next, let's talk about the obfuscation principle and implementation of Fairplay DRM.

Obfuscation principle and some implementations

LLVM Pass

LLVM is an excellent compiler framework, among which, we can roughly divide it into front-end, mid-end, and back-end:

The picture is an excerpt from CMU's CS 15-745 course: https://www.cs.cmu.edu/~15745 .

The front-end is responsible for converting high-level language into LLVM IR; the middle-end processes LLVM IR and completes a series of analysis and optimization tasks, which we call Pass, and outputs LLVM IR again; the back-end is responsible for converting LLVM IR into machine code. Among them, the mid-range gameplay is particularly rich, and basic optimization tasks: such as dead code elimination and constant folding are completed in this part; Address Sanitizer, PC Sanitizer and other compiler instrumentation are also performed here; other obfuscation frameworks are discussed. Many ollvm and Hikari, even Apple's obfuscation mechanism, are all based on this.

This obfuscation method can be basically divided into control flow obfuscation and data flow obfuscation. Other obfuscation methods, such as VMP, are not within the scope of this article.

makeOpaque

In the compiler, In order to prevent some specific expressions from being optimized, we will make equivalent changes to the expressions. We temporarily define this operation as makeOpaque (such as Safari's JavascriptCore, and its JIT component B3 provides this Mechanism), the C++ pseudo code is as follows:

Expression* makeOpaque(Expression *in);

Opaque Predicate

Predicate in computers refers to an expression that is True or False after execution. Some conclusions in number theory can be used as opaque predicate . The result of these opaque predicates is always True or False. For example, in the following expression, the result of y execution is always True:

uint32_t x = 0;
bool y = ((x * x % 4) == 0 || (x * x % 4) == 1);

An example of the application of opaque predicates to obfuscation is bogus CFG.
For example, the source sentence is as follows:

foo1();
foo2();

After transformation, we added a fake branch (ie bogus CFG)
：

foo1();
if ( false )
  junk_code();
else
  foo2();

But if there is no special treatment, the dead code elimination of the compiler and decompiler will remove the false branch, so we need the introduction of makeOpaque, assuming we introduce the expression in the previous example:

foo1();
uint32_t x = rand();
bool y = ((x * x % 4) == 0 || (x * x % 4) == 1);
if ( !y )
  junk_code();
else
  foo2();

If the compiler and decompiler have no corresponding recognition mechanism, this part of the dead code will . 161a4969f19708 inserts a large number of interfering instructions into the dead code, which can cause great trouble to the reverse engineer . After testing under -O2 optimization, Clang 11 can already recognize this rule, but GCC 5.4 cannot.

Reversible transform

Here we introduce the equivalent transformation methods commonly used in the current obfuscation technology.

XOR

The XOR rule is the most common transformation, so I won’t repeat it here.

x ^ c ^ c = x;

Affine transformation

Let's first look at the affine function.

Let's take a look at the practical application below.

Since the operations in computers are implicit modular operations, they have some interesting properties. For example, for an operation on uint32, the inverse element of the modular operation is defined as follows:

//对于
uint32_t a, r_a;

//如果满足
(a * r_a) % UINT32_MAX == 1;

//那么 a 和 r_a 互为模反元素

For a and r_a that are modular and inverse elements of each other (which can be obtained by the extended Euclidean algorithm), there are such characteristics:

uint32_t x = rand();
uint32_t y1 = a * x + c;
//那么满足
x == ra * y1 +  (- ra * c)

Finally, give an example to illustrate:

//对于互为模反元素的4872655123 * 3980501275，取
uint32_t x = 0xdeadbeef;
uint32_t c = 0xbeefbeef;
//则 -ra * c = 0x57f38dcb，且
((x * 4872655123) + 0xbeefbeef) * 3980501275 + 0x57f38dcb == x
/*
可在lldb中验证如下
(lldb) p/x uint32_t x=0xdeadbeef; (uint32_t)(((x * 4872655123) + 0xbeefbeef) * 3980501275 + 0x57f38dcb)
(uint32_t) $8 = 0xdeadbeef
*/

MBA expression (Mixed Boolean-Arithmetic Expression)

MBA expression is a method of mixing arithmetic operations (+, -, *, /) and bit operations (&, |, ~) to hide the confusion of the original expression. It has many forms based on different mathematical principles. Here we mainly introduce polynomial MBA, which is the most frequently encountered form in obfuscation technology.

Similarly, the MBA expression used in Fairplay confusion is:

//OperationSet(+, -, *, &, |, ~)
x - c = (x ^ ~c) + ((2 * x) & ~(2 * c + 1)) + 1;

The obfuscation operation using MBA mainly relies on the following two steps:

Opaque Constant

Opaque constants are based on the MBA obfuscation method, used to hide constants in the data stream. It uses permutation polynomials, which are invertible polynomials in finite fields.

Control flow flattening

This part is the hottest topic discussed in reverse engineering, which is equivalent to replacing the normal control flow conversion with a state machine to interfere with the static control flow analysis. There are also many solutions in the industry. At the same time, since this type of confusion is not obviously used in Fairplay DRM, we will not discuss it any more.

Indirect Branch

The starting addresses of some basic blocks are stored in global variables, and through the generation of opaque constants, the disassembly tool and the naked eye cannot directly obtain the target of the basic block jump. The model is as follows:

//记录基本块地址到全局查找表LUT
LUT[i] = PC;

//执行跳转
jmp/call LUT[makeOpaque(i)]

Specific examples:

In this way, the reverse engineer cannot directly obtain the target function and basic block of the jump. In the same way, by mapping the condition of the judgment statement to the jump table, it is also possible to confuse the conditional jump.

obfuscated Fairplay code, 161a4969f19902 IDA Pro can only identify the first basic block of the function most of the time, and cannot analyze the function boundary .

Cross function confusion + calling convention confusion

Under normal circumstances, the parameter passing of programming languages such as C language follows specific calling conventions, but some obfuscation tools will modify the calling conventions of some internal functions. Take Fairplay DRM as an example:

We can see that the conventional way of passing parameters through registers and stacks has been replaced with the way of passing parameters through heaps. When the structure is constructed, the characteristics of this parameter transfer can be clearly seen. At the same time, some of the passed parameters are XOR confused, and then restored in the sub-function, making it difficult for us to directly obtain the original data, and static analysis tools such as IDA Pro do not support cross-function data flow analysis.

What's more serious is that some important dependent data that affects the operation of the child function has been promoted to the parent function, causing to not be able to speculate on the running process of the child function before the calling relationship is restored.

So, the way to crack Fairplay DRM is to find its weaknesses.

The weakness of Fairplay confusion

Through the previous work, we have been able to complete the opening and decryption of Fairplay normally. Through a series of static analysis and tracking and debugging, we have discovered some countermeasures for this obfuscated system.

The essential reasons for these problems are: The obfuscation system is designed at the IR level and does not obfuscate some machine-related operations. Therefore, in the generated machine code, we can infer some characteristic information before obfuscation .

Function boundary recognition

As mentioned earlier, since Fairplay uses the obfuscation technique of indirect jumps, IDA Pro cannot directly analyze the boundaries of functions. Through tracking, we found that under the arm64e device, in the kernel driver, all the basic blocks of the same function use the same PAC Context, or PAC Modifier, when running to the jump instruction.

With this feature, we can group function boundaries and basic blocks, although so far these basic blocks are not connected.

Indirect jump

For unconditionally jump to , we can solve it by setting breakpoints to track the execution flow.

With tools like KeyPatch, we can restore some simple functions to the point where they are easier to understand.

图10

But the difficulty here is to restore the indirect jump instruction , as shown in the following figure:

图11

For this jump instruction, we can generate the following expression:

//cmp x0, #0
w10 = qword[x12 + (EQ * 0xB + w19) << 3]
//0xB代表两个基本块的在LUT中的下标差

Through the form of the CSET instruction, we can already infer that the jump instruction should be J.NE or J.EQ. Through our debugger plug-in, we can get the jump address of one of the branches and the original jump instruction, and then pass From the expression information, we can quickly infer the address of another branch.

图12

Through Keypatch, we can get the branch statement structure before obfuscation:

图13

At this point, we have been able to completely restore most of the control flow of Fairplay.

Data flow confusion

This part has been mentioned in the previous section. So far we have found the pattern of MBA expressions, but we have not been able to find the complete rules for generating opaque constants in Fairplay. Among them, the rewriting rule of MBA expression currently seems to have only one, namely:

x - c = (x ^ ~c) + ((2 * x) & ~(2 * c + 1)) + 1;

Some tools based on pattern matching, such as D810, can handle such situations better.

Concluding remarks

At present, we can already obtain the AES keys to decrypt each segment of Mach-O. After a lot of debugging and anti-obfuscation, we have reached a preliminary conclusion about the generation of these keys. We hope that the ultimate goal is to complete the study of Fairplay DRM encryption and decryption without relying on Apple devices.

图14

Finally, attach the source code , and welcome everyone for reference and research.

references

Eyrolles, N. (2017). Obfuscation with Mixed Boolean-Arithmetic Expressions: reconstruction, analysis and simplification tools (Doctoral dissertation, Université Paris-Saclay)
https://github.com/obfuscator-llvm/obfuscator
https://github.com/HikariObfuscator/Hikari
https://github.com/keystone-engine/keypatch
https://eshard.com/posts/d810_blog_post_1

About the Author

Wu Liao, Luoluo, and Zhu Mi all come from the Information Security Department of Meituan.

Read more technical articles from the

161a4969f19d6a | . You can view the collection of technical articles from the Meituan technical team over the years.

| This article is produced by the Meituan technical team, and the copyright belongs to Meituan. Welcome to reprint or use the content of this article for non-commercial purposes such as sharing and communication, please indicate "the content is reproduced from the Meituan technical team". This article may not be reproduced or used commercially without permission. For any commercial activity, please send an email to tech@meituan.com to apply for authorization.