Original address: How does the dynamic link library function address
Recently, we found that go
was compiled as so
, and the memory usage has increased a lot. After preliminary analysis, it is caused by dynamic symbols. Just learning to learn~
hello world
int main() {
printf("hello world\n")
}
在最简单的hello world
中, printf
最终也是来自libc.so
动态链接库objdump
,我们可以找到这一行:
400638: call 400520 <printf@plt>
Here printf@plt
means printf
This function is a dependent external function and needs to be dynamically addressed.
Why function addressing is needed
Unlike static linking, the function address can be determined at link time (before execution).
However, the address of the dynamic link library can only be determined when the program is executed and the so file is loaded. Then, to call a function in a dynamic library, there is no way to know the address in advance, so a mechanism is needed to find the address of the function.
Specifically, there are two types of addressing:
- function address exported in so
- so internally calls the address of a non-exported function
in short
The first is to address by function name, which is equivalent to calling dlsym(x, "printf")
in the main program to address, and then dlsym
will be found in the so file printf
The address of the second, is addressed by offset, although the absolute address is not fixed, but inside the so file, the offset between the two functions is fixed.
cache acceleration
Searching through strings is relatively inefficient when you think about it, so what is the way to speed it up? The principle is also simple, that is to add cache.
Specifically, through the cooperation of two segments in the executable file, .plt
executable, .got.plt
writable, to achieve the effect of caching.
Or start with this line call
instruction
400638: call 400520 <printf@plt>
400520
from .plt
section, and .plt
is an executable continuation objdump
instruction can be seen
400520: jmp QWORD PTR [rip+0x200afa] # 601020 <printf@GLIBC_2.2.5>
400526: push 0x0
40052b: jmp 400500 <.plt>
Here are two jmp
:
第jmp
的地址来自601020
, 601020
来自.got.plt
段, .got.plt
是可写When it is executed for the first time, 601020
400526
, which means slow path and needs dynamic search.
When the address is found, the value in 601020
will be modified, so that a follow-up jmp
will complete the addressing, and there is no need to search according to the string.
find logic
As for the slow path search, it will eventually be called to _dl_lookup_symbol_x
, which is roughly the logic:
- First, in the current executable file, find the function name through the offset of
0x0
f538a0825a01761c95931e1710473a57---, which isprintf
- Then from the so file, find the function address according to
printf
The core will use the data of two segments (both steps 1 and 2 above will use these two segments, but correspond to two different files)
-
.dynsym
is used to store symbols, that is,Elf_Sym
this structure, this structure stores the function offset address, name offset address, etc. -
.dynstr
used to store characters, such asprintf
the string itself exists here
Use nm -D
to see data like this, where U
means undefined
, a function that requires external addressing
00000000004004e8 T _init
U printf
Internal call
This is much simpler, the offset is fixed, no dynamic search is needed, just call the call
instruction directly. There are also several implementations on x86 call
one of which is based on offset.
Here's an interesting little detail, like this example:
000000000040061e <main>:
40061e: 48 83 ec 08 sub rsp,0x8
400622: bf 01 00 00 00 mov edi,0x1
400627: e8 e6 ff ff ff call 400612 <add>
40062c: 89 c6 mov esi,eax
The jump address of the call instruction is 0x400612
, how does this come from?
e8
means addressing by relative address, and then there is this result: 0x40062c + 0xffffffe6 = 0x400612
Usually use objdump
and gdb
to see the address of the call
instruction, which are also calculated. If you don't pay attention, you will think they are absolute addresses.
Summarize
- Calling the function in the dynamic link library is dynamically searched by the function name
- The exported functions, as well as the dependent external functions, have meta-information recorded in
.dysym
- The function name string exists in
.dynstr
-
.plt
and.got.plt
this pair is used to address the cache - The offset is used directly for internal calls,
call
One of the instructions is calculated according to the offset
If you find it interesting, please pay attention to my public account "Uncle Soy Milk"~
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。