头图

image.png

origin

Cloud native complexity

In the era of 200x, the server-side software architecture, the complexity of composition, and the degree of heterogeneity are much simpler than cloud native. At that time, most of the basic components were either developed by using enterprises, or they were supported by purchasing component services.

In the 201x era, the open source movement and the de-IOE movement emerged. Enterprises are more inclined to choose open source basic components. However, the maintenance and problem-solving costs of an open source foundation are not as low as they seem. Give you the source code, do you think you can see through everything? For enterprises, there are at least a few big questions now:

From a height:

  • How much human, talent and financial resources does the enterprise need to invest to find or train a person who can see through the basic components of open source?
  • Open source versions, security loopholes, and rapid changes make it difficult for even professionals to quickly see the software behavior during runtime.
  • The intricate dependencies and calling relationships between components, plus version dependencies and changes, make it impossible to run tests in the exact same environment (even if you use vm/docker image)

    • or you are still obsessed with backward compatibility, even though it has hurt countless programmers' hearts and nights
    • Just like the ancient Greek philosopher Heraclitus said: no one can step into the same river once

From the details:

  • For large-scale open source projects, it is impossible for ordinary enterprises to invest manpower to understand all the codes of (note that understands , not read it). And what enterprises really care about or use may only be a small number of sub-modules related to vital failures.
  • For large open source projects, even if you think that understands all the code of . You are also unlikely to know the state of the entire runtime. Even the author of the project may not be able to.

    • The author of the project is not in the enterprise, and it is impossible to fully understand the characteristics of the data in the enterprise. Not to mention the ubiquitous bugs
  • The spirit of open source software lies in openness and free (this does not mean free, only English is used here), and free is not only read only, it is also writable.

    • Most open source software is not designed by a genius product manager or a genius architect in a large company. But it is polished by many users. But if to understand all the code of to be writable, I am afraid that no one can modify the Linux kernel.
  • static code. I think this is the most important. Our so-called understands all the codes of , which refers to static codes. But experienced programmers know that only when the code runs can people really see through it. But if I can analyze a running program, I can say that I can understand all the codes of .

    • This reminds me of the general code review, what are you reviewing?

The Difficulties of Cloud-Native On-Site Analysis

It has been sold for a long time, so is there any way to show off? Can you quickly sort out and analyze the runtime behavior of open source projects?

  1. Add log.

    1. If there is a log in the source code for the problem to be solved, or a log switch is provided, of course, turn it on and finish. Finish work and have dinner. But how lucky is this?
    2. Modify the open source source code, add logs, and come online in an emergency. So how strong do you have to be with O&M? Are you sure adding once is enough?
  2. Language-level dynamic instrumentation injection code

    1. Analyze data or log in injected code. Such as alibaba/arthas . golang instrumentation
    2. This has requirements for the language, if it is c/c++, etc., it is helpless.
    3. There is generally a lot of impact on performance.
  3. debug

    1. Java debug / golang Delve / gdb, etc., all have certain usage thresholds. For example, debug information needs to be included when the program is packaged. This is in the current era when we like to care about the size of the image, and the debug information is mostly discarded. At the same time, a thread or even an entire process may be suspended when a breakpoint occurs. What happens in a production environment is a disaster.
  4. uprobe/kprobe/eBPF

    1. This method is worth a try when none of the above methods work. Next, let's analyze what is uprobe/kprobe/eBPF. Why is it valuable.

Reverse Engineering Thinking

We know that most programs are now coded in high-level languages, and then compiled to generate executable files (.exe / ELF) or intermediate files that are JIT compiled at runtime. Eventually computer instructions must be generated for the computer to run. For open source projects, if we find the mapping relationship between the generated computer instructions and the source code. Then:

  1. Put a hook in a reasonable position of this pile of computer instructions (it can be assumed that this position is the entry point of a high-level language function we are concerned about)
  2. If the program runs to the hook, we can visit:

    1. The function call stack of the current program
    2. Parameters and return values of the current function call
    3. Static/global variables for the current process

For open source projects, knowing the actual state during runtime is the key to problem solving in field analysis.

Since I don't want the beginning of this article to be too theoretical and scare people away, I moved the section reverse engineering thinking to the end.

practice

Before I wrote technical articles, I rarely wrote thousands of words without a single line of code. But I don't know if it's because I'm getting older or what, I always want to talk more nonsense.

Show me the code.

practice goals

Let's take a look at the Envoy sidecar proxy, the backbone of the so-called cloud-native service mesh, as an example, and look at the Envoy startup process and the process of establishing a client connection:

  1. What code is listening on the TCP port
  2. Whether the listening socket is set with the well-known SO_REUSEADDR at home and abroad
  3. Whether the TCP connection has the notorious Nagle algorithm that increases network latency (or does the socket set TCP_NODELAY instead), see https://en.wikipedia.org/wiki/Nagle%27s_algorithm

Having said so much nonsense, here comes the protagonist, eBPF technology and bpftrace, the tool we will use this time.

Let me talk about my environment first:

  • Ubuntu Linux 20.04
  • The system default bpftrace v0.9.4 (there is a problem with this version, which will be mentioned later)

Hello World

The 3 practical goals above are "great". But before we realize it, let's start with a small goal, write a Hello World.

We know that the main entry of the envoy source code is in main_common.cc:

int MainCommon::main(int argc, char** argv, PostServerHook hook) {
    ...
}

Our goal is to output a line of information when calling this function when envoy is initialized, indicating successful interception.

First look at the function address meta information in the envoy executable:

➜  ~ readelf -s --wide ./envoy | egrep 'MainCommon.*main'                                                       
114457: 00000000016313c0   635 FUNC    GLOBAL DEFAULT   14 _ZN5Envoy10MainCommon4mainEiPPcNSt3__18functionIFvRNS_6Server8InstanceEEEE

It needs to be explained here that when C++ code is compiled, the name of the internal representation function is not the name of the source code directly, but the name after normalization (mangling) (which can be manually converted with the c++filt command). Here we know that the deformed function name is: _ZN5Envoy10MainCommon4mainEiPPcNSt3__18functionIFvRNS_6Server8InstanceEEEE . So you can use bpftrace to intercept.

bpftrace -e 'uprobe:./envoy:_ZN5Envoy10MainCommon4mainEiPPcNSt3__18functionIFvRNS_6Server8InstanceEEEE { printf("Hello world: Got MainCommon::main"); }'

At this point, run envoy in another terminal

./envoy -c envoy-demo.yaml

The reality of the neck

When I first learned photography, the teacher told me a situation called: Beginner's luck. The technology world tends to do the opposite. This time, I intercepted nothing. I have tried various methods with self-righteous experience, but to no avail. I tossed about half a year in this groping and fruitless cycle...

breakthrough

After struggling for about half a year, I really wanted to give up. Unexpectedly, a small Hello World goal could not be completed. Until one day, I woke up to the fact that my basic knowledge was not good, and I couldn't locate the root cause of the problem. So I made up for the knowledge of program linking, ELF file format, ELF loading process memory and so on. Later, it took a lot of hard work to find the root cause (if it must be finished in one sentence, it is that the old version of bpftrace misinterpreted the address of the function meta information). The relevant details will be written in a separate technical article. Not much to say here. The solution is very simple, upgrade bpftrace, I directly compiled bpftrace v0.14.1 by myself.

Finally, after starting envoy, the output is:

Hello world: Got MainCommon::main
^C

practice

I try to think about this part out of the normal sequence. Because at the beginning to analyze the implementation principle and script program, it is better to browse the code first, and then run it for everyone to see.

Let's briefly browse the bpftrace program, trace-envoy-socket.bt:

#!/usr/local/bin/bpftrace

#include <linux/in.h>
#include <linux/in6.h>

BEGIN
{
       @fam2str[AF_UNSPEC] = "AF_UNSPEC";
       @fam2str[AF_UNIX] = "AF_UNIX";
       @fam2str[AF_INET] = "AF_INET";
       @fam2str[AF_INET6] = "AF_INET6";
}


tracepoint:syscalls:sys_enter_setsockopt
/pid == $1/
{
       // socket opts: https://elixir.bootlin.com/linux/v5.16.3/source/include/uapi/linux/tcp.h#L92     

       $fd = args->fd;
       $optname = args->optname;
       $optval = args->optval;
       $optval_int = *$optval;
       $optlen = args->optlen;
       printf("\n########## setsockopt() ##########\n");
       printf("comm:%-16s: setsockopt: fd=%d, optname=%d, optval=%d, optlen=%d. stack: %s\n", comm, $fd, $optname, $optval_int, $optlen, ustack);
}

tracepoint:syscalls:sys_enter_bind
/pid == $1/
{
       // printf("bind");
       $sa = (struct sockaddr *)args->umyaddr;
       $fd = args->fd;
       printf("\n########## bind() ##########\n");

       if ($sa->sa_family == AF_INET || $sa->sa_family == AF_INET6) {

              // printf("comm:%-16s: bind AF_INET(6): %-6d %-16s %-3d \n", comm, pid, comm, $sa->sa_family);
              if ($sa->sa_family == AF_INET) { //IPv4
                     $s = (struct sockaddr_in *)$sa;
                     $port = ($s->sin_port >> 8) |
                         (($s->sin_port << 8) & 0xff00);
                     $bind_ip = ntop(AF_INET, $s->sin_addr.s_addr);                         
                     printf("comm:%-16s: bind AF_INET: ip:%-16s port:%-5d fd=%d \n", comm,
                         $bind_ip,
                         $port, $fd);
              } else { //IPv6
                     $s6 = (struct sockaddr_in6 *)$sa;
                     $port = ($s6->sin6_port >> 8) |
                         (($s6->sin6_port << 8) & 0xff00);
                     $bind_ip = ntop(AF_INET6, $s6->sin6_addr.in6_u.u6_addr8);
                     printf("comm:%-16s: bind AF_INET6:%-16s %-5d \n", comm,
                         $bind_ip,
                         $port);
              }
              printf("stack: %s\n", ustack);

              // @bind[comm, args->uservaddr->sa_family,
              //        @fam2str[args->uservaddr->sa_family]] = count();

       }      
}

//tracepoint:syscalls:sys_enter_accept,
tracepoint:syscalls:sys_enter_accept4
/pid == $1/
{
       @sockaddr[tid] = args->upeer_sockaddr;
}


//tracepoint:syscalls:sys_exit_accept,
tracepoint:syscalls:sys_exit_accept4
/pid == $1/
{
       if( @sockaddr[tid] != 0 ) {
              $sa = (struct sockaddr *)@sockaddr[tid];
              if ($sa->sa_family == AF_INET || $sa->sa_family == AF_INET6) {
                     printf("\n########## exit accept4() ##########\n");

                     printf("accept4: pid:%-6d comm:%-16s family:%-3d ", pid, comm, $sa->sa_family);
                     $error = args->ret;

                     if ($sa->sa_family == AF_INET) { //IPv4
                            $s = (struct sockaddr_in *)@sockaddr[tid];
                            $port = ($s->sin_port >> 8) |
                            (($s->sin_port << 8) & 0xff00);
                            printf("peerIP:%-16s peerPort:%-5d fd:%d\n",
                            ntop(AF_INET, $s->sin_addr.s_addr),
                            $port, $error);
                            printf("stack: %s\n", ustack);
                     } else { //IPv6
                            $s6 = (struct sockaddr_in6 *)@sockaddr[tid];
                            $port = ($s6->sin6_port >> 8) |
                            (($s6->sin6_port << 8) & 0xff00);
                            printf("%-16s %-5d %d\n",
                            ntop(AF_INET6, $s6->sin6_addr.in6_u.u6_addr8),
                            $port, $error);
                            printf("stack: %s\n", ustack);
                     }
              }

              delete(@sockaddr[tid]);
       }
}

END
{
       clear(@sockaddr);
       clear(@fam2str);
}

Start now, if you don't understand why, don't worry, we will explain why later:

  1. Start the shell process so that we can get the PID of the envoy that will be started in advance
$ bash -c '
echo "pid=$$"; 
echo "Any key execute(exec) envoy ..." ; 
read; 
exec ./envoy -c ./envoy-demo.yaml'

output:

pid=5678
Any key execute(exec) envoy ...
  1. Start the trace bpftrace script. In a new terminal execute:
$ bpftrace trace-envoy-socket.bt 5678
  1. Go back to the shell terminal of step 1. Press the space bar, Envoy is officially running, and the PID remains at 5678
  2. At this point, we see the near real-time output of the trace in the terminal running the bpftrace script:
$ bpftrace trace-envoy-socket.bt 

########## 1.setsockopt() ##########
comm:envoy : setsockopt: fd=22, optname=2, optval=1, optlen=4. stack:
        setsockopt+14
        Envoy::Network::IoSocketHandleImpl::setOption(int, int, void const*, unsigned int)+90
        Envoy::Network::NetworkListenSocket<Envoy::Network::NetworkSocketTrait<...)0> >::setPrebindSocketOptions()+50
...
        Envoy::Server::ListenSocketFactoryImpl::createListenSocketAndApplyOptions()+114
...
        Envoy::Server::ListenerManagerImpl::createListenSocketFactory(...)+133
...
        Envoy::Server::Configuration::MainImpl::initialize(...)+2135
        Envoy::Server::InstanceImpl::initialize(...)+14470
...
        Envoy::MainCommon::MainCommon(int, char const* const*)+398
        Envoy::MainCommon::main(int, char**, std::__1::function<void (Envoy::Server::Instance&)>)+67
        main+44
        __libc_start_main+243


########## 2.bind() ##########
comm:envoy : bind AF_INET: ip:0.0.0.0          port:10000 fd=22
stack:
        bind+11
        Envoy::Network::IoSocketHandleImpl::bind(std::__1::shared_ptr<Envoy::Network::Address::Instance const>)+101
        Envoy::Network::SocketImpl::bind(std::__1::shared_ptr<Envoy::Network::Address::Instance const>)+383
        Envoy::Network::ListenSocketImpl::bind(std::__1::shared_ptr<Envoy::Network::Address::Instance const>)+77
        Envoy::Network::ListenSocketImpl::setupSocket(...)+76
...
        Envoy::Server::ListenSocketFactoryImpl::createListenSocketAndApplyOptions()+114
...
        Envoy::Server::ListenerManagerImpl::createListenSocketFactory(...)+133
        Envoy::Server::ListenerManagerImpl::setNewOrDrainingSocketFactory...
        Envoy::Server::ListenerManagerImpl::addOrUpdateListenerInternal(...)+3172
        Envoy::Server::ListenerManagerImpl::addOrUpdateListener(...)+409
        Envoy::Server::Configuration::MainImpl::initialize(...)+2135
        Envoy::Server::InstanceImpl::initialize(...)+14470
...
        Envoy::MainCommon::MainCommon(int, char const* const*)+398
        Envoy::MainCommon::main(int, char**, std::__1::function<void (Envoy::Server::Instance&)>)+67
        main+44
        __libc_start_main+243

At this time, simulate a client to connect:

$ telnet localhost 10000

After the connection is successful, you can see that the bpftrace script continues to output:

########## 3.exit accept4() ##########
accept4: pid:219185 comm:wrk:worker_1     family:2   peerIP:127.0.0.1        peerPort:38686 fd:20
stack:
        accept4+96
        Envoy::Network::IoSocketHandleImpl::accept(sockaddr*, unsigned int*)+82
        Envoy::Network::TcpListenerImpl::onSocketEvent(short)+216
        std::__1::__function::__func<Envoy::Event::DispatcherImpl::createFileEvent(...)+65
        Envoy::Event::FileEventImpl::assignEvents(unsigned int, event_base*)::$_1::__invoke(int, short, void*)+92
        event_process_active_single_queue+1416
        event_base_loop+1953
        Envoy::Server::WorkerImpl::threadRoutine(Envoy::Server::GuardDog&, std::__1::function<void ()> const&)+621
        Envoy::Thread::ThreadImplPosix::ThreadImplPosix(...)+19
        start_thread+217


########## 4.setsockopt() ##########
comm:wrk:worker_1    : setsockopt: fd=20, optname=1, optval=1, optlen=4. stack:
        setsockopt+14
        Envoy::Network::IoSocketHandleImpl::setOption(int, int, void const*, unsigned int)+90
        Envoy::Network::ConnectionImpl::noDelay(bool)+143
        Envoy::Server::ActiveTcpConnection::ActiveTcpConnection(...)+141
        Envoy::Server::ActiveTcpListener::newConnection(...)+650
        Envoy::Server::ActiveTcpSocket::newConnection()+377
        Envoy::Server::ActiveTcpSocket::continueFilterChain(bool)+107
        Envoy::Server::ActiveTcpListener::onAcceptWorker(...)+163
        Envoy::Network::TcpListenerImpl::onSocketEvent(short)+856
        Envoy::Event::FileEventImpl::assignEvents(unsigned int, event_base*)::$_1::__invoke(int, short, void*)+92
        event_process_active_single_queue+1416
        event_base_loop+1953
        Envoy::Server::WorkerImpl::threadRoutine(Envoy::Server::GuardDog&, std::__1::function<void ()> const&)+621
        Envoy::Thread::ThreadImplPosix::ThreadImplPosix(...)+19
        start_thread+217


########## 5.exit accept4() ##########
accept4: pid:219185 comm:wrk:worker_1     family:2   peerIP:127.0.0.1        peerPort:38686 fd:-11
stack:
        accept4+96
        Envoy::Network::IoSocketHandleImpl::accept(sockaddr*, unsigned int*)+82
        Envoy::Network::TcpListenerImpl::onSocketEvent(short)+216
        std::__1::__function::__func<Envoy::Event::DispatcherImpl::createFileEvent(...)+65
        Envoy::Event::FileEventImpl::assignEvents(unsigned int, event_base*)::$_1::__invoke(int, short, void*)+92
        event_process_active_single_queue+1416
        event_base_loop+1953
        Envoy::Server::WorkerImpl::threadRoutine(Envoy::Server::GuardDog&, std::__1::function<void ()> const&)+621
        Envoy::Thread::ThreadImplPosix::ThreadImplPosix(...)+19
        start_thread+217

If you have not been exposed to bpftrace before (I believe this is the case for most people), you can first guess and analyze the previous information, and then read my instructions below.

bpftrace script analysis

Back to the bpftrace script trace-envoy-socket.bt above.

You can see that there are many tracepoint:syscalls:sys_enter_xyz functions, each of which is actually some hook method. When the process calls the xzy method, the corresponding hook method will be called. In the hook method, the input parameters, return values (out parameters) of the xyz function, and the function call stack of the current thread can be analyzed. And can save the information analysis state in a BPF map.

In the above example, we intercepted setsockopt, bind, accept4 (entry and return), 4 events, and printed out the relevant input and output parameters, the stack of the current thread of the process.

There is one for each hook method: /pid == $1/ . It is an additional hook method call condition. Because the tracepoint type interception point is for the entire operating system, but we only care about the envoy process started by ourselves, so we need to add the pid of the envoy process as a filter. Among them, $1 is the first parameter when we run the bpftrace trace-envoy-socket.bt 5678 command, which is the pid of the novy process.

bpftrace output analysis

  1. The main thread of envoy sets the setsockopt of the main listening socket

    • comm:envoy. Indicates that this is the main thread
    • fd=22. Note that the socket file handle is 22 (each socket corresponds to a file handle number, which is equivalent to the socket id).
    • optname=2, optval=1. Description The setting item id is 2 (SO_REUSEADDR) and the value is 1.
    • setsockopt+14 to __libc_start_main+243 are the function call stack of the current thread. Through this, you can correspond to the source code of the project.
  2. The main thread of envoy monitors the binding of the main listening socket on port 10000 of IP 0.0.0.0, and calls bind

    • comm:envoy. Indicates that this is the main thread
    • fd=22. Explain that the socket file handle is 22, that is, the same socket as the previous step
    • ip: 0.0.0.0 port: 10000. Indicates the listening address of the socket
    • The other is the function call stack of the current thread. Through this, you can correspond to the project source code.
  3. The wrk:worker_1 thread, one of envoy's worker threads, accepts a connection from a new client. and setsockopt

    • comm:wrk:worker_1 . The wrk:worker_1 thread of one of envoy's worker threads
    • peerIP: 127.0.0.1 peerPort: 38686. Indicates the address of the new client peer.
    • fd: 20. Indicates that the newly accepted socket file handle is 20.
  4. wrk:worker_1 thread setsockopt new client socket connection

    • fd: 20. Indicates that the newly accepted socket file handle is 20.
    • optname=1, optval=1. Description The setting item id is 1 (TCP_NODELAY), and the value is 1.
  5. Ignore this for now, this is most likely the fabled epoll fake wakeup.

The above should be considered clear, but it must be added that the meaning of the setting item id in setsockopt:

setsockopt parameter description:

leveloptnamedescriptive namedescribe
IPPROTO_TCP=81TCP_NODELAY0: Enable Nagle algorithm, delay sending TCP packets
1: Disable Nagle algorithm
SOL_SOCKET=12SO_REUSEADDR1: Turn on address reuse

With this tracking, we achieved our stated goals. At the same time, you can see the thread function call stack, and you can analyze the actual behavior of envoy from the buried points we choose to focus on. Combine the source code to analyze the program behavior at runtime. Get there faster and more purposefully than just looking at static source code. Especially the high-level language features, OOP polymorphism and abstraction technologies that are widely used in modern large projects sometimes make it difficult to directly read the code to analyze the runtime behavior and design the actual purpose. With this technique, this difficulty is simplified.

Outlook

//TODO

Elaborate on reverse engineering thinking

This subsection is a bit deep. It is not necessary knowledge, but just introduces a little background. Due to space problems, it is impossible to make it clear. Please read the References section for clarity. You can skip this section if you don't like it. Brave if you can read this far, don't be scared off by this paragraph.

The relationship between the memory of the process and the executable file

executable file format

Program code is compiled and linked into executable files containing binary computer instructions. The executable file has a format specification. In Linux, this specification is called Executable and linking format (ELF). ELF contains binary computer instructions, static data, and meta-information.

  • Static data - stuff data that we hard code in the program, such as string constants, etc.
  • A collection of binary computer instructions, computer instructions generated by program code logic. Each function in the code generates a block of instructions at compile time, and the linker is responsible for arranging blocks of instructions consecutively into the .text section (area) of the output ELF file. The information records the address of each function at .text section . To put it bluntly, it is the mapping relationship between the function name in the code and the ELF file address or the memory address of the running process. .symtab section is useful for our reverse engineering analysis.
  • Meta Information - Tells the operating system how to load and dynamically link executable files to complete the initialization of process memory. This can include information that is not required at runtime, but can help locate the problem. As mentioned above section (area)

image.png

Typical ELF executable object file.
From [Computer Systems - A Programmer’s Perspective]

process memory

A process in the general sense refers to a running instance of an executable file. The memory structure of a process may be roughly divided into:

image.png
Process virtual address space.From [Computer Systems - A Programmer’s Perspective]

Among them, Memory-mapped region for shared libraries is the binary computer instruction part, which can be simply considered to be directly copied or mapped from the .text section (area) of the executable file (although this is not completely accurate).

Function call at the bottom of the computer

Sometimes I don't know if I'm lucky or unfortunate. Programmers today have a very different perspective on programming than they did in the 1990s. High-level languages/scripting languages, OOP, etc. all tell the programmer that you don't need to know the low-level details.

But sometimes understanding the underlying details can create generic innovations. Such as kernel namespace to container, netfiler to service mesh.

Come back and talk about the key function calls of this article. We know that function calls in high-level languages are actually compiled into function calls in machine language in most cases, and the stack processing is similar to that in high-level languages.

Such as the following piece of code:

//main.c

void funcA() {
    int a;
}

void main() {
    int m;
    funcA();
}

Generate assembly:

gcc -S ./blogc.c

Assembly result snippet:

funcA:
    endbr64
    pushq    %rbp
    movq    %rsp, %rbp
    nop
    popq    %rbp
    ret
...


main:
    endbr64
    pushq    %rbp
    movq    %rsp, %rbp
    movl    $0, %eax
    call    funcA <----- 调用 funcA
    nop
    popq    %rbp
    ret

That is, in fact, the bottom layer of the computer also has function call instructions, and there is also the concept of stack memory in the memory.

image.png

Stacked in-memory structures and references to CPU registersFrom [BPF Performance Tools]

So, just bury the point in the code and analyze the reference of the current CPU register. By analyzing the structure of the stack, the function call chain of the current thread can be obtained. The output/input parameters of the current function are also placed in the specified registers. So you can also visit the out/in parameters. The specific principle can be found in the reference section.

Buried

There are many ways to bury points in the ebpf tool, and the most commonly used ones include:

Which one to use has to refer to [BPF Performance Tools] for an in-depth look.

wonderful reference

  • [Computer Systems - A Programmer's Perspective - Third edition] - Randal E. Bryant • David R. O'Hallaron - An in-depth book on computer principles from the perspective of programmers and operating systems. Introduces the basic principles of compilation and linking, program loading, process memory structure, function call stack, etc.
  • https://cs61.seas.harvard.edu/site/2018/Asm2/ - Fundamentals such as function call stacks
  • [Learning Linux Binary Analysis] - Ryan "elfmaster" O'Neill - In-depth analysis and exploitation of ELF format
  • The ELF format - how programs look from the inside
  • [BPF Performance Tools] - Brendan Gregg

A little reference to the reality of a stuck neck

The root cause of a stuck neck

The root cause is something like https://github.com/iovisor/bcc/issues/2648 . I may write an article detailing it later.

Is there any function meta information (.symtab)?

In the Release ELF of Evnoy and Istio Proxy, is there any function meta information (.symtab) by default?

https://github.com/istio/istio/issues/14331

Argh, we ship envoy binary without symbols.

Could you get the version of your istio-proxy by calling /usr/local/bin/envoy --version? It should include commit hash. Since you're using 1.1.7, I believe the version output will be:

version: 73fa9b1f29f91029cc2485a685994a0d1dbcde21/1.11.0-dev/Clean/RELEASE/BoringSSL

Once you have the commit hash, you can download envoy binary with symbols from
https://storage.googleapis.com/istio-build/proxy/envoy-alpha-73fa9b1f29f91029cc2485a685994a0d1dbcde21.tar.gz (change commit hash if you have a different version of istio-proxy).

You can use gdb with that binary, use it instead of /usr/local/bin/envoy and you should see more useful backtrace.

Thanks!

@Multiply sorry, I pointed you at the wrong binary, it should be this one instead: https://storage.googleapis.com/istio-build/proxy/envoy-symblol-73fa9b1f29f91029cc2485a685994a0d1dbcde21.tar.gz (symbol, not alpha).
envoy binary file size - currently 127MB #240: https://github.com/envoyproxy/envoy/issues/240

mattklein123 commented on Nov 23, 2016

The default build includes debug symbols and is statically linked. If you strip symbols that's what takes you down to 8MB or so. If you want to go down further than that you should dynamically link against system libraries.FWIW, we haven't really focused very much on the build/package/install side of things. I'm hoping the community can help out there. Different deployments are going to need different kinds of compiles.

original:
https://blog.mygraphql.com/zh/posts/low-tec/trace/trace-istio/trace-istio-part1/


MarkZhu
83 声望21 粉丝

Blog: [链接]