Text|Ding Fei (flower name: Luther)
Senior Engineer of Ant Group
He is deeply involved in the commercialization of SOFAMesh products. The main direction is the design and implementation of system architecture upgrade solutions based on service mesh technology.
This article 4394 words read 10 minutes
|Foreword|
As the data plane component of Ant Group's ServiceMesh solution, MOSN has taken into account the extended development needs of third parties from the very beginning of its design. Currently, MOSN supports extending it through three mechanisms: gRPC, WASM, and Go native plugins.
I encountered a lot of problems when I was leading the design and implementation of the extension capability based on Go's native plug-in mechanism. In view of the lack of relevant information in this regard, I came up with this idea to make a very superficial summary. I hope it can be helpful to everyone. .
Note: This article only talks about problems and solutions, and does not read the code. At the end of the article, the checklist of the core source code will be given.
PART. 1--Technical background of the article
1. Runtime
Generally speaking, in the field of computer programming languages, the concept of "runtime" is related to some languages that need to use VM. The operation of the program consists of two parts: the object code and the "virtual machine". For example, the most typical JAVA, namely Java Class + JRE.
For some programming languages that do not seem to need a "virtual machine", there is not much concept of "runtime", and only one part is required to run the program, that is, the object code. But in fact, even C/C++ has a "runtime", the OS/Lib of the platform it runs on.
The same goes for Go, which doesn't seem to have much to do with a "virtual machine" or a "runtime" because running a Go program doesn't require a JRE-like "runtime" to be pre-deployed. But in fact, the "runtime" of the Go language is compiled by the compiler as part of the binary object code.
Figure 1-1. Java program, runtime, and OS relationship
Figure 1-2. C/C++ program, runtime and OS relationship
Figure 1-3. Go program, runtime, and OS relationship
2. Go native plugin mechanism
As a Go language that seems to be closer to the C/C++ technology stack, supporting extensions like dynamic link libraries has always been a strong demand in the community.
As shown in Figure 1-5, Go provides a plugin package in the standard library as a language-level programming interface for plugins. The essence of the src/plugin package is to use the cgo mechanism to call Unix standard interfaces: dlopen() and dlsym() . As such, it gives programmers from a C/C++ background an "I know how to do this" illusion.
Figure 1-4. C/C++ program loading dynamic link library
Figure 1-5. The Go program loads the dynamic link library
PART. 2--Typical problem solving
Unfortunately, compared with the C/C++ technology stack, although the output of Go's plug-in is also a dynamic link library file, it has a series of complex built-in constraints for the development and use of plug-ins. What's more troubling is that Go language not only did not systematically introduce these constraints, but even wrote some poor designs and implementations, which made the troubleshooting of plugin-related problems very anti-human.
In this chapter, let’s take a look at the focus of this chapter. When developing and using Go plugins, mainly compiling and loading plugins, the most common, but must be located in the Go standard library (mainly including compiler, linker, packager and runtime part) ) source code to fully understand several problems, and the corresponding solutions.
In short, when the main program of Go loads the plugin, it will perform a bunch of constraint checks on the two in the "runtime", including but not limited to:
- go version consistent
- go path consistent
- The intersection of go dependencies is consistent
- code consistent
- path is consistent
- go build some flags are consistent
1. Inconsistent Standard Library Versions
When the main program loads the plug-in, an error occurs:
plugin was built with a different version of package runtime/internal/sys
From the text of this error report, we can know that the specific library in question is runtime/internal/sys , which is obviously a built-in standard library of go. Seeing this, you may have great doubts: I obviously use the same local environment to compile the main program and plug-ins, why is the standard library not the same version?
The answer is that Go's error log description is inaccurate . The root cause of this error can be attributed to: some key compilation flags of the main program and the plug-in are inconsistent , which has nothing to do with the "version".
For example, you compile the plugin with the following command:
GO111MODULE=on go build --buildmode=plugin -mod readonly -o ./codec.so ./codec.go
But you use goland's debug mode to debug the main program. At this time, goland will help you assemble the go build command as follows:
Note that the compilation command of goland assembly contains the key
-gcflags all=-N -l parameter, but not in the command compiled by the plugin. At this point, you will get an error about runtime/internal/sys when you try to pull the plugin.
Figure 2-1. Loading failure caused by inconsistent compilation flags
The solution to this type of standard library version inconsistency is relatively simple: align the flags compiled by the main program and plugins as much as possible . In fact, there are some flags that do not affect the loading of plugins, and you can explore them slowly in specific practice.
2. Inconsistent third-party library versions
If you use vendor to manage Go dependencies, then you will encounter the following error 100% immediately after solving the problem in the previous section:
plugin was built with a different version of package xxxxxxxx
Among them, xxxxxxxx refers to a specific third-party library, such as github.com/stretchr/testify . There are several very typical reasons for this error. If there is no relevant troubleshooting experience, several of them may burn a lot of developer time.
Case 1. Inconsistent versions
As shown in the error, it seems that the reason is very clear, that is , the version of a third-party library that the main program and the plug-in depend on are inconsistent , and the error will clearly tell you which library is faulty. At this point, you can compare and check the go.mod files of the main program and the plug-in, find the version of the problem library respectively, and see if they are consistent. If you find that the main process and the plugin do have inconsistencies in commitid or tag at this time, the solution is very simple: align them .
But in many scenarios, you will only use part of the third-party library: such as a package, or just refer to some interface. This part of the code may not be changed at all in different versions, but the changes of other unused code will also lead to the change of the entire third-party library version, which will cause you to become the innocent victim of the "inconsistent version".
And, at this point you may immediately run into another question: Who is the benchmark to align with? main program? Or a plugin?
From a common sense, it is a better strategy to align with the main program as the baseline. After all, the plug-in is a newly added "accessory", and the main program and the plug-in usually have a "one-to-many" relationship. But what if the plugin's third-party library dependencies just don't line up with the main program for whatever reason? After trying for a long time, I haven't found a perfect solution to this problem.
If the versions can't be aligned, you can only fundamentally abandon the path of plug-ins.
This almost brainless strong consistency constraint of the Go language on the three-party library, on the one hand, avoids the potential problems caused by inconsistent versions at runtime; on the other hand, this deliberately does not give The design of programmer flexibility is very unfriendly to plug-in, customization, and expansion development.
Figure 2-2. Loading failure caused by inconsistent versions of the three-party libraries that depend on each other
Case 2. The version number is the same, but the code is inconsistent
Things get complicated when you check the go.mod file according to the idea of case 1, but are surprised to find that the version of the library that reports the error is the same. You might pull out the world's most advanced text-checking tool and spend the morning diffing the commitid of the tripartite repository, but they're all the same, seemingly stuck in Schrödinger's version.
One of the possible reasons for this problem is that someone has directly modified the code in the vendor directory, and the Go plugin mechanism will check the consistency of the code content.
This is really a very daunting and hard-to-find cause. No one will know about it except the person who modified the code, and those who have been "cracked" in other cases. If the modified vendor code is present in the main program, you have almost no reliable way to make them work properly.
Don't change the code directly in the vendor! ! !
Don't change the code directly in the vendor! ! !
Don't change the code directly in the vendor! ! !
Give back to the open source community, or fork-replace! ! !
The good news is, you don't need to fix this. Because even if solved, there will be bigger problems waiting for you.
Figure 2-3. The loading failure caused by the in-place modification of the common dependent third-party library code
Case 3. Inconsistent paths
When the problem is checked and solved according to the ideas of case 1 and case 2, but it still reports a different version of package, you may start to lose patience with Go's plug-in mechanism: the version is really "the same" , the code is really not moving, why are different versions reported? ? ?
The reason is: the plug-in mechanism will verify the "path" of the source code of the dependency library , so the vendor cannot be used to manage dependencies.
For example: your main program source code is placed in the /path/to/main directory, therefore, the directory that one of your third-party libraries depends on should be: /path/to/main/vendor/some/thrid/part/lib ;
Similarly, your plugin source code is placed in the /path/to/plugin directory, so the directory that the same third-party library depends on should be: /path/to/plugin/vendor/some/thrid/part/lib.
These " file path " data will be packaged into the binary executable file and used for verification. When the main program loads the plugin, Go's "runtime" "smartly" identifies it and the plugin's using the " file path " difference. Not the same code, and then reported a different version of package.
Figure 2-4. Loading failure caused by using vendor mechanism to manage third-party libraries
The same problem may also occur in the scenario where different machines/users are used to compile the main program and plug-ins separately: the user name is different, and the path of the go code should also be different.
The solution to this kind of problem is very straightforward: delete the vendor directory of the main program and plugins, or use the -mod=readonly compile flag .
At this point, if you use the same machine to compile the main program and the plug-in, then the common problems should be basically solved, and the plug-in mechanism should work normally. On the other hand, since the vendor is no longer used to manage dependencies, the problem of case 2 is also forced to be solved here: either submit a PR to the community, or fork-replace.
Figure 2-5. Successfully loaded
3. Inconsistent Go versions
fatal error: runtime: no plugin module data
In addition to the above problems, there is also a common error in the scenario of compiling the main program/plug-in on multiple machines separately. One possible reason for this error is inconsistent Go versions , just align them. (What if it just can't be aligned at the machine level? …)
Figure 2-6. Loading failure due to inconsistent Go versions
PART. 3--Unified solution
From the second part, we looked at some problems that were neither easy to troubleshoot nor easy to deal with. In addition, there are still some issues that have not been highlighted. As an extension mechanism officially supported by a programming language, it is really surprising that it is so user-friendly.
Since " Proprietary Cloud MOSN " mainly relies on Go's plug-in mechanism for customization, it is necessary to come up with a systematic solution to solve all these problems. After trying to modify the Go source code directly to no avail (Tucao: The source code of the Go plugin mechanism is slightly regrettable) , we focused on the "product layer" and the peripheral infrastructure to carry out related work:
- Unified compilation environment:
- Provide a standard docker image for compiling the main program and plugins to avoid any problems caused by inconsistencies in go versions, gopath paths, usernames, etc.;
- Pre-made go/pkg/mod to minimize the problem of re-downloading dependencies every time you compile because you don't use vendor mode.
- Unified Makefile:
- Provides a set of compiled Makefiles for the main program and plugins to avoid any problems caused by the go build command.
- Unity plugin development scaffolding:
- Plugins are linked to the main program's dependency version by the scaffolding, not the developer. And other related problems are solved by scaffolding.
- Pipelining:
- Pipeline compilation and deployment to further avoid errors.
Figure 3-1. Unified Solution
PART. 4--Key source code location
If you really want to fundamentally understand the mechanism of plug-in verification, here are some quick entry points for you to enter the source code reading state. The Go source code I am using is version 1.15.2. Relevant Go source location:
- compiler : go/src/cmd/compile/*
- linker : go/src/cmd/link/internal/ld/*
- pkg loader : go/src/cmd/go/internal/load/*
- runtime : go/src/runtime/*
1. What is go build doing?
You can add the -x parameter to the go build command to explicitly print out the entire process of compiling, linking, and packaging the Go program, for example:
go build -x -buildmode=plugin -o ../calc_plugin.so calc_plugin.go
Second, the target code generation
go/src/cmd/compile/internal/gc/obj.go:55 : pay attention to lines 67 and 72, here are two entries;
go/src/cmd/compile/internal/gc/iexport.go:244 : Pay attention to line 280, where path related data will be recorded.
3. Library Hash Generation Algorithm
go/src/cmd/link/internal/ld/lib.go:967: Note lines 995-1025, where the hash of pkg is calculated.
Fourth, the library hash check
go/src/runtime/symtab.go:392 : key data structures;
go/src/runtime/plugin.go:52 : checkpoint between the hash value at the link time and the hash value at runtime;
go/src/cmd/link/internal/ld/symtab.go:621 : hash assignment point during linking period;
go/src/cmd/link/internal/ld/symtab.go:521 : The runtime hash assignment point.
PART. 5--Summary
It can be seen that even though Go's native plug-in mechanism has various headaches, the SOFAStack team still adheres to the original intention of "open source, open and extensible", solves the problem through various means, and finally achieves this ability. Production available.
At present, the customized development of protocol codec and logger of Proprietary Cloud MOSN has achieved comprehensive plug-in. Next, we will continue to upgrade the MOSN architecture, aiming to provide plug-in support for various capabilities including routing logic, LB logic, registry center/configuration center docking, etc.
understand more……
MOSN Star ✨: https://github.com/mosn/mosn
Click to read the original text and build it together with us 🧸
Recommended reading of the week
MOSN builds Subset optimization ideas sharing
MOSN document usage guide
MOSN 1.0 is released, starting the evolution of the new architecture
Interview with MOSN Contributor|Open source can be done
Welcome to follow:
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。