This article is based on the practical case handled by the privatization module of the cloud team, and introduces how to use the privatization module, as well as the details behind the go get tool, including how to get the correct source of go to be privatized on the gitlab source code and certification issues . The article is organized according to Liu Yunpeng, a senior development engineer of YouPaiyun, shared in the Open Talk open class live broadcast. To replay the video, please scroll down and click "Read the original text" at the end of the article.
About Open Talk: A comprehensive technology salon initiated by Youpaiyun, adhering to the original intention of Youpaiyun to "make entrepreneurship easier", providing technology developers with multiple dimensions including technology, operation and maintenance, products, and entrepreneurship in the form of dry goods Knowledge sharing helps corporate members to improve their professional skills and promote better and faster development of the enterprise.
R & D background
GO introduced Module features in version 1.11; version 1.13 introduced Module checksum check, which strengthened the security of Module; now version 1.16 has adopted Module mode by default. A few days ago, the GO team stated on the blog that the support for GOPAHT will be removed in version 1.17. If you have not used GO MODULE yet, hurry up and try GOMDULE.
The main difference between GOMODULE and GOPATH is the use of privatized modules. The use of public modules is the same, all modules are directly obtained through go get. For the privatized module GOPAHT, you can directly drop the module code in the GOPAHT directory, but GO Module cannot. It has its own code management method. Let's briefly introduce it below.
How does GO get Module
GO get module usually uses go get tool to get module, currently go get supports two methods:
The first is to pull code from the traditional VCS code hosting platform, which is mainly based on git, and also supports svn, hg, and other platforms.
The second is the GOPROXY protocol supported by version 1.12. Go obtains the code archive file on the GOPROXY server.
Starting from 1.13, GO also uses a checksum check-GO SUM, all modules will check their checksums after downloading. It compares the hash value of the downloaded module with the hash value in the Google online database to prevent the module from being tampered with. Only the verified module can be installed and used normally.
How VCS obtains modules
GO supports many version management tools. First, you need to determine what version management tool to use to get the module. The judgment methods are roughly divided into three categories, which do not rely on the other two static matching methods and one dynamic matching method.
static matching method
prefix matching: such as github, Google's bitbuket and apache, openstack and other code hosting platforms, will be built in the go get tool chain, and will determine the prefix of the module. When the prefix matches, the corresponding version management tool will be used. In the example on the left of the figure, the github.com/eamaple/pkg module will match the prefix and match the github. At the same time, you can know that github uses the git tool.
regular matching: The regular method is to add a suffix to the module. The suffix name can be the suffix of one of the five version management tools introduced above (git, svn, hg, bzr, and fossil). The matching of suffixes is achieved through regular expressions. The two examples in the above figure both use .git as the suffix, and the sub-groups inside will be obtained through regular expression matching, that is, the VCS sub-groups will be matched to the module that is managed by git.
dynamic matching method
When the prefix and the regular expression do not match, the method of dynamic judgment will be adopted. go get will send an HTTP request, the URL is the module with the protocol header and parameters (go-get=1 ). go get expects the server to return the corresponding information of the module to help go get further operations. GO will send HTTPS requests by default. If the server wants to use the HTTP protocol, it can be handled by the environment variable GOINSECURE. When GOINSECURE is 1, GO will use the HTTP protocol.
The expected return body of Go get is an HTML document, and what makes sense for GO is the meta tag with the name="go-import" attribute. The meta tag will tell GO how to get the module through the content attribute.
The content of content has three parts: the first part is root-path, which refers to the name of the module; the second part, vcs, represents the management tools that need to be used, such as git and svn. ; The third part of repo-url refers to the warehouse under which the original code of the module is stored, and the warehouse needs to be in the form of agreement plus warehouse address.
The above figure takes the sub-package of GO as an example. The go get request is simulated by curl. The golang.org/x/net server returns an html document. The useful part of the document is the part framed by the red circle, which is the meta tag, and the content part One part is the GO module name golang.org/x/net; the second part is git, which means you need to use git to get the original code; the third part is the address of the module hosting, which means it is hosted on the address of the module package googlesource.com/net . Note that meta tags can only be placed in the head. Go get parsing will start from the beginning and stop parsing when it encounters the end tag of head or the start tag of body.
Application of GIT in GO GET
Git supports the HTTP protocol and the SSH protocol. GO uses only the HTTP protocol by default when calling git, and the interactive process of git is disabled during the calling process. For example, git uses the HTTP protocol to clone a private warehouse and requires a user name and password. However, when GO calls git, the user name and password cannot be entered interactively, which will result in failure to obtain the module. The interaction is controlled by the environment variable GIT_TERMINAL_PROMPT. If you manually change the variable to 1, you can enable interaction and manually enter the user name and password.
So how to pass the user name and password to git without perception? In fact, in git, if you use the HTTP protocol, you can pass the user name and password through the netrc file, which is in the HOME directory and has two file formats:
- The first type: define the user name and password of the server by means of server name and user name and password;
- The second type: do not specify a server, and specify the same user name and password for all servers.
As shown in the figure above, gitlab.com is configured in the first item, the user name is root, and the password is admin. When cloning gitlab's private warehouse through git, you can pass the username root and admin to git, so that git can get the username and password without any perception, so that no more passwords are required. In the second article, the default user name and password are set for all servers through default. The user name is set as guest, and the password is 123456, which means that all servers except gitlab.com need to be authenticated, they will regard guest and 123456 as The user name and password are passed to the required program.
The SSH protocol is also supported when go calls git, but it is not used by default. The SSH protocol can be used only when it is displayed and specified when it is dynamically acquired. If the static matching method (prefix matching or regular matching) can be used to match the module information, only the HTTPS protocol can be used.
The module in the above figure is example.com/pkg, and the warehouse address is gitlab.example.com/example/pkg. The content of the meta tag contains complete module information. First, the first part is the name of the module, which is the same as the definition of the name of the previous module; followed by git, which means using git to get the code, and the last part is the warehouse address. Shows that the SSH protocol is specified, along with the git username and server SSH service port number.
Git ssh authentication is based on a key pair. If you don't have a key pair, you can use the SSH tool suite ssh-keygen to generate a key. The above figure lists the commonly used parameter -t, which can specify the type of key. Among them, the RSA key is probably the most commonly used, and I prefer to use ED25519. It has an obvious advantage that the key length is very short, the public key and private key are only 32 bytes, and the security can also be compared with the RSA key 3000 The left and right bits are comparable, and the security can be guaranteed, and the key length is short, so ED25519 is often used as the key.
When the secret key pair is generated, the file code of the key team will be generated in the .ssh folder under HOME, including the private key and the public key. The file ending with ".pub" j is a public key file, and the public key file needs to be configured on a code hosting platform such as gitlab or github. On the right is a screenshot of gitlab. The key used in the picture is the key in ED25519 format. You can see that the length is really very short.
GOPROXY get module
GO supports obtaining GO modules through GOPROXY protocol. The module is based on the HTTP protocol, it only uses HTTP get requests and uses standard HTTP status codes for calls. When using the public GOPROXY protocol, its GOPROXY proxy server defaults to no username and password. But in fact, if you need to build a private one, you can support HTTP basic authorization in the same way as before, through the .netrc file to configure the user name and password. In addition, GOPROXY has two features:
- First: Compared to using VCS to clone directly, GOPROXY will get the module faster. The reason will be explained in detail later.
- Second: It can solve the problem that the modules cannot be accessed, such as the inaccessibility of the Golang domain name. These modules can be accessed and downloaded through a proxy server built by a third party.
The configuration of GOPROXY is controlled by the GOPROXY environment variable, and the configuration is the proxy server URL. Multiple proxy server URLs can be configured, separated by commas and pipe characters. The difference between pipe characters and commas will be explained later.
The URL can be replaced by the fixed strings off and direct. off prohibits downloading modules from any source. Setting GOPROXY to off will prohibit downloading modules. Only local modules can be used. You cannot download modules from gitlab, github or other places. direct stands for pulling directly from VCS, which is generally used as an alternative.
Two examples are shown in the figure:
- The first is the syntax of Linux environment variables, which are set through export. Proxy.golang.org is configured earlier, which is Google’s official goproxy server. After the comma, the alternative solution direct is specified. When the GOPROXY server returns 403 and 410 status codes, it means that the module cannot be found. When specifying alternatives separated by commas, only when the server returns a 403 or 410 status code, go get will try to use alternatives. Here is to download the code from the version management platform.
- The second one uses another grammar configuration, the go env -w grammar comes with GO and is supported by GO1.13 version. It can be used across platforms. With this syntax, there is no operating system difference. You can configure GO-related environment variables in this way on windows, Linux, and max. In the example, it is set to the address of the proxy that is commonly used in China: goproxy.cn. The pipe character designation alternative is used here. The meaning of the pipe character is no matter what error the proxy server returns, even if it is not an HTTP error, such as a 500 error returned by the GOproxy server or a network error. Will try to use alternatives to download the module.
The implementation of GOPROXY is very simple, the official definition has only five interfaces.
The meanings of the three variables in the URL are as follows:
- base represents the URL address of the GOPROXY server;
- module indicates that the name of the module needs to be obtained;
- version is the version of the module.
case encoding problem
The HTTP URL definition is not case-sensitive. When the module or version appears in capital letters, confusion may occur in some systems. In order to avoid this problem, it is necessary to encode uppercase and lowercase letters, and convert uppercase letters into exclamation mark plus lowercase letters.
- The first interface is to get a list of all versions;
- The second interface is to get the information of the specified version;
- The third interface is to obtain the specified module and the mod file of the specified version;
- The fourth interface is to get the latest version of the module. This is an optional interface. If this interface is not provided and implemented, GOPROXY can still work normally;
- The last interface is to download the zip file of the specified version of the module.
The figure above is an example of the list interface, proxy.golang.org is the address of the proxy server, golang.org/x/text is the name of the module to be obtained, @v is a fixed string, and list is the list interface to be called. You can see that this interface returns all the versions of the text package. In the figure, after GO has obtained all the versions, the latest version of the module can be inferred from the version semantics.
As shown in the figure above, the content returned by the INFO interface and the LATEST interface is the same. Version: The version number of the fixed version string, Time is a string in fc3339 time format, which is optional and represents the submission time of the version.
Finally, there are MOD and ZIP interfaces. The MOD interface is to return the mod file of the specified version. In the example above, the latest version of the mod file is obtained. The text package only depends on the tools module. The ZIP file interface is to obtain the ZIP file of the specified version of the module. When it packs all the original files of the version into a ZIP file, go get finally downloads the module of this version through the interface.
As mentioned earlier, it is faster to obtain the source code through GOPROXY than through VCS. Downloading through zip will only download all the files of the current version and will not contain historical version information. If you clone the warehouse through VCS such as git, it will Obtain all historical version information; therefore, the size of the file obtained through the GOPROXY zip interface will be smaller and the download will be faster. It should be noted that GOPROXY defines the size of the module zip file and the total uncompressed limit of all its files is 500 MiB , The size of go.mod file and LICENSE file is limited to 16 MiB.
Go1.13 version began to add the module SUM verification mechanism. By default, all go modules will verify whether their hash is consistent with the online (default: sum.golang.org domestic: sum.golang.google.cn) record after downloading.
The verification process can be controlled by the environment variables GONOSUMDB and GOSUMDB: first look at the configuration of GOSUMDB, which specifies the online database address that needs to be used. Because sum.golang.org used by default is not accessible in China, the configuration in the above figure uses a domestic mirror built by Google, and it can also be configured to off, which means that the verification is disabled, that is, the download module does not verify the hash value. Abandon this process completely. I don't recommend this during use. You can use the environment variables of GONOSUMDB to configure modules that do not require verification. For example, private modules must not pass verification. GONOSUMDE operates by prefix matching. If gitlab.com is configured in the figure, then all packages starting with gitlab.com will not be checked and checked by GO.
Let's sort out the common variables below:
- GONOPROXY runs based on the prefix matching method. The above figure specifies gitlab.com, that is, all the code on gitlab.com, which is not obtained from the GOPROXY server, but is directly pulled from the original code server through the traditional VCS method;
- GONOSUMDB allows modules with matching prefixes to skip security checks;
- GOPRIVATE is equivalent to the collection of the previous two environment variables. Configuring GOPRIVATE is equivalent to configuring the previous two environment variables together;
- GOVCS, this is only added in GO1.16 version, its main function is to specify which modules use which VCS.
Take the cloud business practice again
Here's how to use private modules. The privatized gitlab service is commonly used in companies, and gitlab itself supports HTTP requests in response to go get. When getting the package through go get, the client will send an HTTP request to the gitlab server, and the server will return the response containing the meta tag after receiving the request. This tag tells the client that the module uses git to obtain the original code through the HTTP protocol. gitlab uses HTTPS protocol by default. After the client receives the response from the gitlab server, it can use git to pull the source code of the module correctly. After the module is downloaded, there will also be a checksum check process. You can add gitlab.com to the GOPRIVATE variable to inform go gitlabc.com that the related modules are all private modules to skip the checksum check.
In the internal practice of Youpaiyun, the situation is somewhat different. All HTTP services used in Youpaiyun need to be verified by Google again. All requests sent to the internal gitlab server will be checked in advance to see if there is a google authorized head, if not, it will be directly intercepted and a 403 error will be returned. This will cause all simple HTTP requests to not reach the gitlab server and be directly intercepted. HTTP requests sent by go will also be intercepted, which will cause go to fail to obtain module information correctly. At this time, although the original code on the clone server can be directly communicated with the ssh protocol, because go get does not have this information, the request fails. Therefore, the request indicated by the gray line in the figure below cannot actually be sent.
So how to solve it? The method is to use an additional http service to process the HTTP request of go get. The additional HTTP service does not have a verification process. After the request is passed, go get can correctly obtain the required meta information. The ssh protocol must be specified in meta, because the gitlab http service has secondary authentication, and requests without authentication cannot pass, so only the ssh protocol can be used. Permission authentication can be completed by SSH key pair, and authorization is performed without perception. The go get guide http service does not manage authorization-related issues, and all authorization processing is handed over to gitlab. As a private module, if there is no corresponding response program, authorization and authentication are all handled by gitlab.
go get request guidelines
How to use additional services to guide go get? This needs to modify the naming of the module package, which needs to be modified based on the gitlab naming rules.
Domain name warehouse
A complete module consists of several parts. The first is the domain name gitlab.com, lyp256 is the owner, and pkg is the project name of the module. For a single gitlab platform, the next two paragraphs are important, that is, to specify the module owner and project name, the domain name is definitely fixed and can be ignored.
Based on this rule, I implemented a simple small service to solve the processing of go get http requests. code show as below:
Gitlab CI will set up an empty container. The example in the figure uses the image of golang alpine. There is nothing in this mirror except golang. We need to install related dependencies and inject SSH authentication related content. The script is defined as follows:
The first step: Use mikdir -p to create a directory under cache. This directory is the cache on our CI machine. It is a space on the physical disk that can hold data and is used to cache go mod and reduce module downloads.
Step 2: Install the basic environment, tool packages, etc. The example in the figure installs git and g++, g++ is a dependency required for go compilation, and openssh is the ssh tool chain and git needs to be used.
Step 3: Process the SSH key. There are two steps here, trust the gitlab server secret key and import the authentication private key. The private key is imported through the environment variable $DEPLOY_SSH_KEY. You only need to save the content of the environment variable to the corresponding key file. The gitlab server secret key is obtained using ssh-keyscan and saved to the known_hosts file. Through the configuration of gitlab SI, put the private key that can access the git project in the environment variable $DEPLOY_SSH_KEY, put the private key in the corresponding ssh private key file and grant the correct permissions.
Finally, you need to configure the GOPRIVATE variable to define all go.holdcloud.com related modules as PRIVATE modules. Do not use proxy and check and check.
So far, all the preparations have been basically completed. The go test that follows is the normal ci test logic, which can be written according to the actual situation.
to sum up
- GO will remove support for GOPATH in version 1.17, it is recommended to migrate to GOMDULE as soon as possible;
- GO's checksum check can perceive changes in the code to improve security and usability, and it is recommended not to turn it off;
- It is recommended to keep the vendor to prevent the dependent modules from being deleted.