Xu Shiwei: Import Process and Go+ Module Management丨Go+ Open Class • Second Issue

In order to maintain close communication with Go+ developers and industry practitioners, and jointly promote and communicate the iterative development of Go+, the Go+ development team has planned a series of live courses on "Go+ Open Classes". "Go+ Open Class" will launch wonderful dry goods sharing at a weekly rhythm. Welcome Go+ developers and enthusiasts to continue to pay attention!

Phase 1: Design and implementation of Go+ v1.x (click on the text link on the left to view the content review)
Phase 2: Import process and Go+ module management (Lecturer: Qiniuyun CEO, Go+ language inventor Xu Shiwei)

This article is the content of the second phase of the live broadcast.

The Go+ content construction that we will export to the outside world mainly consists of two parts:

The first category is for a wider range of Go+ users. Go+'s official GitHub has related feature introductions and functional development operation introductions. We are also continuing to organize relevant basic documents and materials for developers to use and learn.

The second category is similar to the content of the "Go+ v1.x design" series. It is aimed at those who want to understand the principles behind this language, and hope to further participate in Go+ development, become a Go+ contributor, and community contributors who contribute value to the community. And some users who want to get a deep experience.

I have a point of view that only after a deep understanding of the principles behind things can we use it better. Therefore, the content of the "Go+ v1.x Design" series will be shared more from the perspective of Inside Go+. In the first issue, I shared the macro architecture of Go+. In this issue, we will combine specific function development to introduce how Go+ implements specific functions.

The content shared in this issue is "import process and Go+ module management". The reason why I chose this topic in the second issue is that the import process and module management are a relatively macro process. Compared with sharing specific syntax implementations of Go+, it is more important to understand it, and it is more helpful for everyone to understand Go+.

The sharing in this issue is mainly divided into two parts:

Go+'s import process
Go+ module management

1. The import process of Go+

In fact, the grammar of Go+ import is basically the same as that of Go. Go’s import grammar has relatively more content. If we expand it in detail, we can have the following content.

In this figure, it includes import Go's standard library, import a user-defined package, and import a Go+ standard library such as import "gop/ast". The red part of import lacal package introduces the grammatical format of the package through a relative path. Because this feature is rarely used in projects, Go+ has not yet been implemented.

Another way of writing is to give an alias to the imported package. There are two special aliases: "_" and ".", among which "_" is more commonly used, and "." is also officially not recommended.

After having the grammar, the next step to understand should be token&scanner and ast&parser. Because the other parts are relatively common, we will mainly focus on the ast of import a package today, which is the abstract syntax tree.

This abstract syntax tree is relatively deep, but in fact the content is relatively basic. The top level is the Package, below the package is the file list, and the next level below is the global declaration list. Global declarations are divided into two categories: one is called function declaration (FuncDecl), and the other is called general declaration (GenDecl).

The general declarations seem to be abstract, mainly including four types of declarations: Import, Type, Var, and Const. The difference between a general declaration and a function declaration is that a function declaration can only declare one function at a time. In a general declaration, you can import multiple packages at the same time, or you can define multiple Type types at once. Although this practice is not very common, feasible. It is more common to declare multiple variables or constants at the same time.

The so-called specification list (Specs) is included under the general declaration. The specification is also an abstract setting. Today we are concerned about ImportSpec, which represents the import of a package. Below ImportSpec is the alias of the package (Name) and the path of the package (Path).

The above is the abstract syntax tree related to the Import package.

After having the abstract syntax tree, it is to compile the abstract syntax tree. As we introduced in the first issue, the main thing done in the compilation process is to convert the abstract syntax tree of Go+ into a function call to the gox DOM Writer component. In gox, the DOM Writer functions related to import a package are as follows:

The first is that there is an Import function under the Package, calling it will get an instance of PkgRef. One of the most important variables in the PkgRef type is Types *types.Package, which contains information about all symbols in the package.

The PkgRef class has two more important methods. One is Ref, which is to quote a certain symbol, pass in the symbol name, and get a reference to the symbol; the second is MarkForceUsed, which is to force import a package.

Importing a package in gox is smart. If the package is not used after import, the reference to the package will not be reflected in the generated code. This is similar to many Go IDEs. If an unused package is imported, the reference will be deleted automatically.

To understand the specific usage of gox DOMWriter, let's look at a specific example:

import "fmt"

func main(){
fmt.Println("Hello world")
}

Here we assume that we want to write a Hello world. First, we import the fmt package, and then enter "Hello world" through fmt.PrintIn. This code is very simple.

For the compiler, it will produce the following sequence of calls to gox:

The first step is NewPackage. Here we assume that what we want to create is a main package. We get the package instance of the main package and assign it to the pkg variable.

The second step is to call pkg.Import, which lazily loads the fmt package and assigns it to the fmt variable.

Next, we define the main function through NewFunc. In the last lecture we created a closure with NewClosure. Closure is a special Func, it doesn't use function name. NewClosure has only three parameters: input, output, and a boolean variable that represents whether it supports variable parameters, while NewFunc has two more parameters than NewClosure, which is the first two parameters above: nil and "main". The first parameter is "reciever", similar to this pointer in other languages, the main function does not use receiver, so it is nil. The second one is easier to understand, is the name of the function.

After NewFunc, call BodyStart to start implementing the function body. In the above example, the function body is a function call. In the last lecture, we introduced that in gox, we write code by way of inverse Polish expressions, that is, the parameter list first, and then the instruction. The function call instruction is Call. So the order of this function call is to pass the function address fmt.Ref("Println") first, then the parameter "Hello world", and then the function call instruction Call. Because this is a function call with 1 parameter, it is Call(1). Finally, we call End to end the function body.

Through this code, we can see that the overall code logic of gox DOM Writer is very intuitive. As long as you understand the reverse Polish expression, the whole logic is very easy to understand.

As we mentioned earlier, in the import process, gox DOM Writer will involve three functions:

(Package).Import(pkgPath string)PkgRef
(*PkgRef).Ref(name string) Ref
(*PkgRef).MarkForceUsed()

We will introduce them one by one in detail.

Among them, the most important point of the (*Package).Import function, which we mentioned earlier, is that the import process is lazily loaded. If the package is not referenced, nothing will happen to this import.

( PkgRef). The Ref function will make a judgment. If the package has not been loaded, it will actually load the package; if it has been loaded, it will directly look up the relevant symbol information (lookup by name). Due to delayed loading, ( PkgRef).Ref may cause multiple packages to be loaded together, which is normal. And from the perspective of performance optimization, we encourage multiple packages, or even load all packages together.

(*PkgRef).MarkForceUsed means to force a package to be loaded. It corresponds to the syntax of import _ "pkgPath" in Go language. In this case, although pkgPath is not used after import, it is still hoped to load the package. In this case, you can call MarkForceUsed to achieve the ability to force loading.

In the Go language, import _ "pkgPath" generally appears more in the plug-in mechanism. The most typical plug-in mechanism in the Go standard library is the image module. Because there are many picture formats to be supported, it is difficult to predict which types of pictures need to be supported. Therefore, the encode and decode of pictures in the Go language are based on the plug-in mechanism, which makes the image system more open.

Earlier we shared the Import syntax, its corresponding abstract syntax tree, the call sequence of the compiler and gox, and the introduction of gox related functions. On the whole, import is relatively simple from the perspective of usage or overall structure understanding. But in fact, what happens inside it is very complicated.

Next, we will introduce in detail what happens during the loading process of the import package of gox, and why we encourage multiple packages to be loaded at the same time.

In fact, in the process of gox import package, the code to load the package is not written by gox itself, but instead calls an extension library written by Go Team-golang.org/x/tools/go/package. There is one in this package Load function can load multiple packages at the same time.

func Load(cfg Config, patterns ...string) ([]Package, error)

The patterns in the Load function are the list of pkgPath to be loaded. The reason why it is called patterns is that it supports wildcards that use "..." to express packages (this is consistent with all go tools, go install, go build, etc. also support package wildcards) . For example, "go/..." means all packages in the Go standard library that begin with "go/", including "go/ast", "go/token", "go/parser" and so on.

The reason for supporting multiple packages to be loaded at the same time is because the basic packages that different packages depend on are similar, and there is a lot of repetitive workload during the loading process, and the current packages.Load function does not have a caching mechanism, so the speed will be very slow. Let's take the fmt package as an example. fmt relies on 9 basic packages, plus it needs to load 10 packages. If another package such as os package is loaded at the same time, it and fmt have a large number of reloaded basic packages. If os and fmt are loaded at the same time, there is no need to load these packages repeatedly. , Thereby greatly improving the loading speed.

The result of Load is a Package list, in which there are two important variables:

Imports map[string]*Package: This variable can be used to build a dependency tree between packages;
Types types.Package: depends on the core information of the package, through this variable can build gox.PkgRef instance.

However, I personally think that the package.Load function has a big problem in its design. This is mainly manifested in:

Overhead of repeated loading

Although it is possible to avoid the problem of repeated loading to a certain extent by loading multiple packages in a single Packages.Load, as mentioned above, there is no optimization between multiple Packages.Load calls, which will cause a lot of repeated loading overhead.

Let us give an example. Suppose we merge Load(nil,"go/token"); Load(nil,"go/ast"); Load(nil,"go/parser") into Load(nil,"go/token","go/ ast"."go/parser"), then the load time of the latter is basically only close to one-third of the former, and one more call is one more overhead.

Moreover, the loading time of packages.Load even reaches the second level, which is very slow. Therefore, this is a big problem that needs to be solved.

Multiple packages.Load results in multiple instances of the same package

The *types.Package instances generated by packages.Load each time are independent, and multiple calls will result in multiple instances of the same package. The result of this problem is that we cannot simply use T1 == T2 to determine whether they are of the same type. This is counter-intuitive. And because of the dependencies between different packages, this counter-intuition will eventually produce very strange results.

For example, if Load(nil,"go/token"); Load(nil,"go/ast"); Load(nil,"go/parser") are called separately, then the first parameter type of parser.ParseDir is token.FileSet type instance constructed by calling Load(nil,"go/token") separately. Although the name is exactly the same, we will fail when we actually do type matching.

So, how to solve these two problems? In Go+, we did make a corresponding solution.

First of all, in order to solve the problem of slow loading, Go+ introduces the cache of package.Load. As long as a package is found to be uncached during loading, it will call package.Load to load it. After loading, it will be cached, and there is no need to load it again next time.

To solve the problem of multiple instances of the same package, Go+ performs a dedup operation on the result of package.Load, that is, deduplication. The specific process is that we scan and reconstruct the results of the second package.Load to ensure that there is only one instance of the same type (see the specific code: gox/dedup.go).

This is a patch-style modification. A more thorough modification is to modify the package.Load itself so that the dependent packages can be shared among multiple loads. I think this method is more scientific, but based on the principle of not adjusting third-party packages as much as possible, Go+ currently uses a "post-processing" process like dedup to solve it.

Let's focus on the mechanism of packages.Load caching.

First, let's take a brief look at the caching process itself, which is very basic. Its general logic is to query whether the package to be loaded has been cached before package.Load, if it has been cached, return the result directly; if it has not been cached, call package.Load first, and then dedup to solve the problem of repeated package instances. Then save it to the cache.

This process is detailed in the func(*LoadPkgCached)load function of gox/import.go.

Of course this is not enough. When the program exits, we also need to persist the cache of all dependent packages. The logic of persistence is to serialize them into json first, and then compress them into zip. This zip compression process is very important. If you don't compress the entire cache, it will be larger. After the final compression, we save the cache as $ModRoot/.gop/gop.cache file.

If you understand the toolchain of the Go language, you will know that Go itself also has a cache similar to package.Load, but its cache is global, unlike Go+, our cache is at the module level. There will be a hidden directory .gop under the root directory of each compiled module, in which cache files are stored.

The specific cache persistence code can be found in: gox/persist.go.

Of course, everyone knows that the cache has the problem of caching. An important issue to be considered for all caches is the update of the cache. We divide this issue into several categories to look at.

First of all, if the dependent package is the Go standard library, because of the local attributes and few people will modify the Go standard library, we can think that the cache will not change in this case.

If the dependent package is not the Go standard library, then the fingerprint of the dependent package needs to be calculated. If the fingerprint changes, it is considered that the dependent package has changed. How to calculate the fingerprint? It contains two situations:

If the dependent package belongs to this Module (code under $ModRoot), then we need to enumerate the files (file list) and calculate the fingerprint based on the last update time of each file. For details of the algorithm, please refer to: func calcFingerp function of gox/import.go;
If the dependent package does not belong to this Module (this function is not implemented yet), you need to read the go.mod file to check the version (tag) of the dependent package. If the version of the two packages.Load does not change, the package is considered unchanged. . Of course, a special case is that we also need to consider the replacement situation. If a package is replaced with native code, it is deemed that the dependent package belongs to the dependency processing within this Module.

Of course, I hope you can try to implement this unrealized function. In the current situation, if you find that the information of the dependent packages when gop is compiled is out of date, the temporary solution is to manually delete the gop cache file (delete instruction: rm $ModRoot/.gop/gop.cache).

2. Go+ module management

There is a very basic but core question about Go+ module management-what is a module?

First, a module is different from a package. A module usually contains one or more packages. My own simple definition of the module is as follows:

Module is the unit of code release
Module is the unit of code version management

The essence of these two definitions is known. The reason is natural, version management is only when there is release. Version management is for the release unit.

Regarding the content of Go+ module management, we also divide it into two parts:

How to import a Go+ package
How to manage Go+ modules

How to import a Go+ package

Think about it, what happens if the pkgPath passed to packages.Load is not a Go package but a Go+ package during the gox import process?

The result is obviously unrecognizable. Our solution is relatively simple, implementing a Go+ version of packages.Load.

Because it is Go+ code, the code is not in gox (gox still focuses on the generation of Go AST), but in the func(*PkgsLoader)Load function in the gop/cl/packages.go file.

The basic logic of this function is as follows:

First call packages.Load to import dependent packages. If there is an error, the error message contains which packages failed to load;
Compile the failed Go+ package into a Go package. We have already introduced how to do this process in the last lecture. Finally, we will generate gop_autogen.go file in the directory where the Go+ package is located;
Recall packages.Load to import the dependent package. Since the go file is written, it is already a valid Go package, and packages.Load is successfully loaded.

The logic of this process here is very similar to the page fault processing of CPU memory management. First try to load, the load failure is similar to page fault interruption, after the interruption, load the page faulted memory (here is to convert the Go+ package to Go package), and then continue to execute (here is to reload the Go+ package). From the point of view of the gox module, it actually doesn't recognize Go+ packages, but it can be loaded. This process is very natural and interesting.

But there may be one last question here. What if the dependent Go+ package has not been downloaded yet?

In Go, packages were downloaded through go get in the early days. At present, the most used method is through go mod tidy to download the modules where all dependent packages are located.

Regarding this issue, our consideration is to implement a function similar to gop mod tidy to realize the automatic download of Go+ packages. This function has not yet been implemented, so you can try it. Its logic is actually very similar to the import Go+ package above.

How to manage Go+ modules

Let's talk about the module management mechanism of Go+. It has two possible options:

Based on Go Module (go.mod) management
Realize Go+ Module (gop.mod) management by yourself

Because Go+ can generate Go files in its own directory to simulate itself as a Go package, it can use Go's toolchain and Go module management mechanism to manage and use Go+. This is a relatively lazy but relatively easy mechanism to implement.

However, the implementation of Go+ Module (gop.mod) management by yourself is somewhat different from the above method. Let's make a detailed comparison.

The current method we use is based on Go-based module management. Its biggest advantage is that it is easy to implement and simple, without doing anything extra, just lying flat. But the disadvantage is that to compile even the simplest Go+ program, you need to reference the Go+ standard library, because there is a special library called buitin, which is a built-in library. Dependence on this library will result in adding a reference to the Go+ standard library to the go.mod files of all Go+ modules, which will make Go+ users feel very inconvenient and prone to various strange problems.

We are considering how to solve this problem thoroughly. The current idea is to implement Go+'s own Module management, and automatically generate go.mod through the gop.mod file. The timing of updating go.mod is relatively simple. We will regenerate go.mod every time the gop.mod file is updated.

So for Go+ modules, the go.mod file does not need to be written in the library, because it is automatically generated. The main addition to the automatic generation is the reference to the Go+ standard library, which is implemented through the replace instruction. With replace, we can always refer to the local Go+ standard library, which is equivalent to an automatic alignment between gop tools and the Go+ standard library.

Earlier we said that various strange problems are prone to occur. The main reason is that we may have updated gop tools to the latest based on the Go Module mechanism, but the Go+ standard library in the go.mod file may be a very old version, which is inconsistent. Sometimes strange problems arise.

The Go Module file is automatically generated through the Go+ Module file, which can achieve seamless collaboration with the Go tool chain, reuse the Go tool chain, and solve the problem of the mismatch between gop tools and the Go+ standard library version.

3. Exercises

First, let's update the status of the practice questions in the first phase of the open class. The Range expression we most expect to be completed has been completed, and the latest Go+ version has already brought this feature. We also introduced this new feature in the Knowledge Planet Go+ public class. Welcome everyone to the Knowledge Planet Go+ public class to learn more.

Basic exercises

1) Solve the update problem of gop cache
issues address:
http://github.com/goplus/gop/issues/891

$ModRoot/.gop/gop.cache currently only perceives updates within this Module, and if the dependency outside the module changes, it cannot be detected correctly. The temporary solution is to manually delete the gop.cache file. Everyone is welcome to solve this problem.

2) Import "pkgPath" problem
issues address:
http://github.com/goplus/gop/issues/881

Strictly speaking, to truly implement import "pkgPath", you need to load the dependent package first to get the alias pkgName of this package. Currently, dependent packages are not loaded, but simply pkgName = path.Base(pkgPath). This is true in most cases, but it is not rigorous.

This problem is relatively simple, and developers who encounter problems can easily bypass it (by manually specifying pkgName), so the priority is not high and it is suitable as a basic exercise.

3) import local package
issues address:
http://github.com/goplus/gop/issues/814

This situation is rarely used in the actual environment, but it is hoped to increase the implementation from the perspective of compatibility. It was also discussed as an exercise in the previous lecture, so I won't expand it here.

Advanced practice

1) Implement gop mod tidy
issues address:
http://github.com/goplus/gop/issues/889

2) Implement Go+ Module (gop.mod)
issues address:
http://github.com/goplus/gop/issues/861

3) Modify packages.Load itself so that dependent packages can be shared between multiple loads
issues address:
http://github.com/goplus/gop/issues/810

We will also help you solve the problems encountered during the training process. Contact as below:

Go+ user group (WeChat group): You can directly ask questions in the group and @我, I will answer directly in the community;
Go+ Open Class (Knowledge Planet): The PPT and text content of this speech will be synchronized in the Knowledge Planet. Welcome to ask questions and exchanges on it.

Xu Shiwei: Import Process and Go+ Module Management丨Go+ Open Class • Second Issue

七牛云

引用和评论

AI for All，Code for All｜七牛云 AI 开源项目扶持计划全面启动