2
头图

Many languages have resource embedding solutions. In Golang, the open source solutions related to resource embedding are even more contentious. There are many usage schemes for Golang resource embedding on the Internet, but few people analyze the principle, and compare the performance of the native implementation and the open source implementation, which is suitable for scene analysis.

So this article is going to talk about this topic, and the right to throw a brick to attract jade.

write in front

No matter which language it is, there will always be some reasons why we need to embed static resources into the language compilation results. Golang is no exception, but before the official resource embedding function " was proposed in December 2019, there were already many projects in the Golang ecosystem that could provide this required function, until Golang 1.16 released , The resource embedding function is officially supported.

Today, more and more articles and even open source projects that have implemented the resource embedding function have recommended using the official go embed command to implement the function. Perhaps we should have a more objective understanding of the similarities and differences between "language native functions" and three-party implementations, as well as the objective gap in performance of technical solutions in the Go language ecosystem that pursues performance.

In the following articles, I will introduce some similar projects that have been famous or widely used on GitHub for a long time, such as packr (3.3k stars), statik (3.4k stars), go.rice (2.3k stars), go-bindata (1.5k stars) stars), vsfgen (1k stars), esc (0.6k stars), fileb0x (0.6k stars)...

In this article, we first take the official native function go embed instruction as the entry point, as the standard reference system, to talk about the principle, basic usage, and performance.

Let's talk about the principle first.

Go Embed Principle

Reading the source code of the latest Golang 1.17, ignoring some parts related to command line parameter processing, it is not difficult to find that the main code implementation related to Embed is mainly in the following four files:

embed/embed.go

embed.go mainly provides the relevant declarations and function definitions of the embed function at runtime (the interface implementation of FS), and provides the description part in go doc

The FS interface implementation is critical for accessing and manipulating files through the file system, such as when you want to use standard FS functions to perform "CRUD" operations on files.

// lookup returns the named file, or nil if it is not present.
func (f FS) lookup(name string) *file {
    ...
}

// readDir returns the list of files corresponding to the directory dir.
func (f FS) readDir(dir string) []file {
    ...
}


func (f FS) Open(name string) (fs.File, error) {
    ...}

// ReadDir reads and returns the entire named directory.
func (f FS) ReadDir(name string) ([]fs.DirEntry, error) {
    ...
}

// ReadFile reads and returns the content of the named file.
func (f FS) ReadFile(name string) ([]byte, error) {
    ...
}

By reading the code, it is not difficult to see that the file is set to read-only in go embed, but if you want, you can implement a readable and writable file system, which we will mention in a later article.

func (f *file) Mode() fs.FileMode {
    if f.IsDir() {
        return fs.ModeDir | 0555
    }
    return 0444
}

In addition to being able to directly manipulate files through FS-related functions, we can also mount "embed fs" into Go HTTP Server or the corresponding file processing function of any Go web framework you like to achieve Nginx-like static resource server.

go/build/read.go

If the former provides the go:embed when we write the code, which is relatively "virtual", then build/read.go provides more realistic analysis and verification processing before the program compilation stage.

This program mainly parses the go:embed instruction content written in the program, and handles the validity of the content, and performs specific logic processing for the content (variables, files) that need to be embedded. There are two key functions:

func readGoInfo(f io.Reader, info *fileInfo) error {
  ...
}

func parseGoEmbed(args string, pos token.Position) ([]fileEmbed, error) {
  ...
}

The function readGoInfo is responsible for reading our code file *.go , finds go:embed in the code, and then passes the line number of the corresponding file containing this content to the parseGoEmbed function related to the file path in the instruction into a specific file or document list.

If the file resource path is a specific file, then save the file to the list of files to be processed. If it is a directory or go:embed image/* template/* , then other calling functions will scan the content as a glob and save the file to the list of pending files.

These contents will eventually be saved in the fileInfo structure associated with each program file, and then wait for the use of go/build/build.go and other related compilers.

// fileInfo records information learned about a file included in a build.
type fileInfo struct {
    name     string // full name including dir
    header   []byte
    fset     *token.FileSet
    parsed   *ast.File
    parseErr error
    imports  []fileImport
    embeds   []fileEmbed
    embedErr error
}

type fileImport struct {
    path string
    pos  token.Pos
    doc  *ast.CommentGroup
}

type fileEmbed struct {
    pattern string
    pos     token.Position
}

compile/internal/noder/noder.go

Compared with the first two programs, noder.go the heaviest work, and is responsible for the final analysis and content association and saves the results in the form of IR, waiting for the final compilation program to process. In addition, it is also responsible cgo related programs (which can be regarded as some form of embedding).

Here it is also the read.go , it will do some checksum judgment work, such as judging whether the user-embedded resources are really used, or the user uses the embed object and the functions below it, but forgets to declare the go:embed instruction Yes, if these unexpected events are found, stop the program in time to avoid entering the compilation phase and wasting time.

The relatively core functions are:

func parseGoEmbed(args string) ([]string, error) {
 ...
}

func varEmbed(makeXPos func(syntax.Pos) src.XPos, name *ir.Name, decl *syntax.VarDecl, pragma *pragmas, haveEmbed bool) {
 ...
}

func checkEmbed(decl *syntax.VarDecl, haveEmbed, withinFunc bool) error {
 ...
}

In the above function, the go:embed instruction we declared in the file will be associated with the static resources in the actual program directory in the form of IR. It can be simply understood that the variables in the context of the go:embed

cmd/compile/internal/gc/main.go

After the processing of the above programs, the file will eventually come to the compiler, and the func Main(archInit func(*ssagen.ArchInfo)) {} to write the static resources directly to the disk (attached to the file):

    // Write object data to disk.
    base.Timer.Start("be", "dumpobj")
    dumpdata()
    base.Ctxt.NumberSyms()
    dumpobj()
    if base.Flag.AsmHdr != "" {
        dumpasmhdr()
    }

In the process of file writing, we can see that for embedded static resources, the writing process is very simple (the implementation part is in src/cmd/compile/internal/gc/obj.go ):

func dumpembeds() {
    for _, v := range typecheck.Target.Embeds {
        staticdata.WriteEmbed(v)
    }
}

At this point, we are clear about the principle and process of Golang resource embedding, what capabilities the official resource embedding function has, and what capabilities it lacks (compared to other open source implementations). Then, I will expand one by one in subsequent articles.

Basic use

We need to talk about the basic use of embed first. On the one hand, it is to take care of students who have not used the embed function, and on the other hand, it is to establish a standard reference system to make objective evaluations for subsequent performance comparisons.

For the convenience and intuition of testing, in this article and subsequent articles, we give priority to implementing a static resource server that can perform performance testing and can provide Web services, where static resources come from "embedded resources".

Step 1: Prepare test resources

When it comes to the resource embedding function, we naturally need to find suitable resources. Because it does not involve the processing of specific file types, we only need to pay attention to the file size here. I found two files exposed on the web as embedded objects.

If you want to try it out for yourself, you can use the link above to get the same testing resource. After downloading the file, we can place the resources in the assets folder in the same directory of the program.

Step 2: Write the basic program

First initialize an empty project:

mkdir basic && cd basic
go mod init solution-embed

To be fair, we first use the test code in the official Go repository as the base template.

// Copyright 2021 The Go Authors. All rights reserved.
// Use of this source code is governed by a BSD-style
// license that can be found in the LICENSE file.

package embed_test

import (
    "embed"
    "log"
    "net/http"
)

//go:embed internal/embedtest/testdata/*.txt
var content embed.FS

func Example() {
    mutex := http.NewServeMux()
    mutex.Handle("/", http.FileServer(http.FS(content)))
    err := http.ListenAndServe(":8080", mutex)
    if err != nil {
        log.Fatal(err)
    }
}

After a simple adjustment, we can get a program that embeds resources in the assets

package main

import (
    "embed"
    "log"
    "net/http"
)

//go:embed assets
var assets embed.FS

func main() {
    mutex := http.NewServeMux()
    mutex.Handle("/", http.FileServer(http.FS(assets)))
    err := http.ListenAndServe(":8080", mutex)
    if err != nil {
        log.Fatal(err)
    }
}

Then we start the program, or compiler, you can localhost:8080 visit our static resource files in the directory, for example: http://localhost:8080/assets/example.txt .

This part of the code, you can get it in https://github.com/soulteary/awesome-golang-embed/tree/main/go-embed-official/basic .

test preparation

Before talking about performance, we first need to transform the program so that the program can be tested and given clear performance indicators.

Step 1: Perfect Testability

The above code is written in the same main function because it is simple enough. In order to be able to be tested, we need to make some simple adjustments, such as splitting the registration route part and the startup service part.

package main

import (
    "embed"
    "log"
    "net/http"
)

//go:embed assets
var assets embed.FS

func registerRoute() *http.ServeMux {
    mutex := http.NewServeMux()
    mutex.Handle("/", http.FileServer(http.FS(assets)))
    return mutex
}

func main() {
    mutex := registerRoute()
    err := http.ListenAndServe(":8080", mutex)
    if err != nil {
        log.Fatal(err)
    }
}

In order to simplify the writing of test code, here we use an open source assertion library testify , and install it first:

go get -u  github.com/stretchr/testify/assert

Then write the test code:

package main

import (
    "net/http"
    "net/http/httptest"
    "testing"

    "github.com/stretchr/testify/assert"
)

func TestStaticRoute(t *testing.T) {
    router := registerRoute()

    w := httptest.NewRecorder()
    req, _ := http.NewRequest("GET", "/assets/example.txt", nil)
    router.ServeHTTP(w, req)

    assert.Equal(t, http.StatusOK, w.Code)
    assert.Equal(t, "@soulteary: Hello World", w.Body.String())
}

After the code is written, we execute go test , no surprise, we will be able to see results similar to the following:

# go test

PASS
ok      solution-embed    0.219s

In addition to verifying that the function is normal, some additional operations can be added here to conduct a rough performance test, such as testing the time required to obtain resources through HTTP 100,000 times:

func TestRepeatRequest(t *testing.T) {
    router := registerRoute()

    passed := true
    for i := 0; i < 100000; i++ {
        w := httptest.NewRecorder()
        req, _ := http.NewRequest("GET", "/assets/example.txt", nil)
        router.ServeHTTP(w, req)

        if w.Code != 200 {
            passed = false
        }
    }

    assert.Equal(t, true, passed)
}

You can get this part of the code from https://github.com/soulteary/awesome-golang-embed/tree/main/go-embed-official/testable .

Step 2: Add Performance Probes

In the past, for black-box programs, we could only use monitoring and comparison before and after to obtain specific performance data. When we have the ability to customize the program, we can directly use the profiler program to collect performance indicators during program operation. .

With pprof , we can quickly add several performance-related interfaces to the Web service of the above code. Most articles will tell you that you can refer to pprof , but it is not. Because of reading the code ( https://cs.opensource.google/go/go/+/refs/tags/go1.17.6:src/net/http/pprof/pprof.go ), we know that pprof's "performance "Monitor interface auto-registration" capability is only valid for the default http service, not for the multiplexed ( mux ) http service:

func init() {
    http.HandleFunc("/debug/pprof/", Index)
    http.HandleFunc("/debug/pprof/cmdline", Cmdline)
    http.HandleFunc("/debug/pprof/profile", Profile)
    http.HandleFunc("/debug/pprof/symbol", Symbol)
    http.HandleFunc("/debug/pprof/trace", Trace)
}

So in order for pprof take effect, we need to manually register these performance indicator interfaces, and adjust the above code to get a program similar to the following.

package main

import (
    "embed"
    "log"
    "net/http"
    "net/http/pprof"
    "runtime"
)

//go:embed assets
var assets embed.FS

func registerRoute() *http.ServeMux {

    mutex := http.NewServeMux()
    mutex.Handle("/", http.FileServer(http.FS(assets)))
    return mutex
}

func enableProf(mutex *http.ServeMux) {
    runtime.GOMAXPROCS(2)
    runtime.SetMutexProfileFraction(1)
    runtime.SetBlockProfileRate(1)

    mutex.HandleFunc("/debug/pprof/", pprof.Index)
    mutex.HandleFunc("/debug/pprof/cmdline", pprof.Cmdline)
    mutex.HandleFunc("/debug/pprof/profile", pprof.Profile)
    mutex.HandleFunc("/debug/pprof/symbol", pprof.Symbol)
    mutex.HandleFunc("/debug/pprof/trace", pprof.Trace)
}

func main() {
    mutex := registerRoute()
    enableProf(mutex)

    err := http.ListenAndServe(":8080", mutex)
    if err != nil {
        log.Fatal(err)
    }
}

After running or compiling the program again, visit http://localhost:8080/debug/pprof/ , you will be able to see an interface similar to the following.

Go PPROF Web 界面

This part of the relevant code can be seen in https://github.com/soulteary/awesome-golang-embed/tree/main/go-embed-official/profiler .

Performance testing (to establish benchmarks)

Here I choose to use two methods for performance testing: the first is based on the sampled data of test cases, and the second is based on the throughput capacity of the interface stress test of the built program.

I have uploaded the relevant code to https://github.com/soulteary/awesome-golang-embed/tree/main/go-embed-official/benchmark , you can get it for your own experiments.

Test case-based performance sampling

We simply adjust the default test program to make a large number of repeated requests (1000 small file reads and 100 large file reads) for the two resources we prepared in the previous section.

func TestSmallFileRepeatRequest(t *testing.T) {
    router := registerRoute()

    passed := true
    for i := 0; i < 1000; i++ {
        w := httptest.NewRecorder()
        req, _ := http.NewRequest("GET", "/assets/vue.min.js", nil)
        router.ServeHTTP(w, req)

        if w.Code != 200 {
            passed = false
        }
    }

    assert.Equal(t, true, passed)
}

func TestLargeFileRepeatRequest(t *testing.T) {
    router := registerRoute()

    passed := true
    for i := 0; i < 100; i++ {
        w := httptest.NewRecorder()
        req, _ := http.NewRequest("GET", "/assets/chip.jpg", nil)
        router.ServeHTTP(w, req)

        if w.Code != 200 {
            passed = false
        }
    }

    assert.Equal(t, true, passed)
}

Next, write a script to help us obtain the resource consumption of different volume files respectively.

#!/bin/bash

go test -run=TestSmallFileRepeatRequest -benchmem -memprofile mem-small.out -cpuprofile cpu-small.out -v
go test -run=TestLargeFileRepeatRequest -benchmem -memprofile mem-large.out -cpuprofile cpu-large.out -v

After execution, you can see output similar to the following:

=== RUN   TestSmallFileRepeatRequest
--- PASS: TestSmallFileRepeatRequest (0.04s)
PASS
ok      solution-embed    0.813s
=== RUN   TestLargeFileRepeatRequest
--- PASS: TestLargeFileRepeatRequest (1.14s)
PASS
ok      solution-embed    1.331s
=== RUN   TestStaticRoute
--- PASS: TestStaticRoute (0.00s)
=== RUN   TestSmallFileRepeatRequest
--- PASS: TestSmallFileRepeatRequest (0.04s)
=== RUN   TestLargeFileRepeatRequest
--- PASS: TestLargeFileRepeatRequest (1.12s)
PASS
ok      solution-embed    1.509s

Performance of embedding large files

Use go tool pprof -http=:8090 cpu-large.out to visualize the call and resource consumption of the program execution process. After executing the command, open http://localhost:8090/ui/ in the browser, and you can see a call graph similar to the following:

嵌入大文件资源使用状况

In the call graph above, we can see that the initiator of the last hop runtime.memmove (30.22%) embed(*openFile) Read (5.04%) . Obtaining the nearly 20m resources we want from the embedded resources only took over 5% of the total time. The rest of the computation is focused on data exchange, automatic extension of go data length, and data recycling.

读取嵌入资源以及相对耗时的调用状况

Similarly, using go tool pprof -http=:8090 mem-large.out , let's check the memory usage:

读取嵌入资源内存消耗状况

It can be seen that after 100 calls, a total of more than 6300 MB of space has been used in memory, which is equivalent to 360 times the consumption of our original resources. On average, we need to pay 3.6 times the resources of the original file for each request. .

Resource usage for embedding small files

After reading the large file, let's take a look at the resource usage of the small file. Because go tool pprof -http=:8090 cpu-small.out , there is no embed in the call graph (resource consumption can be ignored), so we skip the CPU call and directly look at the memory usage.

读取嵌入资源(小文件)内存消耗状况

Before the final output to the user, io copyBuffer will be about 1.7 times that of our resources, which should be due to the gc recycling function. When finally outputs data to the user, the resource usage will be reduced to 1.4 times, compared to Large volume of resources, a lot of .

Throughput testing with Wrk

We first execute go build main.go to obtain the built program, then execute ./main start the service, and then test the throughput of small files:

# wrk -t16 -c 100 -d 30s http://localhost:8080/assets/vue.min.js

Running 30s test @ http://localhost:8080/assets/vue.min.js
  16 threads and 100 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     4.29ms    2.64ms  49.65ms   71.59%
    Req/Sec     1.44k   164.08     1.83k    75.85%
  688578 requests in 30.02s, 60.47GB read
Requests/sec:  22938.19
Transfer/sec:      2.01GB

Without any code optimization, Go uses embedded small resources to provide services, which can handle about 20,000 requests per second. Then look at the throughput for large files:

# wrk -t16 -c 100 -d 30s http://localhost:8080/assets/chip.jpg 

Running 30s test @ http://localhost:8080/assets/chip.jpg
  16 threads and 100 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   332.75ms  136.54ms   1.32s    80.92%
    Req/Sec    18.75      9.42    60.00     56.33%
  8690 requests in 30.10s, 144.51GB read
Requests/sec:    288.71
Transfer/sec:      4.80GB

Because the file size becomes larger, although the request volume seems to be reduced, the data throughput per second has more than doubled. The total data download has more than tripled compared to the smaller problem, from 60GB to 144GB.

At last

At this point, all the things to be discussed in this article are finished. In the following content, I will explain the similarities and differences between various open source implementations and the official implementation in this article, as well as reveal the differences in performance.

-- EOF


We have a small tossing group, which gathers hundreds of friends who like tossing.

Without advertising, we will chat about software and hardware, HomeLab, and programming issues together, and we will also share some technical salon information in the group from time to time.

Friends who like tossing are welcome to scan the code to add friends. (Add friends, please note your real name, indicate the source and purpose, otherwise it will not pass the review)

Those things about tossing the group into the group


If you think the content is still useful, please like and share it with your friends, thank you here.


This article uses the "Signature 4.0 International (CC BY 4.0)" license agreement, welcome to reprint, or re-modify for use, but you need to indicate the source. Attribution 4.0 International (CC BY 4.0)

Author of this article: Su Yang

Created: January 15, 2022
Statistical words: 7122 words
Reading time: 15 minutes to read
Link to this article: https://soulteary.com/2022/01/15/explain-the-golang-resource-embedding-solution-part-1.html


soulteary
191 声望7 粉丝

折腾硬核技术,分享实用内容。