2

图片
Introduction: This article is selected from the Tencent Cloud Developer Community column "Technical Intelligence · Original Collection of Tencent Technicians". This column is a sharing and communication window created by the Tencent Cloud developer community for Tencent technicians and a wide range of developers. The column invites Tencent technicians to share original technical accumulation, and to inspire and grow with a wide range of developers. The author is the author of [Tencent Cloud Developer Community] - Altria Pendragon

About Rust

Rust is a strongly typed, compiled, memory-safe programming language. The earliest version of Rust was originally a private project of an employee of the Mozilla Foundation named Graydon Hoare. Starting in 2009, Mozilla started the development of the sponsors' projects, and in 2010, Rust achieved bootstrapping - using Rust to build Compiler for Rust.
Mozilla applied Rust to build a new generation of browser layout engine Servo - Servo's CSS engine began to integrate into FireFox in 2017.
Rust was originally a memory-safe language, and its original intention was to replace C++ or C to build large-scale low-level projects, such as operating systems, browsers, etc., but because of Mozilla's relationship, the front-end industry also noticed this language. , and applied it in other fields, and its ecology has gradually prospered.

Memory Safety - Rust's Biggest Killer

As we all know, the current mainstream programming languages are generally divided into two categories, one is automatic GC, such as Golang, Java, JavaScript, etc., the other is C++ and C, users need to manage memory manually.
Most languages have similar memory models.
When the code is executed, the values corresponding to the variables are pushed onto the stack in turn. When the code finishes executing a certain scope, the values corresponding to the variables are also popped off the stack. The stack is a first-in, last-out structure that is very consistent. Scope of a programming language - the outermost scope is declared first and then terminated. However, the stack cannot insert values in the middle, so the stack can only store values that will not change once declared and occupying space, such as int, char, or fixed-length arrays, while other values, such as variable-length arrays vector, A variable-length string String cannot be pushed onto the stack.
When the programming language needs a space that is unknown in advance, it will apply to the operating system, the operating system will open up a space, and return the memory address of this space—the pointer to the program, so the programming language successfully stores the data. To the heap, and save the pointer to the stack - because the size of the pointer is fixed, the pointer of a 32-bit program must be 32bit, and the pointer of a 64-bit program must be 64bit.
The data in the stack does not need memory management. With the execution of the code, it is easy to judge whether a variable is still useful - as long as the scope of the variable ends, the value of the variable can no longer be read, then This variable is definitely useless. As long as the scope is declared and ended, it is enough to continuously push and pop the stack to manage the memory of the stack, and the programmer does not need to worry about it.
But the data in the heap will not work, because the program gets only a memory pointer, the actual memory block is not in the stack, and cannot be automatically destroyed with the stack. The program cannot automatically clean up the space corresponding to the pointer when the memory pointer variable in the stack is destroyed - because there may be pointers saved by multiple variables that point to the same memory block, and clearing this memory block at this time will lead to unexpected outside the situation.
图片
Based on this, some programs come with a set of very complex GC algorithms, such as counting how many variables the pointer of a memory block is stored in through reference counting. The pointers are destroyed, and the memory block can be cleaned up here. Some programs need to manage the memory space manually, and any space opened up in the heap must be cleaned up manually.
These two methods have their own advantages and disadvantages. The former causes the program to have a runtime, and the GC algorithm is stored in the runtime, resulting in a larger program size, while the latter becomes memory unsafe, or because the responsibility of memory management falls to the programmer. On the head, the level of the programmer greatly affects the security of the code. Forgetting to recycle will cause the memory occupied by the program to become larger and larger. Recycling errors will lead to the deletion of data that should not be deleted. In addition, there are pointer modifications. When data overflows to other blocks, data that should not be modified is modified, and so on.

Rust, on the other hand, takes a completely new approach to memory management. This method can be simply summarized as: the programmer and the compiler reach a certain agreement, the programmer must write code according to this agreement, and when the programmer writes code according to this agreement, then whether a memory block is still being used , it becomes very clear, so clear that you don't need the program to run, you can know it at the compilation stage, then the compiler can insert the code of memory recovery into a specific location of the code to achieve memory recovery. In other words, Rust essentially circumvents those situations where it is difficult to judge whether a block of addresses is still in use by restricting the use of references. The remaining situations are easy to judge and simple enough to not need. Professional programmers only need a compiler to make a good judgment.

One of the big benefits of this is:
There is no need for GC algorithm and runtime, and it is still manually recycled in essence, but the compiler inserts the manually recycled code, and the programmer does not need to write it by himself. As long as the compilation can pass, then it must be memory safe.

(1) Implementation principle

Rust's memory safety mechanism can be said to be original. It has a very simple and easy-to-understand mechanism called the ownership system, which involves two core concepts, ownership and borrowing.

(2) Ownership

Any value, including pointers, must be bound to a variable, then, we say that the variable has the ownership of the value, such as the following code, the variable str has the ownership of "hello".
let str = "hello"
When the scope in which str is located ends, the value of str is cleaned up and str is no longer valid. This is consistent with almost all mainstream languages, and there is no problem. Also very understandable. But note that Rust itself distinguishes between variable-length strings and immutable-length strings. The above is an immutable-length string, because its length is immutable and can be stored in the stack, so the following code can be executed correctly, like almost every other mainstream language:

 let str = "hello world";
let str2 = str;
println!("{}", str);
println!("{}", str2);

But if we introduce a variable-length string stored on the heap, let's look at the same code again:

 fn main() {
  let str = String::from("hello world");
  let str2 = str;

  println!("{}", str);
  println!("{}", str2);
}

At this point, we will be surprised to find that the code reports an error. why?
The reason is that in the first piece of code, the value of the variable str is stored in the stack, and the variable str has the string of hello world itself. So if str2=str, then it is equivalent to creating a str2 variable, which also has such a string of identical strings, what happens here is "memory copy". Each of the two variables has the ownership of the value of hello world, but the hello world of the two is not the same hello world.
In the second piece of code, the str we get is essentially just an address pointing to a certain memory block, and this address, when we add str2=str, is actually assigning the value of this address For str2, if it is in other languages, it is no problem to write it in this way, but str and str2 will point to the same memory address. When str is modified, str2 also changes.
But in rust, the same value can only be bound to the same variable, or a variable has ownership of the value, just like a thing can only belong to the same person at the same time! When str2=str, the address value saved by str no longer belongs to str, it belongs to str2, which is called [ownership transfer]. So str is invalid, we use an invalid value, then an error will be reported naturally.
Any of the following situations can lead to a transfer of ownership:
The assignment operation mentioned above:

 let str = String::from("hello world"); let str2=str; //str失去所有权!

Pass a value into another scope, such as a function:

 let str=String::from("hello world"); some_func(str); //此时str失效。

1 In this way, we can easily find that for the same memory block address, it can only be stored in one variable at the same time. If this variable goes out of scope, so that the variable cannot be read, then the memory address is It is destined to never be accessed, so this memory block can be released. This judgment process is very simple, and can be completely implemented by the compiler in the static checking phase. So rust can easily achieve memory safety. However, the above way of writing is very anti-human, which does solve the problem of memory safety, but it is not easy to use. For example, I need to pass str into a method to do some logical operations. After the operation, I also hope that I can read this str, such as the following code: fn main() {
let mut str1 = String::from("hello world"); // The mut here just marks the variable as a mutable variable, not a constant.

 add_str(mut str1, "!!!");

 println!("{}", str1);
}

fn add_str(str_1: String, str_2: &str) {
 str_1.push_str(str_2);
}

We want to operate on str, add three exclamation marks after it and print it out. This code is definitely wrong, because when str is passed to the add_str method, the ownership is transferred to the variable str_1 in the add_str method, which no longer has Ownership, so it cannot be used. This situation is actually very common. The pure ownership mechanism complicates this problem, so rust also has a mechanism to solve the following problems: [reference and borrow].

borrow

Although a value can only have its own ownership by one variable, just like people can lend their own things to other people and lend them to different people, variables can also lend their own values. The above code Slight modification:

 fn main() {
  let mut str1 = String::from("hello world");

  add_str(&mut str1, "!!!");

  println!("{}", str1);
}

fn add_str(str_1: &mut String, str_2: &str) {
  str_1.push_str(str_2);
}

What add_str passes in is no longer mut str, but &mut str1, which is equivalent to borrowing this data from mut str1 to use, but the actual ownership is still on str1, and the recovery condition of the memory block is still [The scope where str1 is located is executed, and the memory address saved by str1 is popped off the stack and destroyed].
The essence of these two mechanisms is that the reference count for a piece of memory becomes extremely simple. As long as the variable corresponding to this memory address is in the heap, the reference count is 1, otherwise it is 0. There are only these two cases. There is absolutely no situation where multiple variables point to the same memory address, which greatly reduces the complexity of the reference counting GC algorithm. Reduced to the point where a complex runtime is not required, the static checking phase can get all the occasions when a GC is needed and perform a GC.

Other features of Rust

As a very young programming language, rust has many features common to new languages, and is somewhat similar to a mix of Golang, ts, and higher versions of C++ in terms of features. For example:
No inheritance, only composition, similar to Golang. The subtype brought by inheritance will bring mathematical undecidability, that is, there is a possibility that a piece of code containing subtypes can be constructed, and type deduction and type checking cannot be performed on it, because the type is undecidable, which is manifested in engineering , that is, the compiler falls into dead recursion when the type is pushed and cannot stop. At the same time, multiple layers of inheritance also make code difficult to maintain, and more and more new languages abandon inheritance.
There is a useful package manager cargo, which can easily manage various dependencies. Dependencies exist in isolation between projects, rather than being unified together. This is similar to nodejs, and golang is also promoting isolation between dependent projects. The dependencies installed in the project are written in cargo.toml, and there is cargo.lock, which locks the dependencies to a specific version (almost the same as npm).
A large number of advanced language features: pattern matching, no null but Option (anywhere that may report an error or return a null pointer, an Option enumeration can be returned, based on pattern matching to match success and failure, null is no longer for development. exposed), native asynchronous programming support, and more.

Impact on the front end?

Rust plus some of the above features make it a perfect replacement for C++. At present, there are two directions for using Rust in the front-end field. One is to use Rust to create higher-performance front-end tools, and the other is to use Rust as a programming language to compile WASM modules that can run in browsers.

(1) High-performance tools

In the past, if you wanted to be a high-performance tool in the front-end field, the only option was gyp, which was written in C++ and compiled into an API that can be called by nodejs through gyp. Saas-loader and other well-known libraries are all implemented in this way. But in more cases, most of the front-end tools do not care about performance at all, and are written directly in js, such as Babel, ESLint, webpack, etc., a large part of the reason is that C++ is not very easy to get started, just dozens of A version of C++ features is enough for people to spend a lot of time to learn. After learning, a lot of development experience is required to learn how to better manage memory and avoid memory leaks. Rust is different, it is young enough, there are no dozens of versions of the standard, there is a package manager as modern as npm, and more importantly, it does not leak memory, which makes even C++ even if the history of rust is not long. Can write Nodejs extensions, but there are still a lot of high-performance tools written in Rust in the front-end field. for example:
swc is written in Rust and encapsulates the Nodejs API. Its function is similar to Babel's JS polyfill library, but with the blessing of Rust, its performance can reach 40 times that of Babel.
Rome is also based on Rust, and its author is Sebastian, the author of Babel.
Rome covers tools for compilation, code inspection, formatting, packaging, testing frameworks, and more. It aims to be a comprehensive tool for working with JavaScript source code.
RSLint, a JS code lint tool written in Rust, aims to replace ESLint. As the front end becomes more and more complex, we will gradually pursue a toolchain with better performance. Maybe in a few years we will see projects using the official version of swc and Rome running in the production environment.

(2) WASM

In addition, after having WASM, the front end is also looking for a language that perfectly supports WASM. At present, it is likely to be Rust. For WASM, the language with runtime is unacceptable, because the language with runtime, after being packaged into WASM, not only contains the business code written by ourselves, but also the runtime code, which contains GC and other logic, which greatly increases the package size and is not conducive to user experience. After removing the language with runtime, the range of front-end choices is not large. In C++ and Rust, the advantages of Rust make the front-end community more willing to choose Rust. At the same time, Rust also provides good support in this regard. Rust's official compiler supports compiling Rust code into WASM code, coupled with the out-of-the-box tool wasm-pack, which makes the front end very fast Build the wasm module. Here is a simple demonstration. The following string of code is what I dug out from the swc mentioned above:

 #![deny(warnings)]
#![allow(clippy::unused_unit)]

// 引用其他的包或者标准库、外部库
use std::sync::Arc;

use anyhow::{Context, Error};
use once_cell::sync::Lazy;
use swc::{
    config::{ErrorFormat, JsMinifyOptions, Options, ParseOptions, SourceMapsConfig},
    try_with_handler, Compiler,
};
use swc_common::{comments::Comments, FileName, FilePathMapping, SourceMap};
use swc_ecmascript::ast::{EsVersion, Program};

// 引入wasm相关的库
use wasm_bindgen::prelude::*;

// 使用wasm_bindgen宏,这里的意思是,下面这个方法编译成wasm之后,方法名是transformSync,
// TS的类型是transformSync
#[wasm_bindgen(
    js_name = "transformSync",
    typescript_type = "transformSync",
    skip_typescript
)]
#[allow(unused_variables)]
// 定义一个可以方法,总共方法由于是pub的,因此可以被外部调用。这个方法的目的是:将高版本JS转义成低版本JS
// 具体的内部逻辑我们完全不去管。
pub fn transform_sync(
    s: &str,
    opts: JsValue,
    experimental_plugin_bytes_resolver: JsValue,
) -> Result<JsValue, JsValue> {
    console_error_panic_hook::set_once();

    let c = compiler();

    #[cfg(feature = "plugin")]
    {
        if experimental_plugin_bytes_resolver.is_object() {
            use js_sys::{Array, Object, Uint8Array};
            use wasm_bindgen::JsCast;

            // TODO: This is probably very inefficient, including each transform
            // deserializes plugin bytes.
            let plugin_bytes_resolver_object: Object = experimental_plugin_bytes_resolver
                .try_into()
                .expect("Resolver should be a js object");

            swc_plugin_runner::cache::init_plugin_module_cache_once();

            let entries = Object::entries(&plugin_bytes_resolver_object);
            for entry in entries.iter() {
                let entry: Array = entry
                    .try_into()
                    .expect("Resolver object missing either key or value");
                let name: String = entry
                    .get(0)
                    .as_string()
                    .expect("Resolver key should be a string");
                let buffer = entry.get(1);

                //https://github.com/rustwasm/wasm-bindgen/issues/2017#issue-573013044
                //We may use https://github.com/cloudflare/serde-wasm-bindgen instead later
                let data = if JsCast::is_instance_of::<Uint8Array>(&buffer) {
                    JsValue::from(Array::from(&buffer))
                } else {
                    buffer
                };

                let bytes: Vec<u8> = data
                    .into_serde()
                    .expect("Could not read byte from plugin resolver");

                // In here we 'inject' externally loaded bytes into the cache, so
                // remaining plugin_runner execution path works as much as
                // similar between embedded runtime.
                swc_plugin_runner::cache::PLUGIN_MODULE_CACHE.store_once(&name, bytes);
            }
        }
    }

    let opts: Options = opts
        .into_serde()
        .context("failed to parse options")
        .map_err(|e| convert_err(e, ErrorFormat::Normal))?;

    let error_format = opts.experimental.error_format.unwrap_or_default();

    try_with_handler(
        c.cm.clone(),
        swc::HandlerOpts {
            ..Default::default()
        },
        |handler| {
            c.run(|| {
                let fm = c.cm.new_source_file(
                    if opts.filename.is_empty() {
                        FileName::Anon
                    } else {
                        FileName::Real(opts.filename.clone().into())
                    },
                    s.into(),
                );
                let out = c
                    .process_js_file(fm, handler, &opts)
                    .context("failed to process input file")?;

                JsValue::from_serde(&out).context("failed to serialize json")
            })
        },
    )
    .map_err(|e| convert_err(e, error_format))
}

The special feature of the code is that such derivation is added to some methods. The so-called derivation means that as long as we add this piece of code, the compiler will help us implement the agreed logic:

 #[wasm_bindgen(
    js_name = "transformSync",
    typescript_type = "transformSync",
    skip_typescript
)]

When this derivation is added, the compiler will compile the following functions into binary WASM functions for JS to call.
We use wasm-pack to compile and package the code:

 wasm-pack build --scope swc -t nodejs --features plugin

Get these documents:

 ├── binding_core_wasm.d.ts
├── binding_core_wasm.js
├── binding_core_wasm_bg.js
├── binding_core_wasm_bg.wasm
├── binding_core_wasm_bg.wasm.d.ts
└── package.json

Then you can call it in JS:

 // index.js
let settings = {
    jsc: {
        target: "es2016",
        parser: {
            syntax: "ecmascript",
            jsx: true,
            dynamicImport: false,
            numericSeparator: false,
            privateMethod: false,
            functionBind: false,
            exportDefaultFrom: false,
            exportNamespaceFrom: false,
            decorators: false,
            decoratorsBeforeExport: false,
            topLevelAwait: false,
            importMeta: false,
        },
    },
};

let code = `
let a = 1;
let b = {
    c: {
        d: 1
    }

};
console.log(b?.c?.d);

let MyComponent = () => {
    return (<div a={10}>
        <p>Hello World!</p>
    </div>);
}
`;


const wasm = require('./pkg/binding_core_wasm');
console.log(wasm.transformSync(code, settings))

It can be seen that as long as a Rust library already exists, it is very simple to convert it into WASM. Readers can also toss the WASM of Golong and C++, and they will find that the whole tossing process of Rust is much simpler than that of Golang and C++. .

Is there any problem?

Although I said a lot of Rust's good things above, I was a little bit discouraged when I was learning Rust. One of the big reasons is that Rust is too maverick.
To give a very simple example, in general programming languages, variables and constants are declared in different ways. For example, javascript distinguishes between let and const, and go distinguishes between const and var, or they are declared as variables by default, and constants require additional Declaration, for example, a variable declared in Java will be a constant by adding final in front of it, while Rust is very special. The default declaration is a constant, but the variable needs to be declared additionally. Let a=1 gets a constant, and let mut a=1 is a variable.
As mentioned above, Rust has a lot of special points. Although most of them are just different design concepts, there is no difference between superior and inferior, but such design does bring some mental burden to developers of other languages.
From my learning experience, Rust itself is not difficult to learn. It is not necessarily easier to learn than C++. There are also people in the community who want to learn Rust well and have to learn C++ first, otherwise they will not fully understand the elegance of Rust. Students who want to learn Rust may need to do some psychological preparation.


腾讯云开发者
21.9k 声望17.3k 粉丝