The application, opportunities and challenges of RISC-V in blockchain intelligence and cloud native

This article is reproduced from: Cloak (ID: wearecloakman)

Blockchain, like the Internet around 2000, is stepping into each of us's lives.

As one of the cores of the entire blockchain technology, the design of smart contracts/virtual machines is playing an increasingly important role in the process of promoting blockchain innovation. From this point of view, the design of the virtual machine also presents a blossoming posture.

Based on the understanding and reflection of the smart contract layer and the virtual machine, Secret Ape Technology created the virtual machine 16152773274525 CKB-VM based on the RISC-V hardware instruction set. In this sharing, we will introduce the reason why we chose RISC-V to build virtual machines, and demonstrate the unprecedented flexibility that RISC-V brings to our blockchain landing and innovation.

To date, virtual machine on the market that can directly deploy cryptographic algorithms in smart contracts. 1615277327454b. No other blockchain virtual machine layer has the ability to achieve similar capabilities to CKB-VM. .

At the same time, we believe that CKB-VM is not only applicable in the blockchain field. Today, when the chip is gradually fragmented, CKB-VM can provide cloud application developers with a stable instruction set, which can be implemented through low-level optimization. Run the code on more architectures to realize the true vision write once, run anywhere

share outline:

1. Discuss the demand for smart contracts in the

2. Introduce the core design of CKB-VM and how we can solve the actual problems encountered by CKB-VM by introducing the RISC-V instruction set

3. Explore the equally broad application of CKB-VM in the cloud-native field outside the

4. Review the challenges and coping methods encountered in the implementation of CKB-VM, and show future work plans

The dilemma of existing blockchain virtual machines and how to use RISC-V to solve them

Simulating another computer on a general-purpose computer platform has a long history. We usually call this translation software between the source ISA and the target ISA a virtual machine.

Since the beginning of Ethereum, the smart contract added to the blockchain system represents an important stage in the transformation of the blockchain from a single public ledger to the role of financial service applications.

In order to support intelligent business contracts, we expect the underlying virtual machines safe enough, this not only means virtual machine itself, security , also means Virtual Machine byte code itself is easy to audit and ease of static analysis , At the same time, in addition to security, performance is also an important consideration target , which requires that the translation from the source ISA to the target ISA can be done with as little code as possible.

In addition to security and performance, we also hope that virtual machine is a complete , with a variety of different high-level languages, a variety of different IDEs and a large number of developer assistance tools to help application developers write Robust code.

virtual machines that are most popular in the world: 16152773274747 EVM and WebAssembly .

EVM has many problems. For example, its dynamic jumps mechanism prevents the EVM code from being statically analyzed, which leads to frequent security vulnerabilities on the EVM, and its 256-bit integer leads to extremely poor virtual machine performance. It is not the worst. The worst problem is: Due to the nature of the blockchain, we cannot do any fundamental upgrade to the EVM.

When the EVM was under heavy load, some people switched to WebAssembly, but it is hard to say that WebAssembly is a good choice.

WebAssembly, as the name suggests, was invented for the Web. It was designed to run in a browser, but the current reality is that virtual machine have exceeded the performance requirements of a browser client.

At the same time, the bytecode of WebAssembly is an AST (Abstract Syntax Tree) bytecode rather than an instruction in the traditional sense. The former is a tree structure, and the latter is a one-dimensional instruction stream. This difference makes it take a lot of time to parse the binary data to the AST during the process of loading WebAssembly. , since most applications are not computationally expensive and will not run for a long time like a browser, the loading speed of the 161527732747db code is as important as its execution speed.

If we can adopt RISV-V Instruction Set in a virtual machine on the block chain , not only can effectively solve the above problems, there are many additional benefits:

RISC-V is a simple hardware instruction set, after good design and extensive testing, there will be no too many design errors like EVM;
RISC-V works at a lower level (compared to EVM and WebAssembly). One of the current trends in the world is "simple hardware, complex software". We can find this in many fields such as routers and smart homes. Blockchain should also be the same;
RISC-V program adopts ELF package, its loading speed is faster;
RISC-V program can be easily compiled by JIT and AOT, and its performance upper limit is higher;
RISC-V has a complete tool chain, which can easily analyze and debug programs, which is very important for a wide range of developers;
Finally, it is the first to use RISC-V on the blockchain virtual machine, which will greatly promote the deployment of blockchain nodes on RISC-V hardware in the future.

The core design logic of CKB-VM

In a resource-constrained environment, how resources for effective management and use is to build a high-performance platform for key , all users on the block chain share limited resources in a limited period of time, which makes the block It becomes more important for chain applications to run efficiently within the amount of resources allocated.

There are two key points here, one is that runs efficiently, and the other is that limits the usage of resources . Based on this, the core design logic of CKB-VM:

Efficient operation

One of the key points of CKB-VM's decision to introduce the RISC-V instruction set is that RISC-V IMC instruction set can be semantically equivalent to a combination of several x64 instructions, which means that we only need extreme A RISC-V virtual machine can be built on the x64 platform with less additional consumption.

For example, the AND instruction and the BGE instruction in RISC-V can be implemented in an extremely streamlined manner under the x64 platform. Here is the code that implements the AND instruction by the ASM interpreter of CKB-VM, you can see that we have used some macros, This makes the whole process extremely clear:

Resource limit

CKB-VM allows developers to customize a function that receives a RISC-V instruction and returns the cycle consumption of the instruction. The built-in checker of CKB-VM will count the number of executed RISC-V cycles and set it at a preset threshold. Stop the program.

This will ensure that all applications running in CKB-VM can be stopped within a limited time. It avoids the worst consequences under the : 16152773274ae4 A malicious application stops the operation of the blockchain. CKB-VM uses a byte array internally to simulate the memory of RISC-V, and we limit the total amount of memory required by the application when it is running.

In many cases, we will encounter trade-offs. Larger memory represents greater flexibility, but as the saying goes, any coin has two sides. Large memory will also cause more time to be consumed when initializing the virtual machine. CKB-VM finds a delicate balance between flexibility and initialization speed. The memory limit of this balance is four megabytes. We have done many tests to show that can implement most cryptographic algorithms and sufficiently complex business logic under the limit of four megabytes of memory.

Memory latency

Initialization has been circulating in the Chinese coding community some unclear codes, such as "Bang Tang Tang" and "Tun Tun Tun". They often appear in the debug mode of VS. When you see these symbols, it means that your program has accessed uninitialized memory. This involves a low-level design, that is x64 program applies to the operating system is uninitialized.

But for RISC-V, the situation is a bit different, because specifies that the memory must be initialized with a zero value. Because there is a huge performance gap between malloc and calloc, it is very wasteful to use calloc to apply for four megabytes of memory. CKB-VM's decision is to delay the initialization of memory , that is, only when the relevant memory page is used , The memory page will be initialized with zero value, and the unused memory page will be kept in the uninitialized state. This effectively improves the efficiency of executing some programs that use very little memory.

W^X memory protection strategy

CKB-VM is also designed to run untrusted application code in the blockchain. These codes may come from careless developers or deliberate attackers. A common but effective attack method is that the attacker constructs specific input data so that the program writes CPU instructions into the memory space used to store the data, and then executes these instructions.

CKB-VM has built-in W^X (write xor execute) protection. It is a memory protection strategy. Each page in the address space of an application can be writable or executable, but not both. This mechanism allows more flexibility in writing applications without being overly concerned about some programming errors causing the attacker's code to be executed unexpectedly.

CKB-VM optimizes many underlying logics, making it a computing platform that takes into account both security and performance.

Wide application of CKB-VM in cloud-native fields beyond blockchain

RISC-V + CKB-VM + Cloud Native RISC-V is a brand new technology, it is developing rapidly in the hardware field, but I think its potential is far more than that- may play a role in the cloud native field in the future More important role.

At present, the cloud field is basically the x86 and AMD market. I think RISC-V can be added in a clever way. It is different from direct competition, but uses CKB-VM to run RISC-V programs on the x86 platform through CKB-VM. After there is enough market, try to use real RISC-V hardware. At the same time, for cloud vendors, they do not need to bear the risk of switching architecture, so the resistance to advancement will be much smaller than that of RISC-V hardware directly.

Let's take a look at the advantages of RISC-V + CKB-VM + Cloud Native compared to traditional Cloud Native.

At present, a common practice of Cloud Native is to first start a Docker to isolate the environment and resources, and then run a binary program directly inside Docker, or indirectly execute the user's script code through NodeJS, Python or a JVM.

I can't help asking why it makes things so complicated?

Especially with the gradual maturity of RISC-V backends of high-level languages such as Rust and Golang, we can treat RISC-V as a kind of "cross-platform bytecode" . Through CKB-VM, it can truly Realize the write once, run anywhere .

More detailed permission control than Docker

Compared with the basic function of resource isolation provided by Docker, uses RISC-V + CKB-VM to provide more fine-grained permission control required by more cloud computing platforms. RISC-V program uses system call to communicate with the operating system. The implementation of CKB-VM can proxy all system calls issued by the application. After that, the cloud computing platform can communicate with the operating system according to the user's relevant permissions. The current resource usage and so on determine whether to respond to this system call. E.g:

Control the maximum number of file handles used by the application
Control the number of TCP connections established by the application
Control the IO usage of the application

The above fine-grained control is difficult to implement using Docker, at least not very intuitive, and if RISC-V + CKB-VM is used, because resource requests and releases are all through system calls, the cloud computing platform can be used in application resources and In terms of authority, almost unlimited granularity.

Proxy resource request

When we use the open function in our code, what are we doing?

Did we really ``open'' a real file on the hard drive?

No, we are like a request from the operating system. Sometimes the operating system reads the file from the hard disk, and sometimes it returns the file to us from the cache, depending on the operating system's thinking. In this case, it can be said that the operating system manages hard disk resources.

Cloud native programming should have a similar programming experience as native programming, but the industry is far from doing enough in this regard. Many cloud vendors have the function of providing cloud storage, but in most cases, you still need to install the SDK of these vendors first, and then access them through the relevant API. This experience creates a sense of separation between cloud applications and native applications.

As mentioned above, CKB-VM represents all the system calls issued by the application. When the RISC-V + CKB-VM solution is adopted, the cloud code can be exactly the same as the native code:

For example, when the developer writes open("/foo/bar") , if the code is running on the local machine, the program will open the /foo/bar file under the local file system; and if the code is run on the cloud computing platform, it will open The /foo/bar file in the relevant cloud storage bucket under your current account.

The most important point is is automatic, no need for developers to make any changes to the code, or even recompile!

Millisecond cold start, extremely low resource consumption, extremely fast running speed

These features of CKB-VM make it very suitable for executing as a Lambda function. All these features make it better than traditional Docker-based Lambda functions. And may cloud vendors save a lot of machine costs .

Multi-language support

Just for now, common languages such as C/C++, Golang and Rust can all generate relatively high-quality RISC-V code, which can directly in CKB-VM . Furthermore, it can even be supported in CKB-VM by compiling interpreters such as JavaScript, Lua and Ruby into RISC-V.

The past and future of CKB-VM

Secret Monkey Technology has encountered many challenging problems during the development of CKB-VM, and accumulated a lot of development experience in the process of solving these problems.

How to efficiently simulate the instruction set?

CKB-VM originally used Rust to implement an interpreter, but the code quality after the Rust interpreter was average, and its performance was far inferior to hand-written assembly code. CKB-VM has made two other attempts. One is the AOT compiler , which will first compile the RISC-V program into an x64 program before execution. The second is written by hand by . Compared with the AOT compiler, this allows to achieve more detailed control at the register allocation level.

For example, the context information of the execution environment, or the source operand, destination operand and immediate data in the instruction, these data are fixedly allocated in certain registers during the entire execution phase. All this works very well, but I encountered a little problem when trying to add the B extension to CKB-VM. For example, the implementation of the bfp instruction in the B extension is too complicated for the hand-written ASM code (regardless of The implementation of logic is still in terms of register allocation):

For this reason, in the interpreter loop of ASM, after the instruction is decoded, the execution path of the instruction will be determined according to the type of instruction: in CKB-VM it becomes fast path and slow path . In the fast path, the execution of the instructions will be processed inside the assembly code; in the slow path, the execution of the instructions will be handed over to the Rust interpreter.

Considering that the frequency of these complex instructions in the program is very low, and at the same time, the use of such complex instructions can be intentionally avoided in the application code. Therefore, the fast path + slow path 16152773275054 can almost not affect the main performance indicators. Under the premise of avoiding adding heavy calculation process in the interpreter loop implemented by assembly.

How to test the virtual machine?

First of all, ensure that the virtual machine implementation can pass the official test set, which can provide a minimum guarantee of correctness.

Secondly, the use of fuzzing test (fuzzing test) can better cover the code, using the program to generate a large number of random meaningful or meaningless code, respectively using CKB-VM and other existing RISC-V simulators such as riscvOVPsim, Spike, etc. execute and compare their final results.

CKB-VM's future development plan

We plan to for V extension (vector instruction) , which will allow us to further optimize the cryptographic algorithms in the blockchain with a new idea.

The vectorization of algorithms is an interesting challenge. Most algorithms are more or less parallel to a certain extent. In other words, we can always find an algorithm that can be represented by vectorization ideas in whole or in part. Solve the problem. For CKB-VM, the bottom layer of vector instructions can be solved by SIMD or multithreading.