@Open source people about "code", the smart contract library has a reward, the second phase of the code is coming, welcome to build a smart contract library together
https://mp.weixin.qq.com/s/bYRJsWTd_B373LxxP8dFcA
image.png

introduction

As a smart contract , Solidity has both differences and similarities with other classic languages.

On the one hand, the attributes that serve the blockchain make it different from other languages. For example, the deployment and invocation of contracts must be confirmed by the blockchain network; execution costs need to be strictly controlled to prevent malicious code from consuming node resources.

On the other hand, as a programming language, the implementation of Solidity has not deviated from the classic language. For example, Solidity includes a stack and heap-like design, and a stack virtual machine is used for bytecode processing.

The first few articles in this series introduced how to develop a Solidity program. In order to let readers know what it is, this article will further introduce the internal operating principle of Solidity, focusing on the life cycle of the Solidity program and the working mechanism of EVM.

The life cycle of Solidity

Like other languages, the code life cycle of Solidity is inseparable from the four stages of compilation, deployment, execution, and destruction. The following figure organizes and shows the complete life cycle of the Solidity program:

After compilation, the Solidity file will generate bytecode. This is a code similar to jvm bytecode. During deployment, the bytecode and construction parameters will be constructed into a transaction, which will be packaged into a block, and through the network consensus process, a contract is finally constructed on each blockchain node and the contract address is returned to the user.

When the user is ready to call the function on the contract, the call request will also go through the process of transaction, block, and consensus, and finally executed by the EVM virtual machine on each node.

Below is a sample program, we explore its life cycle through remix.

pragma solidity ^0.4.25;

contract Demo{
    uint private _state;
    constructor(uint state){
        _state = state;
    }
    function set(uint state) public {
        _state = state;
    }
}

Compile

After the source code is compiled, you can get its binary through the ByteCode button:

608060405234801561001057600080fd5b506040516020806100ed83398101806040528101908080519060200190929190505050806000819055505060a4806100496000396000f300608060405260043610603f576000357c0100000000000000000000000000000000000000000000000000000000900463ffffffff16806360fe47b1146044575b600080fd5b348015604f57600080fd5b50606c60048036038101908080359060200190929190505050606e565b005b80600081905550505600a165627a7a723058204ed906444cc4c9aabd183c52b2d486dfc5dea9801260c337185dad20e11f811b0029

You can also get the corresponding bytecode (OpCode):

PUSH1 0x80 PUSH1 0x40 MSTORE CALLVALUE DUP1 ISZERO PUSH2 0x10 JUMPI PUSH1 0x0 DUP1 REVERT JUMPDEST POP PUSH1 0x40 MLOAD PUSH1 0x20 DUP1 PUSH2 0xED DUP4 CODECOPY DUP2 ADD DUP1 PUSH1 0x40 MSTORE DUP2 ADD SWAP1 DUP1 DUP1 MLOAD SWAP1 PUSH1 0x20 ADD SWAP1 SWAP3 SWAP2 SWAP1 POP POP POP DUP1 PUSH1 0x0 DUP2 SWAP1 SSTORE POP POP PUSH1 0xA4 DUP1 PUSH2 0x49 PUSH1 0x0 CODECOPY PUSH1 0x0 RETURN STOP PUSH1 0x80 PUSH1 0x40 MSTORE PUSH1 0x4 CALLDATASIZE LT PUSH1 0x3F JUMPI PUSH1 0x0 CALLDATALOAD PUSH29 0x100000000000000000000000000000000000000000000000000000000 SWAP1 DIV PUSH4 0xFFFFFFFF AND DUP1 PUSH4 0x60FE47B1 EQ PUSH1 0x44 JUMPI JUMPDEST PUSH1 0x0 DUP1 REVERT JUMPDEST CALLVALUE DUP1 ISZERO PUSH1 0x4F JUMPI PUSH1 0x0 DUP1 REVERT JUMPDEST POP PUSH1 0x6C PUSH1 0x4 DUP1 CALLDATASIZE SUB DUP2 ADD SWAP1 DUP1 DUP1 CALLDATALOAD SWAP1 PUSH1 0x20 ADD SWAP1 SWAP3 SWAP2 SWAP1 POP POP POP PUSH1 0x6E JUMP JUMPDEST STOP JUMPDEST DUP1 PUSH1 0x0 DUP2 SWAP1 SSTORE POP POP JUMP STOP LOG1 PUSH6 0x627A7A723058 KECCAK256 0x4e 0xd9 MOD DIFFICULTY 0x4c 0xc4 0xc9 0xaa 0xbd XOR EXTCODECOPY MSTORE 0xb2 0xd4 DUP7 0xdf 0xc5 0xde 0xa9 DUP1 SLT PUSH1 0xC3 CALLDATACOPY XOR 0x5d 0xad KECCAK256 0xe1 0x1f DUP2 SHL STOP 0x29 

The following instruction set is the code corresponding to the set function, and how the set function runs will be explained later.

JUMPDEST DUP1 PUSH1 0x0 DUP2 SWAP1 SSTORE POP POP JUMP STOP

deploy

After compiling, you can deploy the code on remix, and pass in the construction parameter 0x123:
image.png

After the deployment is successful, you can get a transaction receipt:
image.png
Click input, you can see the specific transaction input data:

image.png
In the above piece of data, the part marked in yellow is exactly the contract binary in the previous article; and the part marked in purple corresponds to the incoming construction parameter 0x123.

All these indicate that the contract deployment uses transactions as the medium. Combined with the knowledge of blockchain transactions, we can restore the entire deployment process:

  • The client uses the deployment request (contract binary, construction parameters) as the input data of the transaction to construct a transaction
  • The transaction is encoded by rlp and then signed by the sender's private key
  • The signed transaction is pushed to the node on the blockchain
  • After the blockchain node verifies the transaction, it is deposited in the transaction pool
  • When it is the node's turn to produce a block, package the transaction to build a block and broadcast it to other nodes
  • Other nodes verify the block and reach a consensus. Different blockchains may use different consensus algorithms. In FISCO BCOS, PBFT is used to achieve consensus, which requires a three-phase submission (pre-prepare, prepare, commit)
  • The node executes the transaction, and the result is that the smart contract Demo is created, and the storage space of the state field _state is allocated and initialized to 0x123

carried out

According to whether there is a modifier view, we can divide functions into two categories: call and transaction. Since it is determined during the compilation period that the call will not cause a change in the contract state, for this type of function call, the node can directly provide the query without confirming with other blockchain nodes. Since the transaction may cause status changes, it will be confirmed across the network.

The following will assume that the user has called set(0x10) to see the specific operation process.

First of all, the function set is not configured with view/pure modifiers, which means it may change the state of the contract. Therefore, this call information will be put into a transaction, and through the process of transaction encoding, transaction signature, transaction push, transaction pool caching, block generation, network consensus, etc., it will finally be handed over to the EVM of each node for execution.

In the EVM, the parameter 0xa is stored in the contract field _state by the SSTORE bytecode. The bytecode first gets the address of the state field _state and the new value 0xa from the stack, and then completes the actual storage.

The following figure shows the running process:

image.png

Here is only a rough introduction to how set(0xa) works. The next section will further introduce the working mechanism and data storage mechanism of EVM.

destroy

Since the contract cannot be tampered with after it is on the chain, the life of the contract can continue until the underlying blockchain is completely shut down. If you want to manually destroy the contract, you can use the bytecode selfdestruct. The destruction of the contract also requires transaction confirmation, so I won’t go into details here.

EVM principle

In the previous article, we introduced the operating principle of the Solidity program. After the transaction is confirmed, the bytecode is finally executed by the EVM. For EVM, the above is just a brief introduction, this section will introduce its working mechanism in detail.

Principle of operation

EVM is a stack virtual machine, and its core feature is that all operands will be stored on the stack. Below we will look at its operation principle through a simple Solidity statement code:

uint a = 1;
uint b = 2;
uint c = a + b;

After this code is compiled, the bytecode obtained is as follows:

PUSH1 0x1
PUSH1 0x2
ADD

In order for readers to better understand its concept, here are simplified to the above three sentences, but the actual bytecode may be more complicated, and will be mixed with sentences such as SWAP and DUP.

We can see that in the above code, there are two instructions: PUSH1 and ADD. Their meanings are as follows:

  • PUSH1: Push the data onto the top of the stack.
  • ADD: POP the two top elements of the stack, add them, and push them back to the top of the stack.

The execution process is explained here in a semi-animated way. In the figure below, sp represents the pointer on the top of the stack, and pc represents the program counter. After executing push1 0x1, both pc and sp move down:
image.png

Similarly, after executing push1 0x2, the status of pc and sp is as follows:
image.png

Finally, when the add is executed, the two operands at the top of the stack are popped as the input of the add instruction, and the sum of the two will be pushed onto the stack:
image.png

Storage exploration

In the development process, we often encounter confusing memory modifiers; when reading open source code, we will also see various assembly operations that directly target memory. Developers who do not understand the storage mechanism will be confused when encountering these situations, so this section will explore the storage principle of EVM.

In the previous article "Basic Features of Solidity for Smart Contract Writing", we introduced that a piece of Solidity code usually involves local variables and contract state variables.

The storage methods of these variables are different. The following code shows the relationship between variables and storage methods.


contract Demo{
    //状态存储
    uint private _state;

    function set(uint state) public {
        //栈存储
        uint i = 0;
        //内存存储
        string memory str = "aaa";
    }
}

Stack

The stack is used to store the operands of bytecode instructions. In Solidity, if the local variable is of integer type, fixed-length byte array, etc., it will be pushed into and out of the stack as the instruction runs.

For example, in the following simple statement, the variable value 1 will be read out and pushed onto the top of the stack through the PUSH operation:

uint i = 1;

For such variables, you cannot forcefully change their storage method. If you put the memory modifier before them, the compiler will report an error.

RAM

Memory is similar to the heap in Java, it is used to store "objects". In Solidity programming, if a local variable is a variable-length byte array, string, structure, etc., it is usually modified by the memory modifier to indicate that it is stored in memory.

In this section, we will use strings as an example to analyze how memory stores these objects.

1. Object storage structure

The following will use the assembly statement to analyze the storage of complex objects.

The assembly statement is used to invoke bytecode operations. The mload instruction will be used to call these bytecodes. mload(p) means reading 32 bytes of data from address p. Developers can treat object variables as pointers and pass them directly to mload.

In the following code, after the mload call, the data variable holds the first 32 bytes of the string str in the memory.

string memory str = "aaa";
bytes32 data;
assembly{
    data := mload(str)
}  

Master mload, you can use this to analyze how string variables are stored. The following code will reveal how the string data is stored:

function strStorage() public view returns(bytes32, bytes32){
    string memory str = "你好";
    bytes32 data;
    bytes32 data2;
    assembly{
        data := mload(str)
        data2 := mload(add(str, 0x20))
    }   
    return (data, data2);
}

The data variable represents 0 to 31 bytes of str, and data2 represents 32 to 63 bytes of str. The results of running the strStorage function are as follows:

0: bytes32: 0x0000000000000000000000000000000000000000000000000000000000000006
1: bytes32: 0xe4bda0e5a5bd0000000000000000000000000000000000000000000000000000

As you can see, the value of the first data word is 6, which is exactly the number of bytes of the string "Hello" encoded by UTF-8. The second data word is the UTF-8 encoding of "Hello" itself.

After mastering the storage format of strings, we can use assembly to modify, copy, and splice strings. Readers can search Solidity's string library to learn how to implement string concat.

2. Memory allocation method

Since memory is used to store objects, memory allocation methods must be involved.

The memory allocation method is very simple, that is, sequential allocation. Below we will allocate two objects and view their addresses:

function memAlloc() public view returns(bytes32, bytes32){
    string memory str = "aaa";
    string memory str2 = "bbb";
    bytes32 p1;
    bytes32 p2;
    assembly{
        p1 := str
        p2 := str2
    }   
    return (p1, p2);
}

After running this function, the return result will contain two data words:

0: bytes32: 0x0000000000000000000000000000000000000000000000000000000000000080
1: bytes32: 0x00000000000000000000000000000000000000000000000000000000000000c0

This shows that the starting address of the first string str1 is 0x80, and the starting address of the second string str2 is 0xc0, 64 bytes in between, which happens to be the space occupied by str1 itself. The memory layout at this time is as follows, one of which represents 32 bytes (a data word, EVM uses 32 bytes as a data word instead of 4 bytes):
image.png

  • 0x40~0x60: Free pointer, save the available address, in this example it is 0x100, indicating that the new object will be allocated from 0x100. You can use mload(0x40) to get the allocated address of the new object.
  • 0x80~0xc0: the starting address of the object allocation. The string aaa is assigned here
  • 0xc0~0x100: the character string bbb is allocated
  • 0x100~...: Because it is allocated sequentially, new objects will be allocated here.

State storage

As the name implies, state storage is used to store the state fields of the contract.

From the perspective of the model, storage consists of multiple 32-byte storage slots. In the previous article, we introduced the set function of the Demo contract, where 0x0 represents the storage slot of the state variable _state. All fixed-length variables will be placed in this group of storage slots in sequence.

For mapping and arrays, the storage will be more complicated. It will occupy 1 slot, and the contained data will occupy other slots according to the corresponding rules. For example, in mapping, the storage slot of the data item is determined by the key value k and the mapping own slot p. keccak can calculate it.

In terms of implementation, different chains may use different implementations. The more classic is the MPT tree used by Ethereum. Due to problems such as the performance and scalability of the MPT tree, FISCO BCOS abandoned this structure and adopted distributed storage. The state data was stored through rocksdb or mysql, so that the performance and scalability of the storage were improved.

Concluding remarks

This article introduces the operating principle of Solidity, which is summarized as follows.
First, the Solidity source code will be compiled into bytecode. When deployed, the bytecode will be confirmed across the network using the transaction as a carrier, and a contract will be formed on the node; the contract function call, if it is a transaction type, will be confirmed by the network, and finally Executed by EVM.
EVM is a stacked virtual machine, which reads the bytecode of the contract and executes it.
During the execution process, it will interact with the stack, memory, and contract storage. Among them, the stack is used to store ordinary local variables, these local variables are bytecode operands; memory is used to store objects, using length+body for storage, and sequential allocation is used for memory allocation; state storage is used to store state variables.
Understanding how Solidity works and the principles behind it is the way to Solidity programming master 160f658586d212.


FISCO_BCOS
193 声望1.3k 粉丝

FISCO BCOS是开源联盟区块链底层技术平台,由金融区块链合作联盟(简称金链盟)成立开源工作组通力打造。成员包括博彦科技、华为、深证通、神州数码、四方精创、腾讯、微众银行、亦笔科技和越秀金科等金链盟成员机构。