头图

I. Introduction

Transaction is an indispensable function in traditional relational databases. For example, Mysql, Oracle, and PostgreSql all support transactions, but in NoSQL databases, the concept of transactions is relatively weak, and the implementation is not as complicated as relational databases.

However, for data integrity and consistency, most kvs will implement the basic characteristics of transactions, such as the two originators of the kv database, LevelDB and RocksDB, and some open source kvs implemented in the Go language also support transactions, such as Bolt, Badger, etc.

Rosedb's transaction has just implemented a preliminary version, and the code is still relatively simple, but within my expected concept, the follow-up may slowly evolve to become more complicated.

It should be noted that before the implementation of rosedb transactions, my understanding of transactions was limited to basic concepts such as ACID, so this implementation is completely crossing the river by feeling the stones. There may be some grooves. If you have any questions, you can point it out. , I will continue to learn and improve later.

2. Basic concepts

When it comes to transactions, it’s easy to think of the ACID characteristics of transactions, so let’s take a look back:

  • Atomicity: All operations in a transaction are either completed or failed, and will not end in the middle. If an error occurs during the execution of the transaction, it can be rolled back to the state before the start of the transaction.
  • Consistency: Before and after the transaction, the integrity of the database has not been destroyed, which means that the data status always meets expectations.
  • Isolation (Isolation): Isolation describes the degree of mutual influence of multiple executing transactions. There are four common isolation levels, which indicate different degrees of influence between transactions:

    • Read uncommitted (read uncommitted): a transaction has not yet been committed, another transaction can see its changes (there is a dirty read)
    • Read committed (read committed): The modification of data by a transaction can only be seen by other transactions after it is committed (there is no dirty read, but it cannot be read repeatedly)
    • Repeatable read: The data obtained during the execution of a transaction is consistent with the data at the beginning of the transaction (no dirty reads, repeatable reads, but phantom reads)
    • Serializable: Reading and writing are mutually exclusive to avoid transaction concurrency. A transaction must wait until the previous transaction is committed before it can be executed (no dirty reads, repeatable reads, no phantom reads)
  • Durability: After a transaction is committed, the changes it makes are permanent, and security can be guaranteed even after the database crashes.

The concept of ACID looks quite a lot, but it is not difficult to understand. To implement transactions, it is actually to ensure that these basic concepts of transactions are met when data is read and written. AID must be guaranteed.

Consistency is consistency, which can be simply understood as the ultimate goal of the transaction. The database uses AID to ensure consistency, and we must ensure consistency at the application level. If the data we write is logically wrong, then Even if the database transaction is perfect, consistency cannot be guaranteed.

Three, concrete realization

Before explaining the implementation of transactions, let's take a look at the basic usage of transactions in rosedb:

// 打开数据库实例
db, err := rosedb.Open(rosedb.DefaultConfig())
if err != nil {
   panic(err)
}

// 在事务中操作数据
err = db.Txn(func(tx *Txn) (err error) {
   err = tx.Set([]byte("k1"), []byte("val-1"))
   if err != nil {
      return
   }
   err = tx.LPush([]byte("my_list"), []byte("val-1"), []byte("val-2"))
   if err != nil {
      return
   }
   return
})

if err != nil {
   panic(fmt.Sprintf("commit tx err: %+v", err))
}

First, a database instance will be opened, and then the Txn method will be called. The input parameter of this method is a function, and the transaction operations are completed in this function, which will be executed at the time of submission.

If used like this, the transaction will be automatically committed, of course, you can also manually open the transaction and commit, and manually roll back when an error occurs, as follows:

// 打开数据库实例
db, err := rosedb.Open(rosedb.DefaultConfig())
if err != nil {
   panic(err)
}

// 开启事务
tx := db.NewTransaction()
err = tx.Set([]byte("k1"), []byte("val-1"))
if err != nil {
   // 有错误发生时回滚
   tx.Rollback()
   return
}

// 提交事务
if err = tx.Commit(); err != nil {
   panic(fmt.Sprintf("commit tx err: %+v", err))
}

Of course, the first usage is recommended, eliminating the need for manual transaction commit and rollback.

Txn method represents a read and write transaction, and there is also a TxnView method, which represents a read-only transaction, and the usage is exactly the same, except TxnView method will be ignored.

db.TxnView(func(tx *Txn) error {
   val, err := tx.Get([]byte("k1"))
   if err != nil {
      return err
   }
   // 处理 val

   hVal := tx.HGet([]byte("k1"), []byte("f1"))
   // 处理 hVal
  
   return nil
})

After understanding the basic ACID concepts of transactions and the basic usage of rosedb transactions, let's take a look at how transactions are implemented in rosedb. You can also think of how to ensure AID characteristics.

3.1 Atomicity

As mentioned earlier, atomicity refers to the integrity of the transaction execution, either all succeed or all fail, and cannot stay in an intermediate state.

To achieve atomicity is actually not difficult, you can use rosedb's write feature to solve it. Let's review the basic process of writing rosedb data, two steps: first, the data will be dropped on the disk to ensure reliability, and then the index information in the memory will be updated.

For a transaction operation, to ensure atomicity, you can temporarily store the data that needs to be written in the memory, and then write it to the disk file at one time when the transaction is committed.

There is a problem, that is, an error occurs when writing to disk in batches, or the system crashes. What should I do? In other words, some data may have been successfully written, and some have failed. According to the definition of atomicity, this time the transaction is not committed and is invalid, so how should we know that the data that has been written is invalid?

At present, rosedb adopts the easiest and simpler way to solve this problem.

The specific approach is as follows: each time a transaction starts, a globally unique transaction id will be assigned, and the data that needs to be written will take this transaction id and write it to the file. When all the data is written to the disk, the transaction id is stored separately (also written to a file). When the database is started, all transaction IDs in this file will be loaded first and maintained in a collection, which is called the committed transaction ID.

In this case, even if there is an error in the batch writing of data, since the corresponding transaction id is not stored, when the database is started and the data is taken out to build the index (recall the startup process of rosedb), it can be checked that the transaction id corresponding to the data is not in The submitted transaction id is in the collection, so these data will be considered invalid.

Most kvs of the LSM genre use similar ideas to ensure the atomicity of transactions. For example, rocksdb stores all the writes in the transaction in a WriteBatch, and writes them at once when the transaction is committed.

3.2 Isolation

Currently rosedb supports two types of transactions: read-write transactions and read-only transactions. Only one read-write transaction can be opened at the same time, and multiple read-only transactions can be opened at the same time.

In this mode, read locks are added to reads, and write locks are added to writes. That is to say, reads and writes are mutually exclusive and cannot be performed at the same time. It can be understood that this is serialization among the four isolation levels. Its advantage is simple and easy to implement, but its disadvantage is poor concurrency.

It should be noted that the current implementation will likely be adjusted in the future. My assumption is that snapshot isolation can be used to support read submission or repeatable reading, so that data reading can read historical versions without causing Blocking of write operations is just a lot more complicated in implementation.

3.3 Persistence

Persistence needs to ensure that the data has been written to a non-volatile storage medium, such as the most common disk or SSD, so that even if a system abnormality occurs, data security can be guaranteed.

In rosedb, when writing data, if the default flashing strategy is adopted, the data is written to the operating system page cache, but it is not actually on the disk. If the operating system has not had time to flush the page cached data to disk, it will cause data loss. Although durability cannot be fully guaranteed in this way, the performance is relatively better, because Sync flashing the disk is an extremely slow operation.

If the configuration item Sync is specified as true when you start rosedb, then Sync will be forced every time you write, which can ensure that the data is not lost, but the write performance will decrease.

The actual choice can be based on your own usage scenarios. If the system is stable, has high performance requirements, and can tolerate a small amount of data loss, then you can use the default strategy, that is, Sync is false, otherwise you can force a flash drive.

Four, defects

After the above simple analysis, we can see that rosedb has basically realized the AID feature of the transaction. On the whole, it is quite simple, easy to learn and use, and can be well understood for further expansion. Of course, there are also some shortcomings that need to be resolved urgently.

The first one is the isolation level mentioned above. At present, this method is too simple. Use a large global lock to make serialization. In the future, you can consider only locking a key that needs to be operated to reduce the lock. The granularity.

Another problem is that because rosedb supports a variety of data structures, but structures like List and ZSet are more difficult to support all commands in a transaction, so currently List only supports LPush and RPush, and ZSet only supports ZAdd, ZScore, ZRem commands.

The main reason is that if the existing key is read and written in the transaction, it will be difficult to support this type of command such as range lookup. I have not yet thought of a better solution.

Finally, attach the project address: https://github.com/roseduan/rosedb , welcome everyone to come and watch.

Ps: Rosedb also welcomes friends who are interested in storage and kv to join, and you can also add me to WeChat for in-depth discussions and exchanges.


roseduan
170 声望43 粉丝