This article collates the transcript of the speech given by Bai Yuqing, the person in charge of self-knowledge online infrastructure, on PingCAP Infra Meetup. This article describes the relationship between Zhihu and TiDB, and introduces Zetta, an open source product developed based on the TiDB ecosystem, which can avoid HBase performance problems while reducing system latency under the distributed architecture after TiDB is deployed.

Background overview

BigTable data model

Before we start introducing Zetta, let's take a look at BigTable. BigTable is a sparse multidimensional ordered table ( Sparse multidimensional sorted map ), which is a data model developed by Google to solve the data indexing problem in scenarios with huge amounts of data. Google crawler has a very large amount of data. BigTable can not only provide low-latency storage services that meet its business scenarios, but also provides wide-column capabilities in terms of development efficiency, that is, data structuring, which is very friendly to development. Currently, BigTable is used in scenarios that require data analysis, such as Google Earth, Google Analytics, and Personalized Search.

Know the challenge

When Zhihu developed into 2016 and 2017, as its business grew, it encountered many problems similar to Google. In an environment where the scale of data continues to grow, many scenarios actually present NoSQL scenarios, which are not very strictly relational scenarios.

For example, Zhihu Redis now has about 30,000 to 40,000 instances. If they are transformed into microservices, the calls between services will be very frequent. For some online services, it requires low latency, high throughput, and high concurrency.

For example, services that have been read on the homepage need to filter out the data that users have read when the homepage is displayed. The data of the homepage displayed by Zhihu each time is a very large data set of user dimension plus content dimension. This also includes the AI user portrait service. The user portrait is a very sparse data, which stores a very sparse table of what content users are interested in. These scenarios continue to have an impact on our infrastructure.

introduced HBase

At that time, Zhihu finally chose HBase. HBase is an excellent open source implementation of BigTable, and it has a very mature ecology. But at the same time, it also has some minor problems, such as not supporting inter-bank transactions and imperfect secondary indexes.

Although it can be solved with third-party components (such as Phoenix), it also creates new problems: there are many system components and it is very complicated to maintain. Zhihu also encountered some problems during its use.

  • First, the use cost of . To make HBase useful, you need to invest in a very professional team of engineers to debug. These engineers not only need to have relevant knowledge, but also have a special understanding of HBase.
  • Second, service access to HBase is very high . Many times we use MySQL or NoSQL for development, so we need to pay a relatively large amount of work to migrate to HBase.
  • Third, HBase is very difficult to tune . For example, the tuning of some parameters of HBase's Cache and HDFS requires very detailed adjustments to business scenarios, which is difficult.

To sum up, HBase is not the optimal solution for Zhihu in the current business scenario. In fact, even if the Zhihu team is working hard to tune and optimize, the response time of HBase still fluctuates drastically. The Zhihu team not only hopes that the response time is as low as possible, but also that it can be stable, which HBase cannot satisfy.

Self-built RBase

Based on the above situation, around 2017, Zhihu developed RBase using mature components such as K8s, Kafka, Redis, and MySQL. The upper layer of RBase is HBase, and the underlying storage is MySQL. MySQL is deployed on K8s, Kafka is connected in the middle, and then Cache Through is used to minimize latency.

But RBase also has some problems. In the case of medium data scale, it is very troublesome for MySQL to perform sharding every time for cluster expansion. And because the data in the database is disordered, data analysis cannot be performed smoothly. Under this circumstance, Zhihu still developed the homepage read filter and anti-cheat device fingerprint function, and iterated continuously.

By 2019, the amount of data in Zhihu has further increased, and finally MySQL's Sharding has become the most stressful place for this system. Therefore, RBase was further upgraded and TiDB was introduced to replace MySQL.

On the whole, RBase still uses this architecture, but the problem of MySQL Sharding is completely solved. At the same time, this system also guarantees good performance and can carry more services. But this brings a new problem: Distributed database will inevitably increase the delay of the system.

In order to better solve the above problems, finally, Zetta was born.

The Birth of Zetta

database: Transactional / Serving / Analytical

There are three scenarios in the database, two of which are familiar to everyone, transaction and analysis . Transaction scenarios include scenarios with complex business logic and strong relationship models such as financial transactions. Analysis scenarios include scenarios such as Adhoc, ETL, and reports. But there is another scene: Serving scene , it is used for online services, there is no strong relationship.

User portrait service is a scenario of Serving, which may have sparse wide columns and real-time calculations. And Zetta was born under the background of the growing demand for the Serving scene.

Zetta architecture analysis

The ultimate goal of technological development is to serve value, and cost drives technological progress.

The data value of the transaction is very high, the data value density of big data is relatively low, and Serving is based on an intermediate scenario. Zetta is a cost-compromising product that reduces query costs and usage costs under this value density condition.

Zhihu Zetta hopes to become a TiDB ecosystem partner, because the ecosystem of TiDB is not only open, but also mature. At the same time, Zetta also hopes to become a partner in the Serving scenario in the TiDB ecosystem.

Before that, we also have some trade-offs. As shown in the figure below, the black part is what we have done, the orange is what we are doing, and the blue is what we plan to do now.

Zetta can choose the level of consistency, and supports the choice between strong consistent reading and weak consistent reading. Of course, this is determined according to the business scenario. Zetta also supports non-transactions, for example, it can abandon transaction requirements for more extreme performance. In addition, Zetta will support cache reads in the future, which will bring further performance improvements.

In the access mode, Zetta supports a wide column mode. A wide column is a particularly wide table. This column can be dynamically added, and you can also choose a high table mode or a wide table mode. The two modes are physically different, but can be set in Zetta. In addition, Zetta also uses a clustered index to improve performance, in addition to Hash breakup.

🌟 Other capabilities

Zetta also provides the ability of secondary indexing. At the same time, Zetta does not need multiple versions, because multiple versions are sometimes not important to the students who develop, so the Zetta development team weakened the concept of multiple versions in actual scenarios. At the same time, Zetta supports multiple protocols . Not only does it use HBase Thrift Server itself, it also supports MySQL and HBase native methods.

In addition, Zetta also opened up with Flink, so that Zetta can be used as a big data connector. Next, the Zhihu team will continue to develop data TTL. In the big data scenario, the TTL of the data is a very practical requirement. Because the data is very large, the useless data needs to be cleaned regularly.

In addition, Zetta also supports full-text search, and supports Redis protocol access .

🌟 Architecture overview

Below is a diagram of Zetta's architecture. The first core is the TableStore server, and its underlying storage is TiKV, but the Zhihu team redesigned the data structure, including the table mapping KV method and so on.

Focus on the access layer. The access layer itself is stateless. In order to improve ease of use, Zetta and the upper access layer communicate through grpc. But for users, exposing the grpc interface is not good, and the upper layer data is not friendly enough for users. Ease of use is to map data to Zetta through MySQL or HBase, and it also supports data models.

In order to achieve low latency, Zetta implements a caching layer. When users write, they use the Cache Server to cache, which is equivalent to writing directly to Zetta and then updating to KV. Data reading and writing occur in the Proxy layer and the cache layer, but they will be cached and routed according to the request. Zetta provides a complete solution for developers to decide which way to access .

Zetta's application in

production environment applications

After Zetta was put into use, has brought great benefits to both users and providers of the service.

The user has got a very big performance improvement. Not only has the service delay reduced, the response time has stabilized, and the goal of reducing service costs and physical costs has been achieved.

For service providers, no longer needs to consider other components, only need to maintain the Zetta and TiKV clusters, which greatly reduces the maintenance cost , and the required resource costs are also greatly reduced. In addition, because the TiKV community is very active, developers can give feedback as soon as they encounter problems, and the community will fix them and continue to improve TiKV. In this way, Zetta has had a benign interaction with the TiDB ecosystem, continuously iterating the infrastructure, and benefiting each other.

The specific situation can be shown through some charts and data.

Production environment application

🌟 Read Service & Push Service

In the application of the production environment, after using Zetta for the read service of Zhihu, the latency dropped from 100ms to 90ms, and the storage capacity was also greatly reduced. After using Zetta, the pushed service has also achieved a significant reduction in response time and storage capacity.

🌟 Search for highlighted data

Zhihu's search highlighting data service in the search box originally used a distributed database called Ignite. After replacing it with Zetta, the delay time was greatly reduced. At the same time, in addition to the performance improvement, the difficulty of operation and maintenance is reduced, and there is no need to have dedicated operation and maintenance personnel to manage the Ignite database.

🌟 Image metadata service

Another is the application of Zetta in the image metadata service. The meaning of this picture is that by implementing the switch from HBase to Zetta, a significant reduction in latency and storage has been achieved. In fact, the code has not changed, but the address of the Thrift server is directly changed from HBase Thrift server to Zetta Thrift server.

You may have questions. service delay and storage requirements drop so much after switching to Zetta?

  • The first reason is that our ability to adjust HBase parameters is relatively limited. We know that the underlying storage of HBase is not compressed.
  • The second reason is that there are many versions of HBase. If Compression is opened online, the resource consumption is very terrible, and its impact is uncontrollable, so I know that Compression is rarely opened. Only when the business is at a low peak, dare to try to open it once.

Of course, the question we are asked the most is how long it takes to know HBase Compression. This is actually not certain, it can usually be done between one o'clock in the middle of the night and seven o'clock the next morning.

🌟 Creator Center

In addition, the creator center service also uses Zetta. The Creator Center is a service that displays creator data. Originally, the core data of the Creator Center was all on HBase. Now through migration, the data has been switched from the NoSQL table in the original HBase to the change from the HBase client to the MySQL client.

Now, the query code can be written very clearly, each query is a SQL. In HBase, this table is very complicated. So this brings two benefits. Firstly, performance is improved, and secondly, the code is clearer. After the handover, the delay of the service is reduced, and the jitter of the delay is also eliminated.

Production environment planning access service

In the next step, Zhihu plans to promote Zetta to other services on Zhihu. The service level is divided into three levels: S, A, and B from high to low. S The number of HBase clusters that may be involved may be 4, the access method may be ThriftServer or the native method, and the data volume may be 120TB. It is expected that when this service is switched to Zetta, there will be a relatively large drop in storage capacity.

Zetta's future

In the future, knows that Zetta will be further improved .

  • The first is to improve the performance of Zetta itself, and the other is to improve the functionality.
  • The second is that Zhihu will expand more applications, promote Zetta's scenarios in Zhihu, and at the same time form some best practices.
  • The third is to expand the scene. We may now focus on doing some things online, and later we will slowly find use cases suitable for big data scenarios as best practices.
  • Finally, Zetta hopes to integrate with TiDB in the future, and hopes that Zetta can become TiDB's ecological partner.

Zetta's project address on GitHub is: https://github.com/zhihu/zetta , now the latest code is in Zhihu's internal warehouse. If you are interested in this project and want to communicate, you can also find it in the project Contact us inside, or if you want to connect to Zetta in a scene, we are also happy to help everyone.


PingCAP
1.9k 声望4.9k 粉丝

PingCAP 是国内开源的新型分布式数据库公司,秉承开源是基础软件的未来这一理念,PingCAP 持续扩大社区影响力,致力于前沿技术领域的创新实现。其研发的分布式关系型数据库 TiDB 项目,具备「分布式强一致性事务...