We graduated! Hacking Camp ends in 2021, six major ecological projects enter a new stage

On November 7th, the Hacking Camp 2021 ecological defense meeting was co-sponsored by TiDB Community X Jingwei China and sponsored by Chuxin Capital, Mingshi Capital, Jiyuan Capital, and JuiceFS, which explained the phased results of the project and the outlook for future work .

Some of Hacking Camp's projects are star projects from TiDB Hackathon, and some are new ideas from ecological partners. This issue of Hacking Camp uses ecology as the theme to help partners complete the incubation of the project. The six participating projects have basically completed the set goals. After graduation, they will continue to improve the improvement of related functions, iterate the new version to be more stable, and mentor during the period It will also continue to provide guidance for the project to help polish the project.

The projects that Hacking Camp participated in the defense are:
JuiceFS, a distributed POSIX file system with TiKV as the metadata engine

Implementation of Serverlessdb for HTAP that provides Serverlessdb service based on TiDB

TiDB for PostgreSQL that optimizes the compatibility of PG on TiDB

TiDB's one-stop solution in the field of big data TiBigData

HugeGraph with TiKV as backend storage

Use TiDB as the Doris Connector for data upstream

The judges reviewed the completion of the project, application value, contribution to the TiDB ecosystem, and completion of the defense. In the end, ServerlessDB for HTAP won the unanimous high scores from the jury, and won the "Outstanding Graduate" and "Best Application". Awards.

Special thanks to the following reviewers:
Mingshi Capital Executive Manager Xu Zhihao, Flomesh CTO & Co-founder Liu Yang, TiDB Team Tech Leader Wang Cong, PingCAP R&D Director Zhang Jian, TiKV Maintainer Li Jianjun

Let's take a look at the graduation results of the project together~

JuiceFS：

JuiceFS is a cloud-native POSIX distributed file system, combined with TiKV as the data element engine, JuiceFS can provide tens of billions of file size and exabytes of data storage capacity, and still maintain latency and stability at large scales. In the metadata operation performance test, the average time consumption of the TiKV engine is about 2 to 4 times that of Redis, which is slightly better than the local MySQL.

At present, the main functions have been developed and released in version V0.16, and have passed the pjdfstest test. Existing users have used it in testing and production environments. JuiceFS will use TiKV as the first metadata engine for large-scale production environments in the future, and actively introduce the new features of TiKV while ensuring compatibility.

ServerlessDB for HTAP

The ultimate goal of the project is to turn the cloud database service into a black box, so that application developers only need to focus on how the business is transformed into SQL, and users no longer have to worry about the amount of data, business load, and whether the SQL type is AP or TP. Related things.

Development content

Business load module:

The business load module evaluates whether the current service resources match the current business load, and establishes a business load model for decision-making on expansion and contraction.

Serverless module:

The Serverless module will check the CPU usage of all computing nodes and the underlying storage capacity in real time, triggering the expansion and contraction of computing/storage resources.

Database middleware:

Middleware is used to decouple user connections and back-end database service nodes, so that even if users use connection pools, after expansion, the middleware can balance traffic to all new nodes.

Rule system:

Through the rule system, resource allocation within a specific time range can be fixed. Through the rule setting, before the traffic increases, the resources are allocated in advance

Serverless service orchestration module:

Through the service orchestration module, the creation and release of TiDB clusters and the dynamic adjustment of the expansion and contraction of TiDB components are realized; the k8s local disk management is realized to solve the problem that cloud disks cannot be provided in privatized deployment;

When developing admission-webhook to implement TiDB component shrinking, the middleware registry record is deleted in advance to realize the shrinking that users do not perceive.

Follow-up research and development plan:

It is planned to add Hint and rule modules to distinguish TP/AP more accurately, and estimate that it can reduce the CPU usage of middleware by more than half

Provide richer load balancing algorithms, such as runtime cost based on SQL

Middleware increases business traffic control. If the business load grows too fast, beyond the growth rate that serverless can handle, it will cause unstable background services. Through flow control, it can handle the surge in business flow well.

This project has also won the Hacking Camp Outstanding Graduate and Best Application Award~ It seems that the review has been moved by the project’s vision and development strength, everyone is welcome to watch the trial~

Project address: https://github.com/tidb-incubator/Serverlessdb-for-HTAP

TiDB for PostgreSQL

The project was initiated by Digital China to provide TiDB compatibility with PostgreSQL, while retaining the high availability, flexibility and scalability of TiDB. Allow users to connect existing PostgreSQL clients to TiDB and use PostgreSQL-specific syntax.

Currently completed development:

Delete syntax modification

Add specific PgSQL syntax Returning keyword

Complete the test under the Sysbench_tpcc PgSQL protocol and compare it with the native TiDB test under this version

Complete the benchmark test under the BenchMarkSQL PgSQ L protocol and compare it with the native TiDB test under this version

Comparison of Benchmark test results:

In the future, it is planned to support the system library table structure, graphical client, and abstract protocol layer to switch between different protocols at any time. Welcome everyone to play together~

Project address: https://github.com/DigitalChinaOpenSource/TiDB-for-PostgreSQL

TiBigData

TiBigData provides connectors for various OLAP computing engines of TiDB, which have been implemented including Flink, Presto and MapReduce. In Hacking Camp, I mainly work on the development of Flink related functions.

We have implemented Snapshot source and TiCDC streaming source in Flink. Combining these two sources, we have achieved TiDB's streaming batch integration.

The second is data intercommunication. We use TiKV's cross-data center deployment and the follower read function of Flink connector to realize the real intercommunication of offline data.

Finally, the calculation pushdown. We are compatible with TiKV pushdown operators in all kinds of connectors, which can greatly improve the efficiency of data scanning and calculation.

TiBigData core function enhancements:

The general capabilities of the TiDB java client have been enhanced. We have implemented the TiDB encoder. The encoder code is decoupled from TiSpark and can be adapted to other OLAP engines. It can also be used as a general tool by others in need. Referred by community partners.

Implemented some data type conversion tools, flink/presto data type and TiDB data type conversion.

Realize the distributed client of TiKV, which is more suitable for the distributed computing framework from the API level.

In the future, we will continue to develop Change Log Write, TiDB x Preto/Trino, Flink State Backend in TiKV, etc. Interested students can join the community to play together~

Project address: https://github.com/tidb-incubator/TiBigData

HugeGraph on TiKV

HugeGraph on TiKV is suitable for scenarios that require large-scale graph databases, and has high read and write performance requirements, and it is particularly suitable for the needs of the TiVK storage operation and maintenance team.

Realized functions:

Support single image instance

Support for adding, deleting, modifying and checking Schema

Support Loader to import data, support the addition, deletion, modification and checking of vertices and edges

Support kout, knee and other traversal algorithms, Gremlin query, index query (incomplete)

Show results:

Import data [Xinyu City New Coronary Pneumonia Data Set], and view the graph effect through the HugeGraph-Hubble interface:

Performance test results:

Import speed (write)

Query by id (random read)

Follow-up plan:

Perfect function

Supports advanced functions such as multi-graph instances, truncate/clear graph data, monitoring interface metrics, TTL, etc.

Performance optimization

Write performance optimization: submission mode, batch size adjustment, etc.

Query performance optimization: data encoding optimization, analysis optimization, etc.

Project address: https://github.com/tidb-incubator/hugegraph-on-tikv

Doris Connector：

Use TiDB as the data source to provide Doris with a native connector to open up the data flow of the TP-AP scenario. It is suitable for DML/DDL synchronization support and filtering data with specified conditions. The current project progress is 70%.

Design ideas

Stream Load: An independent service is designed in TiDB. TiDB binlog files are read and parsed regularly, and data rows are assembled into CSV format files, which are imported into Doris through Stream Load.

Routine Load: Use TiDB's Drainer to synchronize binlog to Kafka. Doris implements data synchronization by adding a new TiDB Binlog data format

TiDB native protocol synchronization: Implement the TiDB replica synchronization protocol in Doris, disguising Doris as a node of the TiDB cluster.

Follow-up planning:

The project will continue to iterate, starting from the real user scenario, making the data processing link more unimpeded. The project will be merged into the Doris backbone later in the project.

Project address: https://github.com/apache/incubator-doris

This issue of Hacking Camp ended in six wonderful project defenses, but the ecological maintenance is long-term. We will continue to provide follow-up support for these excellent ecological projects to ensure the lasting vitality of the project. Students who are interested in the project should also pay attention to the follow-up tweets. The founding team will interpret the value of the project to the entire TiDB ecosystem from the application level. There are more special Meetup plans in progress, so stay tuned!

From the Ti planet to the firmament of the universe, we use Hacking to connect a wider range of ecology. 2021 TiDB Hackathon is also about to start, come and explore the mysteries of database technology with us!

We graduated! Hacking Camp ends in 2021, six major ecological projects enter a new stage

PingCAP

引用和评论

从企业数智化四阶段解读 TiDB 场景价值

MySQL慢查询日志：性能优化的终极指南

做到真正0丢失、0重复：Apache SeaTunnel 实现万亿级数据一致性全解密

MySQL 备份 Shell 脚本：支持远程同步与阿里云 OSS 备份

《SQL应用场景解析：如何通过SQL解决实际业务问题》

Devin 发布 DeepWiki，2 星的项目直接装出万星的气场

好用的开源埋点方案-ClkLog埋点用户分析系统