On November 7th, the Hacking Camp 2021 ecological defense meeting was co-sponsored by TiDB Community X Jingwei China and sponsored by Chuxin Capital, Mingshi Capital, Jiyuan Capital, and JuiceFS, which explained the phased results of the project and the outlook for future work .
Some of Hacking Camp's projects are star projects from TiDB Hackathon, and some are new ideas from ecological partners. This issue of Hacking Camp uses ecology as the theme to help partners complete the incubation of the project. The six participating projects have basically completed the set goals. After graduation, they will continue to improve the improvement of related functions, iterate the new version to be more stable, and mentor during the period It will also continue to provide guidance for the project to help polish the project.
The projects that Hacking Camp participated in the defense are:
JuiceFS, a distributed POSIX file system with TiKV as the metadata engine
Implementation of Serverlessdb for HTAP that provides Serverlessdb service based on TiDB
TiDB for PostgreSQL that optimizes the compatibility of PG on TiDB
TiDB's one-stop solution in the field of big data TiBigData
HugeGraph with TiKV as backend storage
Use TiDB as the Doris Connector for data upstream
The judges reviewed the completion of the project, application value, contribution to the TiDB ecosystem, and completion of the defense. In the end, ServerlessDB for HTAP won the unanimous high scores from the jury, and won the "Outstanding Graduate" and "Best Application". Awards.
Special thanks to the following reviewers:
Mingshi Capital Executive Manager Xu Zhihao, Flomesh CTO & Co-founder Liu Yang, TiDB Team Tech Leader Wang Cong, PingCAP R&D Director Zhang Jian, TiKV Maintainer Li Jianjun
Let's take a look at the graduation results of the project together~
JuiceFS:
JuiceFS is a cloud-native POSIX distributed file system, combined with TiKV as the data element engine, JuiceFS can provide tens of billions of file size and exabytes of data storage capacity, and still maintain latency and stability at large scales. In the metadata operation performance test, the average time consumption of the TiKV engine is about 2 to 4 times that of Redis, which is slightly better than the local MySQL.
At present, the main functions have been developed and released in version V0.16, and have passed the pjdfstest test. Existing users have used it in testing and production environments. JuiceFS will use TiKV as the first metadata engine for large-scale production environments in the future, and actively introduce the new features of TiKV while ensuring compatibility.
ServerlessDB for HTAP
The ultimate goal of the project is to turn the cloud database service into a black box, so that application developers only need to focus on how the business is transformed into SQL, and users no longer have to worry about the amount of data, business load, and whether the SQL type is AP or TP. Related things.
Development content
Business load module:
The business load module evaluates whether the current service resources match the current business load, and establishes a business load model for decision-making on expansion and contraction.
Serverless module:
The Serverless module will check the CPU usage of all computing nodes and the underlying storage capacity in real time, triggering the expansion and contraction of computing/storage resources.
Database middleware:
Middleware is used to decouple user connections and back-end database service nodes, so that even if users use connection pools, after expansion, the middleware can balance traffic to all new nodes.
Rule system:
Through the rule system, resource allocation within a specific time range can be fixed. Through the rule setting, before the traffic increases, the resources are allocated in advance
Serverless service orchestration module:
Through the service orchestration module, the creation and release of TiDB clusters and the dynamic adjustment of the expansion and contraction of TiDB components are realized; the k8s local disk management is realized to solve the problem that cloud disks cannot be provided in privatized deployment;
When developing admission-webhook to implement TiDB component shrinking, the middleware registry record is deleted in advance to realize the shrinking that users do not perceive.
Follow-up research and development plan:
It is planned to add Hint and rule modules to distinguish TP/AP more accurately, and estimate that it can reduce the CPU usage of middleware by more than half
Provide richer load balancing algorithms, such as runtime cost based on SQL
Middleware increases business traffic control. If the business load grows too fast, beyond the growth rate that serverless can handle, it will cause unstable background services. Through flow control, it can handle the surge in business flow well.
This project has also won the Hacking Camp Outstanding Graduate and Best Application Award~ It seems that the review has been moved by the project’s vision and development strength, everyone is welcome to watch the trial~
Project address: https://github.com/tidb-incubator/Serverlessdb-for-HTAP
TiDB for PostgreSQL
The project was initiated by Digital China to provide TiDB compatibility with PostgreSQL, while retaining the high availability, flexibility and scalability of TiDB. Allow users to connect existing PostgreSQL clients to TiDB and use PostgreSQL-specific syntax.
Currently completed development:
Delete syntax modification
Add specific PgSQL syntax Returning keyword
Complete the test under the Sysbench_tpcc PgSQL protocol and compare it with the native TiDB test under this version
Complete the benchmark test under the BenchMarkSQL PgSQ L protocol and compare it with the native TiDB test under this version
Comparison of Benchmark test results:
In the future, it is planned to support the system library table structure, graphical client, and abstract protocol layer to switch between different protocols at any time. Welcome everyone to play together~
Project address: https://github.com/DigitalChinaOpenSource/TiDB-for-PostgreSQL
TiBigData
TiBigData provides connectors for various OLAP computing engines of TiDB, which have been implemented including Flink, Presto and MapReduce. In Hacking Camp, I mainly work on the development of Flink related functions.
We have implemented Snapshot source and TiCDC streaming source in Flink. Combining these two sources, we have achieved TiDB's streaming batch integration.
The second is data intercommunication. We use TiKV's cross-data center deployment and the follower read function of Flink connector to realize the real intercommunication of offline data.
Finally, the calculation pushdown. We are compatible with TiKV pushdown operators in all kinds of connectors, which can greatly improve the efficiency of data scanning and calculation.
TiBigData core function enhancements:
The general capabilities of the TiDB java client have been enhanced. We have implemented the TiDB encoder. The encoder code is decoupled from TiSpark and can be adapted to other OLAP engines. It can also be used as a general tool by others in need. Referred by community partners.
Implemented some data type conversion tools, flink/presto data type and TiDB data type conversion.
Realize the distributed client of TiKV, which is more suitable for the distributed computing framework from the API level.
In the future, we will continue to develop Change Log Write, TiDB x Preto/Trino, Flink State Backend in TiKV, etc. Interested students can join the community to play together~
Project address: https://github.com/tidb-incubator/TiBigData
HugeGraph on TiKV
HugeGraph on TiKV is suitable for scenarios that require large-scale graph databases, and has high read and write performance requirements, and it is particularly suitable for the needs of the TiVK storage operation and maintenance team.
Realized functions:
Support single image instance
Support for adding, deleting, modifying and checking Schema
Support Loader to import data, support the addition, deletion, modification and checking of vertices and edges
Support kout, knee and other traversal algorithms, Gremlin query, index query (incomplete)
Show results:
Import data [Xinyu City New Coronary Pneumonia Data Set], and view the graph effect through the HugeGraph-Hubble interface:
Performance test results:
Import speed (write)
Query by id (random read)
Follow-up plan:
Perfect function
Supports advanced functions such as multi-graph instances, truncate/clear graph data, monitoring interface metrics, TTL, etc.
Performance optimization
Write performance optimization: submission mode, batch size adjustment, etc.
Query performance optimization: data encoding optimization, analysis optimization, etc.
Project address: https://github.com/tidb-incubator/hugegraph-on-tikv
Doris Connector:
Use TiDB as the data source to provide Doris with a native connector to open up the data flow of the TP-AP scenario. It is suitable for DML/DDL synchronization support and filtering data with specified conditions. The current project progress is 70%.
Design ideas
Stream Load: An independent service is designed in TiDB. TiDB binlog files are read and parsed regularly, and data rows are assembled into CSV format files, which are imported into Doris through Stream Load.
Routine Load: Use TiDB's Drainer to synchronize binlog to Kafka. Doris implements data synchronization by adding a new TiDB Binlog data format
TiDB native protocol synchronization: Implement the TiDB replica synchronization protocol in Doris, disguising Doris as a node of the TiDB cluster.
Follow-up planning:
The project will continue to iterate, starting from the real user scenario, making the data processing link more unimpeded. The project will be merged into the Doris backbone later in the project.
Project address: https://github.com/apache/incubator-doris
This issue of Hacking Camp ended in six wonderful project defenses, but the ecological maintenance is long-term. We will continue to provide follow-up support for these excellent ecological projects to ensure the lasting vitality of the project. Students who are interested in the project should also pay attention to the follow-up tweets. The founding team will interpret the value of the project to the entire TiDB ecosystem from the application level. There are more special Meetup plans in progress, so stay tuned!
From the Ti planet to the firmament of the universe, we use Hacking to connect a wider range of ecology. 2021 TiDB Hackathon is also about to start, come and explore the mysteries of database technology with us!
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。