我们如何为 ClickHouse 构建一个新的强大的 JSON 数据类型

  • Update January 2025: Benchmarked ClickHouse's new JSON implementation against other data stores and provided results link.
  • Update March 2025: Demonstrated how to accelerate JSON queries for sub-100ms analytical performance regardless of data size.
  • JSON is the lingua franca for handling semi-structured and unstructured data.
  • ClickHouse recognized the importance of seamless JSON support but faced challenges.
  • ClickHouse is a fast analytical database with true column-oriented storage.
  • To enable high-performance for JSON data, implemented true column-oriented storage.
  • Challenges included handling different data types for the same JSON paths and avoiding an avalanche of column files.
  • Introduced the new JSON data type to address these challenges:

    • Dynamically changing data with different data types.
    • High-performance and dense true column-oriented storage.
    • Scalability with limits on subcolumns.
    • Tuning with hints for JSON parsing.
  • The Variant data type is a building block, allowing efficient storage of different data types.
  • The Dynamic type is an enhancement of the Variant type, with features like storing any data type and limiting types.
  • The new JSON type uses these building blocks and has optional parameters and hints.
  • It stores JSON objects with any structure and reads values using JSON paths as subcolumns.
  • Solves challenges like preventing an explosion of column files.
  • Supports reading leave values and subcolumns using special syntax.
  • Implemented a special compact format for discriminators serialization.
  • The new JSON type replaces the deprecated Object('json') data type and is experimental.
  • The JSON roadmap includes enhancements like using JSON key paths in primary keys.
  • The building blocks also pave the way for supporting other semi-structured types.
  • Users can contact ClickHouse Cloud support for private preview access to the new JSON data type.
阅读 16
0 条评论