很高兴为您带来 Databend 2024 年 10 月的最新更新、新功能和改进!我们希望这些增强功能对您有所帮助,并期待您的反馈。

Databend Cloud:多集群的计算集群

多集群的计算集群会根据工作负载需求自动调整计算资源,添加或移除集群。它通过按需扩展或缩减,确保高并发和性能的同时优化成本。

您可以在创建计算集群时启用多集群功能,并设置计算集群可扩展的最大集群数。有关更多信息,请参阅“多集群计算集群”。

重置 FUSE 引擎选项

现在,您可以使用 ALTER TABLE OPTIONS 命令重置一个表的 FUSE 引擎选项,将其恢复为默认值。

ALTER TABLE [ <database_name>. ]<table_name> UNSET OPTIONS (<options>)

举例:

ALTER TABLE fuse_table UNSET OPTIONS (block_per_segment, data_retention_period_in_hours);

SHOW CREATE TABLE fuse_table;

-[ RECORD 1 ]-----------------------------------
       Table: fuse_table
Create Table: CREATE TABLE fuse_table (
  a INT NULL
) ENGINE=FUSE COMPRESSION='lz4' STORAGE_FORMAT='native'

COPY INTO:新增导出选项

COPY INTO 命令引入了新的导出选项:

  • OVERWRITE:当设置为 true 时,将覆盖目标路径中同名的现有文件。注意:OVERWRITE = true 需要 USE_RAW_PATH = trueINCLUDE_QUERY_ID = false
  • INCLUDE_QUERY_ID:当设置为 true 时,导出的文件名中将包含一个唯一的 UUID。
  • USE_RAW_PATH:当设置为 true 时,将使用用户提供的精确路径(包括完整文件名)进行数据导出。如果设置为 false,用户必须提供目录路径。

处理无效日期和时间值

Databend 会自动将无效的日期或时间戳值转换为其最小的有效等价值:日期转换为 1000-01-01,时间戳转换为 1000-01-01 00:00:00,以确保在处理超出范围或格式不正确的日期和时间戳时的一致性。

-- 尝试将最大日期加一天,超出有效范围。
-- 结果: 返回 DateMIN (1000-01-01) 而不是错误。
SELECT ADD_DAYS(TO_DATE('9999-12-31'), 1);

┌────────────────────────────────────┐
│ add_days(to_date('9999-12-31'), 1) │
├────────────────────────────────────┤
│ 1000-01-01                         │
└────────────────────────────────────┘

-- 尝试从最小日期减去一分钟,这将无效。
-- 结果: 返回 DateMIN (1000-01-01 00:00:00),确保结果的稳定性。
SELECT SUBTRACT_MINUTES(TO_DATE('1000-01-01'), 1);

┌────────────────────────────────────────────┐
│ subtract_minutes(to_date('1000-01-01'), 1) │
├────────────────────────────────────────────┤
│ 1000-01-01 00:00:00                        │
└────────────────────────────────────────────┘

新日期函数:DATE_DIFF

DATE_DIFF 函数根据指定的时间单位计算两个日期或时间戳之间的差异。如果 <end_date><start_date> 之后,结果为正;如果在之前,结果为负。

SELECT DATE_DIFF(HOUR, YESTERDAY(), TODAY());

┌───────────────────────────────────────┐
│ DATE_DIFF(HOUR, yesterday(), today()) │
├───────────────────────────────────────┤
│                                    24 │
└───────────────────────────────────────┘

新聚合函数:MODE

MODE 函数返回在一组值中出现频率最高的值。

SELECT MONTH(sale_date) AS month, MODE(product_id) AS most_sold_product
FROM sales
GROUP BY month
ORDER BY month;

┌─────────────────────────────────────┐
│      month      │ most_sold_product │
├─────────────────┼───────────────────┤
│               1 │               101 │
│               2 │               102 │
└─────────────────────────────────────┘

新增JSON 函数

我们引入了一系列 JSON 函数,以便您更轻松地处理JSON 数据:

  • JSON_ARRAY_DISTINCT:移除 JSON 数组中的重复元素,返回仅包含唯一元素的数组。
SELECT JSON_ARRAY_DISTINCT('["apple", "banana", "apple", "orange", "banana"]'::VARIANT);

-[ RECORD 1 ]-----------------------------------
json_array_distinct('["apple", "banana", "apple", "orange", "banana"]'::VARIANT): ["apple","banana","orange"]
  • JSON_ARRAY_EXCEPT:返回一个新的 JSON 数组,包含第一个 JSON 数组中不在第二个 JSON 数组中的元素。
SELECT JSON_ARRAY_EXCEPT(
    '["apple", "banana", "orange"]'::JSON,  
    '["banana", "grapes"]'::JSON         
);

-[ RECORD 1 ]-----------------------------------
json_array_except('["apple", "banana", "orange"]'::VARIANT, '["banana", "grapes"]'::VARIANT): ["apple","orange"]

-- Return an empty array because all elements in the first array are present in the second array.
SELECT json_array_except('["apple", "banana", "orange"]'::VARIANT, '["apple", "banana", "orange"]'::VARIANT)

-[ RECORD 1 ]-----------------------------------
json_array_except('["apple", "banana", "orange"]'::VARIANT, '["apple", "banana", "orange"]'::VARIANT): []
  • JSON_ARRAY_FILTER:根据指定的 Lambda 表达式过滤 JSON 数组中的元素,仅返回满足条件的元素。
SELECT JSON_ARRAY_FILTER(
    ['apple', 'banana', 'avocado', 'grape']::JSON,
    d -> d::String LIKE 'a%'
);

-[ RECORD 1 ]-----------------------------------
json_array_filter(['apple', 'banana', 'avocado', 'grape']::VARIANT, d -> d::STRING LIKE 'a%'): ["apple","avocado"]
  • JSON_ARRAY_INSERT:在指定索引处向 JSON 数组中插入一个值,并返回更新后的 JSON 数组。
-- The new element is inserted at position 0 (the beginning of the array), shifting all original elements to the right
SELECT JSON_ARRAY_INSERT('["task1", "task2", "task3"]'::VARIANT, 0, '"new_task"'::VARIANT);

-[ RECORD 1 ]-----------------------------------
json_array_insert('["task1", "task2", "task3"]'::VARIANT, 0, '"new_task"'::VARIANT): ["new_task","task1","task2","task3"]

-- The new element is inserted at position 1, between task1 and task2
SELECT JSON_ARRAY_INSERT('["task1", "task2", "task3"]'::VARIANT, 1, '"new_task"'::VARIANT);

-[ RECORD 1 ]-----------------------------------
json_array_insert('["task1", "task2", "task3"]'::VARIANT, 1, '"new_task"'::VARIANT): ["task1","new_task","task2","task3"]

-- If the index exceeds the length of the array, the new element is appended at the end of the array
SELECT JSON_ARRAY_INSERT('["task1", "task2", "task3"]'::VARIANT, 6, '"new_task"'::VARIANT);

-[ RECORD 1 ]-----------------------------------
json_array_insert('["task1", "task2", "task3"]'::VARIANT, 6, '"new_task"'::VARIANT): ["task1","task2","task3","new_task"]

-- The new element is inserted just before the last element (task3)
SELECT JSON_ARRAY_INSERT('["task1", "task2", "task3"]'::VARIANT, -1, '"new_task"'::VARIANT);

-[ RECORD 1 ]-----------------------------------
json_array_insert('["task1", "task2", "task3"]'::VARIANT, - 1, '"new_task"'::VARIANT): ["task1","task2","new_task","task3"]

-- Since the negative index exceeds the array’s length, the new element is inserted at the beginning
SELECT JSON_ARRAY_INSERT('["task1", "task2", "task3"]'::VARIANT, -6, '"new_task"'::VARIANT);

-[ RECORD 1 ]-----------------------------------
json_array_insert('["task1", "task2", "task3"]'::VARIANT, - 6, '"new_task"'::VARIANT): ["new_task","task1","task2","task3"]
  • JSON_ARRAY_INTERSECTION:返回两个 JSON 数组之间的共同元素。
-- Find the intersection of two JSON arrays
SELECT json_array_intersection('["Electronics", "Books", "Toys"]'::JSON, '["Books", "Fashion", "Electronics"]'::JSON);

-[ RECORD 1 ]-----------------------------------
json_array_intersection('["Electronics", "Books", "Toys"]'::VARIANT, '["Books", "Fashion", "Electronics"]'::VARIANT): ["Electronics","Books"]

-- Find the intersection of the result from the first query with a third JSON array using an iterative approach
SELECT json_array_intersection(
    json_array_intersection('["Electronics", "Books", "Toys"]'::JSON, '["Books", "Fashion", "Electronics"]'::JSON),
    '["Electronics", "Books", "Clothing"]'::JSON
);

-[ RECORD 1 ]-----------------------------------
json_array_intersection(json_array_intersection('["Electronics", "Books", "Toys"]'::VARIANT, '["Books", "Fashion", "Electronics"]'::VARIANT), '["Electronics", "Books", "Clothing"]'::VARIANT): ["Electronics","Books"]
  • JSON_ARRAY_OVERLAP:检查两个 JSON 数组之间是否存在重叠,如果有共同元素则返回 true,否则返回 false
SELECT json_array_overlap(
    '["apple", "banana", "cherry"]'::JSON,  
    '["banana", "kiwi", "mango"]'::JSON
);

-[ RECORD 1 ]-----------------------------------
json_array_overlap('["apple", "banana", "cherry"]'::VARIANT, '["banana", "kiwi", "mango"]'::VARIANT): true


SELECT json_array_overlap(
    '["grape", "orange"]'::JSON,  
    '["apple", "kiwi"]'::JSON     
);

-[ RECORD 1 ]-----------------------------------
json_array_overlap('["grape", "orange"]'::VARIANT, '["apple", "kiwi"]'::VARIANT): false
  • JSON_ARRAY_REDUCE:通过应用指定的 Lambda 表达式,将 JSON 数组简化为单个值。
SELECT JSON_ARRAY_REDUCE(
    [2, 3, 4]::JSON, 
    (acc, d) -> acc::Int * d::Int
);

-[ RECORD 1 ]-----------------------------------
json_array_reduce([2, 3, 4]::VARIANT, (acc, d) -> acc::Int32 * d::Int32): 24
  • JSON_ARRAY_TRANSFORM(别名:JSON_ARRAY_APPLYJSON_ARRAY_MAP):使用指定的转换 Lambda 表达式转换 JSON 数组的每个元素。
SELECT JSON_ARRAY_TRANSFORM(
    [1, 2, 3, 4]::JSON,
    data -> (data::Int * 10)
);

-[ RECORD 1 ]-----------------------------------
json_array_transform([1, 2, 3, 4]::VARIANT, data -> data::Int32 * 10): [10,20,30,40]
  • JSON_OBJECT_PICK:创建一个新的 JSON 对象,仅包含输入 JSON 对象中指定的键。如果指定的键在输入对象中不存在,则在结果中省略该键。
-- Pick a single key:

SELECT json_object_pick('{"a":1,"b":2,"c":3}'::VARIANT, 'a');
-- Result: {"a":1}

-- Pick multiple keys:

SELECT json_object_pick('{"a":1,"b":2,"d":4}'::VARIANT, 'a', 'b');
-- Result: {"a":1,"b":2}


-- Pick with non-existent key (non-existent keys are ignored):

SELECT json_object_pick('{"a":1,"b":2,"d":4}'::VARIANT, 'a', 'c');
-- Result: {"a":1}
  • JSON_OBJECT_DELETE:从 JSON 对象中删除指定的键,并返回修改后的对象。如果指定的键在对象中不存在,则被忽略。
-- Delete a single key:

SELECT json_object_delete('{"a":1,"b":2,"c":3}'::VARIANT, 'a');
-- Result: {"b":2,"c":3}


-- Delete multiple keys:

SELECT json_object_delete('{"a":1,"b":2,"d":4}'::VARIANT, 'a', 'c');
-- Result: {"b":2,"d":4}


-- Delete a non-existent key (key is ignored):

SELECT json_object_delete('{"a":1,"b":2}'::VARIANT, 'x');
-- Result: {"a":1,"b":2}

关于 Databend

Databend 是一款开源、弹性、低成本,基于对象存储也可以做实时分析的新式数仓。期待您的关注,一起探索云原生数仓解决方案,打造新一代开源 Data Cloud。

👨‍💻‍ Databend Cloud:databend.cn

📖 Databend 文档:docs.databend.cn/

💻 Wechat:Databend

✨ GitHub:github.com/datafuselab…


databend
20 声望10 粉丝

Databend 旨在成为一个 开源、弹性、可靠 的无服务器数仓,查询快如闪电,与 弹性、简单、低成本 的云服务有机结合。数据云的构建,从未如此简单!