executing longer than distributed_ddl_task_timeout (=180) seconds. There are 1 unfinished host?

Question

executing longer than distributed_ddl_task_timeout (=180) seconds. There are 1 unfinished host?

发布于
2024-07-22 北京

更新于
2024-07-22

Clickhouse中执行ddl语句报错: executing longer than distributed_ddl_task_timeout (=180) seconds. There are 1 unfinished hosts (0 of them are currently active), they are going to execute the query in background. (TIMEOUT_EXCEEDED)

我在服务器上通过Docker 搭建了一个Clickhouse、Zookeeper服务，然后本地连接创建表时报超时问题，但我感觉大概率应该和网络不相关，因为由于端口等原因，我把docker compomse中clickhouse 9000 expose 端口改成了 8999, 我感觉是哪里配置改的有问题。但没有找到。

详细clickhouse 报错信息:
DB::Exception: Watching task /clickhouse/task_queue/ddl/query-0000000006 is executing longer than distributed_ddl_task_timeout (=180) seconds. There are 1 unfinished hosts (0 of them are currently active), they are going to execute the query in background. (TIMEOUT_EXCEEDED)

DB::Exception::Exception(DB::Exception::MessageMasked&&, int, bool) @ 0x000000000c6cf19b in /usr/bin/clickhouse
DB::Exception::Exception<String&, long&, unsigned long&, unsigned long&>(int, FormatStringHelperImpl<std::type_identity<String&>::type, std::type_identity<long&>::type, std::type_identity<unsigned long&>::type, std::type_identity<unsigned long&>
::type>, String&, long&, unsigned long&, unsigned long&) @ 0x0000000011717209 in /usr/bin/clickhouse
DB::DDLQueryStatusSource::generate() @ 0x0000000011715cd5 in /usr/bin/clickhouse
DB::ISource::tryGenerate() @ 0x00000000126d3335 in /usr/bin/clickhouse
DB::ISource::work() @ 0x00000000126d2d83 in /usr/bin/clickhouse
DB::ExecutionThreadContext::executeTask() @ 0x00000000126ebd3a in /usr/bin/clickhouse
DB::PipelineExecutor::executeStepImpl(unsigned long, std::atomic<bool>*) @ 0x00000000126e2750 in /usr/bin/clickhouse
DB::PipelineExecutor::execute(unsigned long, bool) @ 0x00000000126e1960 in /usr/bin/clickhouse
void std::__function::__policy_invoker<void ()>::__call_impl<std::__function::__default_alloc_func<ThreadFromGlobalPoolImpl<true>::ThreadFromGlobalPoolImpl<DB::PullingAsyncPipelineExecutor::pull(DB::Chunk&, unsigned long)::$_0>(DB::PullingAsyncP
ipelineExecutor::pull(DB::Chunk&, unsigned long)::$_0&&)::'lambda'(), void ()>>(std::__function::__policy_storage const*) @ 0x00000000126ef6bd in /usr/bin/clickhouse
void* std::__thread_proxy[abi:v15000]<std::tuple<std::unique_ptr<std::__thread_struct, std::default_delete<std::__thread_struct>>, void ThreadPoolImpl<std::thread>::scheduleImpl<void>(std::function<void ()>, Priority, std::optional<unsigned long
, bool)::'lambda0'()>>(void*) @ 0x000000000c7b94f8 in /usr/bin/clickhouse
? @ 0x00007f95089b5609 in ?
? @ 0x00007f95088da353 in ?

zookeeper /clickhouse/task_queue/ddl/query-0000000006 节点信息:
version: 5
query: CREATE TABLE default.events_local UUID \'23da2be5-24cd-4ff0-a185-b0c361864946\' ON CLUSTER posthog (event_date Date, event_type Int32, article_id Int32, title String) ENGINE = ReplicatedMergeTree(\'/clickhouse/repl-3s3r/posthog/tables/0/default/events_local\', \'chi-repl-3s3r-posthog-0-1\') PARTITION BY toYYYYMM(event_date) ORDER BY (event_type, article_id)
hosts: ['xxxx:8999']
initiator: fe1f4b48cb63:8999
settings: load_balancing = 'random', distributed_ddl_task_timeout = 5, stream_poll_timeout_ms = 50, max_memory_usage = 10000000000, allow_nondeterministic_mutations = true
tracing: 00000000-0000-0000-0000-000000000000

能看到 hosts 和 initiator 这里端口映射的都是对的， hosts里ip是服务器主机的ip地址。
initiator 里是clickhouse container 的id

docker compose 配置:

clickhouse:
    extends:
        file: docker-compose.base.yml
        service: clickhouse
    ports:
        - '8123:8123'
        - '8999:8999' # 这里做了端口修改
        - '9440:9440'
        - '9009:9009'
    volumes:
        - ./posthog/idl:/idl
        - ./docker/clickhouse/docker-entrypoint-initdb.d:/docker-entrypoint-initdb.d
        - ./docker/clickhouse/config.xml:/etc/clickhouse-server/config.xml
        - ./docker/clickhouse/users-dev.xml:/etc/clickhouse-server/users.xml
    extra_hosts:
        - 'host.docker.internal:host-gateway'
    depends_on:
        - kafka
        - zookeeper

正常的select 都是可以的，唯独建表会超时。

clickhouse docker zookeeper

阅读 1.3k

AI BotBETA

根据您提供的信息，这个错误是由于分布式DDL任务（如创建表）在ClickHouse集群中执行时超过了配置的distributed_ddl_task_timeout时间限制。默认情况下，这个时间限制是180秒（3分钟），但根据您的ZooKeeper节点信息，您设置了distributed_ddl_task_timeout = 5（秒），这实际上是一个相当短的时间，对于许多DDL操作来说可能不够。

这里有几个可能的解决方案：

增加distributed_ddl_task_timeout的值：
在您的config.xml配置文件中，找到distributed_ddl_task_timeout设置，并增加其值。考虑到您当前设置为5秒，可能需要将其增加到一个更合理的值，比如180秒或更多，这取决于您预期的DDL操作执行时间。
检查集群状态：
确保您的ClickHouse集群中的所有节点都是健康的，并且它们之间的网络连接没有问题。如果某个节点无法访问或响应缓慢，它可能会导致DDL操作超时。
检查DDL查询：
查看您尝试执行的DDL查询是否非常复杂或涉及大量数据移动。有时，大型或复杂的DDL操作可能需要更长时间才能完成。
检查资源使用情况：
确保ClickHouse服务器有足够的资源（如CPU、内存和磁盘I/O）来处理DDL操作。资源不足可能导致操作执行缓慢并超过超时限制。
查看日志：
检查ClickHouse的日志文件以获取有关DDL操作失败的更多详细信息。这可能会提供有关为什么操作超时的线索。
检查ZooKeeper状态：
由于DDL操作在ZooKeeper中排队和执行，因此确保ZooKeeper集群是健康的并且运行正常也很重要。
使用ON CLUSTER子句：
如果您在DDL查询中使用了ON CLUSTER子句，请确保指定的集群配置正确，并且所有节点都可以相互通信。
考虑异步DDL：
由于DDL操作可能是异步的，即使您收到超时错误，查询也可能最终在后台成功执行。您可以稍后检查表是否已创建，或使用系统表来跟踪DDL操作的状态。
重新启动ClickHouse服务：
有时，简单地重新启动ClickHouse服务可以解决临时问题或不一致状态。
检查Docker配置：
确保您的Docker容器和Docker Compose配置没有问题，特别是与网络和端口映射相关的配置。尽管您已经修改了端口映射，但确保其他网络配置（如Docker网桥和防火墙规则）不会干扰ClickHouse集群的通信。

撰写回答

你尚未登录，登录后可以

和开发者交流问题的细节
关注并接收问题和回答的更新提醒
参与内容的编辑和改进，让解决方法与时俱进

推荐问题

相似问题

找不到问题？创建新问题

executing longer than distributed_ddl_task_timeout (=180) seconds. There are 1 unfinished host?

你尚未登录，登录后可以

podman logs 命令内存100% 如何重新初始化容器？

docker Error response from daemon ？

linux下docker容器内服务运行失败？

executing longer than distributed_ddl_task_timeout (=180) seconds. There are 1 unfinished host?

你尚未登录，登录后可以

podman logs 命令 内存100% 如何重新初始化容器？

docker Error response from daemon ？

linux下docker容器内服务运行失败？

podman logs 命令内存100% 如何重新初始化容器？