The height of summer in June marks the first anniversary of the eKuiper project's donation to the LF Edge Foundation. At the beginning of June, the project successfully completed its first annual review at the foundation, and established the goal of upgrading to Stage 2 in the next year. Here we sincerely thank all community contributors, partners and users, and look forward to more partners joining the community construction in the new year.

Our development work has also made good progress. At the beginning of the month, the minor version 1.5.1 was released, which mainly solved some user problems. In the development of version 1.6.0, we have completed the upgrade of the offline cache and retransmission mechanism, which is more suitable for the weak network scenario where the edge-cloud network connection is easily lost in edge deployment. At the same time, we have completed some SQL syntax support, including IN/NOT IN expression support, ORDER BY support for expressions and aliases, etc., to facilitate users to write more complex filtering and sorting logic. Finally, the development of the visual drag-and-drop capability has now completed part of the verification of the background API.

Offline caching and retransmission

In the era of big data, cloud-edge collaboration is the mainstream computing model. Part of the results from edge computing needs to be sent to the cloud for further integration. However, the network connection between edge and cloud is often unstable, and network connection failures occur from time to time. As an edge streaming computing engine, eKuiper often has rules to import computing results into external systems, especially remote external systems. In this case, we need to consider the handling of weak network environments: during failures such as network disconnection, data must be cached and resent after reconnection.

Previously, eKuiper supported sink caching to a certain extent. It provides a global configuration to toggle caching on; system/rule level configuration is used for the serialization interval of the in-memory cache. However, the cache is only in-memory and replicated to the DB (mirror of memory), and there is no well-defined retransmission policy. In June, we optimized the cache mechanism, the cache will be kept in both memory and disk, so that the capacity of the cache will become larger; it will also continuously detect the failure recovery state and restart the rules without restarting Implement automatic resend.

process

Caching only happens in the sink, because that's the only place outside the eKuiper where data can be sent. Each sink can configure its own caching mechanism. The caching process for each sink is the same. If caching is enabled, all sink events go through two phases: first save everything to the cache; then delete the cache after receiving an ack.

  • Error detection: After a failed send, the sink should identify recoverable failures (network etc.) by returning a specific error type, which will return a failed ack so that the cache can be preserved. For successful sends or unrecoverable errors, a successful ack will be sent to clear the cache.
  • Cache mechanism: The cache will be kept in memory first. If the memory threshold is exceeded, subsequent caches will be saved to disk. Once the disk cache exceeds the disk storage threshold, the cache will start rotating. The oldest cache in memory is discarded and the oldest cache on disk is loaded instead.
  • Retransmission strategy: If there is an ack being sent, wait for a successful ack to continue sending the next buffered data. Otherwise, when new data arrives, the first data in the cache is sent to detect network conditions. If the ack is successful, chain all caches (mem + disk) in order. Chained sending can define a sending interval to prevent the formation of message storms.

configure

There are two levels of sink cache configuration. The global configuration in etc/kuiper.yaml that defines the default behavior for all rules. There is also a definition of a rule sink layer that overrides the default behavior.

  • enableCache: Whether to enable sink cache. The cache store configuration follows the configuration of the metadata store defined in etc/kuiper.yaml .
  • memoryCacheThreshold: The number of messages to cache in memory. For performance reasons, the oldest cached messages are stored in memory for immediate resending upon failure recovery. Data here will be lost due to failures such as power outages.
  • maxDiskCache: The maximum amount of information to cache on disk. Disk cache is FIFO. If the disk cache is full, the oldest page of information will be loaded into the memory cache, replacing the old memory cache.
  • bufferPageSize: Buffer pages are the unit of batch read/write to disk to prevent frequent IO. If the page is not full, eKuiper crashes due to hardware or software error, and the last page not written to disk will be lost.
  • resendInterval: The time interval for resend information after fault recovery to prevent information storm.
  • cleanCacheAtStop: Whether to clean all caches when the rule is stopped, to prevent a large number of retransmissions of expired messages when the rule is restarted. If not set to true, the memcache will be stored to disk once the rule is stopped. Otherwise, memory and disk rules are cleaned up.

Currently, the code for this feature has been merged into the 1.6.0 branch ( https://github.com/lf-edge/ekuiper/tree/1.6.0 ). Interested friends can compile and use by themselves.

list filtering

In the rule engine, we often need to determine whether a value is in a list, so as to trigger the corresponding action. In standard SQL syntax, such filtering is usually done using IN/NOT IN expressions. This month, we implemented support for the IN operator. The usage method supports the following two:

  1. Same as standard SQL syntax, supports setting multiple expressions at the same time.

     expression [NOT] IN (expression2,...n)
  2. In the usage scenarios of eKuiper, complex types and schemaless are used more, so it also supports the direct use of expressions (need to ensure that it is an array type) as the right-hand operator.

     expression [NOT] IN arrayExpression

Version 1.5.1

The 1.5.1 version released at the beginning of the month mainly solves problems and small function updates. Major feature updates include:

  • Expanded support for data templates. The new version adds support for data templates such as memory sink, edegX sink, tdengine sink, etc.
  • Support window_start() and window_end() as arguments to other functions.

Bugs resolved include:

  • Neuron connection fails after restarting the rule
  • When the plugin update causes the rule syntax error, the status of the running rule is abnormal
  • When using a shared origin, restart rules can randomly cause connection failures
  • Cross-domain access problem after REST API use authentication

Copyright statement: This article is original by EMQ, please indicate the source when reprinting.

Original link: https://www.emqx.com/zh/blog/ekuiper-newsletter-202206


EMQX
336 声望438 粉丝

EMQ(杭州映云科技有限公司)是一家开源物联网数据基础设施软件供应商,交付全球领先的开源 MQTT 消息服务器和流处理数据库,提供基于云原生+边缘计算技术的一站式解决方案,实现企业云边端实时数据连接、移动、...