Detailed explanation of task scheduling in BI system

Task scheduling is a general computer concept, which can be simply understood as the computer automatically executes a process task based on a certain time frequency. Task scheduling is an important part of the operating system. Timing tasks in Windows systems and Crontab in Linux are commonly used system-level schedulers, which are widely used in various timing execution scenarios. In the field of traditional business intelligence BI, the system scheduler is often used as the scheduler of ETL jobs. Job tasks will be scheduled for execution at a time frequency of T+1 or higher.

With the development of business intelligence BI technology, the scheduling function of BI tools has also developed by leaps and bounds. Especially with the market demand for big data and real-time data, the applications of stronger processing capabilities, multi-threaded jobs, and quasi-real-time scheduling are becoming more and more common.

Specific to the scheduling function of business intelligence BI tools, at present, it can be mainly summarized into two categories:

Task scheduling for data extraction
Task scheduling of notifications and messages

1. Task scheduling of data extraction

One of the application scenarios of business intelligence tools is to break data silos, integrate heterogeneous data distributed in different systems, and extract them into data warehouses to form analysis models to provide data support for visual analysis. For historical data analysis of large amounts of data, the basic extraction method is the timing scheduling mechanism. From the data source to the data warehouse, incremental update or full update can be configured.

The general mechanism used for full update is to Truncate Table first, and then perform the Insert operation. Incremental update is based on the primary key column or timestamp, and only the data that meets the conditions is updated. Either way, it is performed by setting timed scheduling tasks with different time frequencies.

The frequency of tasks is also closely related to the above two update methods. For data with low real-time requirements, full update can be performed by setting T+1. For data with relatively high frequency, you can set the frequency of hours, minutes, or even seconds to schedule incremental updates.

The specific scheduling frequency should be comprehensively considered according to the amount of data, server configuration, and the pressure on the data source system.

Taking Wyn as an example, it supports both the direct connection model and the extraction model in terms of data access. For extraction models, you can set a schedule to run. The auto-reload schedule automatically refreshes the data in the model at a set time.

Users can formulate different execution plans according to different business needs, and reload the cached data in the table at different frequencies. If the plan fails to execute, an email notification will be sent to the email address filled in. If successful, there is no prompt.

Create an automatic timed operation plan, and after setting the operation plan, the data will be automatically refreshed periodically.

Business intelligence BI tools will also consider their reusability in task planning settings. Provides the ability to create run plans from templates. Run plans can be executed manually or automatically. The scheduling and execution of the operation plan is an important technical guarantee for the success of data extraction.

2. Task scheduling of notifications and messages

In the task scheduling scenario of business intelligence, it is not only necessary to extract data into data warehouses for centralized storage. For analysis results such as reports and dashboards, it is also necessary to support the timing push function. Not every viewer of report data directly logs into the system to view the data. For example, on a business trip or when you forget, the analysis results can be actively presented to the data users through the regular push function. In enterprise practice, email push is one of the most commonly used methods.

Take Wyn as an example, in the task plan template, you can use the email push function. Send object supports mail and mail groups. When running a report running plan, it not only supports sending the report as an email attachment or link to the target mailbox, but also supports sending the report as the email body, which is more convenient for users to view.

The key to displaying the report as the message body is to set the export format to "HTML" or "Image" when setting up the run schedule. Then, when you select Email Notification as the sending method, you can select "Display Report in Email Body" in "Sending Type".

To sum up, task scheduling, as a general requirement of business intelligence tools, has been widely used, providing effective support for data extraction. With the development of software development technology and changes in market demand, there will be more and more analysis scenarios with higher real-time data. However, scenarios of historical data analysis with large amounts of data will always exist. Therefore, it is foreseeable that in the future, the extraction model based on task scheduling and the direct connection model based on streaming, real-time push model and direct data source will coexist, and together provide important data analysis for enterprises. Technical Support.

Detailed explanation of task scheduling in BI system

葡萄城技术团队

引用和评论

基于预生成 QA 对的 RAG 知识库解决方案

Vue.js-Vue实例

2025年最新反编译微信小程序的教程及工具

你可能不知道的图片加载相关知识

手写一个动态海洋和天空效果的vue hooks

原生JS大揭秘—JS代码执行原理解刨

使用CSS给标题添加书名号并超出省略