On April 22, 2022, Apache DolphinScheduler officially announced the 3.0.0 alpha release! This version upgrade has ushered in the biggest change since the release, and many new functions and features bring new experience and value to users.
The keywords for 3.0.0-alpha, summed up as "faster, more modern, stronger, easier to maintain".
- Faster and more modern: The UI interface has been refactored. The new UI not only increases the user response speed by dozens of times, but also increases the developer construction speed by hundreds of times, and the page layout and icon style are more modernized;
- Stronger: brings many exciting new features, such as data quality assessment, custom time zone, support for AWS, and adds multiple task plugins and multiple alarm plugins;
- Easier maintenance: The splitting of back-end services is more in line with the development trend of containerization and microservices, and it can also clarify the responsibilities of each service, making maintenance easier.
New functions and new features
1. Brand new UI, the front-end code is more robust and faster
The biggest change in 3.0.0-alpha is the introduction of a new UI, the language page does not need to be reloaded, and a dark theme has been added. The new UI uses Vue3, TSX, Vite related technology stack. Compared with the old UI, the new UI is not only more modern, but also more user-friendly in operation, and the front-end is more robust. Once users find problems in the code during compilation, they can verify the interface parameters, thereby making the front-end code more efficient. robust.
In addition, the new architecture and new technology stack not only allow users to respond dozens of times faster when operating Apache DolphinScheduler, but also hundreds of times faster for developers to compile and start the UI locally, which will greatly shorten the time for developers. Time required to debug and package code.
New UI experience:
Time-consuming comparison of local startup
front page
Workflow instance page
Shell task page
MySQL data source page
2. Support AWS
As the user group of Apache DolphinScheduler becomes more and more abundant, it attracts many overseas users. However, in the overseas business scenario, during the research process, users found that there are two points that affect the user's convenient experience of Apache DolphinScheduler. One is the time zone issue, and the other is the lack of support for overseas cloud vendors, especially AWS. To that end, we've decided to support one of the more important components of AWS, and this is one of the most significant changes in this release.
At present, Apache DolphinScheduler's support for AWS has covered two AWS task types, Amazon EMR and Amazon Redshift , and implemented the resource center to support Amazon S3 storage .
- For Amazon EMR, we created a new task type and provided its Run Job Flow functionality, allowing users to submit multiple steps jobs to Amazon EMR, specifying the amount of resources to use. Details can be found here: https://dolphinscheduler.apache.org/zh-cn/docs/latest/user_doc/guide/task/emr.html
Amazon EMR task definition
- For Amazon Redshift, we are now extending support for Amazon Redshift data sources in the SQL task type, and users can now select the Redshift data source in the SQL task to run Amazon Redshift tasks.
- For Amazon S3, we have extended the resource center of Apache DolphinScheduler to support not only local resources, HDFS resource storage, but also Amazon S3 as a resource center storage. Details can be found at: https://dolphinscheduler.apache.org/zh-cn/docs/latest/user_doc/guide/resource.html
resource.storage.type
In the future, we will support more AWS tasks according to the actual needs of users, so stay tuned.
3. Service split
The brand new UI is the biggest change in the front end of 3.0.0-alpha, and the biggest change in the back end is the splitting of services. Considering the growing popularity of the concept of containers and microservices, the Apache DolphinScheduler developers made a major decision: splitting the backend services. According to the function, we split the service into the following parts:
- master-server: master service
- worker-server: worker service
- api-server: API service
- alert-server: alert service
- standalone-server: standalone is used to quickly experience the dolphinscheduler function
- ui: UI resource
- bin: Quick start script, mainly the script to start each service
- tools: tools related scripts, mainly including database creation and update scripts
All services can be started or stopped by executing the following commands.
bin/dolphinscheduler-daemon.sh <start|stop> <server-name>
4. Data quality check
In this version, the long-awaited data quality verification application function is launched, which solves data quality problems such as the accuracy of the number of data items synchronized from the source, and the alarm that the weekly or monthly average fluctuation of a single table or multiple tables exceeds the threshold. The previous version of Apache DolphinScheduler solved the problem of running tasks in a specific order and time, but there is no more general measure of the quality of the data after the data is run, and users need to pay additional development costs.
Now, 3.0.0-alpha has realized the native support of data quality, and supports the data quality verification process before the workflow runs. In the data quality function module, the user-defined data quality verification rules have realized the task. Strict control of data quality and monitoring of operating results during operation.
5. Task Force
The task group is mainly used to control the concurrency of task instances and clarify the priority within the group. When creating a new task definition, the user can configure the task group corresponding to the current task, and configure the priority of the task running in the task group. After a task is configured with a task group, the execution of the task must not only satisfy the success of all upstream tasks, but also satisfy that the tasks running in the current task group are smaller than the size of the resource pool. When it is greater than or equal to the resource pool size, the task will enter the waiting state for the next check. When multiple tasks in the task group enter the queue to be run at the same time, the task with higher priority will be run first.
See the link for details: https://dolphinscheduler.apache.org/zh-cn/docs/3.0.0/user_doc/guide/resource.html
6. Custom time zone
In versions prior to 3.0.0-alpha, the default time of Apache DolphinScheduler was UTC+8 time zone, but with the expansion of user groups, overseas users and users who conduct business across time zones overseas are often troubled by time zones in use. After 3.0.0-alpha supports time zone switching, the time zone problem is easily solved, meeting the needs of overseas users and overseas business partners. For example, if the time zone involved in the business of the enterprise includes the East Eighth District and the West Fifth District, if you want to use the same DolphinScheduler cluster, you can create multiple users, each user uses its own local time zone, and the time displayed by the corresponding DolphinScheduler object is the same. It will be switched to the local time of the corresponding time zone, which is more in line with the usage habits of local developers.
See the link for details: https://dolphinscheduler.apache.org/zh-cn/docs/3.0.0/user_doc/guide/howto/general-setting.html
7. Task Definition List
Using the previous version of Apache DolphinScheduler 3.0.0-alpha, if users want to operate a task, they need to find the corresponding workflow first and locate the position of the task in the workflow before editing. However, when the number of workflows increases or a single workflow has more tasks, the process of finding the corresponding tasks will become very painful, which is not in line with the easy to use concept pursued by Apache DolphinScheduler. Therefore, we have added a task definition page in 3.0.0-alpha, so that users can quickly locate the task by the task name, operate the task, and easily realize batch task changes.
See the link for details: https://dolphinscheduler.apache.org/zh-cn/docs/latest/user_doc/guide/project/task-definition.html
8. New alarm type
At the same time, the 3.0.0-alpha alert type also adds support for Telegram and Webexteams alert types.
9. New features of the Python API
In 3.0.0-alpha, the biggest change in the Python API is the integration of the corresponding PythonGatewayServer into the API-Server service, which makes opening external services more regular and alleviates the problem of larger binary packages caused by service splitting. At the same time, the Python API also adds CLI and configuration modules, allowing users to customize the configuration file and modify the configuration more conveniently.
10. Other new features
In addition to the above functions, the 3.0.0-alpha version has also carried out many detailed function enhancements, such as refactoring task plug-ins and data source plug-in modules to make expansion easier; support for Spark SQL has been restored; E2E testing has been perfectly compatible with the new UI Wait.
Major optimizations
[#8584] Task backend plugin optimization, new plugins only need to modify the modules that come with the plugin
[#8874] Validate end and start time when submitting/creating cron under workflow
[#9016] Dependent can select global project when adding dependencies
[#9221] AlertSender optimization and close optimization, such as MasterServer
[#9228] Implement using slot to scan database
[#9230] python gateway server is integrated into apiserver to reduce binary package size
[#9372] [python] Migrate pythonGatewayServer to api server
[#9443] [python] add missing config and connect remote server documentation
[#8719] [Master/Worker] Change task ack to run callback
[#9293] [Master] Add task event thread pool
Major bug fixes
[#7236] fix the problem of creating a tenant with S3a Minio fails
[#7416] fix text file busy
[#7896] Fix the problem of generating a duplicate authorized project when project authorization
[#8089] fix failure to start server due to inability to connect to PostgreSQL
[#8183] fix the message that the data source plugin "Spark" cannot be found
[#8202] Fix the problem that the built-in parameters of the commands generated by MapReduce are in the wrong position
[#8751] Fix the problem that the queue is invalid in ProcessDefinition after changing the parameter user
[#8756] fix process using dependent components cannot be migrated between test and production environments
[#8760] Fixed an issue with resource file deletion conditions
[#8791] fix the issue that affects the original node data when editing the form of the copied node
[#8951] Fixed worker resource exhaustion causing downtime
[#9243] fix some types of alerts not showing item name
Release Note
https://github.com/apache/dolphinscheduler/releases/tag/3.0.0-alpha
Thanks to contributors
Sort alphabetically
Aaron Lin, Amy0104, Assert, BaoLiang, Benedict Jin, BenjaminWenqiYu, Brennan Fox, Devosend, DingPengfei, DuChaoJiaYou, EdwardYang, Eric Gao, Frank Chen, GaoTianDuo, HanayoZz, Hua Jiang, Ivan0626, Jeff Zhan, Jiajie Zhong, JieguangZhou, Jiezhi. G, JinYong Li, J·Y, Kerwin, Kevin.Shin, KingsleyY, Kirs, KyoYang, LinKai, LiuBodong, Manhua, Martin Huang, Maxwell, Molin Wang, OS, QuakeWang, ReonYu, SbloodyS, Shiwen Cheng, ShuiMuNianHuaLP, ShuoTiann, Sunny Lei, Tom, Tq, Wenjun Ruan, X&Z, XiaochenNan, Yanbin Lin, Yao WANG, Zonglei Dong, aCodingAddict, aaronlinv, caishunfeng, calvin, calvinit, cheney, chouc, gaojun2048, guoshupei, hjli, huangxiaohai, janeHe13, jegger, jon -qj, kezhenxu94, labbomb, lgcareer, lhjzmn, lidongdai, lifeng, lilyzhou, lvshaokang, lyq, mans2singh, mask, mazhong, mgduoduo, myangle1120, nobolity, ououtt, ouyangyewei, pinkhello, qianli2022, ronyang1985, seagle, shuai hou, simsicon, songjianet, sparklezzz, springmonster, uh001, wangbowen, wangqiang, wangxj3, wangyang, wangyizhi , wind, worry, xiangzihao, xiaodi wang, xiaoguaiguai, xuhhui, yangyunxi, yc322, yihong, yimaixinchen, zchong, zekai-li, zhang, zhangxinruu, zhanqian, zhuangchong, zhuxt2015, zixi0825, zwZjut,
Tian Chou, Xiao Zhang, Time, Wang Qiang, Bai Sui, Hong Shu, Zhang Junjie, Luo Mingtao
Participate and contribute
With the rapid rise of domestic open source, the Apache DolphinScheduler community has ushered in vigorous development. In order to make more usable and easy-to-use scheduling, we sincerely welcome partners who love open source to join the open source community and contribute to the rise of open source in China. , let local open source go global.
There are many ways to participate in the DolphinScheduler community, including:
Contributing the first PR (documentation, code) We also hope that it is simple, the first PR is used to get familiar with the submission process and community collaboration and to feel the friendliness of the community.
The community has put together the following list of issues for newbies: https://github.com/apache/dolphinscheduler/issues/5689
List of non-novice issues: https://github.com/apache/dolphinscheduler/issues?q=is%3Aopen+is%3Aissue+label%3A%22volunteer+wanted%22
How to contribute link: https://dolphinscheduler.apache.org/en-us/docs/development/contribute.html
Come on, the DolphinScheduler open source community needs your participation and contribute to the rise of China's open source, even if it is just a small tile, the combined power is huge.
Participating in open source can learn from various experts at close range and quickly improve your skills. If you want to contribute, we have a contributor seed incubation group, you can add the community assistant WeChat (Leonard-ds), and teach you hands-on (contributors). Regardless of the level, there is always a question and answer, the key is to have a willingness to contribute).
When adding WeChat Assistant, please indicate that you want to contribute.
Come on, the open source community is looking forward to your participation.
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。