Author: Xueren

Introduction

K8s Job is a resource in Kubernetes. It is used to process short-cycle Pods. It is equivalent to a one-time task. After running, the Pods will be destroyed and will not occupy resources all the time. It can save costs and improve resource utilization.

Alibaba task scheduling SchedulerX combined with cloud native, launched the visual K8s task. For script users, the details of container services are shielded, and students who are not familiar with containers (such as operation and maintenance and operation students) can play K8s without building images. Job, benefit from the cost reduction and efficiency enhancement benefits brought by the container service. For container users, SchedulerX is not only fully compatible with native K8s jobs, but also supports historical execution records, log services, re-run tasks, alarm monitoring, visual task scheduling, and other capabilities, escorting enterprise-level applications. The architecture diagram is as follows:

 title=

Feature 1: Rapidly develop K8s visual scripting tasks

Kubernetes jobs are commonly used for offline data processing and operation and maintenance work (such as synchronizing mysql data to the big data platform at 2 am every day, updating the redis cache every 1 hour, etc.), which are generally implemented by scripts. Here is a simple scenario example to compare the differences between the two schemes.

Kubernetes-native solutions

The smallest unit of K8s scheduling is Pod. If you want to run script tasks, you need to package the script into an image in advance, and then configure the script command in the YAML file. The following is an example of querying the database through a python script:

  • Write a python script demo.py
 #!/usr/bin/python
# -*- coding: UTF-8 -*-

import MySQLdb

# 打开数据库连接
db = MySQLdb.connect("localhost", "testuser", "test123", "TESTDB", charset='utf8' )

# 使用cursor()方法获取操作游标 
cursor = db.cursor()

# SQL 查询语句
sql = "SELECT * FROM EMPLOYEE \
WHERE INCOME > %s" % (1000)
try:
    # 执行SQL语句
    cursor.execute(sql)
    # 获取所有记录列表
    results = cursor.fetchall()
    for row in results:
        fname = row[0]
        lname = row[1]
        age = row[2]
        sex = row[3]
        income = row[4]
        # 打印结果
        print "fname=%s,lname=%s,age=%s,sex=%s,income=%s" % \
        (fname, lname, age, sex, income )
        except:
            print "Error: unable to fetch data"
            
            # 关闭数据库连接
db.close()
  • Write Dockerfile
 FROM python:3

WORKDIR /usr/src/app

COPY requirements.txt ./
RUN pip install --no-cache-dir -r requirements.txt

COPY demo.py /root/demo.py

CMD [ "python", "/root/demo.py" ]
  • Make a docker image and push it to the mirror warehouse
 docker build -t registry.cn-beijing.aliyuncs.com/demo/python:1.0.0 .
docker push registry.cn-beijing.aliyuncs.com/demo/python:1.0.0
  • Write the YAML file of the K8s Job, select the image made in step 3 for image, and the command of command is to execute the script
 apiVersion: batch/v1
kind: Job
metadata:
  name: demo-python
spec:
  template:
    spec:
      containers:
      - name: demo-python
        image: registry.cn-beijing.aliyuncs.com/demo/python:1.0.0
        command: ["python",  "/root/demo.py"]
      restartPolicy: Never
  backoffLimit: 4

We saw that it takes so many steps to run scripts in container services. If you want to modify the scripts, you need to rebuild the image and republish the K8s Job, which is very troublesome.

Alibaba Cloud Solutions

Alibaba task scheduling SchedulerX combines cloud native technology to propose a set of visual script task solutions. The task scheduling system is used to manage scripts, and scripts can be written directly online without building images. The scripts can be stored in the user's K8s in the form of Pod It is very convenient to use when running in a cluster, as shown below:

 title=

  1. Create a new K8s task in SchedulerX task management, select Python-Script for resource type (currently supports four script types of shell/python/php/nodejs)

 title=

  1. Click to run once, you can see the pod started in the Kubernetes cluster, the pod name is schedulerx-python-{JobId}

 title=

  1. You can also see historical execution records in the SchedulerX console

 title=

  1. You can see the log of the Pod running in the SchedulerX console

 title=

The difference between the two schemes is more easily seen through a table below:

 title=

Feature 2: Fully compatible with native K8s Job

SchedulerX can not only quickly develop K8s script tasks, shield the details of container services, and bring good news to students who are not familiar with container services, but also host native K8s jobs.

Native built-in Job solution

  • Job

Take the officially provided Job as an example:

  1. Write a YAML file pi.yaml, intentionally write an error, bpi(-1) is illegal
 apiVersion: batch/v1
kind: Job
metadata:
  name: pi
spec:
  template:
    spec:
      containers:
      - name: pi
        image: perl:5.34
        command: ["perl",  "-Mbignum=bpi", "-wle", "print bpi(-1)"]
      restartPolicy: Never
  backoffLimit: 4
  1. Run the Job in the K8s cluster and view the Pod's status and logs:

 title=

The native job of K8s does not support re-running. If you want to re-run after modifying the job, you need to delete it first and then apply it again, which is very troublesome.

 title=

  • CronJob

Take the official CronJob as an example:

  1. write hello.yaml
 apiVersion: batch/v1
kind: CronJob
metadata:
  name: hello
spec:
  schedule: "* * * * *"
  jobTemplate:
    spec:
      template:
        spec:
          containers:
          - name: hello
            image: perl:5.34
            command: ["perl",  "-Mbignum=bpi", "-wle", "print bpi(100)"]
          restartPolicy: OnFailure
  1. Run the CronJob in the K8s cluster and view the pod history and logs

 title=

It is found that the native CronJob can only view the most recent 3 execution records, and the records that are longer ago cannot be viewed, which becomes particularly difficult when there are business problems and want to troubleshoot.

Alibaba Cloud Solutions

Alibaba Task Scheduler SchedulerX can host native K8s tasks, which is convenient for porting. Using SchedulerX hosting, you can enjoy the features of task scheduling, such as task rerun, history record, log service, alarm monitoring, etc.

  1. Create a new K8s task, select K8s for the task type, Job-YAML for the resource type, and print bpi(-1)

 title=

  1. Generate cron expressions through tools, such as running at the 8th minute of every hour

 title=

  1. The scheduling time has not yet come, you can also manually click "run once" to test

 title=

  1. In the K8s cluster, you can see that the Job and Pod are started successfully

 title=

  1. You can also see historical execution records in the SchedulerX console

 title=

  1. You can see the task running log in the SchedulerX console

 title=

  1. Modify the YAML of the task online, print bpi(100)

 title=

  1. No need to delete the job, rerun the task through the console

 title=

  1. The task is rerun successfully, and the new log can be seen

 title=

Below is a table to compare the differences between the two schemes

 title=

Feature 3: Enhance native jobs and support visual task orchestration

In data processing scenarios, there are often dependencies between tasks. For example, task A depends on the completion of task B to start execution.

Kubernetes-native solutions

The current mainstream solution in K8s is to use argo for workflow orchestration, such as defining a DAG as follows:

 # The following workflow executes a diamond workflow
# 
#   A
#  / \
# B   C
#  \ /
#   D
apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
  generateName: dag-diamond
spec:
  entrypoint: diamond
  templates:
  - name: diamond
    dag:
      tasks:
      - name: A
        template: echo
        arguments:
          parameters: [{name: message, value: A}]
      - name: B
        depends: "A"
        template: echo
        arguments:
          parameters: [{name: message, value: B}]
      - name: C
        depends: "A"
        template: echo
        arguments:
          parameters: [{name: message, value: C}]
      - name: D
        depends: "B && C"
        template: echo
        arguments:
          parameters: [{name: message, value: D}]

  - name: echo
    inputs:
      parameters:
      - name: message
    container:
      image: alpine:3.7
      command: [echo, "{{inputs.parameters.message}}"]

We see that building such a simple DAG requires writing so much YAML. If the dependencies are complex, YAML becomes very difficult to maintain.

Alibaba Cloud Solutions

Ali Task Scheduling SchedulerX supports task scheduling through visual workflow

  1. Create a workflow, you can import tasks, or create new tasks on the current canvas, and build a workflow by dragging and dropping

 title=

  1. Click to run once, you can see the running status of the workflow in real time, and it is convenient to check which link the task is stuck in:

 title=

  1. If a task fails, view the log through the console

 title=

  1. Modify the task correctly, and rerun the failed node in place on the workflow instance graph

 title=

  1. Failed tasks will be re-executed with the latest content

 title=

  1. When the upstream is successfully executed, the downstream can continue to execute

 title=

Summarize

Scheduling your K8s tasks through task scheduling SchedulerX can reduce learning costs, speed up development efficiency, allow your task failures to alarm, troubleshoot problems, and create visual K8s tasks under the cloud-native observable system.


阿里云云原生
1k 声望302 粉丝