Preface

Environment and components

  • Ubuntu 20.04
  • Python-3.8(Anaconda3-2020.11-Linux-x86_64)
  • PostgreSQL 12.6
  • apache-airflow 2.0.2
  • celery 4.4.7

Cluster planning

image.png

installation steps

  • Create an account (node0/node1/node2)

    sudo useradd airflow -m -s /bin/bash
    sudo passwd airflow
  • Switch account (node0/node1/node2)

    su airflow
  • Configure Anaconda environment variables (node0/node1/node2)

    # /home/airflow/.bashrc
    export PATH=/home/airflow/anaconda3/bin:$PATH
  • Upgrade pip (node0/node1/node2)

    pip install pip --upgrade  -i https://mirrors.aliyun.com/pypi/simple/
  • Configure pip domestic mirror (node0/node1/node2)

    pip3 config set global.index-url https://mirrors.aliyun.com/pypi/simple/
  • Install airflow (node0/node1/node2) dependencies: https://airflow.apache.org/docs/apache-airflow/2.0.2/extra-packages-ref.html

    # 全家桶(master)
    pip3 install "apache-airflow[all]~=2.0.2"
    # OR 选择性安装
    pip3 install "apache-airflow[async,postgres,mongo,redis,rabbitmq,celery,dask]~=2.0.2"
  • Add PATH environment variable for airflow (node0/node1/node2)

    # 在 /home/airflow/.bashrc 文件尾追加以下内容:
    export PATH=/home/airflow/.local/bin:$PATH
  • Check the airflow version and create the home directory of airflow (node0/node1/node2)

    # 默认 ~/airflow 目录
    airflow version
  • Set Ubuntu system time zone (node0/node1/node2)

    timedatectl set-timezone Asia/Shanghai
  • Modify the time zone in airflow (/home/airflow/airflow/airflow.cfg) (node0/node1/node2)

    [core]
    # 改为 system 或 Asia/Shanghai
    default_timezone = system
  • At this point, the installation is complete

PostgreSQL configuration

  • Create database

    CREATE DATABASE airflow_db;
  • Create user

    CREATE USER airflow_user WITH PASSWORD 'airflow_pass';
    GRANT ALL PRIVILEGES ON DATABASE airflow_db TO airflow_user;
  • Modify PostgreSQL connection (/home/airflow/airflow/airflow.cfg) (node0/node1/node2)

    [core]
    sql_alchemy_conn = postgresql+psycopg2://airflow:airflow@192.168.x.y/airflow
  • Initialize the database table (node0)

    airflow db init
  • Check whether the database is initialized successfully
    psql_airflow

WEB UI login

  • creates an administrator user (node0)

    # 角色表: ab_role
    # 用户表: ab_user
    # 创建 Admin 角色用户 
    airflow users create \
     --lastname user \
     --firstname admin \
     --username admin \
     --email walkerqt@foxmail.com \
     --role Admin \
     --password admin123
    # 创建 Viewer 角色用户 
    airflow users create \
     --lastname user \
     --firstname view \
     --username view \
     --email walkerqt@163.com \
     --role Viewer \
     --password view123
  • Start webserver (node0)

    airflow webserver -p 8080
  • Log in with the created account in the browser
    image.png

Configuration CeleryExecutor

Test Case

  • Create a test script (/home/airflow/airflow/dags/send_msg.py) (node0/node1/node2), and send the local IP to the enterprise WeChat.

    # encoding: utf-8
    # author: qbit
    # date: 2021-05-13
    # summary: 发送/分配任务到任务结点
    
    import os
    import time
    import json
    import psutil
    import requests
    from datetime import timedelta
    from airflow.utils.dates import days_ago
    from airflow.models import DAG
    from airflow.operators.python_operator import PythonOperator
    
    
    def GetLocalIPByPrefix(prefix):
      r"""
    多网卡情况下,根据前缀获取IP
    测试可用:Windows、Linux,Python 3.6.x,psutil 5.4.x
    ipv4/ipv6 地址均适用
    注意如果有多个相同前缀的 ip,只随机返回一个
    """
      localIP = ''
      dic = psutil.net_if_addrs()
      for adapter in dic:
          snicList = dic[adapter]
          for snic in snicList:
              if not snic.family.name.startswith('AF_INET'):
                  continue
              ip = snic.address
              if ip.startswith(prefix):
                  localIP = ip
    
      return localIP
    
    
    def send_msg(msg='default msg', **context):
      r""" 发送 message 到企业微信 """
      print(context)
      run_id = context['run_id']
      nowTime = time.strftime('%Y-%m-%d %H:%M:%S', time.localtime())
      message = '%s\n%s\n%s_%d\n%s' % (
          run_id, nowTime, GetLocalIPByPrefix('192.168.'), os.getpid(), msg)
      print(message)
    
      '''
      发送代码(涉及账号,本段代码隐藏)
      '''
    
    default_args = {
      'owner': 'qbit',
      # depends_on_past 是否依赖于过去。
      # 如果为True,那么必须要上次的 DAG 执行成功了,这次的 DAG 才能执行。
      'depends_on_past': False
    }
    
    with DAG(dag_id='send_msg',
           default_args=default_args,
           start_date=days_ago(1),
           schedule_interval=timedelta(seconds=60),
           # catchup 是否回补(backfill)开始时间到现在的任务
           catchup=False,
           tags=['qbit']
    ) as dag:
      first = PythonOperator(
          task_id='send_msg_1',
          python_callable=send_msg,
          op_kwargs={'msg': '111'},
          provide_context=True,
          dag=dag,
      )
    
      second = PythonOperator(
          task_id='send_msg_2',
          python_callable=send_msg,
          op_kwargs={'msg': '222'},
          provide_context=True,
          dag=dag,
      )
    
      third = PythonOperator(
          task_id='send_msg_3',
          python_callable=send_msg,
          op_kwargs={'msg': '333'},
          provide_context=True,
          dag=dag,
      )
    
      [third, first] >> second
  • View dag information (node0)

    # 打印出所有正在活跃状态的 DAGs
    $ airflow dags list
    
    # 打印出 'send_msg' DAG 中所有的任务
    $ airflow tasks list send_msg
    [2021-05-13 16:00:47,123] {dagbag.py:451} INFO - Filling up the DagBag from /home/airflow/airflow/dags
    send_msg_1
    send_msg_2
    send_msg_3
    
    # 打印出 'send_msg' DAG 的任务层次结构
    $ airflow tasks list send_msg --tree
  • Test a single task (node0)

    airflow tasks test send_msg send_msg_1 20210513
  • Test a single dag (node0)

    airflow dags test send_msg 20210513
  • Cluster test

    # node0
    airflow webserver -p 8080
    airflow scheduler
    airflow celery flower  # 默认端口 5555
    # node1/node2
    airflow celery worker
    # 指定 hostname 启动
    airflow celery worker --celery-hostname node1

    image.png

references

This article is from qbit snap

qbit
268 声望279 粉丝