One of the OpenFunction application series: implement Kubernetes log alarms in a serverless way

Overview

After we collect the container logs to the message server, how should we deal with these logs? Deploying a dedicated log processing workload may consume extra costs, and it is difficult to evaluate the standby number of the log processing workload when the volume of the log increases or decreases sharply. This article provides a serverless-based log processing idea, which can reduce the link cost of the task while increasing its flexibility.

Our general design is to use the Kafka server as the log receiver, and then use the log input to the Kafka server as an event to drive the serverless workload to process the log. The general steps are as follows:

Set up Kafka server as log receiver for Kubernetes cluster
Deploy OpenFunction to provide serverless capabilities for log processing workloads
Write log processing functions, grab specific logs to generate alarm messages
Configure Notification Manager send alerts to Slack

In this scenario, we will take advantage of the Serverless capabilities brought by OpenFunction

OpenFunction is a FaaS (Serverless) project open sourced by the KubeSphere community, which aims to allow users to focus on their business logic without having to care about the underlying operating environment and infrastructure. The project currently has the following key capabilities:
Supports building OCI images through dockerfile or buildpacks
Support using Knative Serving or OpenFunctionAsync (KEDA + Dapr) as runtime to run Serverless workloads
Built-in event-driven framework

Use Kafka as a log receiver

First, we enable the logging component for the KubeSphere platform (you can refer to enable the pluggable component for more information). Then we use strimzi-kafka-operator build a minimal Kafka server.

strimzi-kafka-operator in the default namespace:

helm repo add strimzi https://strimzi.io/charts/
helm install kafka-operator -n default strimzi/strimzi-kafka-operator

Run the following command to create a Kafka cluster and Kafka Topic in the default namespace. The storage type of the Kafka and Zookeeper cluster created by this command is ephemeral , and emptyDir is used for demonstration.

Note that we have created a topic named "logs" at this time, which will be used later

cat <<EOF | kubectl apply -f -
apiVersion: kafka.strimzi.io/v1beta2
kind: Kafka
metadata:
  name: kafka-logs-receiver
  namespace: default
spec:
  kafka:
    version: 2.8.0
    replicas: 1
    listeners:
      - name: plain
        port: 9092
        type: internal
        tls: false
      - name: tls
        port: 9093
        type: internal
        tls: true
    config:
      offsets.topic.replication.factor: 1
      transaction.state.log.replication.factor: 1
      transaction.state.log.min.isr: 1
      log.message.format.version: '2.8'
      inter.broker.protocol.version: "2.8"
    storage:
      type: ephemeral
  zookeeper:
    replicas: 1
    storage:
      type: ephemeral
  entityOperator:
    topicOperator: {}
    userOperator: {}
---
apiVersion: kafka.strimzi.io/v1beta1
kind: KafkaTopic
metadata:
  name: logs
  namespace: default
  labels:
    strimzi.io/cluster: kafka-logs-receiver
spec:
  partitions: 10
  replicas: 3
  config:
    retention.ms: 7200000
    segment.bytes: 1073741824
EOF

Run the following command to view the Pod status, and wait for Kafka and Zookeeper to run and start.

$ kubectl get po
NAME                                                   READY   STATUS        RESTARTS   AGE
kafka-logs-receiver-entity-operator-568957ff84-nmtlw   3/3     Running       0          8m42s
kafka-logs-receiver-kafka-0                            1/1     Running       0          9m13s
kafka-logs-receiver-zookeeper-0                        1/1     Running       0          9m46s
strimzi-cluster-operator-687fdd6f77-cwmgm              1/1     Running       0          11m

Run the following command to view the metadata of the Kafka cluster:

# 启动一个工具 pod
$ kubectl run utils --image=arunvelsriram/utils -i --tty --rm
# 查看 Kafka 集群的元数据
$ kafkacat -L -b kafka-logs-receiver-kafka-brokers:9092

We will add this Kafka server as a log receiver.

Log in to the KubeSphere web console as admin platform management upper left corner, and then select cluster management .
If you enable the multi-cluster function , you can select a cluster.
In cluster management page, select cluster setup under log collection .
Click add log receiver and select Kafka . Enter the Kafka proxy address and port information, and then click confirm to continue.

Run the following command to verify whether the Kafka cluster can receive logs from Fluent Bit:

# 启动一个工具 pod
$ kubectl run utils --image=arunvelsriram/utils -i --tty --rm 
# 检查 logs topic 中的日志情况
$ kafkacat -C -b kafka-logs-receiver-kafka-0.kafka-logs-receiver-kafka-brokers.default.svc:9092 -t logs

Deploy OpenFunction

According to the design in the overview, we need to deploy OpenFunction first. The OpenFunction project references many third-party projects, such as Knative, Tekton, ShipWright, Dapr, KEDA, etc. Manual installation is more cumbersome. It is recommended to use Prerequisites document to deploy OpenFunction dependent components with one click.

Among them, --with-shipwright represents the deployment of shipwright as a function's build driver
--with-openFuncAsync indicates the deployment of OpenFuncAsync Runtime as the load driver of the function
And when your network access to Github and Google is restricted, you can add the --poor-network to download related components

sh hack/deploy.sh --with-shipwright --with-openFuncAsync --poor-network

Deploy OpenFunction:

Choose to install the latest stable version here, you can also use the development version, refer to Install document
In order to use ShipWright normally, we provide a default build strategy, which can be set with the following command:
kubectl apply -f https://raw.githubusercontent.com/OpenFunction/OpenFunction/main/config/strategy/openfunction.yaml

kubectl apply -f https://github.com/OpenFunction/OpenFunction/releases/download/v0.3.0/bundle.yaml

Write log processing function

We use create and deploy WordPress as an example to build a WordPress application as a log producer. The namespace of the application's workload is "demo-project", and the Pod name is "wordpress-v1-f54f697c5-hdn2z".

When the request result is 404, the log content we received is as follows:

{"@timestamp":1629856477.226758,"log":"*.*.*.* - - [25/Aug/2021:01:54:36 +0000] \"GET /notfound HTTP/1.1\" 404 49923 \"-\" \"curl/7.58.0\"\n","time":"2021-08-25T01:54:37.226757612Z","kubernetes":{"pod_name":"wordpress-v1-f54f697c5-hdn2z","namespace_name":"demo-project","container_name":"container-nrdsp1","docker_id":"bb7b48e2883be0c05b22c04b1d1573729dd06223ae0b1676e33a4fac655958a5","container_image":"wordpress:4.8-apache"}}

Our requirement is: when a request result is 404, send an alarm notification to the receiver (you can configure a Slack notification ), and record the namespace, Pod name, request path, request method, etc. information. According to this requirement, we write a simple processing function:

You can learn how to use openfunction-context OpenFunction Context Spec , which is a tool library provided by OpenFunction for users to write functions
You can learn more about OpenFunction function examples OpenFunction Samples

package logshandler

import (
    "encoding/json"
    "fmt"
    "log"
    "regexp"
    "time"

    ofctx "github.com/OpenFunction/functions-framework-go/openfunction-context"
    alert "github.com/prometheus/alertmanager/template"
)

const (
    HTTPCodeNotFound = "404"
    Namespace        = "demo-project"
    PodName          = "wordpress-v1-[A-Za-z0-9]{9}-[A-Za-z0-9]{5}"
    AlertName        = "404 Request"
    Severity         = "warning"
)

// LogsHandler ctx 参数提供了用户函数在集群语境中的上下文句柄，如 ctx.SendTo 用于将数据发送至指定的目的地
// LogsHandler in 参数用于将输入源中的数据（如有）以 bytes 的方式传递给函数
func LogsHandler(ctx *ofctx.OpenFunctionContext, in []byte) int {
    content := string(in)
    // 这里我们设置了三个正则表达式，分别用于匹配 HTTP 返回码、资源命名空间、资源 Pod 名称
    matchHTTPCode, _ := regexp.MatchString(fmt.Sprintf(" %s ", HTTPCodeNotFound), content)
    matchNamespace, _ := regexp.MatchString(fmt.Sprintf("namespace_name\":\"%s", Namespace), content)
    matchPodName := regexp.MustCompile(fmt.Sprintf(`(%s)`, PodName)).FindStringSubmatch(content)

    if matchHTTPCode && matchNamespace && matchPodName != nil {
        log.Printf("Match log - Content: %s", content)

        // 如果上述三个正则表达式同时命中，那么我们需要提取日志内容中的一些信息，用于填充至告警信息中
        // 这些信息为：404 请求的请求方式（HTTP Method）、请求路径（HTTP Path）以及 Pod 名称
        match := regexp.MustCompile(`([A-Z]+) (/\S*) HTTP`).FindStringSubmatch(content)
        if match == nil {
            return 500
        }
        path := match[len(match)-1]
        method := match[len(match)-2]
        podName := matchPodName[len(matchPodName)-1]

        // 收集到关键信息后，我们使用 altermanager 的 Data 结构体组装告警信息
        notify := &alert.Data{
            Receiver:          "notification_manager",
            Status:            "firing",
            Alerts:            alert.Alerts{},
            GroupLabels:       alert.KV{"alertname": AlertName, "namespace": Namespace},
            CommonLabels:      alert.KV{"alertname": AlertName, "namespace": Namespace, "severity": Severity},
            CommonAnnotations: alert.KV{},
            ExternalURL:       "",
        }
        alt := alert.Alert{
            Status: "firing",
            Labels: alert.KV{
                "alertname": AlertName,
                "namespace": Namespace,
                "severity":  Severity,
                "pod":       podName,
                "path":      path,
                "method":    method,
            },
            Annotations:  alert.KV{},
            StartsAt:     time.Now(),
            EndsAt:       time.Time{},
            GeneratorURL: "",
            Fingerprint:  "",
        }
        notify.Alerts = append(notify.Alerts, alt)
        notifyBytes, _ := json.Marshal(notify)

        // 使用 ctx.SendTo 将内容发送给名为 "notification-manager" 的输出端（你可以在之后的函数配置 logs-handler-function.yaml 中找到它的定义）
        if err := ctx.SendTo(notifyBytes, "notification-manager"); err != nil {
            panic(err)
        }
        log.Printf("Send log to notification manager.")
    }
    return 200
}

We will upload function into the code repository, record address code repository and code directory path in the warehouse , create a function in the following steps we will use these two values.

You can find this example OpenFunction Samples

Create function

Next we will use OpenFunction to build the above function. First, set up a secret key file push-secret for accessing the mirror warehouse (after the OCI image is constructed using the code, OpenFunction will upload the image to the user's mirror warehouse for subsequent load startup):

REGISTRY_SERVER=https://index.docker.io/v1/ REGISTRY_USER=<your username> REGISTRY_PASSWORD=<your password>
kubectl create secret docker-registry push-secret \
    --docker-server=$REGISTRY_SERVER \
    --docker-username=$REGISTRY_USER \
    --docker-password=$REGISTRY_PASSWORD

Application function logs-handler-function.yaml :

The function definition includes the use of two key components:
Dapr shields complex middleware from applications, making logs-handler very easy to handle events in Kafka
KEDA drives the startup of the logs-handler function by monitoring the event traffic in the message server, and dynamically expands the logs-handler instance according to the consumption delay of messages in Kafka

apiVersion: core.openfunction.io/v1alpha1
kind: Function
metadata:
  name: logs-handler
spec:
  version: "v1.0.0"
  # 这里定义了构建后的镜像的上传路径
  image: openfunctiondev/logs-async-handler:v1
  imageCredentials:
    name: push-secret
  build:
    builder: openfunctiondev/go115-builder:v0.2.0
    env:
      FUNC_NAME: "LogsHandler"
    # 这里定义了源代码的路径
    # url 为上面提到的代码仓库地址
    # sourceSubPath 为代码在仓库中的目录路径
    srcRepo:
      url: "https://github.com/OpenFunction/samples.git"
      sourceSubPath: "functions/OpenFuncAsync/logs-handler-function/"
  serving:
    # OpenFuncAsync 是 OpenFunction 通过 KEDA+Dapr 实现的一种由事件驱动的异步函数运行时
    runtime: "OpenFuncAsync"
    openFuncAsync:
      # 此处定义了函数的输入（kafka-receiver）和输出（notification-manager），与下面 components 中的定义对应关联
      dapr:
        inputs:
          - name: kafka-receiver
            type: bindings
        outputs:
          - name: notification-manager
            type: bindings
            params:
              operation: "post"
              type: "bindings"
        annotations:
          dapr.io/log-level: "debug"
        # 这里完成了上述输入端和输出端的具体定义（即 Dapr Components）
        components:
          - name: kafka-receiver
            type: bindings.kafka
            version: v1
            metadata:
              - name: brokers
                value: "kafka-logs-receiver-kafka-brokers:9092"
              - name: authRequired
                value: "false"
              - name: publishTopic
                value: "logs"
              - name: topics
                value: "logs"
              - name: consumerGroup
                value: "logs-handler"
          # 此处为 KubeSphere 的 notification-manager 地址
          - name: notification-manager
            type: bindings.http
            version: v1
            metadata:
              - name: url
                value: http://notification-manager-svc.kubesphere-monitoring-system.svc.cluster.local:19093/api/v2/alerts
      keda:
        scaledObject:
          pollingInterval: 15
          minReplicaCount: 0
          maxReplicaCount: 10
          cooldownPeriod: 30
          # 这里定义了函数的触发器，即 Kafka 服务器的 “logs” topic
          # 同时定义了消息堆积阈值（此处为 10），即当消息堆积量超过 10，logs-handler 实例个数就会自动扩展
          triggers:
            - type: kafka
              metadata:
                topic: logs
                bootstrapServers: kafka-logs-receiver-kafka-brokers.default.svc.cluster.local:9092
                consumerGroup: logs-handler
                lagThreshold: "10"

Result presentation

Let's close the Kafka log receiver first: on the log collection , click to enter the Kafka log receiver details page, then click more operations and select change the status and set it to close it to .

After a period of deactivation, we can observe that the logs-handler function instance has shrunk to 0.

Then activate the Kafka log receiver , and the logs-handler starts.

~# kubectl get po --watch
NAME                                                     READY   STATUS        RESTARTS   AGE
kafka-logs-receiver-entity-operator-568957ff84-tdrrx     3/3     Running       0          7m27s
kafka-logs-receiver-kafka-0                              1/1     Running       0          7m48s
kafka-logs-receiver-zookeeper-0                          1/1     Running       0          8m12s
logs-handler-serving-kpngc-v100-zcj4q-5f46996f8c-b9d6f   2/2     Terminating   0          34s
strimzi-cluster-operator-687fdd6f77-kc8cv                1/1     Running       0          10m
logs-handler-serving-kpngc-v100-zcj4q-5f46996f8c-b9d6f   2/2     Terminating   0          36s
logs-handler-serving-kpngc-v100-zcj4q-5f46996f8c-b9d6f   0/2     Terminating   0          37s
logs-handler-serving-kpngc-v100-zcj4q-5f46996f8c-b9d6f   0/2     Terminating   0          38s
logs-handler-serving-kpngc-v100-zcj4q-5f46996f8c-b9d6f   0/2     Terminating   0          38s
logs-handler-serving-kpngc-v100-zcj4q-5f46996f8c-9kj2c   0/2     Pending       0          0s
logs-handler-serving-kpngc-v100-zcj4q-5f46996f8c-9kj2c   0/2     Pending       0          0s
logs-handler-serving-kpngc-v100-zcj4q-5f46996f8c-9kj2c   0/2     ContainerCreating   0          0s
logs-handler-serving-kpngc-v100-zcj4q-5f46996f8c-9kj2c   0/2     ContainerCreating   0          2s
logs-handler-serving-kpngc-v100-zcj4q-5f46996f8c-9kj2c   1/2     Running             0          4s
logs-handler-serving-kpngc-v100-zcj4q-5f46996f8c-9kj2c   2/2     Running             0          11s

Then we make a request to a non-existent path to the WordPress application:

curl http://<wp-svc-address>/notfound

You can see that this message has been received in Slack (in contrast, when we visit the WordPress site normally, Slack will not receive an alert message):

Explore further

Synchronous function solution

> 为了可以正常使用 Knative Serving ，我们需要设置其网关的负载均衡器地址。（你可以使用本机地址作为 workaround）
>
> 将下面的 "1.2.3.4" 替换为实际场景中的地址。
>
> ```shell
> kubectl patch svc -n kourier-system kourier \
> -p '{"spec": {"type": "LoadBalancer", "externalIPs": ["1.2.3.4"]}}'
> 
> kubectl patch configmap/config-domain -n knative-serving \
> --type merge --patch '{"data":{"1.2.3.4.sslip.io":""}}'
> ```
>

In addition to being directly operated by Kafka server-driven functions (asynchronously), OpenFunction also supports the use of its own event framework to connect to the Kafka server, and then drive the Knative function to operate in a sink mode. You can refer to the OpenFunction Samples .

In this solution, the processing speed of synchronous functions is lower than that of asynchronous functions. Of course, we can also use KEDA to trigger the concurrency mechanism of Knative Serving, but overall it lacks the convenience of asynchronous functions. (In the subsequent stages, we will optimize the OpenFunction event framework to solve the shortcomings of synchronous functions)

It can be seen that different types of serverless functions have their own task scenarios. For example, an orderly control flow function needs to be handled by a synchronous function instead of an asynchronous function.

Summary

Serverless brings the ability to quickly disassemble and reconstruct business scenarios that we expect.

As shown in this case, OpenFunction not only improves the flexibility of log processing and alarm notification links in a serverless manner, but also simplifies the complex configuration steps that are usually used when connecting to Kafka into clear semantic code through the function framework. At the same time, we are constantly evolving OpenFunction, and we will implement our own serverless capabilities to drive our own component operations in later versions.

This article is published by the blog one article multi-posting OpenWrite

One of the OpenFunction application series: implement Kubernetes log alarms in a serverless way

Overview

Use Kafka as a log receiver

Deploy OpenFunction

Write log processing function

Create function

Result presentation

Explore further

Summary

KubeSphere

引用和评论

云原生周刊：k0s 成为 CNCF 沙箱项目

云电竞巅峰对决：ToDesk/网易云/START实战测评，谁是真王者？

国产化环境下的 K8s 全离线部署：鲲鹏 + 麒麟 V10 + KubeSphere + Harbor

Higress 开源 Remote MCP Server 托管方案，并将上线 MCP 市场

助力资本与创新协同——2025年民营科技企业投贷融合赋能行动在蓉举行

实时云渲染：颠覆传统工作流的五大核心优势

通过阿里云Milvus与通义千问VL大模型，快速实现多模态搜索