观测云 DataKit 最新推出了 Docker 安装方式,本文主要介绍如何在 Docker 中安装 DataKit。
配置和启动 DataKit 容器
登陆观测云平台,点击「集成」 -「DataKit」 - 「Docker」,然后拷贝第二步的启动命令,启动参数按实际情况配置。
拷贝启动命令:
sudo docker run \
--hostname "$(hostname)" \
--workdir /usr/local/datakit \
-v "/etc/conf/dir/conf.d":"/usr/local/datakit/conf.d/host-inputs-conf"
-v "/":"/rootfs" \
-v /var/run/docker.sock:/var/run/docker.sock \
-e ENV_DATAWAY="https://openway.guance.com?token=tkn_XXXX" \
-e ENV_DEFAULT_ENABLED_INPUTS='cpu,disk,diskio,mem,swap,system,net,host_processes,hostobject,container,dk' \
-e ENV_GLOBAL_HOST_TAGS="tag1=a1,tag2=a2" \
-e ENV_HTTP_LISTEN="0.0.0.0:9529" \
-e HOST_PROC="/rootfs/proc" \
-e HOST_SYS="/rootfs/sys" \
-e HOST_ETC="/rootfs/etc" \
-e HOST_VAR="/rootfs/var" \
-e HOST_RUN="/rootfs/run" \
-e HOST_DEV="/rootfs/dev" \
-e HOST_ROOT="/rootfs" \
--cpus 2 \
--memory 1g \
--privileged \
--publish 9529:9529 \
--name datakit-docker \
-d \
pubrepo.guance.com/datakit/datakit:1.66.2
容器启动后,查看是否启动成功:
docker ps
如下所示,启动成功:
启动参数说明:
--hostname
:将宿主机的主机名作为 DataKit 运行的主机名,如果需要在当前宿主机上运行多个 DataKit,可以给它适当加一些后缀 --hostname "$(hostname)-dk1"--workdir
:设置容器工作目录-v
:各种宿主机文件挂载:- DataKit 中有很多配置文件,我们可以将其在宿主机上准备好,通过 -v 一次性整个挂载到容器中去(容器中的路径为 conf.d/host-inputs-conf 目录)
- 此处将宿主机根目录挂载进 Datakit,目的是访问宿主机上的各种信息(比如 /proc 目录下的各种文件),便于默认开启的采集器采集数据
- 将 docker.sock 文件挂载进 Datakit 容器,便于 container 采集器采集数据。不同宿主机该文件目录可能不同,需按照实际来配置
-e
:各种 Datakit 运行期的环境变量配置,这些环境变量功能跟 DaemonSet 部署 时是一样的ENV_DATAWAY
: 将 token 粘贴到 ENV_DATAWAY 环境变量值中 “token=”--publish
:便于外部将 Trace 等数据发送给 Datakit 容器,此处我们将 Datakit 的 HTTP 端口映射到外面的 9529 上,诸如 trace 数据设置发送地址的时候,需关注这个端口设置。--name
: 指定 Docker 容器名称,否则,name 将随机生成- 此处对该运行的 DataKit 设置了 2C 的 CPU 和 1GiB 内存限制
假如我们在 /host/conf/dir 目录下配置了如下一些采集器:
- APM:DDTrace/OpenTelemetry 等采集器
- Prometheuse exporter:在当前 docker 环境中,某些应用容器暴露了自身指标(一般形如 http://ip:9100/metrics),那么我们可以将其端口暴露出来,然后编写 prom.conf 来采集这些指标
- 日志采集:如果某些 Docker 容器将日志写入了宿主机的某个目录,我们可以单独编写日志采集配置来采集这些文件。不过事先我们需要通过 -v 将这些宿主机的目录挂载进 Datakit 容器。另外,默认开启的 container 采集器,会自动采集所有容器的 stdout 日志
登陆观测云平台,点击「基础设施」 - 「容器」,查看名称为 datakit-docker 容器是否上报,点击进入查看容器详情。
场景演示
如何使用 Docker 的 DataKit 采集用户应用访问数据。
开启 RUM 采集器
在挂载的目录 /etc/conf/dir/conf.d
下创建 rum 目录,然后在 rum 目录下,新建 rum.conf
文件,内容如下:
# {"version": "1.66.2", "desc": "do NOT edit this line"}
[[inputs.rum]]
## profile Agent endpoints register by version respectively.
## Endpoints can be skipped listen by remove them from the list.
## Default value set as below. DO NOT MODIFY THESE ENDPOINTS if not necessary.
endpoints = ["/v1/write/rum"]
## used to upload rum session replay.
session_replay_endpoints = ["/v1/write/rum/replay"]
## specify which metrics should be captured.
measurements = ["view", "resource", "action", "long_task", "error", "telemetry"]
## Android command-line-tools HOME
android_cmdline_home = "/usr/local/datakit/data/rum/tools/cmdline-tools"
## proguard HOME
proguard_home = "/usr/local/datakit/data/rum/tools/proguard"
## android-ndk HOME
ndk_home = "/usr/local/datakit/data/rum/tools/android-ndk"
## atos or atosl bin path
## for macOS datakit use the built-in tool atos default
## for Linux there are several tools that can be used to instead of macOS atos partially,
## such as https://github.com/everettjf/atosl-rs
atos_bin_path = "/usr/local/datakit/data/rum/tools/atosl"
# Provide a list to resolve CDN of your static resource.
# Below is the Datakit default built-in CDN list, you can uncomment that and change it to your cdn list,
# it's a JSON array like: [{"domain": "CDN domain", "name": "CDN human readable name", "website": "CDN official website"},...],
# domain field value can contains '*' as wildcard, for example: "kunlun*.com",
# it will match "kunluna.com", "kunlunab.com" and "kunlunabc.com" but not "kunlunab.c.com".
# cdn_map = '''
# [
# {"domain":"15cdn.com","name":"some-CDN-name","website":"https://www.15cdn.com"},
# {"domain":"tzcdn.cn","name":"some-CDN-name","website":"https://www.15cdn.com"}
# ]
# '''
## Threads config controls how many goroutines an agent cloud start to handle HTTP request.
## buffer is the size of jobs' buffering of worker channel.
## threads is the total number fo goroutines at running time.
# [inputs.rum.threads]
# buffer = 100
# threads = 8
## Storage config a local storage space in hard dirver to cache trace data.
## path is the local file path used to cache data.
## capacity is total space size(MB) used to store data.
# [inputs.rum.storage]
# path = "./rum_storage"
# capacity = 5120
## session_replay config is used to control Session Replay uploading behavior.
## cache_path set the disk directory where temporarily cache session replay data.
## cache_capacity_mb specify the max storage space (in MiB) that session replay cache can use.
## clear_cache_on_start set whether we should clear all previous session replay cache on restarting Datakit.
## upload_workers set the count of session replay uploading workers.
## send_timeout specify the http timeout when uploading session replay data to dataway.
## send_retry_count set the max retry count when sending every session replay request.
## filter_rules set the the filtering rules that matched session replay data will be dropped,
## all rules are of relationship OR, that is to day, the data match any one of them will be dropped.
# [inputs.rum.session_replay]
# cache_path = "/usr/local/datakit/cache/session_replay"
# cache_capacity_mb = 20480
# clear_cache_on_start = false
# upload_workers = 16
# send_timeout = "75s"
# send_retry_count = 3
# filter_rules = [
# "{ service = 'xxx' or version IN [ 'v1', 'v2'] }",
# "{ app_id = 'yyy' and env = 'production' }"
# ]
然后重启 DataKit。
docker restart datakit-docker
docker ps
进入容器查看是否挂载成功,如下图所示已成功挂载。
docker exec -it datakit-docker /bin/bash
datakit monitor
应用接入
登录观测云控制台,进入「用户访问监测」,点击左上角「新建应用」,即可开始创建一个新的应用。
选择 Web 应用,并选择本地环境部署的 NPM 接入方式。
按需填入配置参数,点击创建,即可在应用列表查看应用。
然后,将 SDK 复制到前端项目中。
启动应用后,进行访问,相关数据会上报到观测云平台。
观测云效果
登录观测云控制台,点击「用户访问监测」 -「应用列表」,然后点击创建的应用。
点击查看器,就能查询采集到的用户访问数据。
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。