Preface
mtail is a log extraction tool developed by Google, which is lighter than ELK/EFK/Grafana Loki. Because the demand I encountered was only to collect data in the production log, I used a simpler mtail with Prometheus and Grafana to implement custom log data monitoring.
Update history
August 04, 2021-First Draft
Read the original text- https://wsgzao.github.io/post/mtail/
Common log monitoring solutions
Open source business log monitoring, I mainly recommend the following 3
It is worth noting that ELK is currently being replaced by EFK
1: ELK-"ELK" is the acronym for three open source projects, these three projects are: Elasticsearch, Logstash and Kibana.
Elasticsearch is a search and analysis engine.
Logstash is a server-side data processing pipeline that can collect data from multiple sources at the same time, transform the data, and then send the data to a "repository" such as Elasticsearch.
Kibana allows users to use graphs and charts to visualize data in Elasticsearch.
2: Loki, the latest open source project of the Grafana Labs team, is a horizontally scalable, highly available, multi-tenant log aggregation system.
3: mtail: It is a log extraction tool developed by Google, which extracts metrics from application logs to export to a time series database or time series calculator,
The purpose is to read the application log in real time, analyze it through the script written by yourself, and finally generate time series indicators.
The tool that suits you is the best. Both EFK and Loki are fully functional log collection systems. Of course, they also have their own advantages.
Some experience is recorded in the Blog, you can refer to it
Scribe installation and use- https://wsgzao.github.io/post/scribe/
Use ELK (Elasticsearch + Logstash + Kibana) to build a log centralized analysis platform practice- https://wsgzao.github.io/post/elk/
The difference between the open source log management solution ELK and EFK- https://wsgzao.github.io/post/efk/
Grafana Loki open source log aggregation system instead of ELK or EFK- https://wsgzao.github.io/post/loki/
Introduction to mtail
mtail - extract whitebox monitoring data from application logs for collection into a timeseries database
mtail
is a tool for extracting metrics from application logs to be exported into a timeseries database or timeseries calculator for alerting and dashboarding.
It fills a monitoring niche by being the glue between applications that do not export their own internal state (other than via logs) and existing monitoring systems, such that system operators do not need to patch those applications to instrument them or writing custom extraction code for every such application.
The extraction is controlled by mtail programs which define patterns and actions:
# simple line counter
counter lines_total
/$/ {
lines_total++
}
Metrics are exported for scraping by a collector as JSON or Prometheus format over HTTP, or can be periodically sent to a collectd, StatsD, or Graphite collector socket.
mtail is a tool used to extract metrics from application logs to export to a time series database or time series calculator for alerts and dashboard display. Simply put, it is a tool that reads the application log in real time, and analyzes it in real time through a script written by yourself, and finally generates a time series indicator.
https://github.com/google/mtail
mtail installation
Download : 16114bff6b8974 https://github.com/google/mtail/releases
# check latest version from github
wget https://github.com/google/mtail/releases/download/v3.0.0-rc47/mtail_3.0.0-rc47_Linux_x86_64.tar.gz
tar xf mtail_3.0.0-rc47_Linux_x86_64.tar.gz
# can choose to cp mtail to /usr/local/bin
# cp mtail /usr/local/bin
# 查看mtail版本
./mtail --version
mtail version 3.0.0-rc47 git revision 5e0099f843e4e4f2b7189c21019de18eb49181bf go version go1.16.5 go arch amd64 go os linux
# mtail后台启动
nohup mtail -port 3903 -logtostderr -progs test.mtail -logs test.log &
# 默认端口是3903
nohup ./mtail -progs test.mtail -logs test.log &
# 查看是否启动成功
ps -ef | grep mtail
Detailed parameter explanation: console operation mtail -h
Here are a few simple parameters
参数 描述
-address 绑定HTTP监听器的主机或IP地址
-alsologtostderr 记录标准错误和文件
-emit_metric_timestamp 发出metric的记录时间戳。如果禁用(默认设置),则不会向收集器发送显式时间戳。
-expired_metrics_gc_interval metric的垃圾收集器运行间隔(默认为1h0m0s)
-ignore_filename_regex_pattern 需要忽略的日志文件名字,支持正则表达式。
-log_dir mtail程序的日志文件的目录,与logtostderr作用类似,如果同时配置了logtostderr参数,则log_dir参数无效
-logs 监控的日志文件列表,可以使用,分隔多个文件,也可以多次使用-logs参数,也可以指定一个文件目录,支持通配符*,指定文件目录时需要对目录使用单引号。如:
-logs a.log,b.log
-logs a.log -logs b.log
-logs ‘/export/logs/*.log’
-logtostderr 直接输出标准错误信息,编译问题也直接输出
-override_timezone 设置时区,如果使用此参数,将在时间戳转换中使用指定的时区来替代UTC
-port 监听的http端口,默认3903
-progs mtail脚本程序所在路径
-trace_sample_period 用于设置跟踪的采样频率和发送到收集器的频率。将其设置为100,则100条收集一条追踪。
-v v日志的日志级别,该设置可能被 vmodule标志给覆盖.默认为0.
-version 打印mtail版本
After the program is started, port 3903 is monitored by default, which can be accessed through http://ip:3903, and metrics can be accessed through http://ip:3903/metrics
Detailed explanation of mtail parameters
./mtail -h
mtail version 3.0.0-rc47 git revision 5e0099f843e4e4f2b7189c21019de18eb49181bf go version go1.16.5 go arch amd64 go os linux
Usage:
-address string
Host or IP address on which to bind HTTP listener
-alsologtostderr
log to standard error as well as files
-block_profile_rate int
Nanoseconds of block time before goroutine blocking events reported. 0 turns off. See https://golang.org/pkg/runtime/#SetBlockProfileRate
-collectd_prefix string
Prefix to use for collectd metrics.
-collectd_socketpath string
Path to collectd unixsock to write metrics to.
-compile_only
Compile programs only, do not load the virtual machine.
-disable_fsnotify
DEPRECATED: this flag is no longer in use. (default true)
-dump_ast
Dump AST of programs after parse (to INFO log).
-dump_ast_types
Dump AST of programs with type annotation after typecheck (to INFO log).
-dump_bytecode
Dump bytecode of programs (to INFO log).
-emit_metric_timestamp
Emit the recorded timestamp of a metric. If disabled (the default) no explicit timestamp is sent to a collector.
-emit_prog_label
Emit the 'prog' label in variable exports. (default true)
-expired_metrics_gc_interval duration
interval between expired metric garbage collection runs (default 1h0m0s)
-graphite_host_port string
Host:port to graphite carbon server to write metrics to.
-graphite_prefix string
Prefix to use for graphite metrics.
-ignore_filename_regex_pattern string
-jaeger_endpoint string
If set, collector endpoint URL of jaeger thrift service
-log_backtrace_at value
when logging hits line file:N, emit a stack trace
-log_dir string
If non-empty, write log files in this directory
-logs value
List of log files to monitor, separated by commas. This flag may be specified multiple times.
-logtostderr
log to standard error instead of files
-max_recursion_depth int
The maximum length a mtail statement can be, as measured by parsed tokens. Excessively long mtail expressions are likely to cause compilation and runtime performance problems. (default 100)
-max_regexp_length int
The maximum length a mtail regexp expression can have. Excessively long patterns are likely to cause compilation and runtime performance problems. (default 1024)
-metric_push_interval duration
interval between metric pushes to passive collectors (default 1m0s)
-metric_push_interval_seconds int
DEPRECATED: use --metric_push_interval instead
-metric_push_write_deadline duration
Time to wait for a push to succeed before exiting with an error. (default 10s)
-mtailDebug int
Set parser debug level.
-mutex_profile_fraction int
Fraction of mutex contention events reported. 0 turns off. See http://golang.org/pkg/runtime/#SetMutexProfileFraction
-one_shot
Compile the programs, then read the contents of the provided logs from start until EOF, print the values of the metrics store and exit. This is a debugging flag only, not for production use.
-override_timezone string
If set, use the provided timezone in timestamp conversion, instead of UTC.
-poll_interval duration
Set the interval to poll all log files for data; must be positive, or zero to disable polling. With polling mode, only the files found at mtail startup will be polled. (default 250ms)
-port string
HTTP port to listen on. (default "3903")
-progs string
Name of the directory containing mtail programs
-stale_log_gc_interval duration
interval between stale log garbage collection runs (default 1h0m0s)
-statsd_hostport string
Host:port to statsd server to write metrics to.
-statsd_prefix string
Prefix to use for statsd metrics.
-stderrthreshold value
logs at or above this threshold go to stderr
-syslog_use_current_year
Patch yearless timestamps with the present year. (default true)
-trace_sample_period int
Sample period for traces. If non-zero, every nth trace will be sampled.
-unix_socket string
UNIX Socket to listen on
-v value
log level for V logs
-version
Print mtail version information.
-vm_logs_runtime_errors
Enables logging of runtime errors to the standard log. Set to false to only have the errors printed to the HTTP console. (default true)
-vmodule value
comma-separated list of pattern=N settings for file-filtered logging
parameter | describe |
---|---|
-address | Bind the host or IP address of the HTTP listener |
-alsologtostderr | Log standard errors and files |
-block\_profile\_rate | Nanosecond time before reporting the goroutine blocking event |
-collectd\_prefix | The metrics prefix of the metrics sent to collectd |
-collectd\_socketpath | collectd unixsock path, used to write metrics to it |
-compile\_only | Only try to compile the mtail script program, not execute it, it is used to test the script |
-disable\_fsnotify | Whether to disable the dynamic file discovery mechanism. When it is true, it will not monitor the new files found by dynamic loading, only the files when the program is started. |
-dump\_ast | AST of the dump program after parsing (default to /tmp/mtail.INFO) |
-dump\_ast\_types | Dump the AST of programs with type annotations after type checking (default to /tmp/mtail.INFO) |
-dump\_bytecode | dump program bytecode |
-emit\_metric\_timestamp | The record timestamp of the sent metric. If disabled (the default setting), no explicit timestamp will be sent to the collector. |
-emit\_prog\_label | Display the label corresponding to'prog' in the exported variable. The default is true |
-expired\_metrics\_gc\_interval | metric's garbage collector running interval (default is 1h0m0s) |
-graphite\_host\_port | The address of the graphite carbon server, in the format Host:port. Used to write metrics to the graphite carbon server |
-graphite\_prefix | Metrics prefix sent to graphite metrics |
-ignore\_filename\_regex\_pattern | The name of the log file to be ignored, supports regular expressions. Usage scenario: When the -logs parameter specifies a directory, you can use the ignore\_filename\_regex\_pattern parameter to ignore some files |
-jaeger\_endpoint | If set to true, the trace can be exported to the Jaeger trace collector. Use the --jaeger\_endpoint flag to specify the Jaeger endpoint URL |
-log\_backtrace\_at | When the log record hits the set line N, emit a stack trace |
-log\_dir | The directory of the log file of the mtail program is similar to the logtostderr function. If the logtostderr parameter is configured at the same time, the log\_dir parameter is invalid |
-logs | The list of monitored log files can be used to separate multiple files. You can also use the -logs parameter multiple times. You can also specify a file directory. Wildcard * is supported. When you specify a file directory, you need to use single quotes for the directory. |
-logtostderr | Directly output standard error messages, and also directly output compilation problems |
-metric\_push\_interval\_seconds | metric push interval, unit: second, default 60 seconds |
-metric\_push\_write\_deadline | The time to wait for a successful push before exiting with an error. (Default 10s) |
-mtailDebug | Set the parser debug level |
-mutex\_profile\_fraction | The score of the reported mutex contention event. 0 is off. (This parameter is a literal translation, I don’t understand what it means) |
-one\_shot | This parameter will compile and run the mtail program, and then will read the log from the beginning of the specified file (read the log from the beginning, not a real-time tail), and then print all the collected metrics to the log. This parameter is used to verify whether the mtail program has the expected output and is not used in a production environment. |
-override\_timezone | Set the time zone. If this parameter is used, the specified time zone will be used in the time stamp conversion instead of UTC |
-poll\_interval | Set the interval for polling all log files for data; must be positive, if it is zero, polling will be disabled. Use polling mode, will poll only the files found when mtail is started |
-port | Listening http port, default 3903 |
-progs | Path where mtail script program is located |
-stale\_log\_gc\_interval | stale garbage collector running interval (default is 1h0m0s) |
-statsd\_hostport | statsd address, format Host:port. Used to write metrics to statsd |
-statsd\_prefix | Metrics prefix sent to statsd metrics |
-stderrthreshold | The log information whose severity level exceeds the threshold is not only written to the log file, but also output to stderr. The corresponding values for each severity level: INFO—0, WARNING—1, ERROR—2, FATAL—3, the default value is 2. |
-syslog\_use\_current\_year | If the time stamp does not have a year, use the current year instead. (Default is true) |
-trace\_sample\_period | Used to set the sampling frequency of tracking and the frequency of sending to the collector. Set it to 100, then 100 traces will be collected. |
-v | v The log level of the log. This setting may be overridden by the vmodule flag. The default is 0. |
-version | Print mtail version |
-vmodule | Set the log level by file or module, such as: -vmodule=mapreduce=2,file=1,gfs*=3 |
mtail script syntax
Read the programming guide if you want to learn how to write mtail programs.
https://github.com/google/mtail/blob/main/docs/Programming-Guide.md
mtail script standard format
The standard format is:
COND {
ACTION
}
Where COND
is a conditional expression. It can be a regular expression or a conditional statement of type boolean. as follows:
/foo/ {
ACTION1
}
variable > 0 {
ACTION2
}
/foo/ && variable > 0 {
ACTION3
}
COND
expressions are as follows:
- Relational operators:
< , <= , > , >= , == , != , =~ , !~ , || , && , !
- Arithmetic operators:
| , & , ^ , + , - , * , /, << , >> , **
available operators for the index variable are as follows:
\= , += , ++ , –
mtail
is to extract information from the log and pass it to the monitoring system. Therefore, the indicator variable must be exported and named. The naming can use indicator types such as counter, gauge, etc., and the named variable must be before the COND
script.
For example, export a counter type indicator lines\_total: count the number of log lines, the script content is as follows:
# simple line counter
counter lines_total
/$/ {
lines_total++
}
Types supported by mtail
The three types of counter, gauge, and histogram in mtail are the same as those described in the prometheus type.
The counter type data is a monotonically increasing indicator, that is, it only increases without decreasing. For example, you can use counter type indicators to indicate the number of service requests, the number of successful tasks, and the number of failed tasks.
The gauge type data refers to the index that can be changed arbitrarily, which can be increased or decreased. For example, you can extract the data matched by the regularity, assign it directly to the indicator variable and return it, or return it after calculation.
The histogram (histogram) divides the data into statistics, quoting the description of histogram in prometheus:
In most cases, people tend to use the average value of some quantitative indicators, such as the average CPU usage and the average response time of the page. The problem with this approach is obvious. Take the average response time of system API calls as an example: If most API requests are maintained within 100ms of response time, and the response time of individual requests takes 5s, then some WEBs will be caused. The response time of the page falls to the median, and this phenomenon is called the long tail problem.
In order to distinguish between average slowness and long-tailed slowness, the simplest way is to group according to the range of request delay. For example, count the number of requests with a delay of 0~10ms and the number of requests with a delay of 10~20ms. In this way, the reason for the slow system can be quickly analyzed. Both Histogram and Summary are to be able to solve the existence of such problems. Through the monitoring indicators of Histogram and Summary type, we can quickly understand the distribution of monitoring samples.
Histogram samples the data within a certain period of time (usually request duration or response size, etc.), and counts it in a configurable bucket (bucket), and then you can filter the sample through the specified interval, or you can count the sample The total is , and finally the data is generally displayed as a histogram.
Mtail detailed explanation- https://blog.csdn.net/bluuusea/article/details/105508897
Configure Prometheus data source
After restarting Prometheus, add a new Panel to Grafana Dashoard, and configure the datasource that has been set for it
vim prometheus-config.yml
# 全局配置
global:
scrape_interval: 15s
evaluation_interval: 15s
scrape_configs:
# 监控mtail日志
- job_name: 'mtail'
static_configs:
- targets: ['内网ip:3903']
Reference article
prometheus+grafana+mtail+node_exporter realizes machine load and business monitoring
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。