8

Hello everyone, I am Xiaofu~

Recently I found some interesting tools, I can’t wait to share with you.

How usually check Linux log it?
Like I usually would use tail , head , cat , sed , more , less these classic system commands, or awk such party data filtering tools, together with the high query efficiency. But in the process of using it, I have a headache, that is, there are too many command parameter rules, and people who remember it hurt their brains.

Is there a general way to check logs, such as SQL query, after all, this is an expression that programmers are more familiar with.

The tool q shared today realizes the way to query and count the text content by writing SQL, let’s take a look at the magic of this product.

Set up an environment

q is a command line tool that allows us any file or results , for example, can ps -ef SQL statement to query the result set of a query process command and executed directly.

The purpose is that the text is the database table, um~, of course this sentence is my own understanding, hahaha

It treats ordinary files or result sets as database tables and supports almost all SQL structures, such as WHERE , GROUP BY , JOINS etc. supports automatic column name and column type detection , supports cross-file join query , these two details Introduction, supports multiple encodings.

The installation is relatively simple. In the Linux CentOS environment, as long as the following three steps Windows environment only needs to install a exe .

wget https://github.com/harelba/q/releases/download/1.7.1/q-text-as-data-1.7.1-1.noarch.rpm #下载版本

sudo rpm -ivh q-text-as-data-1.7.1-1.noarch.rpm # 安装

q --version  #查看安装版本
Official document: https://harelba.github.io/q

grammar

q Support all SQLite SQL syntax, standard command line format q + parameter command + "SQL"

q <命令> "<SQL>"

I want to query the content of the myfile.log q "SELECT * FROM myfile.log" .

q "SELECT * FROM myfile.log"

q There is no problem to use without additional parameters, but the use of parameters will make the displayed results more beautiful, so here is a brief understanding, its parameters are divided into two types.

input input command: refers to the operation of the file or result set to be queried, such as: -H command, which means that the input data contains the header line.

q -H "SELECT * FROM myfile.log"

In this case, the column name will be automatically detected and can be used in the query statement. If this option is not provided, the column will be automatically named cX, starting with c1 and so on.

q  "select c1,c2 from ..."
  • output output command: used to query the output result set, for example: -O , let the query result display the column name.
[root@iZ2zebfzaequ90bdlz820sZ software]# ps -ef | q -H "select count(UID) from - where UID='root'"
104
[root@iZ2zebfzaequ90bdlz820sZ software]# ps -ef | q -H -O "select count(UID) from - where UID='root'"
count(UID)
104

There are still many parameters that are not listed one by one. Interested students can look at the official website. Next, we will focus on demonstrating how to use SQL to deal with various query log scenarios.

There are many thieves

Let's look at several common scenarios of query logs below, how to write this SQL.

1. Keyword query

Keyword retrieval should be the most frequently used operation in daily development, but I personally think that q has no advantage, because it must specify a certain column when querying.

[root@iZ2zebfzaequ90bdlz820sZ software]# q "select * from douyin.log where c9 like '%待解析%'"
2021-06-11 14:46:49.323 INFO 22790 --- [nio-8888-exec-2] c.x.douyin.controller.ParserController : 待解析URL :url=https%3A%2F%2Fv.douyin.com%2Fe9g9uJ6%2F                                             
2021-06-11 14:57:31.938 INFO 22790 --- [nio-8888-exec-5] c.x.douyin.controller.ParserController : 待解析URL :url=https%3A%2F%2Fv.douyin.com%2Fe9pdhGP%2F                                             
2021-06-11 15:23:48.004 INFO 22790 --- [nio-8888-exec-2] c.x.douyin.controller.ParserController : 待解析URL :url=https%3A%2F%2Fv.douyin.com%2Fe9pQjBR%2F                                             
2021-06-11 2

The grep command is a full-text search.

[root@iZ2zebfzaequ90bdlz820sZ software]# cat douyin.log | grep '待解析URL'
2021-06-11 14:46:49.323  INFO 22790 --- [nio-8888-exec-2] c.x.douyin.controller.ParserController   : 待解析URL :url=https%3A%2F%2Fv.douyin.com%2Fe9g9uJ6%2F
2021-06-11 14:57:31.938  INFO 22790 --- [nio-8888-exec-5] c.x.douyin.controller.ParserController   : 待解析URL :url=https%3A%2F%2Fv.douyin.com%2Fe9pdhGP%2F

2. Fuzzy query

like Fuzzy search, if the text content has a name listed, directly use the column name to search, if not, directly use the column number c1, c2, cN.

[root@iZ2zebfzaequ90bdlz820sZ software]# cat test.log 
abc
2
3
4
5
23
24
25
[root@iZ2zebfzaequ90bdlz820sZ software]# q -H -t "select * from test.log where abc like '%2%'"
Warning: column count is one - did you provide the correct delimiter?
2
23
24
25

3, intersection and union

Support UNION and UNION ALL operators to take the intersection or union of multiple files.

test.log and test1.log are created as follows, and the contents inside are overlapped. Use union for deduplication.

q -H -t "select * from test.log union select * from test1.log"

[root@iZ2zebfzaequ90bdlz820sZ software]# cat test.log 
abc
2
3
4
5
[root@iZ2zebfzaequ90bdlz820sZ software]# cat test1.log 
abc
3
4
5
6
[root@iZ2zebfzaequ90bdlz820sZ software]# q -H -t "select * from test.log union select * from test1.log"
Warning: column count is one - did you provide the correct delimiter?
Warning: column count is one - did you provide the correct delimiter?
2
3
4
5
6

4. Content deduplication

For example, uuid the total number of ./clicks.csv file under a certain path after de-duplication.

q -H -t "SELECT COUNT(DISTINCT(uuid)) FROM ./clicks.csv"

5. Automatic detection of column type

Note : q will understand whether each column is a number or a string, and determine whether to filter based on real value comparison or string comparison. Here, the -t command will be used.

q -H -t "SELECT request_id,score FROM ./clicks.csv WHERE score > 0.7 ORDER BY score DESC LIMIT 5"

6, field operation

Read the query result of the system command and calculate the total value of each user and group in the /tmp You can perform arithmetic processing on the field.

sudo find /tmp -ls | q "SELECT c5,c6,sum(c7)/1024.0/1024 AS total FROM - GROUP BY c5,c6 ORDER BY total desc"

[root@iZ2zebfzaequ90bdlz820sZ software]# sudo find /tmp -ls | q "SELECT c5,c6,sum(c7)/1024.0/1024 AS total FROM - GROUP BY c5,c6 ORDER BY total desc"
www www 8.86311340332
root root 0.207922935486
mysql mysql 4.76837158203e-06

7. Data statistics

The statistical system has the top 3 user IDs with the largest number of processes, sorted in descending order, which needs to be used in conjunction with system commands. First query all processes and then use SQL to filter. The q command here is equivalent to the grep command.

ps -ef | q -H "SELECT UID,COUNT(*) cnt FROM - GROUP BY UID ORDER BY cnt DESC LIMIT 3"

[root@iZ2zebfzaequ90bdlz820sZ software]# ps -ef | q -H "SELECT UID,COUNT(*) cnt FROM - GROUP BY UID ORDER BY cnt DESC LIMIT 3"
root 104
www 16
rabbitmq 4
[root@iZ2zebfzaequ90bdlz820sZ software]# ps -ef | q -H -O "SELECT UID,COUNT(*) cnt FROM - GROUP BY UID ORDER BY cnt DESC LIMIT 3"
UID cnt
root 110
www 16
rabbitmq 4

We see the difference between adding and not adding the -O command, whether to display the title of the query result.

8, even file check

Under normal circumstances, our log files will be divided into many sub-files of fixed capacity on a daily basis. Without a unified log collection server, if you don’t give an error time interval to search for a keyword, it is tantamount to finding a needle in a haystack. .

It will save a lot of trouble if you can merge the contents of all files and then check it. q supports joint query of files like database tables.

q -H "select * from douyin.log a join douyin-2021-06-18.0.log b on (a.c2=b.c3) where b.c1='root'"

to sum up

After reading it, someone may raise the awk n't it q write so much code and directly use 060d02af46bfda? Um~ The original intention of introducing this tool is not to replace the existing tool, but to provide a more convenient way to check logs.

I am also using awk is really powerful, but it involves a learning cost issue. If you want to play with the dazzling array of commands and matching rules, you still have to work hard. For novice programmers with a little database experience, writing SQL is not a big problem, and q started with 060d02af46bfff will be much easier.


Hundreds of various technical e-books have been sorted out and given to friends. Follow the public number and reply [ 666 ] to collect it by yourself. I have established a technical exchange group with some friends to discuss technology and share technical information, aiming to learn and progress together, if you are interested, join us!

E-book address


程序员小富
2.7k 声望5.3k 粉丝