linux文本处理,两行时间相减

原始日志为:

[t=123]xyzzda, x=abc
[t=126]sdjljs, x=abc
[t=140]sdsws, x=abc
[t=239]dsjdjs, x=wvu
[t=248]sdsdess, x=wvu

所有x值有单独文件x.log:

abc
wvu
xxx

最终想的到相同x值的后面一样减去前面一行的t值。即:

abc:
sdjljs t=3
sdsws  t=14

wvu:
sdsdess t=9

想问一下,awk或者python脚本有没有合适的处理方式?

阅读 5.6k
4 个回答
# cat text.log
[t=123]xyzzda, x=abc
[t=126]sdjljs, x=abc
[t=140]sdsws, x=abc
[t=239]dsjdjs, x=wvu
[t=248]sdsdess, x=wvu

# cat x.log
abc
wvu
xxx


awk -F'[]=, ]' 'NR==FNR{a[$0];next}a[$6]{b[$6]=b[$6]"\n"$3" t="$2-a[$6]}{a[$6]=$2}END{for(i in b)printf "%s:%s\n\n",i,b[i]}' x.log, text.log

# 输出:
abc:
sdjljs t=3
sdsws t=14

wvu:
sdsdess t=9

大概像這樣:

# data
log = """
[t=123]xyzzda, x=abc
[t=126]sdjljs, x=abc
[t=140]sdsws, x=abc
[t=239]dsjdjs, x=wvu
[t=248]sdsdess, x=wvu
"""
# code
import re
from collections import defaultdict

dic = defaultdict(list)
golden_x, golden_t = None, None

for line in log.split('\n'):
    line = line.strip()
    if not line:
        continue
    m = re.match('\[t=(\d+)\](.+), x=(.+)', line)
    t, c, x = m.groups()
    if x == golden_x:
        dic[x].append((c, int(t) - golden_t))
    golden_x, golden_t = x, int(t)
    
for key, ct in dic.items():
    print(key+':')
    for c, t in ct:
        print(c, 't='+str(t))
    print()
# results
abc:
sdjljs t=3
sdsws t=14

wvu:
sdsdess t=9

我回答過的問題: Python-QA

import pandas as pd
import re

log = """
[t=123]xyzzda, x=abc
[t=126]sdjljs, x=abc
[t=140]sdsws, x=abc
[t=239]dsjdjs, x=wvu
[t=248]sdsdess, x=wvu
"""

log = log.strip("\n")
data = re.findall('\[t=(\d+)\](.+), x=(.+)', log)

df = pd.DataFrame(data, columns=["a", "b", "c"])
shift_values = df["a"].groupby(df["c"]).shift(1)
df["d"] = shift_values
df = df.dropna()

df["e"] = df["a"].astype(int) - df["d"].astype(int)

print df

结果

     a        b    c    d   e
1  126   sdjljs  abc  123   3
2  140    sdsws  abc  126  14
4  248  sdsdess  wvu  239   9
import re

s = re.compile('\[t=(\d+)\](.+), x=(.+)').findall(log)    
s.sort(key=lambda i: (i[2], i[0]))
res = [(s[i+1][2], s[i+1][1], int(s[i+1][0])-int(s[i][0]))
       for i in range(len(s)-1) if s[i+1][2] == s[i][2]]
out =  '\n'.join(['{}:{} t={}'.format(*i) for i in res])
print(out)    

结果如下

abc:sdjljs t=3
abc:sdsws t=14
wvu:sdsdess t=9
撰写回答
你尚未登录,登录后可以
  • 和开发者交流问题的细节
  • 关注并接收问题和回答的更新提醒
  • 参与内容的编辑和改进,让解决方法与时俱进
推荐问题