多个日志文件中查找是否含有某个字符串,发现aiofiles很慢,不知道是否使用方法有误?恳请指点
files = [
r'C:\log\20210523.log',
r'C:\log\20210522.log',
r'C:\log\20210521.log',
r'C:\log\20210524.log',
r'C:\log\20210525.log',
r'C:\log\20210520.log',
r'C:\log\20210519.log',
]
async def match_content_in_file(filename:str,content:str,encoding:str="gbk")->bool:
async with aiofiles.open(filename,mode="r",encoding=encoding) as f:
# text = await f.read()
# return content in text
async for line in f:
if content in line:
return True
def match_content_in_file2(filename:str,content:str,encoding:str="gbk")->bool:
with open(filename,mode="r",encoding=encoding) as f:
# text = f.read()
# return content in text
for line in f:
if content in line:
return True
async def main3():
start = time.time()
tasks = [match_content_in_file(f,'808395') for f in files]
l = await asyncio.gather(*tasks)
print(l)
end = time.time()
print(end - start)
def main2():
start = time.time()
l = []
for f in files:
l.append(match_content_in_file2(f,'808395'))
print(l)
end = time.time()
print(end-start)
if __name__ == '__main__':
asyncio.run(main3()) # 很慢
main2() # 很快
实测情况(每个文件约7.5M)
- 逐行读取文件内容异步方式耗时巨大。
[True, True, True, None, None, True, True]
异步方式: 40.80606389045715
-------------------------------------
[True, True, True, None, None, True, True]
同步方式: 0.48870062828063965
- 一次性读取文件内容,异步方式和同步方式差别不大,但还是同步快一点
[True, True, True, False, False, True, True]
异步方式: 0.6835882663726807
-------------------------------------
[True, True, True, False, False, True, True]
同步方式: 0.6745946407318115
环境
python 3.9.2 win10
硬盘读取一个文件是最快的, 同时多读几个文件, 要在多个磁盘块中反复切换, 反而慢.
读文件和网络通讯不一样, 网络请求是在发送后, 需要等待, 这个时候可以使用协程提升并发数量.
硬盘不行.