单个 CSV 文件中的多个分隔符

新手上路,请多包涵

我有一个 CSV,它有三个不同的分隔符,即“|”、“,”和“;”不同列之间。

我如何使用 Python 解析此 CSV?

我的数据如下:

 2017-01-24|05:19:30+0000|TRANSACTIONDelim_secondUSER_LOGINDelim_firstCONSUMERIDDelim_secondc4115f53-3798-4c9e-9bfd-506c842aff96Delim_firstTRANSACTIONDATEDelim_second17-01-24 05:19:30Delim_firstCHANNELIDDelim_secondDelim_firstSHOWIDDelim_secondDelim_firstEPISODEIDDelim_secondDelim_firstBUSINESSUNITDelim_secondnullDelim_firstAIRINGDATEDelim_second|**
2017-01-24|05:19:30+0000|TRANSACTIONDelim_secondUSER_LOGOUTDelim_firstCONSUMERIDDelim_second1583e83882b8e7Delim_firstTRANSACTIONDATEDelim_second17-01-24 05:19:26Delim_firstCHANNELIDDelim_secondDelim_firstSHOWIDDelim_secondDelim_firstEPISODEIDDelim_secondDelim_firstBUSINESSUNITDelim_secondbu002Delim_firstAIRINGDATEDelim_second24-Jan-2017|**
2017-01-24|05:21:59+0000|TRANSACTIONDelim_secondVIEW_PRIVACY_POLICYDelim_firstCONSUMERIDDelim_secondnullDelim_firstTRANSACTIONDATEDelim_second17-01-24 05:21:59Delim_firstCHANNELIDDelim_secondDelim_firstSHOWIDDelim_secondDelim_firstEPISODEIDDelim_secondDelim_firstBUSINESSUNITDelim_secondnullDelim_firstAIRINGDATEDelim_second|**
2017-01-24|05:59:25+0000|TRANSACTIONDelim_secondUSER_LOGOUTDelim_firstCONSUMERIDDelim_second1586a2aa4bc18fDelim_firstTRANSACTIONDATEDelim_second17-01-24 05:59:21Delim_firstCHANNELIDDelim_secondDelim_firstSHOWIDDelim_secondDelim_firstEPISODEIDDelim_secondDelim_firstBUSINESSUNITDelim_secondbu002Delim_firstAIRINGDATEDelim_second24-Jan-2017|**
2017-01-24|05:59:36+0000|TRANSACTIONDelim_secondUSER_LOGOUTDelim_firstCONSUMERIDDelim_second1583e83882b8e7Delim_firstTRANSACTIONDATEDelim_second17-01-24 05:59:31Delim_firstCHANNELIDDelim_secondDelim_firstSHOWIDDelim_secondDelim_firstEPISODEIDDelim_secondDelim_firstBUSINESSUNITDelim_secondbu002Delim_firstAIRINGDATEDelim_second24-Jan-2017|**
2017-01-24|06:04:25+0000|TRANSACTIONDelim_secondUSER_LOGOUTDelim_firstCONSUMERIDDelim_secondc4115f53-3798-4c9e-9bfd-506c842aff96Delim_firstTRANSACTIONDATEDelim_second17-01-24 06:04:24Delim_firstCHANNELIDDelim_secondDelim_firstSHOWIDDelim_secondDelim_firstEPISODEIDDelim_secondDelim_firstBUSINESSUNITDelim_secondbu002Delim_firstAIRINGDATEDelim_second|**
2017-01-24|06:05:07+0000|TRANSACTIONDelim_secondUSER_LOGINDelim_firstCONSUMERIDDelim_secondc4115f53-3798-4c9e-9bfd-506c842aff96Delim_firstTRANSACTIONDATEDelim_second17-01-24 06:05:07Delim_firstCHANNELIDDelim_secondDelim_firstSHOWIDDelim_secondDelim_firstEPISODEIDDelim_secondDelim_firstBUSINESSUNITDelim_secondnullDelim_firstAIRINGDATEDelim_second|**
2017-01-24|06:05:07+0000|TRANSACTIONDelim_secondUSER_LOGINDelim_firstCONSUMERIDDelim_secondc4115f53-3798-4c9e-9bfd-506c842aff96Delim_firstTRANSACTIONDATEDelim_second17-01-24 06:05:07Delim_firstCHANNELIDDelim_secondDelim_firstSHOWIDDelim_secondDelim_firstEPISODEIDDelim_secondDelim_firstBUSINESSUNITDelim_secondbu002Delim_firstAIRINGDATEDelim_second|**

原文由 Sarang Manjrekar 发布,翻译遵循 CC BY-SA 4.0 许可协议

阅读 788
2 个回答

坚持使用标准库, re.split() 可以在这些字符中的任何一个处分割一行:

 import re

with open(file_name) as fobj:
    for line in fobj:
        line_data = re.split('Delim_first|Delim_second|[|]', line)
        print(line_data)

这将在分隔符处拆分 |Delim_firstDelim_second

或者与熊猫:

 import pandas as pd
df = pd.read_csv('multi_delim.csv', sep='Delim_first|Delim_second|[|]',
                  engine='python', header=None)

结果:

在此处输入图像描述

原文由 Mike Müller 发布,翻译遵循 CC BY-SA 3.0 许可协议

一种简单的方法来实现你想要的是使用 熊猫 包,这里有一个小例子:

 import pandas as pd
import StringIO

data = StringIO.StringIO("""a;b|c;
2016-09-05 10:47:00|1,foo;
2016-09-06 10:47:00;2;foo2;
2016-09-07 10:47:00;3;foo3;""")

df = pd.read_csv(data, sep='[;,|]', engine='python')
for c in ['a', 'b', 'c']:
    print('-' * 80)
    print(df[c])

原文由 BPL 发布,翻译遵循 CC BY-SA 3.0 许可协议

撰写回答
你尚未登录,登录后可以
  • 和开发者交流问题的细节
  • 关注并接收问题和回答的更新提醒
  • 参与内容的编辑和改进,让解决方法与时俱进
推荐问题
logo
Stack Overflow 翻译
子站问答
访问
宣传栏