如何从熊猫中的字符串中提取前 8 个字符

新手上路,请多包涵

我在数据框中有列,我正在尝试从字符串中提取 8 位数字。我该怎么做

    Input
 Shipment ID
20180504-S-20000
20180514-S-20537
20180514-S-20541
20180514-S-20644
20180514-S-20644
20180516-S-20009
20180516-S-20009
20180516-S-20009
20180516-S-20009

预期产出

Order_Date
20180504
20180514
20180514
20180514
20180514
20180516
20180516
20180516
20180516

我尝试了下面的代码,但没有用。

 data['Order_Date'] = data['Shipment ID'][:8]

原文由 Rahul rajan 发布,翻译遵循 CC BY-SA 4.0 许可协议

阅读 575
2 个回答

您很接近,需要使用 str 进行索引,这适用于 Serie 的每个值:

 data['Order_Date'] = data['Shipment ID'].str[:8]

为了获得更好的性能,如果没有 NaN s 值:

 data['Order_Date'] = [x[:8] for x in data['Shipment ID']]


 print (data)
        Shipment ID Order_Date
0  20180504-S-20000   20180504
1  20180514-S-20537   20180514
2  20180514-S-20541   20180514
3  20180514-S-20644   20180514
4  20180514-S-20644   20180514
5  20180516-S-20009   20180516
6  20180516-S-20009   20180516
7  20180516-S-20009   20180516
8  20180516-S-20009   20180516

如果省略 str 代码按位置过滤列,前 N 个值如下:

 print (data['Shipment ID'][:2])
0    20180504-S-20000
1    20180514-S-20537
Name: Shipment ID, dtype: object

原文由 jezrael 发布,翻译遵循 CC BY-SA 4.0 许可协议

您还可以使用 str.extract

前任:

 import pandas as pd

df = pd.DataFrame({'Shipment ID': ['20180504-S-20000', '20180514-S-20537', '20180514-S-20541', '20180514-S-20644', '20180514-S-20644', '20180516-S-20009', '20180516-S-20009', '20180516-S-20009', '20180516-S-20009']})
df["Order_Date"] = df["Shipment ID"].str.extract(r"(\d{8})")
print(df)

输出:

        Shipment ID Order_Date
0  20180504-S-20000   20180504
1  20180514-S-20537   20180514
2  20180514-S-20541   20180514
3  20180514-S-20644   20180514
4  20180514-S-20644   20180514
5  20180516-S-20009   20180516
6  20180516-S-20009   20180516
7  20180516-S-20009   20180516
8  20180516-S-20009   20180516

原文由 Rakesh 发布,翻译遵循 CC BY-SA 4.0 许可协议

撰写回答
你尚未登录,登录后可以
  • 和开发者交流问题的细节
  • 关注并接收问题和回答的更新提醒
  • 参与内容的编辑和改进,让解决方法与时俱进