新手上路，请多包涵

我是熊猫的新手，现在我不知道如何安排我的时间序列，看看它：

 date & time of connection
19/06/2017 12:39
19/06/2017 12:40
19/06/2017 13:11
20/06/2017 12:02
20/06/2017 12:04
21/06/2017 09:32
21/06/2017 18:23
21/06/2017 18:51
21/06/2017 19:08
21/06/2017 19:50
22/06/2017 13:22
22/06/2017 13:41
22/06/2017 18:01
23/06/2017 16:18
23/06/2017 17:00
23/06/2017 19:25
23/06/2017 20:58
23/06/2017 21:03
23/06/2017 21:05

这是 130 k 原始数据集的样本，我试过： df.groupby('date & time of connection')['date & time of connection'].apply(list)

我想还不够

我想我应该：

创建索引从 dd/mm/yyyy 到 dd/mm/yyyy 的字典
将“连接日期和时间”类型的 dateTime 转换为 Date
分组和计数“连接日期和时间”的日期
把我算的数字放在字典里？

你觉得我的逻辑怎么样？你知道一些短裙吗？非常感谢

原文由 Erwan Pesle 发布，翻译遵循 CC BY-SA 4.0 许可协议

python python-3.x pandas time-series

阅读 707

2 个回答

得票最新

社区维基

发布于
2022-11-15

✓ 已被采纳

You can use dt.floor for convert to date s and then value_counts or groupby with size :

 df = (pd.to_datetime(df['date & time of connection'])
       .dt.floor('d')
       .value_counts()
       .rename_axis('date')
       .reset_index(name='count'))
print (df)
        date  count
0 2017-06-23      6
1 2017-06-21      5
2 2017-06-19      3
3 2017-06-22      3
4 2017-06-20      2

或者：

 s = pd.to_datetime(df['date & time of connection'])
df = s.groupby(s.dt.floor('d')).size().reset_index(name='count')
print (df)
  date & time of connection  count
0                2017-06-19      3
1                2017-06-20      2
2                2017-06-21      5
3                2017-06-22      3
4                2017-06-23      6

时间：

 np.random.seed(1542)

N = 220000
a = np.unique(np.random.randint(N, size=int(N/2)))
df = pd.DataFrame(pd.date_range('2000-01-01', freq='37T', periods=N)).drop(a)
df.columns = ['date & time of connection']
df['date & time of connection'] = df['date & time of connection'].dt.strftime('%d/%m/%Y %H:%M:%S')
print (df.head())

In [193]: %%timeit
     ...: df['date & time of connection']=pd.to_datetime(df['date & time of connection'])
     ...: df1 = df.groupby(by=df['date & time of connection'].dt.date).count()
     ...:
539 ms ± 45.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [194]: %%timeit
     ...: df1 = (pd.to_datetime(df['date & time of connection'])
     ...:        .dt.floor('d')
     ...:        .value_counts()
     ...:        .rename_axis('date')
     ...:        .reset_index(name='count'))
     ...:
12.4 ms ± 350 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [195]: %%timeit
     ...: s = pd.to_datetime(df['date & time of connection'])
     ...: df2 = s.groupby(s.dt.floor('d')).size().reset_index(name='count')
     ...:
17.7 ms ± 140 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

原文由 jezrael 发布，翻译遵循 CC BY-SA 3.0 许可协议

社区维基

发布于
2022-11-15

确保您的列采用日期格式。

 df['date & time of connection']=pd.to_datetime(df['date & time of connection'])

然后您可以按日期对数据进行分组并进行计数：

 df.groupby(by=df['date & time of connection'].dt.date).count()
Out[10]:
                           date & time of connection
date & time of connection
2017-06-19                                         3
2017-06-20                                         2
2017-06-21                                         5
2017-06-22                                         3
2017-06-23                                         6

原文由 Allen Qin 发布，翻译遵循 CC BY-SA 3.0 许可协议

撰写回答

你尚未登录，登录后可以

和开发者交流问题的细节
关注并接收问题和回答的更新提醒
参与内容的编辑和改进，让解决方法与时俱进

推荐问题

Python & Pandas - 按天分组并计算每一天

你尚未登录，登录后可以

字节的 trae AI IDE 不支持类似 vscode 的 ssh remote 远程开发怎么办？

DataCap 中验证码无法显示，后台出现 NullPointerException 错误?

如何使用 python 代码实现迅雷磁力链接资源的下载？

如何实现一个深拷贝函数？

请问，FastAPI如何获取到前端上传的二进制文件并且返回？

浏览器能请求到数据怎么换了api工具或是爬虫都没数据了呢？

Python 成员变量在多个子类实例间共享，如何避免？

Stack Overflow 翻译