新手上路，请多包涵

我有一个看起来像这样的熊猫数据框

ID     country   month   revenue  profit   ebit
234    USA       201409   10        5       3
344    USA       201409    9        7       2
532    UK        201410    20       10      5
129    Canada    201411    15       10      5

我想按 ID、国家、月份分组，计算每个月和国家的 ID，并对收入、利润、息税前利润求和。上述数据的输出将是：

  country   month    revenue   profit  ebit   count
   USA     201409     19        12      5      2
   UK      201409     20        10      5      1
   Canada  201411     15        10      5      1

我尝试了 pandas 的 groupby、sum 和 count 函数的不同变体，但我无法弄清楚如何将 groupby sum 和 count 一起应用以给出如图所示的结果。请分享您可能有的任何想法。谢谢！

原文由 N91 发布，翻译遵循 CC BY-SA 4.0 许可协议

python python-3.x python-2.7 pandas pandas-groupby

阅读 766

2 个回答

得票最新

社区维基

发布于
2022-11-15

✓ 已被采纳

可以使用 pivot_table 以这种方式完成：

 >>> df1=pd.pivot_table(df, index=['country','month'],values=['revenue','profit','ebit'],aggfunc=np.sum)
>>> df1
                ebit  profit  revenue
country month
Canada  201411     5      10       15
UK      201410     5      10       20
USA     201409     5      12       19

>>> df2=pd.pivot_table(df, index=['country','month'], values='ID',aggfunc=len).rename('count')
>>> df2

country  month
Canada   201411    1
UK       201410    1
USA      201409    2

>>> pd.concat([df1,df2],axis=1)

                ebit  profit  revenue  count
country month
Canada  201411     5      10       15      1
UK      201410     5      10       20      1
USA     201409     5      12       19      2

更新

它可以在一行中使用 pivot_table 并提供一个函数字典以应用于 aggfunc 参数中的每一列：

 pd.pivot_table(
   df,
   index=['country','month'],
   aggfunc={'revenue': np.sum, 'profit': np.sum, 'ebit': np.sum, 'ID': len}
).rename(columns={'ID': 'count'})

                count  ebit  profit  revenue
country month
Canada  201411      1     5      10       15
UK      201410      1     5      10       20
USA     201409      2     5      12       19

原文由 Mabel Villalba 发布，翻译遵循 CC BY-SA 4.0 许可协议

社区维基

发布于
2022-11-15

您可以进行分组，然后将每个国家/地区的计数映射到一个新列。

 g = df.groupby(['country', 'month'])['revenue', 'profit', 'ebit'].sum().reset_index()
g['count'] = g['country'].map(df['country'].value_counts())
g

Out[3]:

    country  month   revenue  profit  ebit  count
0   Canada   201411  15       10      5     1
1   UK       201410  20       10      5     1
2   USA      201409  19       12      5     2

编辑

要获取每个国家和月份的计数，您可以执行另一个 groupby，然后将两个 DataFrame 连接在一起。

 g = df.groupby(['country', 'month'])['revenue', 'profit', 'ebit'].sum()
j = df.groupby(['country', 'month']).size().to_frame('count')
pd.merge(g, j, left_index=True, right_index=True).reset_index()

Out[6]:

    country  month   revenue  profit  ebit  count
0   Canada   201411  15       10      5     1
1   UK       201410  20       10      5     1
2   UK       201411  10       5       2     1
3   USA      201409  19       12      5     2

我为英国添加了另一条日期不同的记录 - 请注意合并后的 DataFrame 中现在有两个英国条目，并具有适当的计数。

原文由 Ben 发布，翻译遵循 CC BY-SA 3.0 许可协议

撰写回答

你尚未登录，登录后可以

和开发者交流问题的细节
关注并接收问题和回答的更新提醒
参与内容的编辑和改进，让解决方法与时俱进

推荐问题

Groupby 求和并计算 python 中的多列

你尚未登录，登录后可以

字节的 trae AI IDE 不支持类似 vscode 的 ssh remote 远程开发怎么办？

DataCap 中验证码无法显示，后台出现 NullPointerException 错误?

如何使用 python 代码实现迅雷磁力链接资源的下载？

请问，FastAPI如何获取到前端上传的二进制文件并且返回？

如何实现一个深拷贝函数？

浏览器能请求到数据怎么换了api工具或是爬虫都没数据了呢？

Python 成员变量在多个子类实例间共享，如何避免？

Stack Overflow 翻译