新手上路，请多包涵

我从这样的输入数据开始

df1 = pandas.DataFrame( {
    "Name" : ["Alice", "Bob", "Mallory", "Mallory", "Bob" , "Mallory"] ,
    "City" : ["Seattle", "Seattle", "Portland", "Seattle", "Seattle", "Portland"] } )

打印时显示如下：

    City     Name
0   Seattle    Alice
1   Seattle      Bob
2  Portland  Mallory
3   Seattle  Mallory
4   Seattle      Bob
5  Portland  Mallory

分组很简单：

 g1 = df1.groupby( [ "Name", "City"] ).count()

打印产生一个 GroupBy 对象：

                   City  Name
Name    City
Alice   Seattle      1     1
Bob     Seattle      2     2
Mallory Portland     2     2
        Seattle      1     1

但我最终想要的是另一个包含 GroupBy 对象中所有行的 DataFrame 对象。换句话说，我想得到以下结果：

                   City  Name
Name    City
Alice   Seattle      1     1
Bob     Seattle      2     2
Mallory Portland     2     2
Mallory Seattle      1     1

我不太明白如何在 pandas 文档中完成此操作。欢迎任何提示。

原文由 saveenr 发布，翻译遵循 CC BY-SA 4.0 许可协议

python pandas dataframe pandas-groupby multi-index

阅读 839

2 个回答

得票最新

社区维基

发布于
2022-09-21

✓ 已被采纳

g1 这里是一个DataFrame。不过，它有一个分层索引：

 In [19]: type(g1)
Out[19]: pandas.core.frame.DataFrame

In [20]: g1.index
Out[20]:
MultiIndex([('Alice', 'Seattle'), ('Bob', 'Seattle'), ('Mallory', 'Portland'),
       ('Mallory', 'Seattle')], dtype=object)

也许你想要这样的东西？

 In [21]: g1.add_suffix('_Count').reset_index()
Out[21]:
      Name      City  City_Count  Name_Count
0    Alice   Seattle           1           1
1      Bob   Seattle           2           2
2  Mallory  Portland           2           2
3  Mallory   Seattle           1           1

或类似的东西：

 In [36]: DataFrame({'count' : df1.groupby( [ "Name", "City"] ).size()}).reset_index()
Out[36]:
      Name      City  count
0    Alice   Seattle      1
1      Bob   Seattle      2
2  Mallory  Portland      2
3  Mallory   Seattle      1

原文由 Wes McKinney 发布，翻译遵循 CC BY-SA 3.0 许可协议

社区维基

发布于
2022-09-21

我想稍微改变 Wes 给出的答案，因为版本 0.16.2 需要 as_index=False 。如果你不设置它，你会得到一个空的数据框。

资料来源：

如果聚合函数被命名为列，聚合函数将不会返回您聚合的组，当 as_index=True 时，默认值。分组的列将是返回对象的索引。

传递 as_index=False 将返回您聚合的组，如果它们被命名为列。

Aggregating functions are ones that reduce the dimension of the returned objects, for example: mean , sum , size , count , std , var , sem , describe , first , last , nth , min , max 。这就是当您执行例如 DataFrame.sum() 并返回 Series 时发生的情况。

nth 可以充当减速器或过滤器，请参见此处。

 import pandas as pd

df1 = pd.DataFrame({"Name":["Alice", "Bob", "Mallory", "Mallory", "Bob" , "Mallory"],
                    "City":["Seattle","Seattle","Portland","Seattle","Seattle","Portland"]})
print df1
#
#       City     Name
#0   Seattle    Alice
#1   Seattle      Bob
#2  Portland  Mallory
#3   Seattle  Mallory
#4   Seattle      Bob
#5  Portland  Mallory
#
g1 = df1.groupby(["Name", "City"], as_index=False).count()
print g1
#
#                  City  Name
#Name    City
#Alice   Seattle      1     1
#Bob     Seattle      2     2
#Mallory Portland     2     2
#        Seattle      1     1
#

编辑：

In version 0.17.1 and later you can use subset in count and reset_index with parameter name in size ：

 print df1.groupby(["Name", "City"], as_index=False ).count()
#IndexError: list index out of range

print df1.groupby(["Name", "City"]).count()
#Empty DataFrame
#Columns: []
#Index: [(Alice, Seattle), (Bob, Seattle), (Mallory, Portland), (Mallory, Seattle)]

print df1.groupby(["Name", "City"])[['Name','City']].count()
#                  Name  City
#Name    City
#Alice   Seattle      1     1
#Bob     Seattle      2     2
#Mallory Portland     2     2
#        Seattle      1     1

print df1.groupby(["Name", "City"]).size().reset_index(name='count')
#      Name      City  count
#0    Alice   Seattle      1
#1      Bob   Seattle      2
#2  Mallory  Portland      2
#3  Mallory   Seattle      1

The difference between count and size is that size counts NaN values while count does not.

原文由 jezrael 发布，翻译遵循 CC BY-SA 4.0 许可协议

撰写回答

你尚未登录，登录后可以

和开发者交流问题的细节
关注并接收问题和回答的更新提醒
参与内容的编辑和改进，让解决方法与时俱进

推荐问题

将 Pandas GroupBy 输出从 Series 转换为 DataFrame

你尚未登录，登录后可以

字节的 trae AI IDE 不支持类似 vscode 的 ssh remote 远程开发怎么办？

DataCap 中验证码无法显示，后台出现 NullPointerException 错误?

发现深拷贝和浅拷贝效果一致：请问一下有什么区别呢？

如何实现一个深拷贝函数？

Python 成员变量在多个子类实例间共享，如何避免？

为什么 Qwen2.5-Omni-7B 官方教程都报错 Cannot import available module of Qwen2_5OmniModel in modelscope ？

Spark-TTS-0.5B 的 requirements.txt 在哪里？

Stack Overflow 翻译