将 Pandas GroupBy 输出从 Series 转换为 DataFrame

新手上路,请多包涵

我从这样的输入数据开始

df1 = pandas.DataFrame( {
    "Name" : ["Alice", "Bob", "Mallory", "Mallory", "Bob" , "Mallory"] ,
    "City" : ["Seattle", "Seattle", "Portland", "Seattle", "Seattle", "Portland"] } )

打印时显示如下:

    City     Name
0   Seattle    Alice
1   Seattle      Bob
2  Portland  Mallory
3   Seattle  Mallory
4   Seattle      Bob
5  Portland  Mallory

分组很简单:

 g1 = df1.groupby( [ "Name", "City"] ).count()

打印产生一个 GroupBy 对象:

                   City  Name
Name    City
Alice   Seattle      1     1
Bob     Seattle      2     2
Mallory Portland     2     2
        Seattle      1     1

但我最终想要的是另一个包含 GroupBy 对象中所有行的 DataFrame 对象。换句话说,我想得到以下结果:

                   City  Name
Name    City
Alice   Seattle      1     1
Bob     Seattle      2     2
Mallory Portland     2     2
Mallory Seattle      1     1

我不太明白如何在 pandas 文档中完成此操作。欢迎任何提示。

原文由 saveenr 发布,翻译遵循 CC BY-SA 4.0 许可协议

阅读 710
2 个回答

g1 这里 一个DataFrame。不过,它有一个分层索引:

 In [19]: type(g1)
Out[19]: pandas.core.frame.DataFrame

In [20]: g1.index
Out[20]:
MultiIndex([('Alice', 'Seattle'), ('Bob', 'Seattle'), ('Mallory', 'Portland'),
       ('Mallory', 'Seattle')], dtype=object)

也许你想要这样的东西?

 In [21]: g1.add_suffix('_Count').reset_index()
Out[21]:
      Name      City  City_Count  Name_Count
0    Alice   Seattle           1           1
1      Bob   Seattle           2           2
2  Mallory  Portland           2           2
3  Mallory   Seattle           1           1

或类似的东西:

 In [36]: DataFrame({'count' : df1.groupby( [ "Name", "City"] ).size()}).reset_index()
Out[36]:
      Name      City  count
0    Alice   Seattle      1
1      Bob   Seattle      2
2  Mallory  Portland      2
3  Mallory   Seattle      1

原文由 Wes McKinney 发布,翻译遵循 CC BY-SA 3.0 许可协议

我想稍微改变 Wes 给出的答案,因为版本 0.16.2 需要 as_index=False 。如果你不设置它,你会得到一个空的数据框。

资料来源

如果聚合函数被命名为列,聚合函数将不会返回您聚合的组,当 as_index=True 时,默认值。分组的列将是返回对象的索引。

传递 as_index=False 将返回您聚合的组,如果它们被命名为列。

Aggregating functions are ones that reduce the dimension of the returned objects, for example: mean , sum , size , count , std , var , sem , describe , first , last , nth , min , max 。这就是当您执行例如 DataFrame.sum() 并返回 Series 时发生的情况。

nth 可以充当减速器或过滤器,请参见 此处

 import pandas as pd

df1 = pd.DataFrame({"Name":["Alice", "Bob", "Mallory", "Mallory", "Bob" , "Mallory"],
                    "City":["Seattle","Seattle","Portland","Seattle","Seattle","Portland"]})
print df1
#
#       City     Name
#0   Seattle    Alice
#1   Seattle      Bob
#2  Portland  Mallory
#3   Seattle  Mallory
#4   Seattle      Bob
#5  Portland  Mallory
#
g1 = df1.groupby(["Name", "City"], as_index=False).count()
print g1
#
#                  City  Name
#Name    City
#Alice   Seattle      1     1
#Bob     Seattle      2     2
#Mallory Portland     2     2
#        Seattle      1     1
#

编辑:

In version 0.17.1 and later you can use subset in count and reset_index with parameter name in size

 print df1.groupby(["Name", "City"], as_index=False ).count()
#IndexError: list index out of range

print df1.groupby(["Name", "City"]).count()
#Empty DataFrame
#Columns: []
#Index: [(Alice, Seattle), (Bob, Seattle), (Mallory, Portland), (Mallory, Seattle)]

print df1.groupby(["Name", "City"])[['Name','City']].count()
#                  Name  City
#Name    City
#Alice   Seattle      1     1
#Bob     Seattle      2     2
#Mallory Portland     2     2
#        Seattle      1     1

print df1.groupby(["Name", "City"]).size().reset_index(name='count')
#      Name      City  count
#0    Alice   Seattle      1
#1      Bob   Seattle      2
#2  Mallory  Portland      2
#3  Mallory   Seattle      1

The difference between count and size is that size counts NaN values while count does not.

原文由 jezrael 发布,翻译遵循 CC BY-SA 4.0 许可协议

撰写回答
你尚未登录,登录后可以
  • 和开发者交流问题的细节
  • 关注并接收问题和回答的更新提醒
  • 参与内容的编辑和改进,让解决方法与时俱进
推荐问题