Pandas - 删除列

新手上路,请多包涵

我知道删除数据框的列应该很简单:

df.drop(df.columns[1], axis=1) 按指数下降

dr.dropna(axis=1, how='any') 根据是否包含 NaN

但是这些都不适用于我的数据框,我不确定这是因为格式问题或数据类型问题还是对这些命令的误用或误解。

这是我的数据框:

 fish_frame after append new_column:                         0       1       2      3                          4  \
2                 GBE COD     NaN     NaN    600                        NaN
3                 GBW COD     NaN  11,189    NaN                        NaN
4                 GOM COD     NaN       0    NaN  Package Deal - $40,753.69
5                 POLLOCK     NaN     NaN  1,103                        NaN
6                   WHAKE     NaN     NaN     12                        NaN
7             GBE HADDOCK     NaN  10,730    NaN                        NaN
8             GBW HADDOCK     NaN  64,147    NaN                        NaN
9             GOM HADDOCK     NaN       0    NaN                        NaN
10                REDFISH     NaN     NaN      0                        NaN
11         WITCH FLOUNDER     NaN     370    NaN                        NaN
12                 PLAICE     NaN     NaN    622                        NaN
13     GB WINTER FLOUNDER  54,315     NaN    NaN                        NaN
14    GOM WINTER FLOUNDER     653     NaN    NaN                        NaN
15  SNEMA WINTER FLOUNDER  14,601     NaN    NaN                        NaN
16          GB YELLOWTAIL     NaN   1,663    NaN                        NaN
17       SNEMA YELLOWTAIL     NaN   1,370    NaN                        NaN
18       CCGOM YELLOWTAIL   1,812     NaN    NaN                        NaN

       6        package_deal_column Package_Price new_column
2    NaN  Package Deal - $40,753.69          None        600
3    NaN  Package Deal - $40,753.69          None    11,1890
4   None  Package Deal - $40,753.69          None          0
5    NaN  Package Deal - $40,753.69          None      1,103
6    NaN  Package Deal - $40,753.69          None         12
7    NaN  Package Deal - $40,753.69          None    10,7300
8    NaN  Package Deal - $40,753.69          None    64,1470
9    NaN  Package Deal - $40,753.69          None          0
10   NaN  Package Deal - $40,753.69          None          0
11   NaN  Package Deal - $40,753.69          None       3700
12   NaN  Package Deal - $40,753.69          None        622
13  None  Package Deal - $40,753.69          None   54,31500
14  None  Package Deal - $40,753.69          None      65300
15  None  Package Deal - $40,753.69          None   14,60100
16   NaN  Package Deal - $40,753.69          None     1,6630
17   NaN  Package Deal - $40,753.69          None     1,3700
18  None  Package Deal - $40,753.69          None    1,81200

然后我有以下代码行:

 fish_frame.drop(fish_frame.columns[1], axis=1)
fish_frame.drop(fish_frame.columns[2], axis=1)
fish_frame.drop(fish_frame.columns[3], axis=1)
fish_frame.drop(fish_frame.columns[4:5], axis=1)
#del fish_frame[4:5]    #doesn't work, "TypeError: slice(4, 5, None) is an invalid key"
del fish_frame['Package_Price']
fish_frame.dropna(axis=1, how='any')

然后我再次打印出数据框,结果如下:

 NEW fish_frame:                         0       1       2      3                          4  \
2                 GBE COD     NaN     NaN    600                        NaN
3                 GBW COD     NaN  11,189    NaN                        NaN
4                 GOM COD     NaN       0    NaN  Package Deal - $40,753.69
5                 POLLOCK     NaN     NaN  1,103                        NaN
6                   WHAKE     NaN     NaN     12                        NaN
7             GBE HADDOCK     NaN  10,730    NaN                        NaN
8             GBW HADDOCK     NaN  64,147    NaN                        NaN
9             GOM HADDOCK     NaN       0    NaN                        NaN
10                REDFISH     NaN     NaN      0                        NaN
11         WITCH FLOUNDER     NaN     370    NaN                        NaN
12                 PLAICE     NaN     NaN    622                        NaN
13     GB WINTER FLOUNDER  54,315     NaN    NaN                        NaN
14    GOM WINTER FLOUNDER     653     NaN    NaN                        NaN
15  SNEMA WINTER FLOUNDER  14,601     NaN    NaN                        NaN
16          GB YELLOWTAIL     NaN   1,663    NaN                        NaN
17       SNEMA YELLOWTAIL     NaN   1,370    NaN                        NaN
18       CCGOM YELLOWTAIL   1,812     NaN    NaN                        NaN

       6        package_deal_column new_column
2    NaN  Package Deal - $40,753.69        600
3    NaN  Package Deal - $40,753.69    11,1890
4   None  Package Deal - $40,753.69          0
5    NaN  Package Deal - $40,753.69      1,103
6    NaN  Package Deal - $40,753.69         12
7    NaN  Package Deal - $40,753.69    10,7300
8    NaN  Package Deal - $40,753.69    64,1470
9    NaN  Package Deal - $40,753.69          0
10   NaN  Package Deal - $40,753.69          0
11   NaN  Package Deal - $40,753.69       3700
12   NaN  Package Deal - $40,753.69        622
13  None  Package Deal - $40,753.69   54,31500
14  None  Package Deal - $40,753.69      65300
15  None  Package Deal - $40,753.69   14,60100
16   NaN  Package Deal - $40,753.69     1,6630
17   NaN  Package Deal - $40,753.69     1,3700
18  None  Package Deal - $40,753.69    1,81200

既没有 NaN 下降工作,也没有索引下降工作。只有特定的 drop[column name] 命令有效,但我不能对这个脚本的每次迭代都这样做。

我很困惑,我希望这不是我犯的一个非常愚蠢的错误。

另外,我自己并不完全理解这些信息,但打印 fish_frame.info() 会产生:

 <class 'pandas.core.frame.DataFrame'>
RangeIndex: 17 entries, 2 to 18
Data columns (total 8 columns):
0                      17 non-null object
1                      4 non-null object
2                      8 non-null object
3                      5 non-null object
4                      1 non-null object
6                      0 non-null object
package_deal_column    17 non-null object
new_column             17 non-null object
dtypes: object(8)
memory usage: 586.0+ bytes

解决此问题的任何帮助将不胜感激。

原文由 theprowler 发布,翻译遵循 CC BY-SA 4.0 许可协议

阅读 455
2 个回答

如果没有错误,我没有从您的输出中看到错误,您只是忘记使用 inplace 参数:

 df.drop(df.columns[1], axis=1, inplace=True)

原文由 A.Kot 发布,翻译遵循 CC BY-SA 3.0 许可协议

这里有一些选择:

设置:

 df = pd.DataFrame(np.random.rand(3,5), columns=list('abcde'))

In [57]: cols_to_drop = ['b', 'd']

In [63]: df
Out[63]:
          a         b         c         d         e
0  0.758670  0.734007  0.027711  0.614674  0.955711
1  0.833110  0.242010  0.922831  0.165401  0.546079
2  0.414916  0.949050  0.608527  0.018036  0.230343

选项1:

 df = df[df.columns.drop(col_to_drop)]

选项 2:

 df = df[df.columns.difference(cols_to_drop)]

选项 3:

 df = df.loc[:, ~df.columns.isin(cols_to_drop)]

全部返回:

           a         c         e
0  0.758670  0.027711  0.955711
1  0.833110  0.922831  0.546079
2  0.414916  0.608527  0.230343

原文由 MaxU - stop russian terror 发布,翻译遵循 CC BY-SA 3.0 许可协议

撰写回答
你尚未登录,登录后可以
  • 和开发者交流问题的细节
  • 关注并接收问题和回答的更新提醒
  • 参与内容的编辑和改进,让解决方法与时俱进
推荐问题