我有一个包含多列的 python pandas 数据框，其中一列有 0 值。我想用本列的 median 或 mean 替换 0 值。 data 是我的数据框 artist_hotness 是列 mean_artist_hotness = data['artist_hotness'].dropna().mean() if len(data.artist_hotness[ data.artist_hotness.isnull() ]) > 0: data.artist_hotness.loc[ (data.artist_hotness.isnull()), 'artist_hotness'] = mean_artist_hotness 我试过了，但没有用。原文由 jeangelj 发布，翻译遵循 CC BY-SA 4.0 许可协议

Python/Pandas Dataframe 将 0 替换为中值

2 个回答

发布于
2023-01-04

✓ 已被采纳

我想你可以使用 mask 并添加参数 skipna=True 到 mean 代替 dropna Also need change condition to data.artist_hotness == 0 if need replace 0 values or data.artist_hotness.isnull() if need replace NaN values:

 import pandas as pd
import numpy as np

data = pd.DataFrame({'artist_hotness': [0,1,5,np.nan]})
print (data)
   artist_hotness
0             0.0
1             1.0
2             5.0
3             NaN

mean_artist_hotness = data['artist_hotness'].mean(skipna=True)
print (mean_artist_hotness)
2.0

data['artist_hotness']=data.artist_hotness.mask(data.artist_hotness == 0,mean_artist_hotness)
print (data)
   artist_hotness
0             2.0
1             1.0
2             5.0
3             NaN

或者使用 loc ，但省略列名：

 data.loc[data.artist_hotness == 0, 'artist_hotness'] = mean_artist_hotness
print (data)
   artist_hotness
0             2.0
1             1.0
2             5.0
3             NaN

data.artist_hotness.loc[data.artist_hotness == 0, 'artist_hotness'] = mean_artist_hotness
print (data)

IndexingError: (0 True 1 False 2 False 3 False 名称：artist_hotness, dtype: bool, ‘artist_hotness’)

另一个解决方案是 DataFrame.replace 指定列：

 data=data.replace({'artist_hotness': {0: mean_artist_hotness}})
print (data)
    aa  artist_hotness
0  0.0             2.0
1  1.0             1.0
2  5.0             5.0
3  NaN             NaN

或者，如果需要替换所有列中的所有 0 值：

 import pandas as pd
import numpy as np

data = pd.DataFrame({'artist_hotness': [0,1,5,np.nan], 'aa': [0,1,5,np.nan]})
print (data)
    aa  artist_hotness
0  0.0             0.0
1  1.0             1.0
2  5.0             5.0
3  NaN             NaN

mean_artist_hotness = data['artist_hotness'].mean(skipna=True)
print (mean_artist_hotness)
2.0

data=data.replace(0,mean_artist_hotness)
print (data)
    aa  artist_hotness
0  2.0             2.0
1  1.0             1.0
2  5.0             5.0
3  NaN             NaN

如果需要替换 NaN 在所有列中使用 DataFrame.fillna ：

 data=data.fillna(mean_artist_hotness)
print (data)
    aa  artist_hotness
0  0.0             0.0
1  1.0             1.0
2  5.0             5.0
3  2.0             2.0

但如果仅在某些列中使用 Series.fillna ：

 data['artist_hotness'] = data.artist_hotness.fillna(mean_artist_hotness)
print (data)
    aa  artist_hotness
0  0.0             0.0
1  1.0             1.0
2  5.0             5.0
3  NaN             2.0

原文由 jezrael 发布，翻译遵循 CC BY-SA 3.0 许可协议

社区维基

1

发布于
2023-01-04

使用 pandas replace 方法：

 df = pd.DataFrame({'a': [1,2,3,4,0,0,0,0], 'b': [2,3,4,6,0,5,3,8]})

df
   a  b
0  1  2
1  2  3
2  3  4
3  4  6
4  0  0
5  0  5
6  0  3
7  0  8

df['a']=df['a'].replace(0,df['a'].mean())

df
   a  b
0  1  2
1  2  3
2  3  4
3  4  6
4  1  0
5  1  5
6  1  3
7  1  8

原文由 shivsn 发布，翻译遵循 CC BY-SA 3.0 许可协议

撰写回答

你尚未登录，登录后可以

和开发者交流问题的细节
关注并接收问题和回答的更新提醒
参与内容的编辑和改进，让解决方法与时俱进

推荐问题

Python/Pandas Dataframe 将 0 替换为中值

你尚未登录，登录后可以

字节的 trae AI IDE 不支持类似 vscode 的 ssh remote 远程开发怎么办？

DataCap 中验证码无法显示，后台出现 NullPointerException 错误?

发现深拷贝和浅拷贝效果一致：请问一下有什么区别呢？

如何实现一个深拷贝函数？

Python 成员变量在多个子类实例间共享，如何避免？

为什么 Qwen2.5-Omni-7B 官方教程都报错 Cannot import available module of Qwen2_5OmniModel in modelscope ？

Spark-TTS-0.5B 的 requirements.txt 在哪里？

Stack Overflow 翻译