新手上路，请多包涵

我正在尝试使用 sklearn 中的 LinearRegression，但出现“无法将字符串转换为浮点数”。数据帧的所有列都是浮点数，输出 y 也是浮点数。我看过其他帖子，建议是转换为浮动，我已经这样做了。

 <class 'pandas.core.frame.DataFrame'>
Int64Index: 789 entries, 158 to 684
Data columns (total 8 columns):
f1     789 non-null float64
f2     789 non-null float64
f3     789 non-null float64
f4     789 non-null float64
f5     789 non-null float64
f6     789 non-null float64
OFF    789 non-null uint8
ON     789 non-null uint8
dtypes: float64(6), uint8(2)
memory usage: 44.7 KB

type(y_train)
pandas.core.series.Series
type(y_train[0])
float

from sklearn.model_selection import train_test_split
X_train,X_test,y_train,y_test=train_test_split(X,Y,random_state=0)
X_train.head()
from sklearn.linear_model import LinearRegression
linreg = LinearRegression().fit(X_train, y_train)

我得到的错误是

ValueError                                Traceback (most recent call last)
<ipython-input-282-c019320f8214> in <module>()
      6 X_train.head()
      7 from sklearn.linear_model import LinearRegression
----> 8 linreg = LinearRegression().fit(X_train, y_train)
510         n_jobs_ = self.n_jobs
    511         X, y = check_X_y(X, y, accept_sparse=['csr', 'csc', 'coo'],
--> 512                          y_numeric=True, multi_output=True)
    513
    514         if sample_weight is not None and np.atleast_1d(sample_weight).ndim > 1:

 527         _assert_all_finite(y)
    528     if y_numeric and y.dtype.kind == 'O':
--> 529         y = y.astype(np.float64)
    530
    531     check_consistent_length(X, y)

ValueError: could not convert string to float: '--'

请帮忙。

原文由 Tinniam V. Ganesh 发布，翻译遵循 CC BY-SA 4.0 许可协议

python pandas scikit-learn valueerror

阅读 642

2 个回答

得票最新

社区维基

发布于
2023-01-10

✓ 已被采纳

一个快速的解决方案是使用 pd.to_numeric 将数据可能包含的任何字符串转换为数值。如果它们与转换不兼容，它们将减少为 NaN s。

 from sklearn.linear_model import LinearRegression

X = X.apply(pd.to_numeric, errors='coerce')
Y = Y.apply(pd.to_numeric, errors='coerce')

此外，您可以选择用一些默认值填充这些值：

 X.fillna(0, inplace=True)
Y.fillna(0, inplace=True)

用与您的问题相关的任何内容替换填充值。我不建议删除这些行，因为您最终可能会删除 X 和 Y 中的不同行，从而导致数据标签不匹配。

最后，拆分并调用您的分类器：

 X_train, X_test, y_train, y_test = train_test_split(X, Y, random_state=0)
clf = LinearRegression().fit(X_train, y_train)

原文由 cs95 发布，翻译遵循 CC BY-SA 3.0 许可协议

查看全部 2 个回答

推荐问题

sklearn-LinearRegression：无法将字符串转换为浮点数：'--'

字节的 trae AI IDE 不支持类似 vscode 的 ssh remote 远程开发怎么办？

DataCap 中验证码无法显示，后台出现 NullPointerException 错误?

发现深拷贝和浅拷贝效果一致：请问一下有什么区别呢？

如何实现一个深拷贝函数？

Python 成员变量在多个子类实例间共享，如何避免？

为什么 Qwen2.5-Omni-7B 官方教程都报错 Cannot import available module of Qwen2_5OmniModel in modelscope ？

Spark-TTS-0.5B 的 requirements.txt 在哪里？

Stack Overflow 翻译