我从名为 Flights.py 的程序中获得了以下代码片段
...
#Load the Dataset
df = dataset
df.isnull().any()
df = df.fillna(lambda x: x.median())
# Define X and Y
X = df.iloc[:, 2:124].values
y = df.iloc[:, 136].values
X_tolist = X.tolist()
# Splitting the dataset into the Training set and Test set
from sklearn.cross_validation import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0)
# Feature Scaling
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)
倒数第二行抛出以下错误:
Traceback (most recent call last):
File "<ipython-input-14-d4add2ccf5ab>", line 3, in <module>
X_train = sc.fit_transform(X_train)
File "/Users/<username>/anaconda/lib/python3.6/site-packages/sklearn/base.py", line 494, in fit_transform
return self.fit(X, **fit_params).transform(X)
File "/Users/<username>/anaconda/lib/python3.6/site-packages/sklearn/preprocessing/data.py", line 560, in fit
return self.partial_fit(X, y)
File "/Users/<username>/anaconda/lib/python3.6/site-packages/sklearn/preprocessing/data.py", line 583, in partial_fit
estimator=self, dtype=FLOAT_DTYPES)
File "/Users/<username>/anaconda/lib/python3.6/site-packages/sklearn/utils/validation.py", line 382, in check_array
array = np.array(array, dtype=dtype, order=order, copy=copy)
TypeError: float() argument must be a string or a number, not 'function'
我的数据 df
大小为 (22587, 138)
我正在看以下问题以寻求灵感:
TypeError: float() 参数必须是字符串或数字,而不是 Geocoder 中的“方法”
我尝试了以下调整:
# Feature Scaling
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train.as_matrix)
X_test = sc.transform(X_test.as_matrix)
这导致了以下错误:
AttributeError: 'numpy.ndarray' object has no attribute 'as_matrix'
我目前不知道如何扫描数据框并查找/转换有问题的条目。
原文由 HMLDude 发布,翻译遵循 CC BY-SA 4.0 许可协议
正如 这个答案 所解释的那样,
fillna
不适用于回调。如果您传递一个,它将被视为文字填充值,这意味着您的NaN
s 将被替换为 lambda:如果您尝试按中位数填充,解决方案是基于该列创建一个中位数数据框,并将其传递给
fillna
。