ValueError：找到具有 0 个样本 (s) (shape= (0, 1) 的数组，而 MinMaxScaler 要求最小值为 1

Question

新手上路，请多包涵

我是 ML 的初学者。我正在帮助我主修数学的朋友使用 TensorFlow 基于他提供的 .csv 文件创建一个股票预测器。

我有一些问题。第一个是他的 .csv 文件。该文件只有日期和收盘值，它们没有分开，因此我不得不手动将日期和值分开。我已经设法做到了，现在我在使用 MinMaxScaler() 时遇到了麻烦。有人告诉我，我几乎可以忽略日期，只测试收盘值，将它们归一化，并根据它们做出预测。

我不断收到此错误：

 ValueError: Found array with 0 sample(s) (shape=(0, 1)) while a
minimum of 1 is required by MinMaxScaler()

老实说，我以前从未使用过 SKLearn 或 TensorFlow，这是我第一次从事这样的项目。我在该主题上看到的所有指南都使用了 pandas，但就我而言， .csv 文件一团糟，我不相信我可以使用 pandas。

我正在关注这个 DataCamp 教程：

但不幸的是，由于我缺乏经验，有些事情对我来说并没有真正起作用，我希望能更清楚地了解我应该如何处理我的案件。

下面附上我的（凌乱的）代码：

 import pandas as pd
import numpy as np
import tensorflow as tf
import sklearn
from sklearn.model_selection import KFold
from sklearn.preprocessing import scale
from sklearn.preprocessing import MinMaxScaler
import matplotlib
import matplotlib.pyplot as plt
from dateutil.parser import parse
from datetime import datetime, timedelta
from collections import deque

stock_data = []
stock_date = []
stock_value = []
f = open("s&p500closing.csv","r")
data = f.read()
rows = data.split("\n")
rows_noheader = rows[1:len(rows)]

#Separating values from messy `.csv`, putting each value to it's list and also a combined list of both
for row in rows_noheader:
    [date, value] = row[1:len(row)-1].split('\t')
    stock_date.append(date)
    stock_value.append((value))
    stock_data.append((date, value))

#Numpy array of all closing values converted to floats and normalized against the maximum
stock_value = np.array(stock_value, dtype=np.float32)
normvalue = [i/max(stock_value) for i in stock_value]

#Number of closing values and days. Since there is one closing value for each, they both match and there are 4528 of them (each)
nclose_and_days = 0
for i in range(len(stock_data)):
    nclose_and_days+=1

train_data = stock_value[:2264]
test_data = stock_value[2264:]

scaler = MinMaxScaler()

train_data = train_data.reshape(-1,1)
test_data = test_data.reshape(-1,1)

# Train the Scaler with training data and smooth data
smoothing_window_size = 1100
for di in range(0,4400,smoothing_window_size):
    #error occurs here
    scaler.fit(train_data[di:di+smoothing_window_size,:])
    train_data[di:di+smoothing_window_size,:] = scaler.transform(train_data[di:di+smoothing_window_size,:])

# You normalize the last bit of remaining data
scaler.fit(train_data[di+smoothing_window_size:,:])
train_data[di+smoothing_window_size:,:] = scaler.transform(train_data[di+smoothing_window_size:,:])

# Reshape both train and test data
train_data = train_data.reshape(-1)

# Normalize test data
test_data = scaler.transform(test_data).reshape(-1)

# Now perform exponential moving average smoothing
# So the data will have a smoother curve than the original ragged data
EMA = 0.0
gamma = 0.1
for ti in range(1100):
    EMA = gamma*train_data[ti] + (1-gamma)*EMA
    train_data[ti] = EMA

# Used for visualization and test purposes
all_mid_data = np.concatenate([train_data,test_data],axis=0)

window_size = 100
N = train_data.size
std_avg_predictions = []
std_avg_x = []
mse_errors = []

for pred_idx in range(window_size,N):
    std_avg_predictions.append(np.mean(train_data[pred_idx-window_size:pred_idx]))
    mse_errors.append((std_avg_predictions[-1]-train_data[pred_idx])**2)
    std_avg_x.append(date)

print('MSE error for standard averaging: %.5f'%(0.5*np.mean(mse_errors)))

原文由 Daniel Vaindiner 发布，翻译遵循 CC BY-SA 4.0 许可协议

python tensorflow scikit-learn

阅读 2.5k

1 个回答

得票最新

社区维基

1

发布于
2023-01-04

我知道这篇文章很旧，但正如我在这里偶然发现的那样，其他人会……在遇到同样的问题并谷歌搜索相当多之后我发现了一篇文章 https://github.com/llSourcell/Make_Money_with_Tensorflow_2.0/issues/7

所以看起来如果你下载一个太小的数据集它会抛出这个错误。下载 1962 年的 .csv 文件，它就足够大了 ;)。

现在，我只需要为我的数据集找到正确的参数..因为我正在将其适应另一种类型的预测..希望它有帮助

原文由 Vincenzo 发布，翻译遵循 CC BY-SA 4.0 许可协议

撰写回答

你尚未登录，登录后可以

和开发者交流问题的细节
关注并接收问题和回答的更新提醒
参与内容的编辑和改进，让解决方法与时俱进

推荐问题

Stack Overflow 翻译

子站问答

访问

本篇内容翻译自 Stack Overflow，如果你觉得翻译结果值得改进，欢迎直接编辑修改，感谢你为社区贡献。

ValueError：找到具有 0 个样本 (s) (shape= (0, 1) 的数组，而 MinMaxScaler 要求最小值为 1

你尚未登录，登录后可以

有一种算法存在返回真，不存在返回假的高性能算法，我忘记是什么了?

duckdb 的 python sdk 读取 csv 的时候，如何指定列的字段类型？

为什么 pypi 的页面上的新版本在通过 pip 获取不到？

请问在一个项目中一般是创建多个ioc容器，还是一个ioc容器？

python这句代码是什么意思？

我写的python单例 init会调用多次如何解决?

使用anaconda.navigator的时候，新建一个python的环境：那么会有推荐的python包。这些包我们默认应该安装还是不必管呢？

Stack Overflow 翻译

ValueError：找到具有 0 个样本 (s) (shape= (0, 1) 的数组，而 MinMaxScaler 要求最小值为 1

你尚未登录，登录后可以

有一种算法 存在返回真，不存在返回假的高性能算法，我忘记是什么了?

duckdb 的 python sdk 读取 csv 的时候，如何指定列的字段类型？

为什么 pypi 的页面上的新版本在通过 pip 获取不到？

请问在一个项目中一般是创建多个ioc容器，还是一个ioc容器？

python这句代码是什么意思？

我写的python单例 init会调用多次 如何解决?

使用anaconda.navigator的时候，新建一个python的环境： 那么会有推荐的python包。这些包我们默认应该安装还是不必管呢？

Stack Overflow 翻译

有一种算法存在返回真，不存在返回假的高性能算法，我忘记是什么了?

我写的python单例 init会调用多次如何解决?

使用anaconda.navigator的时候，新建一个python的环境：那么会有推荐的python包。这些包我们默认应该安装还是不必管呢？