ValueError:找到具有 0 个样本 (s) (shape= (0, 1) 的数组,而 MinMaxScaler 要求最小值为 1

新手上路,请多包涵

我是 ML 的初学者。我正在帮助我主修数学的朋友使用 TensorFlow 基于他提供的 .csv 文件创建一个股票预测器。

我有一些问题。第一个是他的 .csv 文件。该文件只有日期和收盘值,它们没有分开,因此我不得不手动将日期和值分开。我已经设法做到了,现在我在使用 MinMaxScaler() 时遇到了麻烦。有人告诉我,我几乎可以忽略日期,只测试收盘值,将它们归一化,并根据它们做出预测。

我不断收到此错误:

 ValueError: Found array with 0 sample(s) (shape=(0, 1)) while a
minimum of 1 is required by MinMaxScaler()

老实说,我以前从未使用过 SKLearn 或 TensorFlow,这是我第一次从事这样的项目。我在该主题上看到的所有指南都使用了 pandas,但就我而言, .csv 文件一团糟,我不相信我可以使用 pandas。

我正在关注 这个 DataCamp 教程

但不幸的是,由于我缺乏经验,有些事情对我来说并没有真正起作用,我希望能更清楚地了解我应该如何处理我的案件。

下面附上我的(凌乱的)代码:

 import pandas as pd
import numpy as np
import tensorflow as tf
import sklearn
from sklearn.model_selection import KFold
from sklearn.preprocessing import scale
from sklearn.preprocessing import MinMaxScaler
import matplotlib
import matplotlib.pyplot as plt
from dateutil.parser import parse
from datetime import datetime, timedelta
from collections import deque

stock_data = []
stock_date = []
stock_value = []
f = open("s&p500closing.csv","r")
data = f.read()
rows = data.split("\n")
rows_noheader = rows[1:len(rows)]

#Separating values from messy `.csv`, putting each value to it's list and also a combined list of both
for row in rows_noheader:
    [date, value] = row[1:len(row)-1].split('\t')
    stock_date.append(date)
    stock_value.append((value))
    stock_data.append((date, value))

#Numpy array of all closing values converted to floats and normalized against the maximum
stock_value = np.array(stock_value, dtype=np.float32)
normvalue = [i/max(stock_value) for i in stock_value]

#Number of closing values and days. Since there is one closing value for each, they both match and there are 4528 of them (each)
nclose_and_days = 0
for i in range(len(stock_data)):
    nclose_and_days+=1

train_data = stock_value[:2264]
test_data = stock_value[2264:]

scaler = MinMaxScaler()

train_data = train_data.reshape(-1,1)
test_data = test_data.reshape(-1,1)

# Train the Scaler with training data and smooth data
smoothing_window_size = 1100
for di in range(0,4400,smoothing_window_size):
    #error occurs here
    scaler.fit(train_data[di:di+smoothing_window_size,:])
    train_data[di:di+smoothing_window_size,:] = scaler.transform(train_data[di:di+smoothing_window_size,:])

# You normalize the last bit of remaining data
scaler.fit(train_data[di+smoothing_window_size:,:])
train_data[di+smoothing_window_size:,:] = scaler.transform(train_data[di+smoothing_window_size:,:])

# Reshape both train and test data
train_data = train_data.reshape(-1)

# Normalize test data
test_data = scaler.transform(test_data).reshape(-1)

# Now perform exponential moving average smoothing
# So the data will have a smoother curve than the original ragged data
EMA = 0.0
gamma = 0.1
for ti in range(1100):
    EMA = gamma*train_data[ti] + (1-gamma)*EMA
    train_data[ti] = EMA

# Used for visualization and test purposes
all_mid_data = np.concatenate([train_data,test_data],axis=0)

window_size = 100
N = train_data.size
std_avg_predictions = []
std_avg_x = []
mse_errors = []

for pred_idx in range(window_size,N):
    std_avg_predictions.append(np.mean(train_data[pred_idx-window_size:pred_idx]))
    mse_errors.append((std_avg_predictions[-1]-train_data[pred_idx])**2)
    std_avg_x.append(date)

print('MSE error for standard averaging: %.5f'%(0.5*np.mean(mse_errors)))

原文由 Daniel Vaindiner 发布,翻译遵循 CC BY-SA 4.0 许可协议

阅读 2.5k
1 个回答

我知道这篇文章很旧,但正如我在这里偶然发现的那样,其他人会……在遇到同样的问题并谷歌搜索相当多之后我发现了一篇文章 https://github.com/llSourcell/Make_Money_with_Tensorflow_2.0/issues/7

所以看起来如果你下载一个太小的数据集它会抛出这个错误。下载 1962 年的 .csv 文件,它就足够大了 ;)。

现在,我只需要为我的数据集找到正确的参数..因为我正在将其适应另一种类型的预测..希望它有帮助

原文由 Vincenzo 发布,翻译遵循 CC BY-SA 4.0 许可协议

撰写回答
你尚未登录,登录后可以
  • 和开发者交流问题的细节
  • 关注并接收问题和回答的更新提醒
  • 参与内容的编辑和改进,让解决方法与时俱进
推荐问题
logo
Stack Overflow 翻译
子站问答
访问
宣传栏