2

I am constructing an LSTM predictor with Keras. My input array is historical price data. I segment the data into window_size blocks, in order to predict prediction length blocks ahead. My data is a list of 4246 floating point numbers. I seperate my data into 4055 arrays each of length 168 in order to predict 24 units ahead.

This gives me an x_train set with dimension (4055,168). I then scale my data and try to fit the data but run into a dimension error.

df = pd.DataFrame(data)
print(f"Len of df: {len(df)}")
min_max_scaler = MinMaxScaler()
H = 24

window_size = 7*H
num_pred_blocks = len(df)-window_size-H+1

x_train = []
y_train = []
for i in range(num_pred_blocks):
    x_train_block = df['C'][i:(i + window_size)]
    x_train.append(x_train_block)
    y_train_block = df['C'][(i + window_size):(i + window_size + H)]
    y_train.append(y_train_block)

LEN = int(len(x_train)*window_size)
x_train = min_max_scaler.fit_transform(x_train)
batch_size = 1
    
def build_model():
    model = Sequential()
    model.add(LSTM(input_shape=(window_size,batch_size),
                   return_sequences=True,
                   units=num_pred_blocks))
    model.add(TimeDistributed(Dense(H)))
    model.add(Activation("linear"))
    model.compile(loss="mse", optimizer="rmsprop")
    return model
    
num_epochs = epochs
model= build_model()
model.fit(x_train, y_train, batch_size = batch_size, epochs = 50)

The error being returned is as such.

ValueError: Error when checking model target: the list of Numpy arrays that you are passing to your model is not the size the model expected. Expected to see 1 array(s), but instead got the following list of 4055 arrays: [array([[0.00630006],

Am I not segmenting correctly? Loading correctly? Should the number of units be different than the number of prediction blocks? I appreciate any help. Thanks.

Edit

The suggestions to convert them to Numpy arrays is correct but MinMixScalar() returns a numpy array. I reshaped the arrays into the proper dimension but now my computer is having CUDA memory error. I consider the problem solved. Thank you.

df = pd.DataFrame(data)
min_max_scaler = MinMaxScaler()
H = prediction_length

window_size = 7*H
num_pred_blocks = len(df)-window_size-H+1

x_train = []
y_train = []
for i in range(num_pred_blocks):
    x_train_block = df['C'][i:(i + window_size)].values
    x_train.append(x_train_block)
    y_train_block = df['C'][(i + window_size):(i + window_size + H)].values
    y_train.append(y_train_block)

x_train = min_max_scaler.fit_transform(x_train)
y_train = min_max_scaler.fit_transform(y_train)
x_train = np.reshape(x_train, (len(x_train), 1, window_size))
y_train = np.reshape(y_train, (len(y_train), 1, H))
batch_size = 1

def build_model():
    model = Sequential()
    model.add(LSTM(batch_input_shape=(batch_size, 1, window_size),
                   return_sequences=True,
                   units=100))
    model.add(TimeDistributed(Dense(H)))
    model.add(Activation("linear"))
    model.compile(loss="mse", optimizer="rmsprop")
    return model

num_epochs = epochs
model = build_model()
model.fit(x_train, y_train, batch_size = batch_size, epochs = 50)

2 Answers 2

1

I don't think you passed the batch size in the model.

input_shape=(window_size,batch_size) is the data dimension. which is correct, but you should use input_shape=(window_size, 1)

If you want to use batch, you have to add another dimension, like this LSTM(n_neurons, batch_input_shape=(n_batch, X.shape[1], X.shape[2])) (Cited from the Keras)

in your case:

def build_model():
    model = Sequential()
    model.add(LSTM(input_shape=(batch_size, 1, window_size),
                   return_sequences=True,
                   units=num_pred_blocks))
    model.add(TimeDistributed(Dense(H)))
    model.add(Activation("linear"))
    model.compile(loss="mse", optimizer="rmsprop")
    return model

You also need to use np.shape to change the dimension of the of your data, it should be (batch_dim, data_dim_1, data_dim_2). I use numpy, so numpy.reshape() will work.

First your data should be row-wise, so for each row, you should have a shape of (1, 168), then add the batch dimension, it will be (batch_n, 1, 168).

Hope this help.

Sign up to request clarification or add additional context in comments.

Comments

1

That's probably because x_train and y_train were not updated to numpy arrays. Take a closer look at this issue on github.

model = build_model()
x_train, y_train = np.array(x_train), np.array(y_train)
model.fit(x_train, y_train, batch_size = batch_size, epochs = 50)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.