0

I'm trying to train a model to predict the close column 2 steps ahead. Prepared the dataset:

datetime close close_forward close_shift_1 close_shift_2 close_shift_3 close_shift_4 close_shift_5
2024-10-01 10:05:00 4009.0 4020.0 4002.5 3994.5 3993.0 3991.0 4007.5
2024-10-01 10:06:00 4020.5 4018.0 4009.0 4002.5 3994.5 3993.0 3991.0
2024-10-01 10:07:00 4020.0 4018.5 4020.5 4009.0 4002.5 3994.5 3993.0
2024-10-01 10:08:00 4018.0 4010.5 4020.0 4020.5 4009.0 4002.5 3994.5
2024-10-01 10:09:00 4018.5 4017.0 4018.0 4020.0 4020.5 4009.0 4002.5
2024-10-01 10:10:00 4010.5 4010.0 4018.5 4018.0 4020.0 4020.5 4009.0

Where close_forward are the values of close after 2 steps and close_shift are the values of a few steps back.

Trained linear regression

column_names = column_names =['close', 'close_shift_1', 'close_shift_2''close_shift_3''close_shift_4', 'close_shift_5']
X_train = df[column_names].values
y_train = df['сlose_forward'].values

scaler_X = StandardScaler()
scaler_y = StandardScaler()


X_train_scaled = scaler_X.fit_transform(X_train)
y_train_scaled = scaler_y.fit_transform(y_train.reshape(-1, 1))


model = Ridge(alpha=0.0001)
# model.fit(X_train, y_train)
model.fit(X_train_scaled, y_train_scaled)


X_test_scaled = scaler_X.transform(X_test)


# y_pred = model.predict(X_test)
y_pred_scaled = model.predict(X_test_scaled)


y_pred = scaler_y.inverse_transform(y_pred_scaled).flatten()

built a graph

enter image description here

If you zoom in on one of the sections

enter image description here

As we can see, the model predicts the values of close and not test, as required. What could be the error and how to fix it?

It is required to train the model so that it predicts 2 steps ahead. I tried to train with other steps, other models (gradient boosting and lstm), but the effect is the same

1
  • The part were you create X_test is missing from your code. Also add the code you are using to create those plots. Looking at the second graph it looks as if you are using the same data for training and testing and the mismatch between both curves is a plotting error: your test data is shifted along the time axis. Commented Jan 15 at 13:01

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.