I'm trying to train a model to predict the close column 2 steps ahead. Prepared the dataset:
| datetime | close | close_forward | close_shift_1 | close_shift_2 | close_shift_3 | close_shift_4 | close_shift_5 |
|---|---|---|---|---|---|---|---|
| 2024-10-01 10:05:00 | 4009.0 | 4020.0 | 4002.5 | 3994.5 | 3993.0 | 3991.0 | 4007.5 |
| 2024-10-01 10:06:00 | 4020.5 | 4018.0 | 4009.0 | 4002.5 | 3994.5 | 3993.0 | 3991.0 |
| 2024-10-01 10:07:00 | 4020.0 | 4018.5 | 4020.5 | 4009.0 | 4002.5 | 3994.5 | 3993.0 |
| 2024-10-01 10:08:00 | 4018.0 | 4010.5 | 4020.0 | 4020.5 | 4009.0 | 4002.5 | 3994.5 |
| 2024-10-01 10:09:00 | 4018.5 | 4017.0 | 4018.0 | 4020.0 | 4020.5 | 4009.0 | 4002.5 |
| 2024-10-01 10:10:00 | 4010.5 | 4010.0 | 4018.5 | 4018.0 | 4020.0 | 4020.5 | 4009.0 |
Where close_forward are the values of close after 2 steps and close_shift are the values of a few steps back.
Trained linear regression
column_names = column_names =['close', 'close_shift_1', 'close_shift_2''close_shift_3''close_shift_4', 'close_shift_5']
X_train = df[column_names].values
y_train = df['сlose_forward'].values
scaler_X = StandardScaler()
scaler_y = StandardScaler()
X_train_scaled = scaler_X.fit_transform(X_train)
y_train_scaled = scaler_y.fit_transform(y_train.reshape(-1, 1))
model = Ridge(alpha=0.0001)
# model.fit(X_train, y_train)
model.fit(X_train_scaled, y_train_scaled)
X_test_scaled = scaler_X.transform(X_test)
# y_pred = model.predict(X_test)
y_pred_scaled = model.predict(X_test_scaled)
y_pred = scaler_y.inverse_transform(y_pred_scaled).flatten()
built a graph
If you zoom in on one of the sections
As we can see, the model predicts the values of close and not test, as required. What could be the error and how to fix it?
It is required to train the model so that it predicts 2 steps ahead. I tried to train with other steps, other models (gradient boosting and lstm), but the effect is the same