I have the following data frame:
| col0 |
|---|
| 1 |
| 2 |
| 3 |
| 4 |
| 5 |
| ... |
| 1000 |
I'd like roll col0 into a data frame with a window size of 5, so the outcome would be like this:
| col0 | col1 | col2 | ... | col995 |
|---|---|---|---|---|
| 1 | 2 | 3 | ... | 996 |
| 2 | 3 | 4 | ... | 997 |
| 3 | 4 | 5 | ... | 998 |
| 4 | 5 | 6 | ... | 999 |
| 5 | 6 | 7 | ... | 1000 |
I've tried using loops and "iloc" which would produce correct results, but as the original data frame gets much longer, it would take too long to finish. To complete 10,000, it'd take almost 2 minutes, 20,000 almost 10 minutes, and so on... Is there any way to do it faster, more efficiently in Python?
df[f'col{i}'] = df[f'col0'][i:i+5]in a loop? Alternatively, you can create a dict and then convert it to a dataframe. Note that creating/manipulating dataframes with about thousands of columns is generally not very efficient (Pandas is not made for fast manipulation of wide dataframes). A loop of 1000 iterations is relatively fast in Python.