How to generate rolling subsequences into a dataframe in Python

Question

I have the following data frame:

col0
1
2
3
4
5
...
1000

I'd like roll col0 into a data frame with a window size of 5, so the outcome would be like this:

col0	col1	col2	...	col995
1	2	3	...	996
2	3	4	...	997
3	4	5	...	998
4	5	6	...	999
5	6	7	...	1000

I've tried using loops and "iloc" which would produce correct results, but as the original data frame gets much longer, it would take too long to finish. To complete 10,000, it'd take almost 2 minutes, 20,000 almost 10 minutes, and so on... Is there any way to do it faster, more efficiently in Python?

Unsure what "roll a column into a dataframe means". Can you provide an input along with the desired output and code? — sud
– sud, Commented Feb 17, 2024 at 13:29
well what about something like df[f'col{i}'] = df[f'col0'][i:i+5] in a loop? Alternatively, you can create a dict and then convert it to a dataframe. Note that creating/manipulating dataframes with about thousands of columns is generally not very efficient (Pandas is not made for fast manipulation of wide dataframes). A loop of 1000 iterations is relatively fast in Python. — Jérôme Richard
– Jérôme Richard, Commented Feb 17, 2024 at 14:13
mozway just had the perfect answer below. My choice of words wasn't the best. It was supposed to be 'sliding window' as per mozway's solution. — ntintel
– ntintel, Commented Feb 17, 2024 at 14:18

mozway · Accepted Answer · 2024-02-17 14:16:42Z

17

Use numpy.lib.stride_tricks.sliding_window_view and transpose (T):

from numpy.lib.stride_tricks import sliding_window_view as swv

out = pd.DataFrame(swv(df['col0'], 5).T).add_prefix('col')

Output:

   col0  col1  col2  col3  col4  col5  col6  col7  col8  col9  ...  col986  \
0     1     2     3     4     5     6     7     8     9    10  ...     987   
1     2     3     4     5     6     7     8     9    10    11  ...     988   
2     3     4     5     6     7     8     9    10    11    12  ...     989   
3     4     5     6     7     8     9    10    11    12    13  ...     990   
4     5     6     7     8     9    10    11    12    13    14  ...     991   

   col987  col988  col989  col990  col991  col992  col993  col994  col995  
0     988     989     990     991     992     993     994     995     996  
1     989     990     991     992     993     994     995     996     997  
2     990     991     992     993     994     995     996     997     998  
3     991     992     993     994     995     996     997     998     999  
4     992     993     994     995     996     997     998     999    1000  

[5 rows x 996 columns]

Reproducible input:

N = 1000
df = pd.DataFrame({'col0': range(1, N+1)})

Timing for 100k rows:

25 ms ± 402 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

edited Feb 17, 2024 at 14:16

answered Feb 17, 2024 at 14:10

mozway

267k13 gold badges56 silver badges106 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

mozway Over a year ago

@ntintel yes, I added a timing for 100k rows, about 25ms on my computer ;)

mozway Over a year ago

Interestingly, I just tested a direct way to produce the output (window=5 ; pd.DataFrame(swv(df['col0'], len(df)-window+1)).add_prefix('col')) and this was not faster than using the transpose.

ouroboros1 · Accepted Answer · 2024-02-17 14:12:25Z

5

You can use sliding_window_view for this:

import pandas as pd
from numpy.lib.stride_tricks import sliding_window_view

df = pd.DataFrame({'Col0': range(1,1001)})

data = sliding_window_view(df['Col0'], 5).T

df_new = pd.DataFrame(data, 
                      columns=[f'Col{i}' for i in range(data.shape[1])])

print(df_new)

   Col0  Col1  Col2  Col3  Col4  ...  Col991  Col992  Col993  Col994  Col995
0     1     2     3     4     5  ...     992     993     994     995     996
1     2     3     4     5     6  ...     993     994     995     996     997
2     3     4     5     6     7  ...     994     995     996     997     998
3     4     5     6     7     8  ...     995     996     997     998     999
4     5     6     7     8     9  ...     996     997     998     999    1000

[5 rows x 996 columns]

answered Feb 17, 2024 at 14:12

ouroboros1

15.2k7 gold badges49 silver badges59 bronze badges

1 Comment

ntintel Over a year ago

Thank you! This solution is awesome. Same as mozway's above.

Collectives™ on Stack Overflow

How to generate rolling subsequences into a dataframe in Python

2 Answers 2

2 Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related