12

I have the following data frame:

col0
1
2
3
4
5
...
1000

I'd like roll col0 into a data frame with a window size of 5, so the outcome would be like this:

col0 col1 col2 ... col995
1 2 3 ... 996
2 3 4 ... 997
3 4 5 ... 998
4 5 6 ... 999
5 6 7 ... 1000

I've tried using loops and "iloc" which would produce correct results, but as the original data frame gets much longer, it would take too long to finish. To complete 10,000, it'd take almost 2 minutes, 20,000 almost 10 minutes, and so on... Is there any way to do it faster, more efficiently in Python?

3
  • 1
    Unsure what "roll a column into a dataframe means". Can you provide an input along with the desired output and code? Commented Feb 17, 2024 at 13:29
  • 1
    well what about something like df[f'col{i}'] = df[f'col0'][i:i+5] in a loop? Alternatively, you can create a dict and then convert it to a dataframe. Note that creating/manipulating dataframes with about thousands of columns is generally not very efficient (Pandas is not made for fast manipulation of wide dataframes). A loop of 1000 iterations is relatively fast in Python. Commented Feb 17, 2024 at 14:13
  • mozway just had the perfect answer below. My choice of words wasn't the best. It was supposed to be 'sliding window' as per mozway's solution. Commented Feb 17, 2024 at 14:18

2 Answers 2

17

Use numpy.lib.stride_tricks.sliding_window_view and transpose (T):

from numpy.lib.stride_tricks import sliding_window_view as swv

out = pd.DataFrame(swv(df['col0'], 5).T).add_prefix('col')

Output:

   col0  col1  col2  col3  col4  col5  col6  col7  col8  col9  ...  col986  \
0     1     2     3     4     5     6     7     8     9    10  ...     987   
1     2     3     4     5     6     7     8     9    10    11  ...     988   
2     3     4     5     6     7     8     9    10    11    12  ...     989   
3     4     5     6     7     8     9    10    11    12    13  ...     990   
4     5     6     7     8     9    10    11    12    13    14  ...     991   

   col987  col988  col989  col990  col991  col992  col993  col994  col995  
0     988     989     990     991     992     993     994     995     996  
1     989     990     991     992     993     994     995     996     997  
2     990     991     992     993     994     995     996     997     998  
3     991     992     993     994     995     996     997     998     999  
4     992     993     994     995     996     997     998     999    1000  

[5 rows x 996 columns]

Reproducible input:

N = 1000
df = pd.DataFrame({'col0': range(1, N+1)})

Timing for 100k rows:

25 ms ± 402 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
Sign up to request clarification or add additional context in comments.

2 Comments

@ntintel yes, I added a timing for 100k rows, about 25ms on my computer ;)
Interestingly, I just tested a direct way to produce the output (window=5 ; pd.DataFrame(swv(df['col0'], len(df)-window+1)).add_prefix('col')) and this was not faster than using the transpose.
5

You can use sliding_window_view for this:

import pandas as pd
from numpy.lib.stride_tricks import sliding_window_view

df = pd.DataFrame({'Col0': range(1,1001)})

data = sliding_window_view(df['Col0'], 5).T

df_new = pd.DataFrame(data, 
                      columns=[f'Col{i}' for i in range(data.shape[1])])

print(df_new)

   Col0  Col1  Col2  Col3  Col4  ...  Col991  Col992  Col993  Col994  Col995
0     1     2     3     4     5  ...     992     993     994     995     996
1     2     3     4     5     6  ...     993     994     995     996     997
2     3     4     5     6     7  ...     994     995     996     997     998
3     4     5     6     7     8  ...     995     996     997     998     999
4     5     6     7     8     9  ...     996     997     998     999    1000

[5 rows x 996 columns]

1 Comment

Thank you! This solution is awesome. Same as mozway's above.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.