0

I have a dataframe like this.

A,B
1,2
3,4
5,6
7,8
9,10
11,12
13,14

I would like to split this above dataframe. The splitted dataframe should contains every three rows. The first dataframe splitted can contain from index 0 to index 2. Second contains from index 1 to index and so on.

A,B
1,2
3,4
5,6

A,B
3,4
5,6
7,8

A,B
5,6
7,8
9,10

and so on.

I have been using forloop and then using the iloc and then adding those splitted dataframe into the list.

I am looking if there is some vectorized method to split that above dataframe in pandas. The dataframe is huge and using forloop through each rows is quite slow.

1 Answer 1

1

Assuming you have standard RangeIndex indexes and borrowing a vectorized approach for a rolling window from here, we can get down to numpy's level and:

def rolling_window(a, window):
    shape = a.shape[:-1] + (a.shape[-1] - window + 1, window)
    strides = a.strides + (a.strides[-1],)
    return np.lib.stride_tricks.as_strided(a, shape=shape, strides=strides)

df.to_numpy()[rolling_window(df.index.values, 3)]

which yields

array([[[ 1,  2],
        [ 3,  4],
        [ 5,  6]],

       [[ 3,  4],
        [ 5,  6],
        [ 7,  8]],

       [[ 5,  6],
        [ 7,  8],
        [ 9, 10]],

       [[ 7,  8],
        [ 9, 10],
        [11, 12]],

       [[ 9, 10],
        [11, 12],
        [13, 14]]])

If you need these as data frames back, just use the constructor and a map

map(pd.DataFrame, df.to_numpy()[rolling_window(df.index.values, 3)])
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.