16

In pandas how do I split Series/dataframe into two Series/DataFrames where odd rows in one Series, even rows in different? Right now I am using

rng = range(0, n, 2)
odd_rows = df.iloc[rng]

This is pretty slow.

2 Answers 2

36

Use slice:

In [11]: s = pd.Series([1,2,3,4])

In [12]: s.iloc[::2]  # even
Out[12]:
0    1
2    3
dtype: int64

In [13]: s.iloc[1::2]  # odd
Out[13]:
1    2
3    4
dtype: int64
Sign up to request clarification or add additional context in comments.

1 Comment

Super late but explanation: slice syntax. start(do nothing):stop(do nothing):step_count(2). So for evens you'd start at 0, go to end, increment by 2. For odds, you'd start at 1, go to end, increment by 2.
6

Here's some comparisions

In [100]: df = DataFrame(randn(100000,10))

simple method (but I think range makes this slow), but will work regardless of the index (e.g. does not have to be a numeric index)

In [96]: %timeit df.iloc[range(0,len(df),2)]
10 loops, best of 3: 21.2 ms per loop

The following require an Int64Index that is range based (which is easy to get, just reset_index()).

In [107]: %timeit df.iloc[(df.index % 2).astype(bool)]
100 loops, best of 3: 5.67 ms per loop

In [108]: %timeit df.loc[(df.index % 2).astype(bool)]
100 loops, best of 3: 5.48 ms per loop

make sure to give it index positions

In [98]: %timeit df.take(df.index % 2)
100 loops, best of 3: 3.06 ms per loop

same as above but no conversions on negative indicies

In [99]: %timeit df.take(df.index % 2,convert=False)
100 loops, best of 3: 2.44 ms per loop

This winner is @AndyHayden soln; this only works on a single dtype

In [118]: %timeit DataFrame(df.values[::2],index=df.index[::2])
10000 loops, best of 3: 63.5 us per loop

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.