1

I would like to split a dataframe into 4 dataframes named q1, q2, q3 and q4 where q1 should contain all rows where a specific column (e.g. age) is among the lowest 25% of the (age) distribution, q2 from 25% to 50%, q3 from 50% to 75% and q4 from 75% to 100%. Or in other words: I would like to create 4 equally sized groups of persons based on their age.

How can I do this in a pythonic way (currently I am using loops but that's possibly not a great solution)?

2

1 Answer 1

1

Not very pretty but does the trick (if anybody is interested):

df = pd.DataFrame(np.array([[1, 100], [2, 10], [3, 1], [4, 50], [5, 43], [6, 61], [7, 99], [7, 11]]), columns=['idx', 'age'])

print(df)

q = df.quantile([0.00, 0.25, 0.50, 0.75, 1.00])

col = 'age'

q1 = df[((df[col]>=q[col][0.00]) & (df[col]<q[col][0.25]))]
q2 = df[((df[col]>=q[col][0.25]) & (df[col]<q[col][0.50]))]
q3 = df[((df[col]>=q[col][0.50]) & (df[col]<q[col][0.75]))]
q4 = df[((df[col]>=q[col][0.75]) & (df[col]<=q[col][1.00]))]
print('----')
print(q1)
print('----')
print(q2)
print('----')
print(q3)
print('----')
print(q4)

yields:

   idx  age
0    1  100
1    2   10
2    3    1
3    4   50
4    5   43
5    6   61
6    7   99
7    7   11
----
   idx  age
1    2   10
2    3    1
----
   idx  age
4    5   43
7    7   11
----
   idx  age
3    4   50
5    6   61
----
   idx  age
0    1  100
6    7   99
Sign up to request clarification or add additional context in comments.

1 Comment

is there something more pretty/efficient?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.