how to select rows in pandas dataframe based on between from another column

Question

I have a dataframe organised like so:

    x   y   e
A 0 0.0 1.0 0.01
  1 0.1 0.9 0.03
  2 0.2 1.3 0.02
...
B 0 0.0 0.5 0.02
  1 0.1 0.6 0.02
  2 0.2 0.9 0.04
...

etc.

I would like to select rows of a A/B/etc. that fall between certain values in x.

This, for example, works:

p,q=0,1
indices=df.loc[("A"),"x"].between(p,q)
df.loc[("A"),"y"][indices]

Out:
[1.0,0.9]

However, this takes two lines of code, and uses chain indexing. However, what is to me the obvious way of one-lining this doesn't work:

p,q=0,1
df.loc[("A",df[("A"),"x"].between(p,q)),"y"]

Out:
[1.0,0.9]

How can I avoid chain indexing here?

(Also, if anyone wants to explain how to make the "x" column into the indices and thereby avoid the '0,1,2' indices, feel free!)

[Edited to clarify desired output]

for the second question df = df.reset_index(level=1, drop=True).set_index('x', append=True) — Quang Hoang
– Quang Hoang, Commented Jun 30, 2022 at 13:25
df.query('@p<=x && x<=@q').loc['A','y'] would work for the first one. — Quang Hoang
– Quang Hoang, Commented Jun 30, 2022 at 13:26

Corralien · Accepted Answer · 2022-06-30 13:48:50Z

2

You can merge your 2 lines of code by using a lambda function.

>>> df.loc['A'].loc[lambda A: A['x'].between(p, q), 'y']

1    0.9
2    1.3
Name: y, dtype: float64

The output of your code:

indices=df.loc[("A"),"x"].between(p,q)
output=df.loc[("A"),"y"][indices]
print(output)

# Output
1    0.9
2    1.3
Name: y, dtype: float64

answered Jun 30, 2022 at 13:48

Corralien

121k8 gold badges44 silver badges69 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

ARGratrex Over a year ago

Thank you, that's perfect! The lambda function pops up in so many answers, I really must learn it... Thanks again.

Corralien Over a year ago

Using a lambda function (or a named function) has a main advantage: the state of the dataframe is evaluated before the call. So A contains df.loc['A'] and not df.

BENY · Accepted Answer · 2022-06-30 13:45:21Z

1

You can do

cond = df['x'].between(0.05,0.15) & (df.index.get_level_values(level=0)=='A')
df[cond]
Out[284]: 
       x    y     e
A B                
A 1  0.1  0.9  0.03

answered Jun 30, 2022 at 13:45

BENY

324k22 gold badges176 silver badges250 bronze badges

Comments

Cameron Riddell · Accepted Answer · 2022-06-30 13:46:01Z

The trick here is to combine everything into a single boolean indexer. So to convert .loc['A', …] you can use df.index.get_level_values(0) == 'A' and then combine with your other conditional via &

import numpy as np
import pandas as pd
df = pd.DataFrame(
    data=np.linspace([0]*3, [1]*3 , 10, axis=0),
    index=pd.MultiIndex.from_product([['A', 'B'], range(5)]),
    columns=[*'xye'], 
)

out = df.loc[(df.index.get_level_values(0) == 'A') & (df['x'].between(.1, .4))]
print(out)
            x         y         e
A 1  0.111111  0.111111  0.111111
  2  0.222222  0.222222  0.222222
  3  0.333333  0.333333  0.333333

This is what my input data looked like:

print(df)
            x         y         e
A 0  0.000000  0.000000  0.000000
  1  0.111111  0.111111  0.111111
  2  0.222222  0.222222  0.222222
  3  0.333333  0.333333  0.333333
  4  0.444444  0.444444  0.444444
B 0  0.555556  0.555556  0.555556
  1  0.666667  0.666667  0.666667
  2  0.777778  0.777778  0.777778
  3  0.888889  0.888889  0.888889
  4  1.000000  1.000000  1.000000

Collectives™ on Stack Overflow

how to select rows in pandas dataframe based on between from another column

3 Answers 3

2 Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

2 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related