4

I have a dataframe organised like so:

    x   y   e
A 0 0.0 1.0 0.01
  1 0.1 0.9 0.03
  2 0.2 1.3 0.02
...
B 0 0.0 0.5 0.02
  1 0.1 0.6 0.02
  2 0.2 0.9 0.04
...

etc.

I would like to select rows of a A/B/etc. that fall between certain values in x.

This, for example, works:

p,q=0,1
indices=df.loc[("A"),"x"].between(p,q)
df.loc[("A"),"y"][indices]

Out:
[1.0,0.9]

However, this takes two lines of code, and uses chain indexing. However, what is to me the obvious way of one-lining this doesn't work:

p,q=0,1
df.loc[("A",df[("A"),"x"].between(p,q)),"y"]

Out:
[1.0,0.9]

How can I avoid chain indexing here?

(Also, if anyone wants to explain how to make the "x" column into the indices and thereby avoid the '0,1,2' indices, feel free!)

[Edited to clarify desired output]

3
  • 1
    for the second question df = df.reset_index(level=1, drop=True).set_index('x', append=True) Commented Jun 30, 2022 at 13:25
  • 1
    df.query('@p<=x && x<=@q').loc['A','y'] would work for the first one. Commented Jun 30, 2022 at 13:26
  • What is your expected output for your sample? Commented Jun 30, 2022 at 13:47

3 Answers 3

2

You can merge your 2 lines of code by using a lambda function.

>>> df.loc['A'].loc[lambda A: A['x'].between(p, q), 'y']

1    0.9
2    1.3
Name: y, dtype: float64

The output of your code:

indices=df.loc[("A"),"x"].between(p,q)
output=df.loc[("A"),"y"][indices]
print(output)

# Output
1    0.9
2    1.3
Name: y, dtype: float64
Sign up to request clarification or add additional context in comments.

2 Comments

Thank you, that's perfect! The lambda function pops up in so many answers, I really must learn it... Thanks again.
Using a lambda function (or a named function) has a main advantage: the state of the dataframe is evaluated before the call. So A contains df.loc['A'] and not df.
1

You can do

cond = df['x'].between(0.05,0.15) & (df.index.get_level_values(level=0)=='A')
df[cond]
Out[284]: 
       x    y     e
A B                
A 1  0.1  0.9  0.03

Comments

0

The trick here is to combine everything into a single boolean indexer. So to convert .loc['A', …] you can use df.index.get_level_values(0) == 'A' and then combine with your other conditional via &

import numpy as np
import pandas as pd
df = pd.DataFrame(
    data=np.linspace([0]*3, [1]*3 , 10, axis=0),
    index=pd.MultiIndex.from_product([['A', 'B'], range(5)]),
    columns=[*'xye'], 
)

out = df.loc[(df.index.get_level_values(0) == 'A') & (df['x'].between(.1, .4))]
print(out)
            x         y         e
A 1  0.111111  0.111111  0.111111
  2  0.222222  0.222222  0.222222
  3  0.333333  0.333333  0.333333

This is what my input data looked like:

print(df)
            x         y         e
A 0  0.000000  0.000000  0.000000
  1  0.111111  0.111111  0.111111
  2  0.222222  0.222222  0.222222
  3  0.333333  0.333333  0.333333
  4  0.444444  0.444444  0.444444
B 0  0.555556  0.555556  0.555556
  1  0.666667  0.666667  0.666667
  2  0.777778  0.777778  0.777778
  3  0.888889  0.888889  0.888889
  4  1.000000  1.000000  1.000000

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.