3

I'm trying to use query with a MultiIndex that has multiple levels of columns.

!pip install pandas-datareader --quiet

Next ...

from pandas_datareader import DataReader

df = DataReader(["SPY", "XOM"],  "yahoo", datetime(2012,7,1), datetime(2018,7,21))
df.keys()

Returns ...

MultiIndex(levels=[['High', 'Low', 'Open', 'Close', 'Volume', 'Adj Close'], ['SPY', 'XOM']],
           labels=[[0, 0, 1, 1, 2, 2, 3, 3, 4, 4, 5, 5], [0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1]],
           names=['Attributes', 'Symbols'])

And

df['High']['SPY'].head()

Returns ...

Date
2012-07-02    136.649994
2012-07-03    137.509995
2012-07-05    137.800003
2012-07-06    135.770004
2012-07-09    135.570007
Name: SPY, dtype: float64

I was wondering how to use query with multiple levels? I was thinking something like this?

df.query('High.SPY > 137')
0

1 Answer 1

3

AFAIU this is only partially supported - see this issue on GH.

That post suggests using this syntax:

df.query('@df.High.SPY > 137')

If you don't have a specific need to use query, it's doable with loc:

df.loc[:, ('High', 'SPY')][df.loc[:, ('High', 'SPY')] > 137]

Or alternatively:

df[df.loc[:, ('High', 'SPY')] > 137].loc[:,('High', 'SPY')]


Date
2012-07-03    137.509995
2012-07-05    137.800003
2012-07-18    137.639999
2012-07-19    138.179993
2012-07-20    137.160004
2012-07-27    139.070007
2012-07-30    139.339996
2012-07-31    138.869995
...
Sign up to request clarification or add additional context in comments.

4 Comments

Edited - now I think it's OK
also df.loc[:, ('High', 'SPY')] - df[('High', 'SPY')]
where is the > 137?
I think simplify df.loc[:, ('High', 'SPY')] , > 137 is also necessary :)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.