13

With a simple (single-level) column index one can access a column in a pandas DataFrame using .query() as follows:

df1 = pd.DataFrame(np.random.rand(10,2),index=range(10),columns=['A','B'])
df1.query('A > 0.5')

I am struggling to achieve the analogous in a DataFrame with column multi-index:

df2 = pd.DataFrame(np.random.rand(10,2),index=range(10),columns=[['A','B'],['C','D']])
df2.query('(A,C) > 0.5') # fails
df2.query('"(A,C)" > 0.5') # fails
df2.query('("A","C") > 0.5') # fails

Is this doable? Thanks...

(As to the motivation: query() seems to allow for very concise selection on a row mutli-index - column single-index dataframe, for example:

df3 = pd.DataFrame(np.random.rand(6,2),index=[[0]*3+[1]*3,range(2,8)],columns=['A','B'])
df3.index.names=['one','two']
df3.query('one==0 & two<4 & A>0.5')

I would like to do something similar with a DF multi-indexed on both axes...)

2
  • MultiIndexing can be more trouble than it's worth. It can be really convenient when you need it, but you don't usually need it. If you want to use querying, I'm inclined to suggest you restructure your DataFrame. Commented Oct 21, 2014 at 15:41
  • 1
    I imagine this is a commonly encountered issue, I'm surprised this question was not more discoverable. #backlog Commented Dec 21, 2020 at 7:24

1 Answer 1

8

There's an open issue on github for this, but in the meantime, one suggested workaround is to refer to the column via the DataFrame variable through @ notation:

df2.query("@df2.A.C > 0.5")

This is not a perfect workaround. If your header names/levels contain spaces, you will need to remove/rename them first.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.