3

I have two dataframes

df
Out[162]: 
          colA  colB
L0 L1 L2            
A1 B1 C1     1     2
      C2     3     4
   B2 C1     5     6
      C2     7     8
A2 B3 C1     9    10
      C2    11    12
   B4 C1    13    14
      C2    15    16

df1
Out[166]: 
               rate
from to            
CHF  CHF   1.000000
     MXN  19.673256
     ZAR   0.000000
     XAU   0.000775
     THB  32.961405

When I did

df.query('L0=="A1" & L2=="C1"')
Out[167]: 
          colA  colB
L0 L1 L2            
A1 B1 C1     1     2
   B2 C1     5     6

Which give me back the expected out put .

Then I want to apply the same function in df1

df1.query('ilevel_0=="CHF" & ilevel_1=="MXN"') 

and

df1.query('from=="CHF" & to=="MXN"') 

Both failed

What happened here ?


Data Input :

#df
{'colA': {('A1', 'B1', 'C1'): 1,
  ('A1', 'B1', 'C2'): 3,
  ('A1', 'B2', 'C1'): 5,
  ('A1', 'B2', 'C2'): 7,
  ('A2', 'B3', 'C1'): 9,
  ('A2', 'B3', 'C2'): 11,
  ('A2', 'B4', 'C1'): 13,
  ('A2', 'B4', 'C2'): 15},
 'colB': {('A1', 'B1', 'C1'): 2,
  ('A1', 'B1', 'C2'): 4,
  ('A1', 'B2', 'C1'): 6,
  ('A1', 'B2', 'C2'): 8,
  ('A2', 'B3', 'C1'): 10,
  ('A2', 'B3', 'C2'): 12,
  ('A2', 'B4', 'C1'): 14,
  ('A2', 'B4', 'C2'): 16}}


#df1
{'rate': {('CHF', 'CHF'): 1.0,
('CHF', 'MXN'): 19.673256,
  ('CHF', 'THB'): 32.961405,
  ('CHF', 'XAU'): 0.000775,
  ('CHF', 'ZAR'): 0.0}}
11
  • 1
    So, from is a reserved keyword in python, so using it in query doesn't work. Commented Jan 8, 2018 at 22:38
  • df1.query('ilevel_0=="CHF" & ilevel_1=="MXN"') this works on my computer. Commented Jan 8, 2018 at 22:40
  • @cᴏʟᴅsᴘᴇᴇᴅ Yeah , i notice that but why ilevel_0 still not working ... Commented Jan 8, 2018 at 22:40
  • @Tai error code on my side pandas.core.computation.ops.UndefinedVariableError: name 'ilevel_0' is not defined Commented Jan 8, 2018 at 22:40
  • 3
    ilevel_0 is the name given to the axis when it has no name. If the axis already has a name, then you need to use that name in query. On the other hand, if you do df1.rename_axis([None, None]) and then call query, you get your answer. Commented Jan 8, 2018 at 22:42

1 Answer 1

4

Consider -

df1

               rate
from to            
CHF  CHF   1.000000
     MXN  19.673256
     THB  32.961405
     XAU   0.000775
     ZAR   0.000000

First, the reason for df1.query('ilevel_0=="CHF" & ilevel_1=="MXN"') not working, is because your index already has a name. ilevel_* is the name assigned, when the index does not yet have a name. So, this command gives you an UndefinedVariableError.

Next, the reason for df1.query('from=="CHF" & to=="MXN"') not working, is that from is a keyword in python, and when pandas evals the expression, from == ... is considered invalid syntax. One workaround would be -

df1.rename_axis(['frm', 'to']).query("frm == 'CHF' and to == 'MXN'")


              rate
frm to            
CHF MXN  19.673256

Another would be getting rid of the axis names -

df1.rename_axis([None, None]).query("ilevel_0 == 'CHF' and ilevel_1 == 'MXN'") 

              rate
CHF MXN  19.673256

Keep in mind that query suffers from a host of limitations, mostly revolving around restrictions with variable names.

Sign up to request clarification or add additional context in comments.

4 Comments

The multi-index related api, contain too many tricks , Hope they can fix it in the future :-)
@Wen Yep, would hope that they could make the API a little cleaner, or get rid of it.
For me , I even want to reset_index rather than working with multiple index , it's pain ...
Brilliant answer!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.