0

A pandas.DataFrame have it's MultiIndex created as shown:

import pandas as pd
import numpy as np
df = pd.DataFrame({'A': 'foo bar foo bar foo bar foo foo'.split(),
                   'B': 'one one two three three two four three'.split(),
                   'C': np.arange(8), 'D': np.arange(8) * 2})

df = df.set_index(['A','B'])

This creates a `DataFrame``:

           C   D
A   B           
foo one    0   0
bar one    1   2
foo two    2   4
bar three  3   6
foo three  4   8
bar two    5  10
foo four   6  12
    three  7  14

Problem: Why do you get a KeyError: 'foo' when trying to select using df['foo']? Similarly, df['foo', 'one'] and df['foo']['one'].

Furthermore,the MultiIndex did not group all the foos together under A? Is it necessary to group them together, like :

          A         B
one 1 -0.732470 -0.313871
    2 -0.031109 -2.068794
    3  1.520652  0.471764
two 1 -0.101713 -1.204458
    2  0.958008 -0.455419
    3 -0.191702 -0.915983

1 Answer 1

3

df['foo'] tries to select column foo and thus generates KeyError as there is no foo column. I guess you meant to do df.loc['foo'] and df.loc['foo', 'one'].

Sign up to request clarification or add additional context in comments.

1 Comment

Thanks! df.loc['foo', 'one'] also returns a warning PerformanceWarning: indexing past lexsort depth may impact performance.. Should we be concerned?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.