MultiIndex DataFrame - Getting only the possible values of a lower level index given an upper level index value

Question

When I slice into a MultiIndex DataFrame by a level 0 index value, I want to know the possible level 1+ index values that fall under that initial value. If my wording doesn't make sense, here's an example:

>>> arrays = [['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux'],
... ['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two'],
... ['a','b','a','b','b','b','b','b']]
>>> tuples = list(zip(*arrays))
>>> index = pd.MultiIndex.from_tuples(tuples, names=['first', 'second','third'])
>>> s = pd.Series(np.random.randn(8), index=index)
>>> s
first  second  third
bar    one     a       -0.598684
       two     b        0.351421
baz    one     a       -0.618285
       two     b       -1.175418
foo    one     b       -0.093806
       two     b        1.092197
qux    one     b       -1.515515
       two     b        0.741408
dtype: float64

s's index looks like:

>>> s.index
MultiIndex(levels=[[u'bar', u'baz', u'foo', u'qux'], [u'one', u'two'], [u'a', u'b']],
           labels=[[0, 0, 1, 1, 2, 2, 3, 3], [0, 1, 0, 1, 0, 1, 0, 1], [0, 1, 0, 1, 1, 1, 1, 1]],
           names=[u'first', u'second', u'third'])

When I take just the section of s whose first index value is foo, and look up the index of that I get:

>>> s_foo = s.loc['foo']
>>> s_foo
second  third
one     b       -0.093806
two     b        1.092197
dtype: float64

>>> s_foo.index
MultiIndex(levels=[[u'one', u'two'], [u'a', u'b']],
           labels=[[0, 1], [1, 1]],
           names=[u'second', u'third'])

I want the index of s_foo to act as if the higher level of s does not exist, yet we can see in s_foo.index's levels attribute that a is still considered a potential value of index third, despite the fact that s_foo only has b as a possible value.

Essentially, what I want to find are all the possible third values of foo_s, i.e. b and only b. Right now I do set(s_foo.reset_index()['third']), but I was hoping for a more elegant solution

J.vdS · Accepted Answer · 2018-04-25 19:46:18Z

1

You can create s_foo and explicitly drop the unused levels:

s_foo = s.loc['foo']
s_foo.index = s_foo.index.remove_unused_levels()

answered Apr 25, 2018 at 19:46

J.vdS

614 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Mike S Over a year ago

This is exactly what I wanted. Thanks!

Darren Brien · Accepted Answer · 2018-04-25 19:34:17Z

0

Reset index seems like the right way to go, seems like you don't want it to be an index (the result you're getting is the way indexes work).

s.reset_index(level=2).groupby(level=[0])['third'].unique()

or if you want counts

s.reset_index(level=2).groupby(level=[0])['third'].value_counts()

answered Apr 25, 2018 at 19:34

Darren Brien

1157 bronze badges

Collectives™ on Stack Overflow

MultiIndex DataFrame - Getting only the possible values of a lower level index given an upper level index value

2 Answers 2

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related