Selecting by subset of multiindex level

Question

Toy example

Suppose I have the dataframe df shown below

           C
L0 L1 L2    
0  w  P   11
      Q    9
      R   21
      S    4
   x  P    3
      Q    0
      R   23
      S   20
   y  P   19
      Q    0
      R    7
      S   13
   z  P   17
      Q    0
      R    5
      S    1
1  w  P    8
      Q    2
      R   12
      S    0
   x  P   22
      Q   14
      R    2
      S   18
   y  P    6
      Q    0
      R   16
      S   15
   z  P   10
      Q    0
      R    8
      S    0

Note that the rows of df are indexed by 3-level multi-index.

I can find the minimum of the C column for each value of the L2 level, as follows:

In [58]: df.groupby(level='L2').min()
Out[58]: 
    C
L2   
P   3
Q   0
R   2
S   0

Similarly, the following expression shows the values of L2 for which this minimum is greater than 0:

In [59]: df.groupby(level='L2').min() > 0
Out[59]: 
        C
L2       
P    True
Q   False
R    True
S   False

Question: How can I select the rows of the original dataframe df corresponding to the values of L2 for which the minimum of C is greater than 0?

In this simple example the problem boils down to selecting the rows of df whose L2 value is either 'P' or 'R'. Therefore, it would not be difficult to brute-force the problem by concatenating the rows for L2='P' with those for L2='R'.

In the application I have in mind, however, such a solution becomes unwieldy, since here the L2 level has ~2000 values, and for about half of them, the minimum is greater than 0.

Therefore, I'm looking for a more scalable way to solve this selection problem.

DATA

L0  L1  L2  C
0   w   P   11
0   w   Q   9
0   w   R   21
0   w   S   4
0   x   P   3
0   x   Q   0
0   x   R   23
0   x   S   20
0   y   P   19
0   y   Q   0
0   y   R   7
0   y   S   13
0   z   P   17
0   z   Q   0
0   z   R   5
0   z   S   1
1   w   P   8
1   w   Q   2
1   w   R   12
1   w   S   0
1   x   P   22
1   x   Q   14
1   x   R   2
1   x   S   18
1   y   P   6
1   y   Q   0
1   y   R   16
1   y   S   15
1   z   P   10
1   z   Q   0
1   z   R   8
1   z   S   0

Zero · Accepted Answer · 2017-07-05 16:48:09Z

2

Here's one way

Store the L2 values needed

In [413]: m = df.groupby(level='L2').min() > 0

In [414]: m
Out[414]:
        C
L2
P    True
Q   False
R    True
S   False

Using, get_level_values(index_level_name).isin(to_be_filtered) filter only needed values.

In [415]: df[df.index.get_level_values('L2').isin(m.loc[m.C, :].index)]
Out[415]:
           C
L0 L1 L2
0  w  P   11
      R   21
   x  P    3
      R   23
   y  P   19
      R    7
   z  P   17
      R    5
1  w  P    8
      R   12
   x  P   22
      R    2
   y  P    6
      R   16
   z  P   10
      R    8

Here's another way. Using, transform and subset selection

In [430]: df[(df.groupby(level='L2').transform('min') > 0).C]
Out[430]:
           C
L0 L1 L2
0  w  P   11
      R   21
   x  P    3
      R   23
   y  P   19
      R    7
   z  P   17
      R    5
1  w  P    8
      R   12
   x  P   22
      R    2
   y  P    6
      R   16
   z  P   10
      R    8

Details

In [416]: m.loc[m.C, :].index
Out[416]: Index([u'P', u'R'], dtype='object', name=u'L2')

edited Jul 5, 2017 at 16:48

answered Jul 5, 2017 at 16:36

Zero

77.4k22 gold badges153 silver badges153 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

kjo Over a year ago

Thanks! BTW, m.loc[m.C, :] (instead of m[m]) obviates the dropna.

Zero Over a year ago

And, realized you may not need m using transform method.

Collectives™ on Stack Overflow

Selecting by subset of multiindex level

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related