1

I have a multiindex dataframe that looks like this:

df = {'C': {('S', 0): 'A',
  ('S', 2): 'A',
  ('T', 0): 'A',
  ('T', 1): 'A',
  ('T', 3): 'A',
  ('U', 1): 'A',
  ('U', 2): 'A',
  ('U', 0): 'A',
  ('V', 0): 'A',
  ('W', 2): 'A',
  ('W', 0): 'A',
  ('X', 0): 'A',
  ('Y', 3): 'A',
  ('Z', 0): 'A',
  ('Z', 1): 'A'},
 'D': {('S', 0): '15',
  ('S', 2): '22',
  ('T', 0): '20',
  ('T', 1): '20',
  ('T', 3): '20',
  ('U', 1): '18',
  ('U', 2): '14',
  ('U', 0): '14',
  ('V', 0): '14',
  ('W', 2): '22',
  ('W', 0): '25',
  ('X', 0): '15',
  ('Y', 3): '17',
  ('Z', 0): '04',
  ('Z', 1): '16'},
 'E': {('S', 0): 1.0,
  ('S', 2): 1.0,
  ('T', 0): 2.0,
  ('T', 1): 2.0,
  ('T', 0): 2.0,
  ('U', 1): 2.0,
  ('U', 2): 2.0,
  ('U', 0): 2.0,
  ('V', 0): 1.0,
  ('W', 2): 1.0,
  ('W', 0): 1.0,
  ('X', 0): 1.0,
  ('Y', 3): 2.0,
  ('Z', 0): 3.0,
  ('Z', 1): 3.0}}

I want to keep the level 0 rows if in its level 1 there is a value >=2

the output will look something like this:

outp = {'C': {('S', 0): 'A',
  ('S', 2): 'A',
  ('T', 0): 'A',
  ('T', 1): 'A',
  ('T', 3): 'A',
  ('U', 1): 'A',
  ('U', 2): 'A',
  ('U', 0): 'A',
  ('W', 2): 'A',
  ('W', 0): 'A',
  ('Y', 3): 'A'},
 'D': {('S', 0): '15',
  ('S', 2): '22',
  ('T', 0): '20',
  ('T', 1): '20',
  ('T', 3): '20',
  ('U', 1): '18',
  ('U', 2): '14',
  ('U', 0): '14',
  ('W', 2): '22',
  ('W', 0): '25',
  ('Y', 3): '17'},
 'E': {('S', 0): 1.0,
  ('S', 2): 1.0,
  ('T', 0): 2.0,
  ('T', 1): 2.0,
  ('T', 0): 2.0,
  ('U', 1): 2.0,
  ('U', 2): 2.0,
  ('U', 0): 2.0,
  ('W', 2): 1.0,
  ('W', 0): 1.0,
  ('Y', 3): 2.0}}

What I did is I got the value from level 0 when level 1 >= 2, but because when doing this I deleted the values 0 and 1 from level 1 that should stay, I had to create another dataframe with the gotten values and then merge using 'inner'. I got the desired output but for sure I took the long and probably stupid way.

How could I do this in a better way?

Thanks.

3 Answers 3

3

Let's try with groupby filter on level=0 and filter to keep level 0 values when there is any value in index level 1 (get_level_values) greater than or equal to 2:

outp = (
    df.groupby(level=0)
        .filter(lambda s: (s.index.get_level_values(1) >= 2).any())
)

outp:

     C   D    E
S 0  A  15  1.0
  2  A  22  1.0
T 0  A  20  2.0
  1  A  20  2.0
  3  A  20  NaN
U 1  A  18  2.0
  2  A  14  2.0
  0  A  14  2.0
W 2  A  22  1.0
  0  A  25  1.0
Y 3  A  17  2.0
Sign up to request clarification or add additional context in comments.

Comments

3

Get the indices in level 0, where level 1 >= 2, and index the main df :

df.loc[df.query("ilevel_1 > =2").index.get_level_values(0)]

     C   D    E
S 0  A  15  1.0
  2  A  22  1.0
T 0  A  20  2.0
  1  A  20  2.0
  3  A  20  NaN
U 1  A  18  2.0
  2  A  14  2.0
  0  A  14  2.0
W 2  A  22  1.0
  0  A  25  1.0
Y 3  A  17  2.0

Comments

1

Here is a different way:

(df.loc[df.reset_index(level=1)
        .groupby(level=0)['level_1']
        .transform(lambda x: x.ge(2).any()).to_numpy()])

or

df.loc[df.index.to_frame().groupby(0)[1].transform(lambda x: x.ge(2).any())]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.