Select rows by column value and include previous row by another column value

Question

Here's an example of DataFrame:

import numpy as np
import pandas as pd

df = pd.DataFrame([
    [0, "file_0", 5],
    [0, "file_1", 0],
    [1, "file_2", 0],
    [1, "file_3", 8],
    [2, "file_4", 0],
    [2, "file_5", 5],
    [2, "file_6", 100],
    [2, "file_7", 0],
    [2, "file_8", 50]
], columns=["case", "filename", "num"])

I wanna select num==0 rows and their previous rows with the same case value, no matter the num value of the previous row.

Finally, we should get

case    filename    num
0   file_0  5
0   file_1  0
1   file_2  0
2   file_4  0
2   file_6  100
2   file_7  0

I have got that I can select the previous row by

df[(df['num']==0).shift(-1).fillna(False)]

However, this doesn't consider the case value. One solution that came to my mind is group by case first and then filter data. I have no idea how to code it ...

Within one case, if two consecutive num are zero, should the first be selected twice? — Reinderien
– Reinderien, Commented Dec 31, 2022 at 14:08
Will num always alternate between 0 and 100 in that pattern within a case? — Reinderien
– Reinderien, Commented Dec 31, 2022 at 14:10
@Reinderien Sorry for the simple example. Actually, num can be zero and any other positive numbers. — zxdawn
– zxdawn, Commented Dec 31, 2022 at 14:13
Sure; but specifically - within a case there's no guarantee that every other element is a zero, right? You should update your example to demonstrate this. You also need to show code for the approach you've tried so far. — Reinderien
– Reinderien, Commented Dec 31, 2022 at 14:15

zxdawn · Accepted Answer · 2023-01-01 05:58:10Z

1

I figure out the answer by myself:

# create boolean masks which are true when `num` is 0 and previous `case` is the same
mask = (df.case.eq(df.case.shift())) & (df['num']==0)

# concat previous rows and num==0 rows
df_res = pd.concat([df[mask.shift(-1).fillna(False)], df[df['num']==0]]).sort_values(['case', 'filename'])

answered Jan 1, 2023 at 5:58

zxdawn

1,0391 gold badge11 silver badges27 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Surjit Samra Over a year ago

so it needed masking/ filtering , shifting , concatenating and sorting :), well done, you had more data to analyse problem better by yourself

Surjit Samra · Accepted Answer · 2022-12-31 15:09:07Z

0

How about merging df ?

    df = pd.DataFrame([
    [0, "file_0", 0],
    [0, "file_1", 0],
    [1, "file_2", 0],
    [2, "file_3", 0],
    [2, "file_4", 100],
    [2, "file_5", 0],
    [2, "file_6", 50],
    [2, "file_7", 0]
], columns=["case", "filename", "num"])
df = df.merge(df, left_on='filename', right_on='filename', how='inner')
df[(df['case_x'] == df['case_y']) & df['num_x'] == 0]
Out[219]: 
   case_x filename  num_x  case_y  num_y
0       0   file_0      0       0      0
1       0   file_1      0       0      0
2       1   file_2      0       1      0
3       2   file_3      0       2      0
4       2   file_4    100       2    100
5       2   file_5      0       2      0
6       2   file_6     50       2     50
7       2   file_7      0       2      0

then you can rename columns back

df[['case_x', 'filename',  'num_x']].rename({'case_x':'case','num_x':'num'},axis=1)
Out[223]: 
   case filename  num
0     0   file_0    0
1     0   file_1    0
2     1   file_2    0
3     2   file_3    0
4     2   file_4  100
5     2   file_5    0
6     2   file_6   50
7     2   file_7    0

answered Dec 31, 2022 at 15:09

Surjit Samra

4,6721 gold badge29 silver badges36 bronze badges

3 Comments

Quang Hoang Over a year ago

I think you forgot to shift before merge.

Surjit Samra Over a year ago

is shift even needed :) ?

Quang Hoang Over a year ago

Yes, Then, we check their previous rows. Oh, wait, you merged on filename... So no, you doesn't really do anything, just duplicate the data and then drop the extra parts just created.

Quang Hoang · Accepted Answer · 2022-12-31 15:12:16Z

0

Do you mean:

df.join(df.groupby('case').shift(-1)
                .loc[df['num']==0]
                .dropna(how='all').add_suffix('_next'), 
        how='inner')

Output:

   case filename  num filename_next  num_next
0     0   file_0    0        file_1       0.0
3     2   file_3    0        file_4     100.0
5     2   file_5    0        file_6      50.0

edited Dec 31, 2022 at 15:12

answered Dec 31, 2022 at 14:53

Quang Hoang

151k11 gold badges64 silver badges86 bronze badges

2 Comments

zxdawn Over a year ago

Very close. Concatenate not Nan rows to num = 0 rows sounds like the result I posted above.

Reinderien Over a year ago

I doubt this is what's intended - my read of the problem is that filename_next is really just additional rows with no dedicated column

Collectives™ on Stack Overflow

Select rows by column value and include previous row by another column value

3 Answers 3

1 Comment

3 Comments

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

1 Comment

3 Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related