1

Here's an example of DataFrame:

import numpy as np
import pandas as pd

df = pd.DataFrame([
    [0, "file_0", 5],
    [0, "file_1", 0],
    [1, "file_2", 0],
    [1, "file_3", 8],
    [2, "file_4", 0],
    [2, "file_5", 5],
    [2, "file_6", 100],
    [2, "file_7", 0],
    [2, "file_8", 50]
], columns=["case", "filename", "num"])

I wanna select num==0 rows and their previous rows with the same case value, no matter the num value of the previous row.

Finally, we should get

case    filename    num
0   file_0  5
0   file_1  0
1   file_2  0
2   file_4  0
2   file_6  100
2   file_7  0

I have got that I can select the previous row by

df[(df['num']==0).shift(-1).fillna(False)]

However, this doesn't consider the case value. One solution that came to my mind is group by case first and then filter data. I have no idea how to code it ...

5
  • Within one case, if two consecutive num are zero, should the first be selected twice? Commented Dec 31, 2022 at 14:08
  • @Reinderien No, just once. Commented Dec 31, 2022 at 14:10
  • Will num always alternate between 0 and 100 in that pattern within a case? Commented Dec 31, 2022 at 14:10
  • @Reinderien Sorry for the simple example. Actually, num can be zero and any other positive numbers. Commented Dec 31, 2022 at 14:13
  • Sure; but specifically - within a case there's no guarantee that every other element is a zero, right? You should update your example to demonstrate this. You also need to show code for the approach you've tried so far. Commented Dec 31, 2022 at 14:15

3 Answers 3

1

I figure out the answer by myself:

# create boolean masks which are true when `num` is 0 and previous `case` is the same
mask = (df.case.eq(df.case.shift())) & (df['num']==0)

# concat previous rows and num==0 rows
df_res = pd.concat([df[mask.shift(-1).fillna(False)], df[df['num']==0]]).sort_values(['case', 'filename'])
Sign up to request clarification or add additional context in comments.

1 Comment

so it needed masking/ filtering , shifting , concatenating and sorting :), well done, you had more data to analyse problem better by yourself
0

How about merging df ?

    df = pd.DataFrame([
    [0, "file_0", 0],
    [0, "file_1", 0],
    [1, "file_2", 0],
    [2, "file_3", 0],
    [2, "file_4", 100],
    [2, "file_5", 0],
    [2, "file_6", 50],
    [2, "file_7", 0]
], columns=["case", "filename", "num"])
df = df.merge(df, left_on='filename', right_on='filename', how='inner')
df[(df['case_x'] == df['case_y']) & df['num_x'] == 0]
Out[219]: 
   case_x filename  num_x  case_y  num_y
0       0   file_0      0       0      0
1       0   file_1      0       0      0
2       1   file_2      0       1      0
3       2   file_3      0       2      0
4       2   file_4    100       2    100
5       2   file_5      0       2      0
6       2   file_6     50       2     50
7       2   file_7      0       2      0

then you can rename columns back

df[['case_x', 'filename',  'num_x']].rename({'case_x':'case','num_x':'num'},axis=1)
Out[223]: 
   case filename  num
0     0   file_0    0
1     0   file_1    0
2     1   file_2    0
3     2   file_3    0
4     2   file_4  100
5     2   file_5    0
6     2   file_6   50
7     2   file_7    0

3 Comments

I think you forgot to shift before merge.
is shift even needed :) ?
Yes, Then, we check their previous rows. Oh, wait, you merged on filename... So no, you doesn't really do anything, just duplicate the data and then drop the extra parts just created.
0

Do you mean:

df.join(df.groupby('case').shift(-1)
                .loc[df['num']==0]
                .dropna(how='all').add_suffix('_next'), 
        how='inner')

Output:

   case filename  num filename_next  num_next
0     0   file_0    0        file_1       0.0
3     2   file_3    0        file_4     100.0
5     2   file_5    0        file_6      50.0

2 Comments

Very close. Concatenate not Nan rows to num = 0 rows sounds like the result I posted above.
I doubt this is what's intended - my read of the problem is that filename_next is really just additional rows with no dedicated column

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.