3

I’ve a pd df consists three columns: ID, t, and ind1.

import pandas as pd
dat = {'ID': [1,1,1,1,2,2,2,3,3,3,3,4,4,4,5,5,6,6,6],
        't': [0,1,2,3,0,1,2,0,1,2,3,0,1,2,0,1,0,1,2],
        'ind1' : [1,1,1,1,0,0,0,0,0,0,0,1,1,1,1,1,0,0,0]
        }

df = pd.DataFrame(dat, columns = ['ID', 't', 'ind1'])

print (df)

What I need to do is to create a new column (res) that

  • for all ID with ind1==0, then res is zero.
  • for all ID with ind1==1 and if t==max(t) (group by ID), then res = 1, otherwise zero.

Here’s anticipated output

enter image description here

1
  • This is a confused ~ all ==1 means one group should all equal to 1 ? Commented Aug 21, 2020 at 13:28

3 Answers 3

4

Check with groupby with idxmax , then where with transform all

df['res']=df.groupby('ID').t.transform('idxmax').where(df.groupby('ID').ind1.transform('all')).eq(df.index).astype(int)
df
Out[160]: 
    ID  t  ind1  res
0    1  0     1    0
1    1  1     1    0
2    1  2     1    0
3    1  3     1    1
4    2  0     0    0
5    2  1     0    0
6    2  2     0    0
7    3  0     0    0
8    3  1     0    0
9    3  2     0    0
10   3  3     0    0
11   4  0     1    0
12   4  1     1    0
13   4  2     1    1
14   5  0     1    0
15   5  1     1    1
16   6  0     0    0
17   6  1     0    0
18   6  2     0    0
Sign up to request clarification or add additional context in comments.

Comments

2

This works on the knowledge that the ID column is sorted :

cond1 = df.ind1.eq(0)
cond2 = df.ind1.eq(1) & (df.t.eq(df.groupby("ID").t.transform("max")))

df["res"] = np.select([cond1, cond2], [0, 1], 0)

df


   ID   t ind1 res
0   1   0   1   0
1   1   1   1   0
2   1   2   1   0
3   1   3   1   1
4   2   0   0   0
5   2   1   0   0
6   2   2   0   0
7   3   0   0   0
8   3   1   0   0
9   3   2   0   0
10  3   3   0   0
11  4   0   1   0
12  4   1   1   0
13  4   2   1   1
14  5   0   1   0
15  5   1   1   1
16  6   0   0   0
17  6   1   0   0
18  6   2   0   0

5 Comments

Thanks! Your solution was the fastest!
have you test it , whether it take the consideration of the all ind1 ==1 ?
@BEN_YO, if you could point out the faulty row or rows, that would be helpful to me
@sammywemmy I think different people have different understanding of op's question , so it can be consider a bad question ~, no worry , your answer should be right ~
@BEN_YO, yes i did test it, my data has over 14M rows. Thanks for you solution-- I did upvote it.
1

Use groupby.apply:

df['res'] = (df.groupby('ID').apply(lambda x: x['ind1'].eq(1)&x['t'].eq(x['t'].max()))
               .astype(int).reset_index(drop=True))

print(df)
    ID  t  ind1  res
0    1  0     1    0
1    1  1     1    0
2    1  2     1    0
3    1  3     1    1
4    2  0     0    0
5    2  1     0    0
6    2  2     0    0
7    3  0     0    0
8    3  1     0    0
9    3  2     0    0
10   3  3     0    0
11   4  0     1    0
12   4  1     1    0
13   4  2     1    1
14   5  0     1    0
15   5  1     1    1
16   6  0     0    0
17   6  1     0    0
18   6  2     0    0

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.