Pandas assign second min value to column

Question

have stuck with pandas. I have df which contains every transaction (first column - index, sorted by time):

        email    date
43487   aaa     2017-10-11 08:28:39
42910   bbb     2017-09-24 07:49:52
45561   bbb     2017-12-03 11:03:56
47212   bbb     2018-01-02 12:25:52
89734   ccc     2018-02-02 12:25:52
89734   ccc     2018-03-02 12:20:52

I also have df2 which contains unique emails and min date (df1 with drop_duplicates, since it was sorted by time, i got min date by default):

        email    date
43487   aaa     2017-10-11 08:28:39
42910   bbb     2017-09-24 07:49:52
89734   ccc     2018-02-02 12:25:52

How to create column date2 in df2, containing the second min date for respective email in df1?

I tried for loop:

for email in df2['email']:
    df2.at[email, 'date2'] = df1.loc[df1['email'] == email]['date'].iloc[1] 
    if len(df1.loc[df1['email'] == email]['date']) > 1 else None

But it is very long (55k rows and 32 GB RAM - no result in 5 min).

Desired output is:

        email   date                date 2
43487   aaa     2017-10-11 08:28:39 None
42910   bbb     2017-09-24 07:49:52 2017-12-03 11:03:56
89734   ccc     2018-02-02 12:25:52 2018-03-02 12:20:52

Because there is only one aaa transaction, there was no second one. — Simon Osipov
– Simon Osipov, Commented Dec 27, 2018 at 9:02
So, basically, I want a table with 3 columns - email, date of 1 transaction, date of 2 transactions. — Simon Osipov
– Simon Osipov, Commented Dec 27, 2018 at 9:03

yatu · Accepted Answer · 2018-12-27 09:40:12Z

2

You could use sort_values to sort the dates within each email, which can be done using a list of columns to sort by.

Then you GroupBy email and use nth to select the second element of each group.

date2 = df.sort_values(['email','date']).groupby('email').nth(1)

             date
email                    
bbb   2017-12-03 11:03:56
ccc   2018-03-02 12:20:52

Finally left merge with df2 on email:

df2.merge(date2, on = 'email', how = 'left')

    email        date_x              date_y
0   aaa 2017-10-11 08:28:39                 NaT
1   bbb 2017-09-24 07:49:52 2017-12-03 11:03:56
2   ccc 2018-02-02 12:25:52 2018-03-02 12:20:52

edited Dec 27, 2018 at 9:40

answered Dec 27, 2018 at 9:03

yatu

88.6k12 gold badges93 silver badges148 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Pandas assign second min value to column

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related