have stuck with pandas.
I have df which contains every transaction (first column - index, sorted by time):
email date
43487 aaa 2017-10-11 08:28:39
42910 bbb 2017-09-24 07:49:52
45561 bbb 2017-12-03 11:03:56
47212 bbb 2018-01-02 12:25:52
89734 ccc 2018-02-02 12:25:52
89734 ccc 2018-03-02 12:20:52
I also have df2 which contains unique emails and min date (df1 with drop_duplicates, since it was sorted by time, i got min date by default):
email date
43487 aaa 2017-10-11 08:28:39
42910 bbb 2017-09-24 07:49:52
89734 ccc 2018-02-02 12:25:52
How to create column date2 in df2, containing the second min date for respective email in df1?
I tried for loop:
for email in df2['email']:
df2.at[email, 'date2'] = df1.loc[df1['email'] == email]['date'].iloc[1]
if len(df1.loc[df1['email'] == email]['date']) > 1 else None
But it is very long (55k rows and 32 GB RAM - no result in 5 min).
Desired output is:
email date date 2
43487 aaa 2017-10-11 08:28:39 None
42910 bbb 2017-09-24 07:49:52 2017-12-03 11:03:56
89734 ccc 2018-02-02 12:25:52 2018-03-02 12:20:52
date2column isNone?aaatransaction, there was no second one.email,date of 1 transaction,date of 2 transactions.