How to compute a variable (through an easy function) which is a row_counter of sorts.
For instance, suppose i have a data which is at Customer_ID level and has all transactions for each customer_id. I want a field created, which sorts transactions from earliest to latest data and assigns a counter variable
I am doing it slightly convoluted manner like below
cust_df['row_count'] = 1
cust_df = cust_df.sort_values(['cust_id', 'transaction_date'])
cust_trans_num = cust_df .groupby(['cust_id', 'transaction_number'])['row_count'].count().reset_index()
cust_trans_num = cust_trans_num .groupby(by=['cust_id', 'transaction_number']).sum().groupby(
level=[0]).cumsum().reset_index().rename(columns={'row_count': 'cust_trans_number'})
cust_df = pd.merge(cust_df , cust_trans_num , how='left', on=['cust_id', 'transaction_number'])
Input data is like below:
cust_id transaction_number transaction_date
1 ABC1 1/1/2018
1 DEF3 12/31/2017
1 XYZ2 4/1/2017
Output data should be like below:
cust_id transaction_number transaction_date cust_trans_number
1 XYZ2 4/1/2017 1
1 DEF3 12/31/2017 2
1 ABC1 1/1/2018 3
Let me know if there is a one-line solution to this?