Consider the below pandas DataFrame:
from pandas import Timestamp
df = pd.DataFrame({
'day': [Timestamp('2017-03-27'),
Timestamp('2017-03-27'),
Timestamp('2017-04-01'),
Timestamp('2017-04-03'),
Timestamp('2017-04-06'),
Timestamp('2017-04-07'),
Timestamp('2017-04-11'),
Timestamp('2017-05-01'),
Timestamp('2017-05-01')],
'act_id': ['916298883',
'916806776',
'923496071',
'926539428',
'930641527',
'931935227',
'937765185',
'966163233',
'966417205']
})
As you may see, there are 9 unique ids distributed in 7 days.
I am looking for a way to add two new columns.
- The first column:
An increment number for each new day. For example 1 for '2017-03-27'(same number for same day), 2 for '2017-04-01', 3 for '2017-04-03', etc.
- The second column:
An increment number for each new act_id per day. For example 1 for '916298883', 2 for '916806776' (which is linked to the same day '2017-03-27'), 1 for '923496071', 1 for '926539428', etc.
The final table should look like this
I have already tried to build the first column with apply and a function but it doesn't work as it should.
#Create helper function to give index number to a new column
counter = 1
def giveFlag(x):
global counter
index = counter
counter+=1
return index
And then:
# Create day flagger column
df_helper['day_no'] = df_helper['day'].apply(lambda x: giveFlag(x))