Python Compare a Dataframe with a list of dates and assign a string based on results

Question

I have dataframe of datetime index. I have a three lists of dates prescribing their condition. I want to compare each date of dataframe with three lists and assigns a string to the row.

df = 
  index                   data
2019-02-04 14:52:00    73.923746
2019-02-05 10:48:00    73.335315
2019-02-05 11:28:00    72.021457
2019-02-06 10:49:00    72.367468
2019-02-07 10:16:00    73.434296
2019-02-14 10:54:00    73.094386
2019-02-27 12:08:00    70.930997
2019-02-28 12:41:00    70.444107
2019-02-28 13:21:00    70.426729
2019-03-29 11:29:00    70.758032
2019-04-29 11:29:00    70.758032
2019-12-14 14:30:00    73.515568
2019-12-23 10:54:00    72.812583

bad_dates = [dates_bwn_twodates('2019-03-22','2019-04-09'),'bad_day']
good_dates= [dates_bwn_twodates('2019-4-10','2019-4-29'),'good_day']

explist = [bad_dates,good_dates]

I want to compare each index in df with the above two lists and produce a new column indicating the condition of the day. My present code

df['test'] =  'normal_day'
for i in explist:
    for j in df.index:
        if bool(set(i[0])&set(j.strftime('%Y-%m-%d'))) == True:
            df['test'].loc[j] = i[1]

My present output is

  index                   data       test 
2019-02-04 14:52:00    73.923746     normal_day 
2019-02-05 10:48:00    73.335315     normal_day 
2019-02-05 11:28:00    72.021457     normal_day 
2019-02-06 10:49:00    72.367468     normal_day 
2019-02-07 10:16:00    73.434296     normal_day 
2019-02-14 10:54:00    73.094386     normal_day 
2019-02-27 12:08:00    70.930997     normal_day 
2019-02-28 12:41:00    70.444107     normal_day 
2019-02-28 13:21:00    70.426729     normal_day 
2019-03-29 11:29:00    70.758032     normal_day 
2019-04-29 11:29:00    70.758032     normal_day 
2019-12-14 14:30:00    73.515568     normal_day 
2019-12-23 10:54:00    72.812583     normal_day

My code is not working properly.

What does my code is not working properly mean, exactly? Why would you use loops for this? Why the if ... == True:? Have you not read the pandas docs? — AMC
– AMC, Commented Jan 23, 2020 at 21:31
Does this answer your question? Pandas conditional creation of a series/dataframe column — AMC
– AMC, Commented Jan 23, 2020 at 21:32

Kenan · Accepted Answer · 2020-01-23 20:55:34Z

2

Create the masks

bad = df['index'].between('2019-03-22', '2019-04-09')
good = df['index'].between('2019-04-10', '2019-04-29')

Then assign them

df['test'] =  'normal_day'
df.loc[bad, 'test'] = 'bad_day'
df.loc[good, 'test'] = 'good_day'

answered Jan 23, 2020 at 20:55

Kenan

14.2k9 gold badges47 silver badges56 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

Mainland Over a year ago

Your solution is so simple and elegant. I got some error: AttributeError: 'DatetimeIndex' object has no attribute 'between'

Mainland Over a year ago

I found this apporach mask = (df['date'] > start_date) & (df['date'] <= end_date) Thanks.

Kenan Over a year ago

You could also convet to str to use between, df['index'].astype(str).between(...) or between_time

Mainland Over a year ago

I am trying to use between_time. It looks good than masking. But I am getting errors for df.between_time(pd.to_datetime('2019-04-30'),pd.to_datetime('2019-05-09')) as ValueError: Cannot convert arg [Timestamp('2019-04-30 00:00:00')] to a time

Kenan Over a year ago

if your df['index'].dtype is datetime64, between should work fine, df['index'].between('2019-02-05', '2019-04-28')

Collectives™ on Stack Overflow

Python Compare a Dataframe with a list of dates and assign a string based on results

1 Answer 1

5 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

5 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related