1

I have dataframe of datetime index. I have a three lists of dates prescribing their condition. I want to compare each date of dataframe with three lists and assigns a string to the row.

df = 
  index                   data
2019-02-04 14:52:00    73.923746
2019-02-05 10:48:00    73.335315
2019-02-05 11:28:00    72.021457
2019-02-06 10:49:00    72.367468
2019-02-07 10:16:00    73.434296
2019-02-14 10:54:00    73.094386
2019-02-27 12:08:00    70.930997
2019-02-28 12:41:00    70.444107
2019-02-28 13:21:00    70.426729
2019-03-29 11:29:00    70.758032
2019-04-29 11:29:00    70.758032
2019-12-14 14:30:00    73.515568
2019-12-23 10:54:00    72.812583

bad_dates = [dates_bwn_twodates('2019-03-22','2019-04-09'),'bad_day']
good_dates= [dates_bwn_twodates('2019-4-10','2019-4-29'),'good_day']

explist = [bad_dates,good_dates]

I want to compare each index in df with the above two lists and produce a new column indicating the condition of the day. My present code

df['test'] =  'normal_day'
for i in explist:
    for j in df.index:
        if bool(set(i[0])&set(j.strftime('%Y-%m-%d'))) == True:
            df['test'].loc[j] = i[1]

My present output is

  index                   data       test 
2019-02-04 14:52:00    73.923746     normal_day 
2019-02-05 10:48:00    73.335315     normal_day 
2019-02-05 11:28:00    72.021457     normal_day 
2019-02-06 10:49:00    72.367468     normal_day 
2019-02-07 10:16:00    73.434296     normal_day 
2019-02-14 10:54:00    73.094386     normal_day 
2019-02-27 12:08:00    70.930997     normal_day 
2019-02-28 12:41:00    70.444107     normal_day 
2019-02-28 13:21:00    70.426729     normal_day 
2019-03-29 11:29:00    70.758032     normal_day 
2019-04-29 11:29:00    70.758032     normal_day 
2019-12-14 14:30:00    73.515568     normal_day 
2019-12-23 10:54:00    72.812583     normal_day 

My code is not working properly.

2
  • What does my code is not working properly mean, exactly? Why would you use loops for this? Why the if ... == True:? Have you not read the pandas docs? Commented Jan 23, 2020 at 21:31
  • 1
    Does this answer your question? Pandas conditional creation of a series/dataframe column Commented Jan 23, 2020 at 21:32

1 Answer 1

2

Create the masks

bad = df['index'].between('2019-03-22', '2019-04-09')
good = df['index'].between('2019-04-10', '2019-04-29')

Then assign them

df['test'] =  'normal_day'
df.loc[bad, 'test'] = 'bad_day'
df.loc[good, 'test'] = 'good_day'
Sign up to request clarification or add additional context in comments.

5 Comments

Your solution is so simple and elegant. I got some error: AttributeError: 'DatetimeIndex' object has no attribute 'between'
I found this apporach mask = (df['date'] > start_date) & (df['date'] <= end_date) Thanks.
You could also convet to str to use between, df['index'].astype(str).between(...) or between_time
I am trying to use between_time. It looks good than masking. But I am getting errors for df.between_time(pd.to_datetime('2019-04-30'),pd.to_datetime('2019-05-09')) as ValueError: Cannot convert arg [Timestamp('2019-04-30 00:00:00')] to a time
if your df['index'].dtype is datetime64, between should work fine, df['index'].between('2019-02-05', '2019-04-28')

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.