0

I have a pandas dataframe df and an array of datetimes holidays

df.head()

date    hour    count   Relative Humidity   Temperature Precipitation   dow
0   2019-07-01  0   672 57.64   71.8    0.0 Monday
1   2019-07-01  1   359 61.61   70.8    0.0 Monday
2   2019-07-01  2   197 61.63   69.8    0.0 Monday
3   2019-07-01  3   115 63.32   69.0    0.0 Monday
4   2019-07-01  4   168 67.91   67.9    0.0 Monday

df.dtypes

date                  object
hour                   int64
count                  int64
Relative Humidity    float64
Temperature          float64
Precipitation        float64
dow                   object
dtype: object

holidays

[datetime.date(2019, 9, 2), datetime.date(2019, 7, 4)]

My goal is to create a new column that indicates whether or not the date is a workday but the following if else statement throws an error:

df['is_workday'] = df.apply(lambda row: False if (row['dow'] in ('Saturday', 'Sunday') | pd.to_datetime(row['date'],  format='%Y-%m-%d') in holidays) else True)

KeyError: 'dow'

What could be causing this issue?

1 Answer 1

2

By default, df.apply(...) applies on columns. To apply your lambda on each row, specify:

df.apply(..., axis=1)

Aside from that, this looks very inefficient and can be made much faster without any lambda. A more efficient method is to vectorize your logic:

cond_wkend = df['dow'].isin({'Saturday', 'Sunday'})
cond_holdy = pd.to_datetime(df['date']).isin(holidays)

df['is_workday'] = ~(cond_wkend | cond_holdy)
Sign up to request clarification or add additional context in comments.

5 Comments

Thanks. I tried including axis=0 but still get the same error. What could be done to make this statement more efficient?
sorry, my bad, I corrected: I meant axis=1.
axis = 1 throws a slightly different error: TypeError: unsupported operand type(s) for |: 'tuple' and 'Timestamp' - however using or works. @Pierre What could be done about efficiency here?
right, use or instead of |. The former is a regular boolean or, the latter is bitwise operation e.g. on Series. I was just commenting on your initial error.
see my updated answer re. efficiency; for large DataFrames, you'll be amazed how much faster vectorized operations are. And BTW, in that case, | is the correct operator (it's or for Series).

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.