creating new variable and applying conditional value based on a date range with pandas dataframe

Question

New to Python and coding in general here so this should be pretty basic for most of you.

I basically created this dataframe with a Datetime index.

Here's the dataframe

df = pd.date_range(start='2018-01-01', end='2019-12-31', freq='D')

I would now like to add a new variable to my df called "vacation" with a value of 1 if the date is between 2018-06-24 and 2018-08-24 and value of 0 if it's not between those dates. How can I go about doing this? I've created a variable with a range of vacation but I'm not sure how to put these two together along with creating a new column for "vacation" in my dataframe.

vacation = pd.date_range(start = '2018-06-24', end='2018-08-24')

Thanks in advance.

Gautham Pughazhendhi · Accepted Answer · 2019-11-03 10:06:15Z

2

First, pd.date_range(start='2018-01-01', end='2019-12-31', freq='D') will not create a DataFrame instead it will create a DatetimeIndex. You can then convert it into a DataFrame by having it as an index or a separate column.

# Having it as an index

datetime_index = pd.date_range(start='2018-01-01', end='2019-12-31', freq='D')
df = pd.DataFrame({}, index=datetime_index)
# Using numpy.where() to create the Vacation column
df['Vacation'] = np.where((df.index >= '2018-06-24') & (df.index <= '2018-08-24'), 1, 0)

Or

# Having it as a column

datetime_index = pd.date_range(start='2018-01-01', end='2019-12-31', freq='D')
df = pd.DataFrame({'Date': datetime_index})
# Using numpy.where() to create the Vacation column
df['Vacation'] = np.where((df['Date'] >= '2018-06-24') & (df['Date'] <= '2018-08-24'), 1, 0)

Note: Displaying only the first five rows of the dataframe df.

answered Nov 3, 2019 at 10:06

Gautham Pughazhendhi

3521 gold badge4 silver badges18 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

fafz Over a year ago

Thank you. I like your solution. Now just to get it to work in one additional way. Suppose I have variables vacation2018 and vacation2019. A set of dates in both years. I've created variables 'vacation2018 = pd.date_range(start = '2018-06-27', end = '2018-09-01') vacation2019 = pd.date_range(start = '2019-06-27', end='2019-08-31')' What can I do to make my vacation variable 1 if datetime index falls in either of those dates? I've tried: 'df['vacation'] = np.where((df.index=vacation2018))' but that's giving me syntax error.

Gautham Pughazhendhi Over a year ago

You need not create separate date indexes such as 'vacation2018' and 'vacation2019', instead, you can just modify the condition as follows. vacation_2018 = (df.index >= '2018-06-27') & (df.index <= '2018-09-01') vacation_2019 = (df.index >= '2019-06-27') & (df.index <= '2019-08-31') df['Vacation'] = np.where(vacation_2018 | vacation_2019, 1, 0) If you find the answer satisfying please upvote it.

jezrael · Accepted Answer · 2019-11-03 10:09:51Z

Solution for new DataFrame:

i = pd.date_range(start='2018-01-01', end='2018-08-26', freq='D')

m = (i > '2018-06-24') & (i < '2018-08-24') 
df = pd.DataFrame({'vacation': m.astype(int)}, index=i)

Or:

df = pd.DataFrame({'vacation':np.where(m, 1, 0)}, index=i)

print (df)
            vacation
2018-01-01         0
2018-01-02         0
2018-01-03         0
2018-01-04         0
2018-01-05         0
             ...
2018-08-22         1
2018-08-23         1
2018-08-24         0
2018-08-25         0
2018-08-26         0

[238 rows x 1 columns]

Solution for add new column to existing DataFrame:

Create mask by compare DatetimeIndex with chaining by & for bitwise AND and convert it to integer (True to 1 and False to 0) or use numpy.where:

i = pd.date_range(start='2018-01-01', end='2018-08-26', freq='D')
df = pd.DataFrame({'a': 1}, index=i)

m = (df.index > '2018-06-24') & (df.index < '2018-08-24') 

df['vacation'] = m.astype(int)
#alternative
#df['vacation'] = np.where(m, 1, 0)
print (df)
            a  vacation
2018-01-01  1         0
2018-01-02  1         0
2018-01-03  1         0
2018-01-04  1         0
2018-01-05  1         0
       ..       ...
2018-08-22  1         1
2018-08-23  1         1
2018-08-24  1         0
2018-08-25  1         0
2018-08-26  1         0

[238 rows x 2 columns]

Another solution with DatetimeIndex and DataFrame.loc - difference is 1 included 2018-06-24 and 2018-08-24 edge values:

df['vacation'] = 0
df.loc['2018-06-24':'2018-08-24'] = 1
print (df)
           a  vacation
2018-01-01  1         0
2018-01-02  1         0
2018-01-03  1         0
2018-01-04  1         0
2018-01-05  1         0
       ..       ...
2018-08-22  1         1
2018-08-23  1         1
2018-08-24  1         1
2018-08-25  1         0
2018-08-26  1         0

[238 rows x 2 columns]

Collectives™ on Stack Overflow

creating new variable and applying conditional value based on a date range with pandas dataframe

2 Answers 2

2 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related