1

New to Python and coding in general here so this should be pretty basic for most of you.

I basically created this dataframe with a Datetime index.

Here's the dataframe

df = pd.date_range(start='2018-01-01', end='2019-12-31', freq='D')

I would now like to add a new variable to my df called "vacation" with a value of 1 if the date is between 2018-06-24 and 2018-08-24 and value of 0 if it's not between those dates. How can I go about doing this? I've created a variable with a range of vacation but I'm not sure how to put these two together along with creating a new column for "vacation" in my dataframe.

vacation = pd.date_range(start = '2018-06-24', end='2018-08-24')

Thanks in advance.

2 Answers 2

2

First, pd.date_range(start='2018-01-01', end='2019-12-31', freq='D') will not create a DataFrame instead it will create a DatetimeIndex. You can then convert it into a DataFrame by having it as an index or a separate column.

# Having it as an index

datetime_index = pd.date_range(start='2018-01-01', end='2019-12-31', freq='D')
df = pd.DataFrame({}, index=datetime_index)
# Using numpy.where() to create the Vacation column
df['Vacation'] = np.where((df.index >= '2018-06-24') & (df.index <= '2018-08-24'), 1, 0)

enter image description here

Or

# Having it as a column

datetime_index = pd.date_range(start='2018-01-01', end='2019-12-31', freq='D')
df = pd.DataFrame({'Date': datetime_index})
# Using numpy.where() to create the Vacation column
df['Vacation'] = np.where((df['Date'] >= '2018-06-24') & (df['Date'] <= '2018-08-24'), 1, 0)

enter image description here

Note: Displaying only the first five rows of the dataframe df.

Sign up to request clarification or add additional context in comments.

2 Comments

Thank you. I like your solution. Now just to get it to work in one additional way. Suppose I have variables vacation2018 and vacation2019. A set of dates in both years. I've created variables 'vacation2018 = pd.date_range(start = '2018-06-27', end = '2018-09-01') vacation2019 = pd.date_range(start = '2019-06-27', end='2019-08-31')' What can I do to make my vacation variable 1 if datetime index falls in either of those dates? I've tried: 'df['vacation'] = np.where((df.index=vacation2018))' but that's giving me syntax error.
You need not create separate date indexes such as 'vacation2018' and 'vacation2019', instead, you can just modify the condition as follows. vacation_2018 = (df.index >= '2018-06-27') & (df.index <= '2018-09-01') vacation_2019 = (df.index >= '2019-06-27') & (df.index <= '2019-08-31') df['Vacation'] = np.where(vacation_2018 | vacation_2019, 1, 0) If you find the answer satisfying please upvote it.
2

Solution for new DataFrame:

i = pd.date_range(start='2018-01-01', end='2018-08-26', freq='D')

m = (i > '2018-06-24') & (i < '2018-08-24') 
df = pd.DataFrame({'vacation': m.astype(int)}, index=i)

Or:

df = pd.DataFrame({'vacation':np.where(m, 1, 0)}, index=i)

print (df)
            vacation
2018-01-01         0
2018-01-02         0
2018-01-03         0
2018-01-04         0
2018-01-05         0
             ...
2018-08-22         1
2018-08-23         1
2018-08-24         0
2018-08-25         0
2018-08-26         0

[238 rows x 1 columns]

Solution for add new column to existing DataFrame:

Create mask by compare DatetimeIndex with chaining by & for bitwise AND and convert it to integer (True to 1 and False to 0) or use numpy.where:

i = pd.date_range(start='2018-01-01', end='2018-08-26', freq='D')
df = pd.DataFrame({'a': 1}, index=i)

m = (df.index > '2018-06-24') & (df.index < '2018-08-24') 

df['vacation'] = m.astype(int)
#alternative
#df['vacation'] = np.where(m, 1, 0)
print (df)
            a  vacation
2018-01-01  1         0
2018-01-02  1         0
2018-01-03  1         0
2018-01-04  1         0
2018-01-05  1         0
       ..       ...
2018-08-22  1         1
2018-08-23  1         1
2018-08-24  1         0
2018-08-25  1         0
2018-08-26  1         0

[238 rows x 2 columns]

Another solution with DatetimeIndex and DataFrame.loc - difference is 1 included 2018-06-24 and 2018-08-24 edge values:

df['vacation'] = 0
df.loc['2018-06-24':'2018-08-24'] = 1
print (df)
           a  vacation
2018-01-01  1         0
2018-01-02  1         0
2018-01-03  1         0
2018-01-04  1         0
2018-01-05  1         0
       ..       ...
2018-08-22  1         1
2018-08-23  1         1
2018-08-24  1         1
2018-08-25  1         0
2018-08-26  1         0

[238 rows x 2 columns]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.