2

I have created a dataframe with a range of dates from 8/1/18 to today() and am trying to assign a month_start_date to each date (eg: 2018/08/04 would be 2018/08/01).

I have been able to get the day of the date into month_start_date, but I'm really just trying to replace the day in the date column with 1 for all dates.

import pandas as pd
from datetime import datetime

datelist = pd.date_range(start='2018-08-01', end=datetime.today())

df_columns = ['date']
df = pd.DataFrame(datelist, columns = df_columns)

df['month_start_date'] = df['date'].dt.day
print(df)
          date  month_start_date
0   2018-08-01                 1
1   2018-08-02                 2
2   2018-08-03                 3
3   2018-08-04                 4
4   2018-08-05                 5
0

2 Answers 2

4
df['month_start_date'] = pd.to_datetime(df['date']).apply(lambda x: x.replace(day=1))
Sign up to request clarification or add additional context in comments.

Comments

2

You can do this more generally with the Pandas.tseries.offsets package. In this example you can calculate your dates using MonthBegin

import datetime
import pandas as pd

datelist = pd.date_range(start='2018-08-01',end=datetime.datetime.today())
month_start_list = (datelist + datetime.timedelta(1)) + pd.tseries.offsets.MonthBegin(n=-1)
df = pd.DataFrame({"date": datelist, "month_start": month_start_list})
print(df)
          date month_start
0   2018-08-01  2018-08-01
1   2018-08-02  2018-08-01
2   2018-08-03  2018-08-01
3   2018-08-04  2018-08-01
4   2018-08-05  2018-08-01
..         ...         ...
892 2021-01-09  2021-01-01
893 2021-01-10  2021-01-01
894 2021-01-11  2021-01-01
895 2021-01-12  2021-01-01
896 2021-01-13  2021-01-01

[897 rows x 2 columns]

4 Comments

This is interesting and can be helpful for sure, but a couple things maybe you can help me address: (1) I noticed that this data is almost producing the desired result, but actually adding a month (see that the dataframe starts at 2018-09-01, when it should read 2018-08-01. (2) I was looking to add a new column with this value so that each individual date is assigned a month_start_date
Here are few things that might help: 1. MonthBegin takes an input, n, which is the number of months to roll backwards or forwards. By default its taking a value of 1, but to roll back to the start of the current month you can set n=-1. You will also notice there is a edge case at the start of the month. Pandas moves every date, so it would roll back to the prior month. To avoid this you can add pd.Timedelta(days=1) first. 2. The output of the addition won't modify the original input, so you can just specify it as a separate column. I'll edit my code above with the changes.
Actually to keep your types consistent, instead of pd.Timedelta, you should use datetime.datetime.timedelta(1)
That was super helpful, thank you so much for the clarification!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.