0

My goal is to take one record that has a column for start date and end date. Then create records for each day between both dates.

df = pd.DataFrame({ 'START_DATE':'8/16/2021' 'END_DATE':'8/28/2021' 'DAYS_BETWEEN':13 'NAME':'LOCATION1' 'TOTAL_AMT':1000})

The transformed df would have an extra field DATE_VALUE and 13 records representing each day from start date to end date. Besides the DATE_VALUE field all other fields can remain the same in each record.

2

2 Answers 2

0

Suppose your df can have multiple rows and we need to create new records for each day between both dates in each row, we can create a date range record for each row by pd.date_range and then expand the date range records to multiple rows (each day in one row) by .explode(), as follows:

df['DATE_VALUE'] = df.apply(lambda x: pd.date_range(start=x['START_DATE'], end=x['END_DATE']), axis=1)

df = df.explode('DATE_VALUE').reset_index(drop=True)

Result:

print(df)

   START_DATE   END_DATE  DAYS_BETWEEN       NAME  TOTAL_AMT DATE_VALUE
0   8/16/2021  8/28/2021            13  LOCATION1       1000 2021-08-16
1   8/16/2021  8/28/2021            13  LOCATION1       1000 2021-08-17
2   8/16/2021  8/28/2021            13  LOCATION1       1000 2021-08-18
3   8/16/2021  8/28/2021            13  LOCATION1       1000 2021-08-19
4   8/16/2021  8/28/2021            13  LOCATION1       1000 2021-08-20
5   8/16/2021  8/28/2021            13  LOCATION1       1000 2021-08-21
6   8/16/2021  8/28/2021            13  LOCATION1       1000 2021-08-22
7   8/16/2021  8/28/2021            13  LOCATION1       1000 2021-08-23
8   8/16/2021  8/28/2021            13  LOCATION1       1000 2021-08-24
9   8/16/2021  8/28/2021            13  LOCATION1       1000 2021-08-25
10  8/16/2021  8/28/2021            13  LOCATION1       1000 2021-08-26
11  8/16/2021  8/28/2021            13  LOCATION1       1000 2021-08-27
12  8/16/2021  8/28/2021            13  LOCATION1       1000 2021-08-28
Sign up to request clarification or add additional context in comments.

8 Comments

I'm getting an error. AttributeError: 'DataFrame' object has no attribute 'explode' I tried to upgrade pandas. Then I tried downgrading but it's the same error every time.
@Tom What's your version before upgrade ? Any why downgrade again ?
I didn't check before I upgraded but the newer version is 1.1.5. I'm currently using 1.1.5
@Tom So with version 1.1.5 you should be able to use .explode, right ?
Yes I'm going to update it. I'll let you know later if it works. Thanks for the help!
|
0

Assuming df has only one row like in your example, try:

#create a new dataframe with the same row repeated 13 times
df2 = pd.concat([df]*df["DAYS_BETWEEN"].iat[0], ignore_index=True)

#create new column using pd.date_range
df2["DATE_VALUE"] = pd.date_range(df["START_DATE"].iat[0], df["END_DATE"].iat[0], periods=df["DAYS_BETWEEN"].iat[0])

>>> df2
   START_DATE   END_DATE  DAYS_BETWEEN       NAME  TOTAL_AMT DATE_VALUE
0   8/16/2021  8/28/2021            13  LOCATION1       1000 2021-08-16
1   8/16/2021  8/28/2021            13  LOCATION1       1000 2021-08-17
2   8/16/2021  8/28/2021            13  LOCATION1       1000 2021-08-18
3   8/16/2021  8/28/2021            13  LOCATION1       1000 2021-08-19
4   8/16/2021  8/28/2021            13  LOCATION1       1000 2021-08-20
5   8/16/2021  8/28/2021            13  LOCATION1       1000 2021-08-21
6   8/16/2021  8/28/2021            13  LOCATION1       1000 2021-08-22
7   8/16/2021  8/28/2021            13  LOCATION1       1000 2021-08-23
8   8/16/2021  8/28/2021            13  LOCATION1       1000 2021-08-24
9   8/16/2021  8/28/2021            13  LOCATION1       1000 2021-08-25
10  8/16/2021  8/28/2021            13  LOCATION1       1000 2021-08-26
11  8/16/2021  8/28/2021            13  LOCATION1       1000 2021-08-27
12  8/16/2021  8/28/2021            13  LOCATION1       1000 2021-08-28

2 Comments

I like this solution but yes the df has more than one record. Right now I'm seeing 4 records that I want to turn into 13 records. Each with different date_value and the sum of the Total_Amt on all records (1000) START_DATE END_DATE DAYS_BETWEEN NAME TOTAL_AMT 8/16/2021 8/28/2021 13 LOCATION1 200 8/16/2021 8/28/2021 13 LOCATION1 200 8/16/2021 8/28/2021 13 LOCATION1 200 8/16/2021 8/28/2021 13 LOCATION1 400 I'm getting a ValueError: Length of values does not match length of index error on the create new column line.
If you update your question, I could see how to do it over multiple records.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.