Add column values to a dataframe based on date range of another dataframe

Question

I have 2 dataframes:

df1 with sales data:

key | date       | sales
1   | 2020-10-16 | 100
1   | 2020-10-17 | 150
1   | 2020-10-19 | 180

2   | 2019-11-01 | 26
2   | 2019-11-02 | 27
2   | 2019-11-05 | 28

df2 with advertisement campaign data:

key | sale_start | sale_end   | stock
1   | 2020-10-16 | 2020-10-18 | 1000
1   | 2020-10-17 | 2020-10-20 | 1500
1   | 2020-10-20 | 2020-10-31 | 1800

2   | 2019-11-01 | 2019-11-03 | 260
2   | 2019-11-03 | 2019-11-05 | 270
2   | 2019-11-05 | 2019-11-15 | 280

I need to get the "stock" number of "df2" into "df1" so that every sales day of "df1" has the stock number as a column.
then I need the % of saled products to stock for that day.

There are some overlapping campaigns, so the "stock" needs to be summed to for those overlapping days.

End result should be:

key | date       | sales | stock              | sales_stock_%
1   | 2020-10-16 | 100   | 1000               | 10
1   | 2020-10-17 | 150   | 2500 (1000 + 1500) | 6
1   | 2020-10-19 | 180   | 1500               | 12

2   | 2019-11-01 | 26    | 260                | 10
2   | 2019-11-02 | 27    | 260                | 10.38461538461538
2   | 2019-11-05 | 28    | 550 (270 + 280)    | 5.090909090909091

Last column is easy but how can I add the stock to df1?

Are sale_start and sale_end datetimes? Also are the dates inclusive, so does the first row of df1 cover all three days? How do the dates overlap? — Paul Brennan
– Paul Brennan, Commented Nov 19, 2020 at 16:40
sale_start and sale_end are both dtype: datetime64[ns]. The first row of df1 covers only the single day in column "date" (here "2020-10-16"), same for all other rows in df1. Overlap happens in df2, e.g. row 1 says campaign days for this campaign are "2020-10-16", "2020-10-17", "2020-10-18" but row 2 says this campaign goes from "2020-10-17", "2020-10-18", "2020-10-19", "2020-10-20". So both campaigns overlap on day "2020-10-17" + "2020-10-18". — Vega
– Vega, Commented Nov 19, 2020 at 16:45

Bricam · Accepted Answer · 2020-11-19 19:17:04Z

1

assuming you still can't find the answer and correct what @Paul Brennan commented:

for index, row in df1.iterrows():
    df1.at[index, "stock"]= df2[(df2["sale_start"] <= row["date"]) & (df2["sale_end"] >= row["date"])].sum()["stock"]

answered Nov 19, 2020 at 19:17

Bricam

715 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Paul Brennan · Accepted Answer · 2020-11-19 17:08:32Z

0

for index, row in df1.iterrows():
    df1.at[row.Index, 'stock'] = df2[(df2.sale_start <= row['date']) & (d2.sale_end >= row['date']).sum(axis=1).stock

sorry for the mess this is not that pythonic. Here is the plan, for each sale, get the stock for that sale. The stock is the sum of all the stocks that are available on the sale date, summed.

answered Nov 19, 2020 at 17:08

Paul Brennan

2,7364 gold badges23 silver badges27 bronze badges

2 Comments

Vega Over a year ago

Thanks for your answer, but I get a Series' object has no attribute 'stock?

Vega Over a year ago

Then I get 'Series' object has no attribute 'Index'? Both df1 and df2 have the default index. And there is a ] missing before .sum(.

Collectives™ on Stack Overflow

Add column values to a dataframe based on date range of another dataframe

2 Answers 2

Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related