1

I have 2 dataframes:

df1 with sales data:

key | date       | sales
1   | 2020-10-16 | 100
1   | 2020-10-17 | 150
1   | 2020-10-19 | 180

2   | 2019-11-01 | 26
2   | 2019-11-02 | 27
2   | 2019-11-05 | 28

df2 with advertisement campaign data:

key | sale_start | sale_end   | stock
1   | 2020-10-16 | 2020-10-18 | 1000
1   | 2020-10-17 | 2020-10-20 | 1500
1   | 2020-10-20 | 2020-10-31 | 1800

2   | 2019-11-01 | 2019-11-03 | 260
2   | 2019-11-03 | 2019-11-05 | 270
2   | 2019-11-05 | 2019-11-15 | 280
  • I need to get the "stock" number of "df2" into "df1" so that every sales day of "df1" has the stock number as a column.
  • then I need the % of saled products to stock for that day.

There are some overlapping campaigns, so the "stock" needs to be summed to for those overlapping days.

End result should be:

key | date       | sales | stock              | sales_stock_%
1   | 2020-10-16 | 100   | 1000               | 10
1   | 2020-10-17 | 150   | 2500 (1000 + 1500) | 6
1   | 2020-10-19 | 180   | 1500               | 12

2   | 2019-11-01 | 26    | 260                | 10
2   | 2019-11-02 | 27    | 260                | 10.38461538461538
2   | 2019-11-05 | 28    | 550 (270 + 280)    | 5.090909090909091

Last column is easy but how can I add the stock to df1?

2
  • Are sale_start and sale_end datetimes? Also are the dates inclusive, so does the first row of df1 cover all three days? How do the dates overlap? Commented Nov 19, 2020 at 16:40
  • 1
    sale_start and sale_end are both dtype: datetime64[ns]. The first row of df1 covers only the single day in column "date" (here "2020-10-16"), same for all other rows in df1. Overlap happens in df2, e.g. row 1 says campaign days for this campaign are "2020-10-16", "2020-10-17", "2020-10-18" but row 2 says this campaign goes from "2020-10-17", "2020-10-18", "2020-10-19", "2020-10-20". So both campaigns overlap on day "2020-10-17" + "2020-10-18". Commented Nov 19, 2020 at 16:45

2 Answers 2

1

assuming you still can't find the answer and correct what @Paul Brennan commented:

for index, row in df1.iterrows():
    df1.at[index, "stock"]= df2[(df2["sale_start"] <= row["date"]) & (df2["sale_end"] >= row["date"])].sum()["stock"]
Sign up to request clarification or add additional context in comments.

Comments

0
for index, row in df1.iterrows():
    df1.at[row.Index, 'stock'] = df2[(df2.sale_start <= row['date']) & (d2.sale_end >= row['date']).sum(axis=1).stock

sorry for the mess this is not that pythonic. Here is the plan, for each sale, get the stock for that sale. The stock is the sum of all the stocks that are available on the sale date, summed.

2 Comments

Thanks for your answer, but I get a Series' object has no attribute 'stock?
Then I get 'Series' object has no attribute 'Index'? Both df1 and df2 have the default index. And there is a ] missing before .sum(.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.