3

I've following two data frame df_sales and df_supply.

I want to merge the sale to supply in such a manner that my df_sales table have DATE_SUPPLY and QNT_SUPPLY from df_supply on below conditions

*Condition: DATE_SUPPLY should be recent DATE_SUPPLY of corresponding "ITEM" for corresponding "STORE", i.e, DATE_SALE <- max(df_supply[df_supply$DATE_SUPPLY <= df_sales$DATE_SALE & df_supply$STORE == df_sales$STORE & df_supply$ITEM == df_sales$ITEM,]$DATE_SUPPLY)*

It can be possible using row apply function or simply by writing loop. But I have huge dataset so don't want looping.

df_sales <- data.frame("STORE"=c(1001,1001,1001,1001,1001,1002,1002,1002,1002,1002),"ITEM"=c(13048, 13057, 13082, 13048, 13057, 13145, 13166, 13229, 13057, 13048),"DATE_SALE"=as.Date(c("1/1/2014","1/1/2014","1/2/2014","1/2/2014","1/2/2014","1/2/2014","1/3/2014","1/3/2014","1/3/2014","1/4/2014"),"%m/%d/%Y"),"QNT_SALE"=c(1,1,1,1,1,1,1,1,1,1))

df_sales

   STORE  ITEM  DATE_SALE QNT_SALE
1   1001 13048 2014-01-01        1
2   1001 13057 2014-01-01        1
3   1001 13082 2014-01-02        1
4   1001 13048 2014-01-02        1
5   1001 13057 2014-01-02        1
6   1002 13145 2014-01-02        1
7   1002 13166 2014-01-03        1
8   1002 13229 2014-01-03        1
9   1002 13057 2014-01-03        1
10  1002 13048 2014-01-04        1

df_supply <- data.frame("STORE"=c(1001,1002,1001,1001,1002,1002,1002,1002,1002,1002),"ITEM"=c(13048,13229,13057,13082,13145,13166,13229,13057,13048,13048),"DATE_SUPPLY"=as.Date(c("1/31/2013","1/31/2013","1/31/2013","1/1/2014","1/2/2014","1/2/2014","1/2/2014","1/2/2014","1/3/2014","2/1/2014"),"%m/%d/%Y"),"QNT_SUPPLY"=c(2,1,2,1,1,1,2,3,1,2))
df_supply
   STORE  ITEM DATE_SUPPLY CUM_QNT_SUPPLY
1   1001 13048 2013-01-31          2
2   1002 13229 2013-01-31          1
3   1001 13057 2013-01-31          2
4   1001 13082 2014-01-01          1
5   1002 13145 2014-01-02          1
6   1002 13166 2014-01-02          1
7   1002 13229 2014-01-02          2
8   1002 13057 2014-01-02          3
9   1002 13048 2014-01-03          1
10  1002 13048 2014-02-01          2



Output Required:
Sales Vs Supply
   STORE  ITEM  DATE_SALE QNT_SALE  DATE_SUPPLY QNT_SUPPLY
1   1001 13048 2014-01-01        1  2013-01-31          2
2   1001 13057 2014-01-01        1  2013-01-31          2
3   1001 13082 2014-01-02        1  2014-01-01          1
4   1001 13048 2014-01-02        1  2013-01-31          2
5   1001 13057 2014-01-02        1  2013-01-31          2
6   1002 13145 2014-01-03        1  2014-01-02          1
7   1002 13166 2014-01-03        1  2014-01-02          1
8   1002 13229 2014-01-03        1  2014-01-02          2
9   1002 13057 2014-01-03        1  2014-01-02          3
10  1002 13048 2014-01-04        1  2014-01-03          1

2 Answers 2

5

Using rolling joins from data.table:

require(data.table)
setkey(setDT(df_supply), STORE, ITEM, DATE_SUPPLY)
idx = df_supply[df_sales, roll=Inf, which=TRUE]
cbind(df_sales, df_supply[idx, 3:4])
#    STORE  ITEM  DATE_SALE QNT_SALE DATE_SUPPLY QNT_SUPPLY
# 1   1001 13048 2014-01-01        1  2013-01-31          2
# 2   1001 13057 2014-01-01        1  2013-01-31          2
# 3   1001 13082 2014-01-02        1  2014-01-01          1
# 4   1001 13048 2014-01-02        1  2013-01-31          2
# 5   1001 13057 2014-01-02        1  2013-01-31          2
# 6   1002 13145 2014-01-02        1  2014-01-02          1
# 7   1002 13166 2014-01-03        1  2014-01-02          1
# 8   1002 13229 2014-01-03        1  2014-01-02          2
# 9   1002 13057 2014-01-03        1  2014-01-02          3
# 10  1002 13048 2014-01-04        1  2014-01-03          1

cbind returns an entirely new object. If instead you'd like to add the new columns by reference to df_sales use := instead. There are numerous examples of using it here on SO and also explained under the new HTML vignettes.

Sign up to request clarification or add additional context in comments.

1 Comment

idx = df_supply[df_sales, roll=Inf, which=TRUE], this line is throwing error, as actual dataset has ITEM as factor. Tried to convert STORE also a factor as actual ITEM id has factor value so ITEM can't be treated as numeric. I'm facing below error: Error in bmerge(i <- shallow(i), x, leftcols, rightcols, io <- haskey(i), : x.'STORE_ID' is a factor column being joined to i.'Row' which is type 'integer'. Factor columns must join to factor or character columns.
3

Here is something you can try, using merge and relevant ordering (order):

# order the data.frames
df_sales <- df_sales[order(-df_sales$STORE, -df_sales$ITEM, df_sales$DATE_SALE, decreasing=T), ]
df_supply <- df_supply[order(-df_supply$STORE, -df_supply$ITEM, df_supply$DATE_SUPPLY, decreasing=T), ]

# merge the data.frames
res <- merge(df_sales, df_supply, by=c("STORE","ITEM"), all=T)

# keep only records with DATE_SUPPLY anterior to DATE_SALE
res <- res[with(res, DATE_SUPPLY <= DATE_SALE), ]

# remove duplicates (based on STORE, ITEM and DATE_SALE)
res <- res[!duplicated(res[, 1:3]), ]

res
   # STORE  ITEM  DATE_SALE QNT_SALE DATE_SUPPLY QNT_SUPPLY
# 1   1001 13048 2014-01-02        1  2013-01-31          2
# 2   1001 13048 2014-01-01        1  2013-01-31          2
# 3   1001 13057 2014-01-02        1  2013-01-31          2
# 4   1001 13057 2014-01-01        1  2013-01-31          2
# 5   1001 13082 2014-01-02        1  2014-01-01          1
# 7   1002 13048 2014-01-04        1  2014-01-03          1
# 8   1002 13057 2014-01-03        1  2014-01-02          3
# 9   1002 13145 2014-01-02        1  2014-01-02          1
# 10  1002 13166 2014-01-03        1  2014-01-02          1
# 11  1002 13229 2014-01-03        1  2014-01-02          2

11 Comments

I am not sure how this will perform with a different dataset where there are non-matching dates, but for this example, it works
that's why the examples must be representative ;-)
@akrun this is really a rolling join question, especially while been tagged with data.table but my attempt to solve it was unsuccessful for some reason. Did you try it?
@DavidArenburg No, I didn't try it.
I wrote a solution but can't make it populate all the fields for some reason.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.