2

Think about the following two data frames

df1 <- data.frame(a = rep(1, 10), b = rep(1, 10), c = rep(1, 10))
df2 <- data.frame(company = c("a", "b", "c"), weight = c(5, 10, 20))

df1

   a b c
1  1 1 1
2  1 1 1
3  1 1 1
4  1 1 1
5  1 1 1
6  1 1 1
7  1 1 1
8  1 1 1
9  1 1 1
10 1 1 1

df2

  company weight
1       a      5
2       b     10
3       c     20

I'm now looking for a solution that looks for the column names of df1 in the company column of df2 and multiplies each row of the corresponding company in df1 with the value from the weight column.

So what I want to achieve is:

df.weighted

   a.weighted b.weighted c.weighted
1           5         10         20
2           5         10         20
3           5         10         20
4           5         10         20
5           5         10         20
6           5         10         20
7           5         10         20
8           5         10         20
9           5         10         20
10          5         10         20

Does anyone have an idea?

Thank you!

1 Answer 1

4

We could make the lengths of both the datasets same and multiply

out <- setNames(df2$weight, df2$company)[col(df1)] * df1
names(out) <- paste0(names(out), ".weighted")

Or another option is

df1 * split(df2$weight, df2$company)[names(df1)]

Or with match

df2$weight[match(names(df1), df2$company)][col(df1)] * df1

Or using sweep

sweep(df1[df2$company],  2, FUN = `*`, df2$weight)
Sign up to request clarification or add additional context in comments.

5 Comments

Thanks! I didn't know that data frames can be multiplied with vectors like this.I will use the match function, because this way not all columns in df1 have to match the entries in df2. Thanks again!
Just be careful about the different results of multiplying with lists vs vectors - df1 * c(1,2,3) vs df1 * list(1,2,3)
As thelatemail mentioned, here we are making the lengths same by replicating with col, otherwise, the recycling effect could change the output
I believe another option could be: t(t(df1[df2$company]) * df2$weight) The reason is to maintain the order of the companies, or even if fewer companies were selected from df1
I think the match version is safest as it both takes into account the ordering and can deal with non-matches as well by returning NA

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.