1

I have two dataframes:

Dataframe #1

A   B   C   D   E 
2   1   0   5   7       

and

Dataframe #2

C   E   F   G 
3   1   0   9

I would like to combine the two dataframes such that the 1st dataframe serves as the reference and the columns in common are added together

The solution should look like:

A   B   C   D   E 
2   1   3   5   8

Notice the elements of the shared columns (columns C and E) were added and the extra columns in Dataframe #2 (columns F and G) were ignored because they were not found in Dataframe #1. Each Dataframe has a single row.

I have tried accomplishing this with the native functions in R as well as the merge package but I've had no luck. In my application I need to combine thousands of columns many of which are shared between the two dataframes so I'm working on developing a robust solution.

Thanks!

2

2 Answers 2

4

This should work:

overlap = intersect(names(df1), names(df2))
df1[overlap] = df1[overlap] + df2[overlap]

It assumes the data frames have the same number of rows.

Sign up to request clarification or add additional context in comments.

Comments

1

It may be easier to convert from wide to long format first, do the operations and then convert back. For example, using dplyr and tidyr:

library(dplyr)
library(tidyr)
df1 <- data.frame(A = 2, B = 1, C = 0, D = 5, E = 7)
df2 <- data.frame(C = 3, E = 1, F = 0, G = 9)

df1 %>%
  gather(variable, value) %>%
  left_join(gather(df2, variable, value), by = "variable") %>%
  rowwise() %>%
  mutate(sum = sum(value.x, value.y, na.rm = TRUE)) %>%
  select(variable, sum) %>%
  spread(variable, sum)

What this does: puts variables from df1 into 1 column and their values in another; does the same to df2; combines df1 and df2 using df1 as reference; sums the values for each variable; selects the variable column and the new values (sums); converts back to wide format.

Result:

# A tibble: 1 × 5
      A     B     C     D     E
* <dbl> <dbl> <dbl> <dbl> <dbl>
1     2     1     3     5     8

In general when you have the same measurement for multiple variables in many columns, it's worth considering whether the long format is a better representation of the data.

1 Comment

So you voted to close the question as a duplicate then posted an answer?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.