0

I have multiple dataframes with identical column names and dimension. :

 df1
    device_id    price     tax
1       a         200       5
2       b         100       2
3       c         50        1 

 df2
    device_id    price     tax
1       b         200       7
2       a         100       3
3       c         50        1


 df3
    device_id    price     tax
1       c         50       5
2       b         300      1
3       a         50       2  

What I want to do is to create another dataframe df where I will add the price and tax values from the above three dataframes with matching device_ids.

So, df would be like

df
    device_id    price     tax
1       a         350       10
2       b         600       10
3       c         150        7 

How can I do it? Also, it would be great if the solution can be generalized to larger number of dataframes instead of just 3.

0

2 Answers 2

1

First, get all your data frames into a list (called dflist here, defined below). Then it's easy to do with aggregate() after row-binding the list elements.

aggregate(. ~ device_id, do.call(rbind, dflist), sum)
#   device_id price tax
# 1         a   350  10
# 2         b   600  10
# 3         c   150   7

Or you could use the data.table package.

library(data.table)
rbindlist(dflist)[, lapply(.SD, sum), by = device_id]
#    device_id price tax
# 1:         a   350  10
# 2:         b   600  10
# 3:         c   150   7

Or dplyr.

library(dplyr)
bind_rows(dflist) %>% 
    group_by(device_id) %>%
    summarize_each(funs(sum))
# Source: local data frame [3 x 3]
#
#   device_id price   tax
#      <fctr> <int> <int>
# 1         a   350    10
# 2         b   600    10
# 3         c   150     7

Data:

dflist <- structure(list(df1 = structure(list(device_id = structure(1:3, .Label = c("a", 
"b", "c"), class = "factor"), price = c(200L, 100L, 50L), tax = c(5L, 
2L, 1L)), .Names = c("device_id", "price", "tax"), class = "data.frame", row.names = c("1", 
"2", "3")), df2 = structure(list(device_id = structure(c(2L, 
1L, 3L), .Label = c("a", "b", "c"), class = "factor"), price = c(200L, 
100L, 50L), tax = c(7L, 3L, 1L)), .Names = c("device_id", "price", 
"tax"), class = "data.frame", row.names = c("1", "2", "3")), 
    df3 = structure(list(device_id = structure(c(3L, 2L, 1L), .Label = c("a", 
    "b", "c"), class = "factor"), price = c(50L, 300L, 50L), 
        tax = c(5L, 1L, 2L)), .Names = c("device_id", "price", 
    "tax"), class = "data.frame", row.names = c("1", "2", "3"
    ))), .Names = c("df1", "df2", "df3"))
Sign up to request clarification or add additional context in comments.

Comments

1

We can use by from base R after rbinding after we place all the data.frame objects in a list (mget(paste0("df", 1:3)))

 dfN <- do.call(rbind, mget(paste0("df", 1:3)))
 do.call(rbind, by(dfN[-1], dfN[1], FUN = colSums))

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.