1

I have two dataframes, I need to add two columns from those two and store the result in the original bigger dataframe, but the bigger dataframe has lot more 'branch' column than the smaller one. I tried using match but the non matching branches the sum is NA

Sample code:

> df1 <- data.frame(branch = letters[seq(1,5)],
+                   rev = seq(10,50,10),
+                   stringsAsFactors = 0)
> df1
  branch rev
1      a  10
2      b  20
3      c  30
4      d  40
5      e  50
> 
> df2 <- data.frame(branch = c('b','d'),
+                   Amt = c(10,10),
+                   stringsAsFactors = 0)
> df2
  branch Amt
1      b  10
2      d  10
> 
> df1$rev + df2[match(df1$branch,df2$branch),2,drop = 1]
[1] NA 30 NA 50 NA
> 

Expected Output

> df1
  branch rev
1      a  10
2      b  30
3      c  30
4      d  50
5      e  50
> 

I tried using left join as below:

> left_join(df1, df2, by = 'branch')
  branch rev Amt
1      a  10  NA
2      b  20  10
3      c  30  NA
4      d  40  10
5      e  50  NA
> df1 <- left_join(df1, df2, by = 'branch')
> df1[is.na(df1)] <- 0
> df1
  branch rev Amt
1      a  10   0
2      b  20  10
3      c  30   0
4      d  40  10
5      e  50   0
> df1$rev <- df1$rev + df1$Amt
> df1
  branch rev Amt
1      a  10   0
2      b  30  10
3      c  30   0
4      d  50  10
5      e  50   0
> df1$Amt <- NULL
> df1
  branch rev
1      a  10
2      b  30
3      c  30
4      d  50
5      e  50
> 

Could someone let me know if there's a simpler solution for this.

5 Answers 5

2

An option using data.table:

library(data.table)
setDT(df1)[, rev :=
    setDT(df2)[.SD, on=.(branch), rev + nafill(Amt, fill=0)]
]

output:

   branch rev
1:      a  10
2:      b  30
3:      c  30
4:      d  50
5:      e  50
Sign up to request clarification or add additional context in comments.

Comments

2

How about this, no libraries required:

    df1 <- df1[order(df1$branch),] #sort based on branch
    df2 <- df2[order(df2$branch),] #sort also so next step works
    df1$branch[df1$branch %in% df2$branch] #just to check we are on correct path

    #do the task
    df1$rev[df1$branch %in% df2$branch] <- df1$rev[df1$branch %in% df2$branch]  + df2$Amt[df2$branch %in% df1$branch] 

Warning -- if there are repeated "branch" values in df2...e.g. two "b", you would need to accumulate those before adding them to df1.

Comments

1

One way would to store the output of match in a variable, replace NA to 0 and then add values

vals <- df2$Amt[match(df1$branch,df2$branch)]
df1$rev + replace(vals, is.na(vals), 0)
#[1] 10 30 30 50 50

Something similar in dplyr, doing left_join instead of match

library(dplyr)

df1 %>%
  left_join(df2, by = 'branch') %>%
  mutate(Amt = replace(Amt, is.na(Amt), 0), 
         rev  = rev + Amt) %>%
  select(names(df1))

Comments

1

Using dplyr, you can aggregate both dataframes using bind_rows (and renaming Amt by rev in order to match colnames), group by "branch" and calculate the sum:

library(dplyr)
df1 %>% bind_rows(., rename(df2, rev = Amt)) %>%
  group_by(branch) %>%
  summarise(rev = sum(rev))

# A tibble: 5 x 2
  branch   rev
  <chr>  <dbl>
1 a         10
2 b         30
3 c         30
4 d         50
5 e         50

Comments

0

Use aggregate to get the sum of rev in different branch group .

library(magrittr)
colnames(df2)[2] <- "rev"
df1 <- rbind(df1, df2) %>% aggregate(rev ~ branch, ., FUN = sum)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.