Merge unequal dataframes and replace missing rows with 0

Question

I have two data.frames, one with only characters and the other one with characters and values.

df1 = data.frame(x=c('a', 'b', 'c', 'd', 'e'))
df2 = data.frame(x=c('a', 'b', 'c'),y = c(0,1,0))
merge(df1, df2)
  x y
1 a 0
2 b 1
3 c 0

I want to merge df1 and df2. The characters a, b and c merged good and also have 0, 1, 0 but d and e has nothing. I want d and e also in the merge table, with the 0 0 condition. Thus for every missing row at the df2 data.frame, the 0 must be placed in the df1 table, like:

We usually call characters also values, so your y column would be called numeric. — s_baldur
– s_baldur, Commented Sep 4, 2018 at 15:22

Chase · Accepted Answer · 2019-01-04 03:57:51Z

119

Take a look at the help page for merge. The all parameter lets you specify different types of merges. Here we want to set all = TRUE. This will make merge return NA for the values that don't match, which we can update to 0 with is.na():

zz <- merge(df1, df2, all = TRUE)
zz[is.na(zz)] <- 0

> zz
  x y
1 a 0
2 b 1
3 c 0
4 d 0
5 e 0

Updated many years later to address follow up question

You need to identify the variable names in the second data table that you aren't merging on - I use setdiff() for this. Check out the following:

df1 = data.frame(x=c('a', 'b', 'c', 'd', 'e', NA))
df2 = data.frame(x=c('a', 'b', 'c'),y1 = c(0,1,0), y2 = c(0,1,0))

#merge as before
df3 <- merge(df1, df2, all = TRUE)
#columns in df2 not in df1
unique_df2_names <- setdiff(names(df2), names(df1))
df3[unique_df2_names][is.na(df3[, unique_df2_names])] <- 0

^{Created on 2019-01-03 by the reprex package (v0.2.1)}

edited Jan 4, 2019 at 3:57

answered May 11, 2011 at 14:21

Chase

69.5k18 gold badges147 silver badges164 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

jbest Over a year ago

Hi Chase, can I used command "all=true' for df1 only. Sometimes this command includes data that are not available in df1 but are available in df2

Chase Over a year ago

@jbest - there are arguments all.x and all.y where x == the first data.frame object and y == the second, precisely for this situation. See the help page for ?merge for details.

Nick Sabbe · Accepted Answer · 2011-05-11 14:52:59Z

7

Or, as an alternative to @Chase's code, being a recent plyr fan with a background in databases:

require(plyr)
zz<-join(df1, df2, type="left")
zz[is.na(zz)] <- 0

answered May 11, 2011 at 14:52

Nick Sabbe

12k1 gold badge45 silver badges57 bronze badges

Comments

Wojciech Sobala · Accepted Answer · 2011-05-11 20:11:33Z

4

Another alternative with data.table.

EXAMPLE DATA

dt1 <- data.table(df1)
dt2 <- data.table(df2)
setkey(dt1,x)
setkey(dt2,x)

CODE

dt2[dt1,list(y=ifelse(is.na(y),0,y))]

answered May 11, 2011 at 20:11

Wojciech Sobala

7,5812 gold badges23 silver badges27 bronze badges

1 Comment

lmo Over a year ago

In version 1.10.4, you don't need to setkey and can use df2[df1, on="x"][is.na(y), y := 0] immediately after creating the data.tables to produce the desired result.

sbha · Accepted Answer · 2019-03-22 23:29:46Z

Assuming df1 has all the values of x of interest, you could use a dplyr::left_join() to merge and then either a base::replace() or tidyr::replace_na() to replace the NAs as 0s:

library(tidyverse)

# dplyr only:
df_new <- 
  left_join(df1, df2, by = 'x') %>% 
  mutate(y = replace(y, is.na(y), 0))

# dplyr and tidyr:
df_new <- 
  left_join(df1, df2, by = 'x') %>% 
  mutate(y = replace_na(y, 0))

# In the sample data column `x` is a factor, which will give a warning with the join. This can be prevented by converting to a character before the join:
df_new <- 
  left_join(df1 %>% mutate(x = as.character(x)), 
            df2 %>% mutate(x = as.character(x)), 
            by = 'x') %>% 
    mutate(y = replace(y, is.na(y), 0))

Ian E. Gorman · Accepted Answer · 2014-03-27 04:36:56Z

I used the answer given by Chase (answered May 11 '11 at 14:21), but I added a bit of code to apply that solution to my particular problem.

I had a frame of rates (user, download) and a frame of totals (user, download) to be merged by user, and I wanted to include every rate, even if there were no corresponding total. However, there could be no missing totals, in which case the selection of rows for replacement of NA by zero would fail.

The first line of code does the merge. The next two lines change the column names in the merged frame. The if statement replaces NA by zero, but only if there are rows with NA.

# merge rates and totals, replacing absent totals by zero
graphdata <- merge(rates, totals, by=c("user"),all.x=T)
colnames(graphdata)[colnames(graphdata)=="download.x"] = "download.rate"
colnames(graphdata)[colnames(graphdata)=="download.y"] = "download.total"
if(any(is.na(graphdata$download.total))) {
    graphdata[is.na(graphdata$download.total),]$download.total <- 0
}

Captain Tyler · Accepted Answer · 2022-04-26 17:24:35Z

1

Here, a data.table answer. This may be used in selected columns varying the cols_added_df2's definition

df1 = data.frame(x=c('a', 'b', 'c', 'd', 'e'))
df2 = data.frame(x=c('a', 'b', 'c'),y = c(0,1,0))
setDT(df1)
setDT(df2)
df3 <- merge(df1, df2, by = "x", all.x = TRUE)

cols_added_df2 <- setdiff(names(df2), names(df1)) 
df3[, 
  paste0(cols_added_df2) := lapply(.SD, function(col){
    fifelse(is.na(col), 1, col)
  }),
  .SDcols = cols_added_df2
]

answered Apr 26, 2022 at 17:24

Captain Tyler

6149 silver badges20 bronze badges

2 Comments

swihart Mar 5 at 20:15

fifelse(is.na(col), 1, col) should be ifelse(is.na(col), 0, col) to replace missing with 0, correct?

Captain Tyler Apr 15 at 13:59

you can, but I tried to keep into data.table pkg. See fifelse

moodymudskipper · Accepted Answer · 2022-04-28 20:52:38Z

0

With {powerjoin} we can do:

df1 = data.frame(x=c('a', 'b', 'c', 'd', 'e'))
df2 = data.frame(x=c('a', 'b', 'c'),y = c(0,1,0))
powerjoin::power_full_join(df1, df2, fill = 0)
#> Joining, by = "x"
#>   x y
#> 1 a 0
#> 2 b 1
#> 3 c 0
#> 4 d 0
#> 5 e 0

^{Created on 2022-04-28 by the reprex package (v2.0.1)}

answered Apr 28, 2022 at 20:52

moodymudskipper

47.7k12 gold badges131 silver badges185 bronze badges

Collectives™ on Stack Overflow

Merge unequal dataframes and replace missing rows with 0

7 Answers 7

2 Comments

Comments

1 Comment

Comments

Comments

2 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

7 Answers 7

2 Comments

Comments

1 Comment

Comments

Comments

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related