2

I would like to replace values of one dataframe with NA of another dataframe that have the same identifier. That is, for all values of df1 that have the same id, assign the "NA" values of df2 at the corresponding id and indices.

I have df1 and df2:

df1 =data.frame(id = c(1,1,2,2,6,6),a = c(2,4,1,7,5,3), b = c(5,3,0,3,2,5),c = c(9,3,10,33,2,5))
df2 =data.frame(id = c(1,2,6),a = c("NA",0,"NA"), b= c("NA", 9, 9),c=c(0,"NA","NA"))

what i would like is df3:

df3 = data.frame(id = c(1,1,2,2,6,6),a = c("NA","NA",1,7,"NA","NA"), b = c("NA","NA",0,3,2,5),c = c(9,3,"NA","NA","NA","NA"))

I have tried the lookup function and the library "data.table", but i could get the correct df3. Could anyone please help me with this?

3 Answers 3

3

We can do a join on 'id' and then replace the NA values by multiplying the .

library(data.table)
nm1 <- names(df1)[-1]
setDT(df1)[df2,  (nm1) := Map(function(x, y) x*(NA^is.na(y)), .SD, 
                  mget(paste0('i.', nm1))), on = .(id), .SDcols = nm1]
df1
#   id  a  b  c
#1:  1 NA NA  9
#2:  1 NA NA  3
#3:  2  1  0 NA
#4:  2  7  3 NA
#5:  6 NA  2 NA
#6:  6 NA  5 NA

data

df2 =data.frame(id = c(1,2,6),a = c(NA,0,NA), b= c(NA, 9, 9),c=c(0,NA,NA))

NOTE: In the OP's post NA were "NA"

Sign up to request clarification or add additional context in comments.

Comments

1

Since your NA values are actually text "NA" you will have to turn all your variables into text (with as.character). You can join both dataframes by id column. Since both dataframes have columns a,b, and c R will rename then a.x, b.x and c.x (df1) and a.y, b.y and c.y (df2). After that you can create new columns a,b, and c. These than have "NA" whenever a.y == "NA" and a.x otherwise (and so on). If your NA values were real NA you need to test differently is.na(value) (see example below in the code).

library(dplyr)

df1 %>%  
  mutate_all(as.character) %>% # allvariables as text
  left_join(df2 %>% 
              mutate_all(as.character) ## all variables as text
            , by = "id") %>% ## join tables by 'id'; a.x from df1 and a.y from df2 and so on
  mutate(a = case_when(a.y == "NA" ~ "NA", TRUE ~ a.x), ## if a.y == "NA" take this,else  a.x 
         b = case_when(b.y == "NA" ~ "NA", TRUE  ~ b.x),
         c = case_when(c.y == "NA" ~ "NA", TRUE ~ c.x)) %>%
  select(id, a, b, c) ## keep only these initial columns

  id  a  b  c
1  1 NA NA  9
2  1 NA NA  3
3  2  1  0 NA
4  2  7  3 NA
5  6 NA  2 NA
6  6 NA  5 NA

##if your dataframe head real NA this is how you can test:
missing_value <- NA

is.na(missing_value) ## TRUE
missing_value == NA  ## Does not work with R

Comments

0

First you need to transform your "NA"'s with real NA's:

df2[df2=="NA"]<-NA

Then, can perform the replacements with:

df1[is.na(df2[match(df1[,1],df2[,1]),])] <- NA

OUTPUT:

df1
  id  a  b  c
1  1 NA NA  9
2  1 NA NA  3
3  2  1  0 NA
4  2  7  3 NA
5  6 NA  2 NA
6  6 NA  5 NA

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.