1

My data looks something like this. What i want to do now is replace the "Old ID" values by using matching values from the second table: First table is this,

      Old ID |   Usage 
       211         25          
       211         17          
       211         18         
       202         11          
       202         12          
       194         17          
       202         16          
       194         22          
       194         84          
       198         26         

The second table with the matching values

      Old ID |     ID 
       211         abf          
       202         rdg          
       194         ufe         
       198                   

The first table should be changed after replacing each value in the Old ID with the corresponding values in the second table. If the value in the ID column is missing or "NULL" then the replaced value in the first table should show as "N/A" The first table should now look like this,

      Old ID |   Usage 
       abf         25          
       abf         17          
       abf         18         
       rdg         11          
       rdg         12          
       ufe         17          
       rdg         16          
       ufe         22          
       ufe         84          
       n/a         26   

I have around 2 million such entries. Thanks a lot for you help

2

3 Answers 3

2

This can be solved with an update on join :

library(data.table)
setDT(DT1)[setDT(DT2), on = "Old_ID", Old_ID := ID][]
    Old_ID Usage
 1:    abf    25
 2:    abf    17
 3:    abf    18
 4:    rdg    11
 5:    rdg    12
 6:    ufe    17
 7:    rdg    16
 8:    ufe    22
 9:    ufe    84
10:     NA    26

Data

DT1 <- structure(list(Old_ID = c("abf", "abf", "abf", "rdg", "rdg", 
"ufe", "rdg", "ufe", "ufe", NA), Usage = c("25", "17", "18", 
"11", "12", "17", "16", "22", "84", "26")), .Names = c("Old_ID", 
"Usage"), row.names = c(NA, -10L), class = c("data.table", "data.frame"))

DT2 <- structure(list(Old_ID = c("211", "202", "194", "198"), ID = c("abf", 
"rdg", "ufe", NA)), .Names = c("Old_ID", "ID"), row.names = c(NA, 
-4L), class = c("data.table", "data.frame"))
Sign up to request clarification or add additional context in comments.

Comments

0

Something like this?

df1 <- data.frame(old.id = c(211, 211, 211, 202, 194, 202, 198, 194), usage=c(20:27), stringsAsFactors = F)
df2 <- data.frame(old.id = c(211, 211, 212, 213, 202, 198), ID =  c("a", "a", "b", "c", "d", "e"), stringsAsFactors = F)


df1$old.id <- sapply(df1$old.id , (function(nm) { out <- df2[df2$old.id == nm, ]$ID; ifelse(length(out) > 0, out[1], NA) }))

df1    

Comments

0

first merge the two tables then remove the duplicates as below:

  S=merge(df1,df2,by="Old_ID")
  S[!duplicated(S),c(3,2)]
      ID Usage
 1   ufe    17
 4   ufe    22
 7   ufe    84
 10 <NA>    26
 11  rdg    11
 14  rdg    12
 17  rdg    16
 20  abf    25
 23  abf    17
 26  abf    18

2 Comments

Hey Onyambu, i made a mistake while posting the question. The second table does not have the same no.of of entries as the first one. The second one is just meant to match Old ID with ID. I have edited the question
The solutions might be different but the code still remains the same

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.