2

I have a dataset with 300 columns and 1000 rows and a corresponding code book in data.table format. For simplicity I am going to give 3 columns for both.

dt <- data.table(id = 1:10,
                 a  = sample(c(1,2,3),10, replace = T),
                 b  = sample(c(1,2)  ,10, replace = T),
                 c  = sample(c(1:5)  ,10, replace = T))

    id a b c
 1:  1 2 1 2
 2:  2 2 1 1
 3:  3 3 1 1
 4:  4 3 1 1
 5:  5 1 2 5
 6:  6 2 1 3
 7:  7 1 2 3
 8:  8 1 1 2
 9:  9 2 1 5
10: 10 3 2 4

cb <- data.table(var = c(rep("a", 3), rep("b", 2), rep("c", 5)),
                 val = c(1,2,3,1,2,1,2,3,4,5),
                 des = c("red", "blue", "yellow", "yes","no","K", "Na","Ag","Au","Si"))

    var val    des
 1:   a   1    red
 2:   a   2   blue
 3:   a   3 yellow
 4:   b   1    yes
 5:   b   2     no
 6:   c   1      K
 7:   c   2     Na
 8:   c   3     Ag
 9:   c   4     Au
10:   c   5     Si

In cb, var is the corresponding variable in dt, and val is the value in dt that has the corresponding des value. I want to edit dt by replacing the values in dt by the values in cb. It should look like

    id      a   b  c
 1:  1    red yes Na
 2:  2 yellow  no Ag
 3:  3   blue yes Ag
 4:  4    red yes Au
 5:  5   blue yes Ag
 6:  6   blue  no Au
 7:  7 yellow yes Si
 8:  8   blue  no Ag
 9:  9    red  no  K
10: 10 yellow  no Ag

How do I perform an operation like this efficiently and in a way that doesn't sound like my computer has built in piston?

The reason is I have a pre-written code to analyze the data and need the actual values in order to run it. It may also prove useful in general because many times I am given data and a code book, but usually they aren't this many variables.

2
  • 1
    I feel like your first row in the final output should be 'a' = 'blue', correct? Commented May 10, 2017 at 22:16
  • 3
    Please use set.seed when using sample, so the results are repeatable Commented May 10, 2017 at 22:19

3 Answers 3

3

You could try

dcast(melt(dt, 1, var="var", val="val")[cb, on=c("var","val")], id~var, value.var="des")
#     id      a   b  c
#  1:  1    red yes  K
#  2:  2 yellow  no Si
#  3:  3    red yes Si
#  4:  4    red  no Au
#  5:  5    red  no Ag
#  6:  6   blue yes  K
#  7:  7   blue  no Si
#  8:  8 yellow yes Na
#  9:  9   blue yes Ag
# 10: 10 yellow yes Si
Sign up to request clarification or add additional context in comments.

3 Comments

This is the more robust solution because it works for any number of columns.
By any chance do you know how long this would take with 1M obs?
@akash87 Why not try it out: 300 vars w/ 100.000 obs took ~20 secs on my rather old PC cols <- paste0("V", 1:300);cb <- setDT(expand.grid(var=cols, val=1:3))[,des:=sample(LETTERS, .N, T)];dt <- as.data.table(replicate(length(cols), sample(1:3,100000,T)))[,id:=1:.N];system.time(dcast(melt(dt, id="id", var="var", val="val")[cb, on=c("var","val"), allow.cartesian=TRUE][!is.na(id)], id~var, value.var="des")). That is if I did not make a mistake..
3

Another option would be to do multiple merge + updates:

cb_dc <- data.table::dcast(cb, des~var, value.var = "val")
cols = c("a","b","c")
dt[, (cols) := lapply(cols, function(x) cb_dc[dt, des, on = x]) ]

 #  id      a   b  c
 #1:  1    red yes Si
 #2:  2   blue yes Na
 #3:  3   blue  no Au
 #4:  4 yellow yes  K
 #5:  5    red  no Na
 #6:  6 yellow yes Na
 #7:  7 yellow  no  K
 #8:  8   blue  no Na
 #9:  9   blue yes Si
#10: 10    red  no Na

Data:

set.seed(1)
  dt <- data.table(id = 1:10,
                   a  = sample(c(1,2,3),10, replace = T),
                   b  = sample(c(1,2)  ,10, replace = T),
                   c  = sample(c(1:5)  ,10, replace = T))

Comments

1

This dplyr answer essentialy joins with a sub table once for each of the three columns.

library(dplyr)

dt %>% 
  left_join(cb %>% filter(var == "a"), by=c("a" = "val")) %>% 
  left_join(cb %>% filter(var == "b"), by=c("b" = "val")) %>% 
  left_join(cb %>% filter(var == "c"), by=c("c" = "val")) %>%
  select(id, des.x, des.y, des) %>%
  rename(a = des.x, b = des.y, c = des)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.