I have two data frames:
- Lookup table
lookupwith columnsvarName(variable name),key, andvalue - Data frame
dfwith columns named exactly as values invarNameand values corresponding tokey(values indfare keys tolookup). This data frame is much bigger than lookup data frame (e.g. 1e6 rows).
I would like to recode data in df by appending new column for every variable, where key in df is replaced by corresponding value from lookup data frame. It is important to note that keys are of type double.
Sample data:
# Generate sample data
lookup <- data.frame(
varName = rep(LETTERS[1:3], each = 3),
key = runif(9),
value = runif(9)
)
df <- expand.grid(
A = lookup[lookup$varName == 'A', 'key'],
B = lookup[lookup$varName == 'B', 'key'],
C = lookup[lookup$varName == 'C', 'key']
)
My current solution uses temporary renaming of variables and join from plyr:
require(plyr)
for (varName in unique(lookup$varName)) {
tmpLookup <- rename(lookup, replace = c(key = varName))
df[paste0(varName, '_value')] <- join(df[varName], tmpLookup[c(varName, 'value')],
by = varName)['value']
}
df
Questions:
- is this safe? I cannot find any information if joining
doublewill work always correctly usingjoin - is there better way to accomplish the same safer and faster?