I have a set of patient data df I am trying to de-identify in R.
structure(list(name = structure(c(2L, 5L, 1L, 6L, 4L, 3L), .Label = c("Andrew",
"Jim", "Kurt", "Lester", "Mickey", "Taylor"), class = "factor"),
heart_rate = c(78L, 82L, 67L, 105L, 85L, 94L), age = c(35L,
23L, 43L, 52L, 33L, 45L), partner = structure(c(5L, 2L, 6L,
1L, 3L, 4L), .Label = c("Andrew", "Jim ", "Kurt ", "Lester ",
"Mickey ", "Taylor "), class = "factor")), class = "data.frame", row.names = c(NA,
-6L))
I want to replace the names of both the name and partner columns based on the id column of this object called key
structure(list(name = structure(c(2L, 5L, 1L, 6L, 4L, 3L), .Label = c("Andrew",
"Jim", "Kurt", "Lester", "Mickey", "Taylor"), class = "factor"),
id = structure(c(2L, 5L, 1L, 6L, 4L, 3L), .Label = c("A3",
"J9", "K5", "L4", "M4", "T7"), class = "factor")), class = "data.frame", row.names = c(NA,
-6L))
I can de-identify the name column with this code
df[["name"]] <- key[ match(df[['name']], key[['name']] ) , 'id']
but, when I try to de-identify the partner column with this code
df[["partner"]] <- key[ match(df[['partner']], key[['name']] ) , 'id']
My dataframe looks like this
structure(list(name = structure(c(2L, 5L, 1L, 6L, 4L, 3L), .Label = c("A3",
"J9", "K5", "L4", "M4", "T7"), class = "factor"), heart_rate = c(78L,
82L, 67L, 105L, 85L, 94L), age = c(35L, 23L, 43L, 52L, 33L, 45L
), partner = structure(c(NA, NA, NA, 1L, NA, NA), .Label = c("A3",
"J9", "K5", "L4", "M4", "T7"), class = "factor")), row.names = c(NA,
-6L), class = "data.frame")
Does anyone have any suggestions? Bonus points for methods that could just apply over all columns in a dataset in one line and explanations of code are greatly appreciated.