Afternoon clever people.
I have a decent sized data set (>800k rows) and as an example I have pulled out a tiny sample of 20 columns by 2 rows. At the outset only the "Topics" column is populated with a vector, all other columns are set to FALSE.
This will recreate the data as it sits currently...
Topics <- c("E11,E31,E313,ECAT" , "E1,E20")
E1 <- c(FALSE, FALSE)
E11 <- c(FALSE, FALSE)
E20 <- c(FALSE, FALSE)
E30 <- c(FALSE, FALSE)
E31 <- c(FALSE, FALSE)
E100 <- c(FALSE, FALSE)
E300 <- c(FALSE, FALSE)
E313 <- c(FALSE, FALSE)
ECAT <- c(FALSE, FALSE)
df <- data.frame(Topics,E1,E11,E20,E30,E31,E100,E300,E313,ECAT)
Which will give something like...
Topics E1 E11 E20 E30 E31 E100 E300 E313 ECAT
E11,E31,E313,ECAT FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
E1,E20 FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
I want to set the relevant row,column to TRUE where there is a match for each of the items in the topic vector. So it should look something like...
Topics E1 E11 E20 E30 E31 E100 E300 E313 ECAT
E11,E31,E313,ECAT FALSE TRUE FALSE FALSE TRUE FALSE TRUE FALSE TRUE
E1,E20 TRUE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE
So far I have failed UTTERLY to work this one out but I suspect it is something like:
- split the topic into a vector using
strsplit - for each item in vector try to match to
names(df) - when matched set row,column == TRUE
BUT I have tried all sorts and cannot fathom the logic. Can anyone break this down for me please?
E313should beTRUEinstead ofE300