I have a data frame, ‘df’. The data frame is quite large. The data is quite fuzzy; it contains misspells, no constant pattern etc. see example
structure(list(ABC = structure(c(1L, 3L, 4L, 6L, 8L, 9L, 5L,
11L, 2L, 7L, 10L), .Label = c("2-8-2010 14:42:00 (number not ok)",
"2-8-2010 18:42:00 (nuber is not oke)", "2-8-2010 18:42:00 (number is not ok)",
"2-9-2010 14:47:00 (? Not ok )", "23:59 missing &^%", "26-9-2010 23.24",
"26-9-2010 23.24 not (working)", "26-9-2010 23.28 note: shutdown number!)",
"26-9-2010 23.29 (missing brackets", "Im oke and working\n",
"number"), class = "factor")), .Names = "ABC", row.names = c(NA,
-11L), class = "data.frame")
Q) How to recode a string variable based on a match with a target string?
In my case how to recode a the variable ‘ABC’ when the strings matches the words “not working” and “number is not ok” and when there is a match, create variable XYZ labeled ‘present’ etc. I’m aiming for this:
structure(list(ABC = structure(c(2L, 4L, 5L, 7L, 9L, 10L, 6L,
1L, 12L, 3L, 8L, 11L), .Label = c("", "2-8-2010 14:42:00 (number not ok)",
"2-8-2010 18:42:00 (nuber is not oke)", "2-8-2010 18:42:00 (number is not ok)",
"2-9-2010 14:47:00 (? Not ok )", "23:59 missing &^%", "26-9-2010 23.24",
"26-9-2010 23.24 not (working)", "26-9-2010 23.28 note: shutdown number!)",
"26-9-2010 23.29 (missing brackets", "Im oke and working\tabsent\n",
"number"), class = "factor"), XYZ = structure(list(XYZ = structure(c(3L,
3L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 3L, 1L), .Label = c("absent",
"missing", "present"), class = "factor")), .Names = "XYZ", class = "data.frame", row.names = c(NA,
-12L))), .Names = c("ABC", "XYZ"), row.names = c(NA, -12L), class = "data.frame")
I know, there are some examples on Stack that look the same but, I could not getting them working. I hope someone can push me in the right direction.
Thank you