I have the following column in my data frame that contain charges
library(dplyr)
library(stringr)
df<-data.frame(charge=c("trespass-1st degree",
"trespass - 1st degree","rape or attempted rape - 1st degree",
"rape or attempt rape 1st degree","Assault 1st","Assault 1st"))
charge
1 trespass-1st degree
2 trespass - 1st degree
3 rape or attempted rape - 1st degree
4 rape or attempt rape 1st degree
5 Assault 1st
6 Assault 1st
I want to make sure that certain charges that have data entry errors are standardized. e.g
trespass-1st degree vs trespass - 1st degree and rape or attempted rape - 1st degree vs rape or attempt rape 1st degree
I tried the following
df%>%
mutate(charge=
case_when(str_detect(charge, "^trespass-1st") ~ "Trespass 1st",
str_detect(charge,"^rape or attempted rape")~"Rape 1st"))
which gives me the following output
charge
1 Trespass 1st
2 <NA>
3 Rape 1st
4 <NA>
5 <NA>
6 <NA>
How do I make sure that if only two strings are present like "tresspass" and "1st" then that gets tagged as " Trespass 1st" and if "rape" and "1st" are present in the charge column then that gets tagged as "Rape 1st"
To get the following df
charge
1 Trespass 1st
2 Trespass 1st
3 Rape 1st
4 Rape 1st
5 Assault 1st
6 Assault 1st