1

I have the following column in my data frame that contain charges

library(dplyr)
library(stringr)

  df<-data.frame(charge=c("trespass-1st degree",
      "trespass - 1st degree","rape or attempted rape - 1st degree",
      "rape or attempt rape 1st degree","Assault 1st","Assault 1st"))

                               charge
1                 trespass-1st degree
2               trespass - 1st degree
3 rape or attempted rape - 1st degree
4     rape or attempt rape 1st degree
5                         Assault 1st
6                         Assault 1st

I want to make sure that certain charges that have data entry errors are standardized. e.g trespass-1st degree vs trespass - 1st degree and rape or attempted rape - 1st degree vs rape or attempt rape 1st degree

I tried the following

df%>%
  mutate(charge=
           case_when(str_detect(charge, "^trespass-1st") ~ "Trespass 1st",
                     str_detect(charge,"^rape or attempted rape")~"Rape 1st"))

which gives me the following output

        charge
1 Trespass 1st
2         <NA>
3     Rape 1st
4         <NA>
5         <NA>
6         <NA>

How do I make sure that if only two strings are present like "tresspass" and "1st" then that gets tagged as " Trespass 1st" and if "rape" and "1st" are present in the charge column then that gets tagged as "Rape 1st"

To get the following df

        charge
1 Trespass 1st
2 Trespass 1st        
3     Rape 1st
4     Rape 1st
5  Assault 1st
6  Assault 1st
0

1 Answer 1

1

The issue is that some elements doesn't have spaces (trespass-1st vs trespass-1st) or some suffix (attempt vs attempted)

library(dplyr)
df %>%
    mutate(charge=
         case_when(str_detect(charge, "^trespass\\s*-\\s*1st") ~ 
           "Trespass 1st",
                  str_detect(charge,"^rape or attempte*d* rape")~"Rape 1st", 
              TRUE ~ charge))
#        charge
#1 Trespass 1st
#2 Trespass 1st
#3     Rape 1st
#4     Rape 1st
#5  Assault 1st
#6  Assault 1st

data

df <- structure(list(charge = c("trespass-1st degree", "trespass - 1st degree", 
"rape or attempted rape - 1st degree", "rape or attempt rape 1st degree", 
"Assault 1st", "Assault 1st")), class = "data.frame", row.names = c("1", 
"2", "3", "4", "5", "6"))
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.