1

I applied case_when to a text data of thousands of rows to detect strings with multiple conditions and replace them but got a wrong result because case_when doesn't execute the remaining conditions once a condition is met. I have seen a solution in How to detect more than one regex in a case_when statement, but the solution does not have multiplicity of multiple conditions such as in my data.

Any alternative to case_when will be is appreciated.

This is the dummy data:

statement <- structure(list(stmt = c("diabetes is common", "police not my friend"
  "transport is cheap", "english is my language", "education is my right")), 
  class = "data.frame", row.names = c(NA, -5L))

I tried to adapt the 1st solution in How to detect more than one regex in a case_when statement but could not really figure it out.

I want to detect strings in texts in column stmt and recode the column into these five domains: APC, PDP, APGA, APP and SDP. Below are strings to be detected:

APC <- c("addiction|mental||Diabetes|health|healthy|Oranga|unwell|AOD| well| surgery|dental|recovery|oranga|Mirimiri|asthma|anger|checks|alcohol|pregnant|clinical|clinic")

PDP <- c("whanau direct|whānau direct|money|transport|home|repairs|social|budget|job|housing|house|financial|finance|Ohanga|furniture|accommodation|welfare|living|work|babies arrival|AT hop card|Entitlements|ohunga|bills|electricity|water|employment")

APGA <- c("Kaupapa|Te reo|language|Tikanga|Iwi|relationship|Tikinga|Reunite|")

APP <- c("Studying|training|NCEA|ECE|Counseling|counsel|Knowledge|School|Education|matauranga|parenting|skills")

rangatiratanga <- c("self-management|Rangitiratanga|custody|police|court|CYFS|advocacy|Oranga Tamariki|rangatiratanga|section 101|EPOA|Familly issues")

2 Answers 2

1

You may use case_when with grepl and a regex alternation:

statement$col <- case_when(
    grepl("(addiction|mental|Diabetes|health|healthy|Oranga|unwell|AOD| well| surgery|dental|recovery|oranga|Mirimiri|asthma|anger|checks|alcohol|pregnant|clinical|clinic)", statement$stmt) ~ "APC",
    grepl("(whanau direct|whānau direct|money|transport|home|repairs|social|budget|job|housing|house|financial|finance|Ohanga|furniture|accommodation|welfare|living|work|babies arrival|AT hop card|Entitlements|ohunga|bills|electricity|water|employment)", statement$stmt) ~ "PDP",
    grepl("(Kaupapa|Te reo|language|Tikanga|Iwi|relationship|Tikinga|Reunite)", statement$stmt) ~ "APGA",
    grepl("(Studying|training|NCEA|ECE|Counseling|counsel|Knowledge|School|Education|matauranga|parenting|skills)", statement$stmt) ~ "APP",
    grepl("(self-management|Rangitiratanga|custody|police|court|CYFS|advocacy|Oranga Tamariki|rangatiratanga|section 101|EPOA|Familly issues)", statement$stmt) ~ "rangatiratanga",
    TRUE ~ NA_character_
)
Sign up to request clarification or add additional context in comments.

6 Comments

Hi @Tim Biegeleisen, can you check your code? It throws up an error: Error: must be a character vector, not a logical vector.
@William I have tested my general syntax locally and it is working. Maybe you copied my code wrongly or maybe have you changed it from you see exactly above?
@William @Tim You probably need to change it to TRUE ~ NA_character_.
@RonakShah Thanks...I would have thought that a character vector can store an NA value.
@TimBiegeleisen Actually class of NA is logical but it would have still worked with base::ifelse because it automatically converts logical NA to character NA. However, dplyr::if_else or dplyr::case_when don't do that. You need to explicitly specify type of NA.
|
0

Thanks to @Tim Biegeleisen, but detecting strings ordinarily using case_when() & grepl() may throw up errors, if cases are not ignored. The grepl() can include ignore.case = T argument in order to make string matching case insensitive, such as in the code below:

statement$col <- case_when(
      grepl(ignore.case = T, "(addiction|mental|Diabetes|health|healthy|Oranga|unwell|AOD| well| surgery|dental|recovery|oranga|Mirimiri|asthma|anger|checks|alcohol|pregnant|clinical|clinic)", statement$stmt) ~ "APC",
      grepl(ignore.case = T, "(whanau direct|whānau direct|money|transport|home|repairs|social|budget|job|housing|house|financial|finance|Ohanga|furniture|accommodation|welfare|living|work|babies arrival|AT hop card|Entitlements|ohunga|bills|electricity|water|employment)", statement$stmt) ~ "PDP",
      grepl(ignore.case = T, "(Kaupapa|Te reo|language|Tikanga|Iwi|relationship|Tikinga|Reunite)", statement$stmt) ~ "APGA",
      grepl(ignore.case = T, "(Studying|training|NCEA|ECE|Counseling|counsel|Knowledge|School|Education|matauranga|parenting|skills)", statement$stmt) ~ "APP",
      grepl(ignore.case = T, "(self-management|Rangitiratanga|custody|police|court|CYFS|advocacy|Oranga Tamariki|rangatiratanga|section 101|EPOA|Familly issues)", statement$stmt) ~ "rangatiratanga",
      TRUE ~ NA_character_
    )

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.