1

I have df that looks like this, and I would like to build a new variableMain if Math|ELA in Subject. The sample data and my codes are:

df<- structure(list(Subject = c("Math", "Math,ELA", "Math,ELA, PE", 
"PE, Math", "ART,ELA", "PE,ART")), row.names = c(NA, -6L), class = c("tbl_df", 
"tbl", "data.frame"))

df<-df %>%
+ mutate(Main=case_when (grepl("Math|ELA", Subject)~ paste0(str_extract_all(df$Subject, "Math|ELA"))))

However my outcome looks like following, not the one I like. What did I do wrong? I feel that my codes complicated the simple step. Any better solution?

enter image description here

2 Answers 2

1

str_extract_all returns a list. We need to loop over the list and paste/str_c

library(dplyr)
library(stringr)
library(purrr)
df %>%
  mutate(Main = case_when(grepl("Math|ELA", Subject)~ 
        map_chr(str_extract_all(Subject, "Math|ELA"), toString)))

-output

# A tibble: 6 x 2
#  Subject      Main     
#  <chr>        <chr>    
#1 Math         Math     
#2 Math,ELA     Math, ELA
#3 Math,ELA, PE Math, ELA
#4 PE, Math     Math     
#5 ART,ELA      ELA      
#6 PE,ART       <NA> 

Or another option is separate_rows from tidyr

library(tidyr)
df %>% 
  mutate(rn = row_number()) %>% 
  separate_rows(Subject) %>% 
  group_by(rn) %>%
  summarise(Main = toString(intersect(Subject, c("Math", "ELA"))), 
       .groups = 'drop') %>% 
  select(Main) %>%
  bind_cols(df, .)

NOTE: paste by itself doesn't do anything and in a list, we need to loop over the list


Or another option is to use

trimws(gsub("(Math|ELA)(*SKIP)(*FAIL)|\\w+", "", df$Subject, perl = TRUE), whitespace = ",\\s*")
#[1] "Math"     "Math,ELA" "Math,ELA" "Math"     "ELA"      ""     
Sign up to request clarification or add additional context in comments.

4 Comments

Thanks. If we don't use str_extract_all, any better way?
@Stataq you could use separate_rows and then extract the strings as well
@Stataq updatd with another option
Could you also show me how to use separate_rows to do this? Many thanks.
1

Here is a base R option using regmatches

transform(
  df,
  Main = sapply(
    regmatches(Subject, gregexpr("Math|ELA", Subject)),
    function(x) replace(toString(x), !length(x), NA)
  )
)

which gives

       Subject      Main
1         Math      Math
2     Math,ELA Math, ELA
3 Math,ELA, PE Math, ELA
4     PE, Math      Math
5      ART,ELA       ELA
6       PE,ART      <NA>

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.