how to build a new variable by extract a string from another variable

Question

I have df that looks like this, and I would like to build a new variableMain if Math|ELA in Subject. The sample data and my codes are:

df<- structure(list(Subject = c("Math", "Math,ELA", "Math,ELA, PE", 
"PE, Math", "ART,ELA", "PE,ART")), row.names = c(NA, -6L), class = c("tbl_df", 
"tbl", "data.frame"))

df<-df %>%
+ mutate(Main=case_when (grepl("Math|ELA", Subject)~ paste0(str_extract_all(df$Subject, "Math|ELA"))))

However my outcome looks like following, not the one I like. What did I do wrong? I feel that my codes complicated the simple step. Any better solution?

akrun · Accepted Answer · 2021-04-19 19:05:56Z

1

str_extract_all returns a list. We need to loop over the list and paste/str_c

library(dplyr)
library(stringr)
library(purrr)
df %>%
  mutate(Main = case_when(grepl("Math|ELA", Subject)~ 
        map_chr(str_extract_all(Subject, "Math|ELA"), toString)))

-output

# A tibble: 6 x 2
#  Subject      Main     
#  <chr>        <chr>    
#1 Math         Math     
#2 Math,ELA     Math, ELA
#3 Math,ELA, PE Math, ELA
#4 PE, Math     Math     
#5 ART,ELA      ELA      
#6 PE,ART       <NA>

Or another option is separate_rows from tidyr

library(tidyr)
df %>% 
  mutate(rn = row_number()) %>% 
  separate_rows(Subject) %>% 
  group_by(rn) %>%
  summarise(Main = toString(intersect(Subject, c("Math", "ELA"))), 
       .groups = 'drop') %>% 
  select(Main) %>%
  bind_cols(df, .)

NOTE: paste by itself doesn't do anything and in a list, we need to loop over the list

Or another option is to use

trimws(gsub("(Math|ELA)(*SKIP)(*FAIL)|\\w+", "", df$Subject, perl = TRUE), whitespace = ",\\s*")
#[1] "Math"     "Math,ELA" "Math,ELA" "Math"     "ELA"      ""

edited Apr 19, 2021 at 19:05

answered Apr 19, 2021 at 18:45

akrun

891k38 gold badges590 silver badges700 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Stataq Over a year ago

Thanks. If we don't use str_extract_all, any better way?

akrun Over a year ago

@Stataq you could use separate_rows and then extract the strings as well

akrun Over a year ago

@Stataq updatd with another option

Stataq Over a year ago

Could you also show me how to use separate_rows to do this? Many thanks.

ThomasIsCoding · Accepted Answer · 2021-04-19 20:39:55Z

1

Here is a base R option using regmatches

transform(
  df,
  Main = sapply(
    regmatches(Subject, gregexpr("Math|ELA", Subject)),
    function(x) replace(toString(x), !length(x), NA)
  )
)

which gives

       Subject      Main
1         Math      Math
2     Math,ELA Math, ELA
3 Math,ELA, PE Math, ELA
4     PE, Math      Math
5      ART,ELA       ELA
6       PE,ART      <NA>

answered Apr 19, 2021 at 20:39

ThomasIsCoding

106k9 gold badges38 silver badges110 bronze badges

Collectives™ on Stack Overflow

how to build a new variable by extract a string from another variable

2 Answers 2

4 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

4 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related