1

Given a dataframe of types and values like so:

topic keyword
cheese cheddar
meat beef
meat chicken
cheese swiss
bread focaccia
bread sourdough
cheese gouda

My aim is to make a set of dynamic regexs based on the type, but I don't know how to make the variable names from the types. I can do this individually like so:

fn_get_topic_regex <- function(targettopic,df)
{
  filter_df <- df |>
    filter(topic == targettopic)
  regex <- paste(filter_df$keyword, collapse =  "|")
}

and do things like:

cheese_regex <- fn_get_topic_regex("cheese",df)

But what I'd like to be able to do is build all these regexes automatically without having to define each one.

The intended output would be something like:

cheese_regex: "cheddar|swiss|gouda"
bread_regex: "focaccia|sourdough"
meat_regex: "beef|chicken"

Where the start of the variable name is the distinct topic.

What's the best way to do that without defining each regex individually by hand?

1
  • Function assign() may help you Commented Nov 26, 2022 at 20:12

3 Answers 3

2

You can use dplyr's group_by() and summarise()

df %>%
  group_by(topic) %>%
  summarise(regex = paste(keyword, collapse = "|"))

# A tibble: 3 × 2
  topic  regex              
  <chr>  <chr>              
1 bread  focaccia|sourdough 
2 cheese cheddar|swiss|gouda
3 meat   beef|chicken 

Or you can apply your function to every unique value in df$topic:

map_chr(unique(df$topic) %>% setNames(paste0(., "_regex")),
        fn_get_topic_regex, df = df)

         cheese_regex            meat_regex           bread_regex 
"cheddar|swiss|gouda"        "beef|chicken"  "focaccia|sourdough"

Just remember to add return(regex) to the end of your function, or not to assign the last line to a variable at all. I would even put everything in a single pipe chain:

fn_get_topic_regex <- function(targettopic,df)
{
  df |>
    filter(topic == targettopic) |>
    pull(keyword) |>
    paste(collapse =  "|")
}
Sign up to request clarification or add additional context in comments.

Comments

2

Here is a base R solution with your intended output in a named list.

df <- structure(list(topic = c("cheese", "meat", "meat", "cheese", "bread", "bread", "cheese"), 
                     keyword = c("cheddar", "beef", "chicken", "swiss", "focaccia", "sourdough", "gouda")), 
                class = "data.frame", row.names = c(NA, -7L))

#split into a list per topic
topics <- split(df, df$topic)

#collapse the keyword column
topics <- lapply(topics, function(t) {
   paste(t$keyword, collapse =  "|")
})

#rename
names(topics)<- paste0(names(topics), "_regex")

topics

$bread_regex
[1] "focaccia|sourdough"

$cheese_regex
[1] "cheddar|swiss|gouda"

$meat_regex
[1] "beef|chicken"

Comments

2

We could do something like this:

  1. after grouping we could use summarise together with paste and collapse to get our regex s
  2. Then, when the regex is needed we could refer to it by indexing like the example below:
library(dplyr)
library(stringr) #str_detect
my_regex <- df %>% 
  group_by(topic) %>% 
  summarise(regex = paste(keyword, collapse = "|"))

df %>% 
  mutate(new_col = ifelse(str_detect(keyword, my_regex$regex[1]), "it is bread", "it is not bread"))
 A tibble: 3 × 2
  topic  regex              
  <chr>  <chr>              
1 bread  focaccia|sourdough 
2 cheese cheddar|swiss|gouda
3 meat   beef|chicken       
> df %>% 
+   mutate(new_col = ifelse(str_detect(keyword, my_regex$regex[1]), "it is bread", "it is not bread"))
   topic   keyword         new_col
1 cheese   cheddar it is not bread
2   meat      beef it is not bread
3   meat   chicken it is not bread
4 cheese     swiss it is not bread
5  bread  focaccia     it is bread
6  bread sourdough     it is bread
7 cheese     gouda it is not bread

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.