Dynamic variables from dataframe value in R with value names?

Question

Given a dataframe of types and values like so:

topic	keyword
cheese	cheddar
meat	beef
meat	chicken
cheese	swiss
bread	focaccia
bread	sourdough
cheese	gouda

My aim is to make a set of dynamic regexs based on the type, but I don't know how to make the variable names from the types. I can do this individually like so:

fn_get_topic_regex <- function(targettopic,df)
{
  filter_df <- df |>
    filter(topic == targettopic)
  regex <- paste(filter_df$keyword, collapse =  "|")
}

and do things like:

cheese_regex <- fn_get_topic_regex("cheese",df)

But what I'd like to be able to do is build all these regexes automatically without having to define each one.

The intended output would be something like:

cheese_regex: "cheddar|swiss|gouda"
bread_regex: "focaccia|sourdough"
meat_regex: "beef|chicken"

Where the start of the variable name is the distinct topic.

What's the best way to do that without defining each regex individually by hand?

Function assign() may help you

asd-tm
– asd-tm

2022-11-26 20:12:33 +00:00
Commented Nov 26, 2022 at 20:12 — asd-tm
– asd-tm, Commented Nov 26, 2022 at 20:12

Ricardo Semião · Accepted Answer · 2022-11-26 20:21:20Z

You can use dplyr's group_by() and summarise()

df %>%
  group_by(topic) %>%
  summarise(regex = paste(keyword, collapse = "|"))

# A tibble: 3 × 2
  topic  regex              
  <chr>  <chr>              
1 bread  focaccia|sourdough 
2 cheese cheddar|swiss|gouda
3 meat   beef|chicken

Or you can apply your function to every unique value in df$topic:

map_chr(unique(df$topic) %>% setNames(paste0(., "_regex")),
        fn_get_topic_regex, df = df)

         cheese_regex            meat_regex           bread_regex 
"cheddar|swiss|gouda"        "beef|chicken"  "focaccia|sourdough"

Just remember to add return(regex) to the end of your function, or not to assign the last line to a variable at all. I would even put everything in a single pipe chain:

fn_get_topic_regex <- function(targettopic,df)
{
  df |>
    filter(topic == targettopic) |>
    pull(keyword) |>
    paste(collapse =  "|")
}

Dave2e · Accepted Answer · 2022-11-26 20:24:03Z

2

Here is a base R solution with your intended output in a named list.

df <- structure(list(topic = c("cheese", "meat", "meat", "cheese", "bread", "bread", "cheese"), 
                     keyword = c("cheddar", "beef", "chicken", "swiss", "focaccia", "sourdough", "gouda")), 
                class = "data.frame", row.names = c(NA, -7L))

#split into a list per topic
topics <- split(df, df$topic)

#collapse the keyword column
topics <- lapply(topics, function(t) {
   paste(t$keyword, collapse =  "|")
})

#rename
names(topics)<- paste0(names(topics), "_regex")

topics

$bread_regex
[1] "focaccia|sourdough"

$cheese_regex
[1] "cheddar|swiss|gouda"

$meat_regex
[1] "beef|chicken"

answered Nov 26, 2022 at 20:24

Dave2e

24.3k18 gold badges46 silver badges57 bronze badges

Comments

TarJae · Accepted Answer · 2022-11-26 20:24:12Z

We could do something like this:

after grouping we could use summarise together with paste and collapse to get our regex s
Then, when the regex is needed we could refer to it by indexing like the example below:

library(dplyr)
library(stringr) #str_detect
my_regex <- df %>% 
  group_by(topic) %>% 
  summarise(regex = paste(keyword, collapse = "|"))

df %>% 
  mutate(new_col = ifelse(str_detect(keyword, my_regex$regex[1]), "it is bread", "it is not bread"))

 A tibble: 3 × 2
  topic  regex              
  <chr>  <chr>              
1 bread  focaccia|sourdough 
2 cheese cheddar|swiss|gouda
3 meat   beef|chicken       
> df %>% 
+   mutate(new_col = ifelse(str_detect(keyword, my_regex$regex[1]), "it is bread", "it is not bread"))
   topic   keyword         new_col
1 cheese   cheddar it is not bread
2   meat      beef it is not bread
3   meat   chicken it is not bread
4 cheese     swiss it is not bread
5  bread  focaccia     it is bread
6  bread sourdough     it is bread
7 cheese     gouda it is not bread

Collectives™ on Stack Overflow

Dynamic variables from dataframe value in R with value names?

3 Answers 3

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related