2

I have a dataframe with and id column and an eats column, and a separate food list. I want to process the dataframe so that a column is added for each food in the food list which is populated with 1 if the food is present in eats and 0 otherwise.

txt <- tibble(id = c(1, 2, 3),
          eats = c("apple, oats, banana, milk, sugar",
                   "oats, banana, sugar",
                   "chocolate, milk, sugar"))

food_list <- c("apple", "oats", "chocolate")

for (i in food_list){
  print(i)
  txt <- txt %>% 
    mutate(!!i := if_else(stringr::str_detect(eats, i), 1, 0))
}

I could do this using a for loop but struggling to do it without a loop. I Will be very grateful if someone can point me to how this can be done without using for loops and instead using the purrr library map functions.

Thanks!

5 Answers 5

5

We could use map as

library(purrr)
library(dplyr)
library(stringr)
txt <- map_dfc(food_list, ~ txt %>%
      transmute(!! .x := +(stringr::str_detect(eats, .x)))) %>% 
    bind_cols(txt, .)

-output

txt
# A tibble: 3 x 5
     id eats                             apple  oats chocolate
  <dbl> <chr>                            <int> <int>     <int>
1     1 apple, oats, banana, milk, sugar     1     1         0
2     2 oats, banana, sugar                  0     1         0
3     3 chocolate, milk, sugar               0     0         1

In base R, this can be done in on-liner

txt[food_list] <- +(sapply(food_list, grepl, x = txt$eats))
Sign up to request clarification or add additional context in comments.

1 Comment

Ohh I just noticed that my answer is nearly similar to yours, but believe me I didn't copy the idea. +1 already :)
4

You can use cbind and str_detect , with map_df:

library(dplyr)
library(purrr)
library(stringr)

cbind(txt, map_dfc(food_list, ~+str_detect(txt$eats, .x))%>%set_names(food_list))

  id                             eats apple oats chocolate
1  1 apple, oats, banana, milk, sugar     1    1         0
2  2              oats, banana, sugar     0    1         0
3  3           chocolate, milk, sugar     0    0         1

Comments

3

Here is an alternative solution:

library(dplyr)
library(tidyr)

txt %>%
  separate_rows(eats, sep = ", ") %>%
  rowwise() %>%
  mutate(ext = match(eats, food_list)) %>%
  drop_na() %>%
  pivot_wider(names_from = eats, values_from = ext, values_fn = length, values_fill = 0) %>%
  right_join(txt, by = "id") %>%
  relocate(id, eats)

# A tibble: 3 x 5
     id eats                             apple  oats chocolate
  <dbl> <chr>                            <int> <int>     <int>
1     1 apple, oats, banana, milk, sugar     1     1         0
2     2 oats, banana, sugar                  0     1         0
3     3 chocolate, milk, sugar               0     0         1

2 Comments

Quite an adventure, going through pretty much the whole tidyverse to get the answer!
Yes it's more tidyr, dplyr and a little bit of base R :)
3

You may use base R's Reduce like this

Reduce(function(a, b) {
  a[[b]] <- +(grepl(b, a[["eats"]]))
  a
}, init = txt, food_list)

# A tibble: 3 x 5
     id eats                             apple  oats chocolate
  <dbl> <chr>                            <int> <int>     <int>
1     1 apple, oats, banana, milk, sugar     1     1         0
2     2 oats, banana, sugar                  0     1         0
3     3 chocolate, milk, sugar               0     0         1

You may also use purrr::reduce similarly, where you can use (i) walrus operator and (ii) bang bang operators, instead of subsetting

library(tidyverse)
txt <- tibble(id = c(1, 2, 3),
              eats = c("apple, oats, banana, milk, sugar",
                       "oats, banana, sugar",
                       "chocolate, milk, sugar"))

food_list <- c("apple", "oats", "chocolate")

reduce(food_list, .init = txt, ~ .x %>% 
         mutate(!!.y := +str_detect(eats, .y))
         )
#> # A tibble: 3 x 5
#>      id eats                             apple  oats chocolate
#>   <dbl> <chr>                            <int> <int>     <int>
#> 1     1 apple, oats, banana, milk, sugar     1     1         0
#> 2     2 oats, banana, sugar                  0     1         0
#> 3     3 chocolate, milk, sugar               0     0         1

Created on 2021-07-29 by the reprex package (v2.0.0)

Comments

1

Add word boundaries (\\b) to the values in food_list so that words are matched completely.

For example, see the difference in outputs in the following case -

library(stringr)
x <- c('apple', 'pineapple')

str_detect(x, 'apple')
#[1] TRUE TRUE

str_detect(x, '\\bapple\\b')
#[1]  TRUE FALSE

The same goes for grepl in base R -

food_list <- c("apple", "oats", "chocolate")
food_pat <- sprintf('\\b%s\\b', food_list)
txt[food_list] <- lapply(food_pat, function(x) as.integer(grepl(x, txt$eats)))
txt

# A tibble: 3 x 5
#     id eats                             apple  oats chocolate
#  <dbl> <chr>                            <int> <int>     <int>
#1     1 apple, oats, banana, milk, sugar     1     1         0
#2     2 oats, banana, sugar                  0     1         0
#3     3 chocolate, milk, sugar               0     0         1

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.