6

I would like to create a new column based on the results of str_detect across multiple columns using across.

For example, in the test data below, I'd like to search for "No job" across columns that start with "job", then return 1 if that string is detected in any of the columns, and 0 if it is not.

test_data <-  data.frame("job1" = c('Sales','Baker','Blacksmith','Brewer'), 
                         "job2" = c('Mailman','Jockey','Jobhunter',"No job"),
                         "id" = c("id_1", "id_2", "id_3", "id_4"))

# Output I'd like:

#         job1      job2   id no_job
#1      Sales   Mailman id_1      0
#2      Baker    Jockey id_2      0
#3 Blacksmith Jobhunter id_3      0
#4     Brewer    No job id_4      1

I know I could unite the columns that start with "job", and then just use str_detect on that new column like this:

test_data2 <- test_data %>%
    unite(col = "all_jobs", starts_with("job"), sep = ", ", remove = FALSE) %>%
    mutate(no_job = if_else(str_detect(all_jobs, "No job"), 1, 0))

... but I was wondering if there was a way to use across to do the same thing. I'd tried variations on the following but haven't gotten it to work.

test_data2 <- test_data %>%
    mutate(no_job = if_else(across(starts_with("job"), str_detect(., "No job")), 1, 0))

3 Answers 3

10

One option could be:

test_data %>%
 rowwise() %>%
 mutate(no_job = +any(str_detect(c_across(-id), "No job")))

  job1       job2      id    no_job
  <fct>      <fct>     <fct>  <int>
1 Sales      Mailman   id_1       0
2 Baker      Jockey    id_2       0
3 Blacksmith Jobhunter id_3       0
4 Brewer     No job    id_4       1
Sign up to request clarification or add additional context in comments.

5 Comments

In case "No job" appears in multiple columns, you could use mutate(no_job = as.numeric(any(str_detect(c_across(-id), "No job"))))
What's the deal with the + ther in that mutate()? Could you please explain? I have never encountered that before, and I am very curious!!
@Dunois it is the same as as.numeric(), i.e. it just converts a logical vector to a numerical one.
@Dunois, is there a way to apply this to specific columns? I have a larger dataset and I would just like to apply this for specific columns.
@flxflks I'm not the OP, but in general, you can use selection verbs like starts_with() inside c_across() to specify a subset of columns. Or even more generally, you could just use across() within mutate().
2

I encountered a similar problem and this is a possible solution using case_when:

test_data %>% mutate(no_job = case_when(if_any(str_detect(starts_with("job"), "No job"))~1))

Comments

0

Another solution could be pmap from purrr family:

library(tidyverse)
test_data %>% 
mutate(no_job=pmap_int(across(1:2), ~ str_detect(c(...), "No job") %>% sum()))

output:

        job1      job2   id no_job
1      Sales   Mailman id_1      0
2      Baker    Jockey id_2      0
3 Blacksmith Jobhunter id_3      0
4     Brewer    No job id_4      1

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.