1

I have a dataframe called diary00 with multiple columns that starts with "act1". These columns contain numeric values.

I want to categorise these numeric values into 3 groups. Say, I want to classify 1,2,3,4 as leisure, 5,6,7 as work and 8,9,0 as home.

Is there a way for me to use either the function starts_with("act1") or "^act1" then convert the numeric values to character all at once?

I tried using the mutate() and recode() functions.

mutate(act1_001 = recode (act1_001, 1,2,3,4 = "leisure")

but an error returns:

Error: unexpected '=' in:
"  mutate(act1_001 = recode(act1_001, 1,2,3,4 = "leisure")
1

4 Answers 4

2

To apply the same transformation to multiple columns, use across():

library(tidyverse)

diary00 <- tibble(
  act1_a = sample(0:9, 10),
  act1_b = sample(0:9, 10)
)

diary00 |> 
  mutate(across(
    starts_with("act1"),
    \(x) case_when(
      x %in% 1:4 ~ "leisure",
      x %in% 5:7 ~ "work",
      x %in% c(8, 9, 0) ~ "home"
    )
  ))
#> # A tibble: 10 × 2
#>    act1_a  act1_b 
#>    <chr>   <chr>  
#>  1 leisure work   
#>  2 home    leisure
#>  3 work    leisure
#>  4 leisure home   
#>  5 leisure work   
#>  6 work    leisure
#>  7 home    work   
#>  8 leisure leisure
#>  9 work    home   
#> 10 home    home

Created on 2023-12-08 with reprex v2.0.2

Sign up to request clarification or add additional context in comments.

Comments

0

Note input of @Onyambu:

library(dplyr)

diary00 %>%
  mutate(across(starts_with("act1"), ~cut(
    .,
    breaks = c(-1, 4, 7, 9), 
    labels = c("leisure", "work", "home"),
    include.lowest = TRUE,
    right = FALSE
  )))

Here is a version using cut:

set.seed(123) 
diary00 <- data.frame(
  act1a = sample(0:9, 10, replace = TRUE),
  act1b = sample(0:9, 10, replace = TRUE),
  act1c = sample(0:9, 10, replace = TRUE),
  other_column = sample(0:9, 10, replace = TRUE)
)

library(dplyr)

diary00 %>%
  mutate(across(starts_with("act1"), ~cut(
    .,
    breaks = c(-Inf, 0.5, 4.5, 7.5, Inf), 
    labels = c("home", "leisure", "work", "home"),
    include.lowest = TRUE
  )))


    act1a   act1b   act1c other_column
1  leisure    home    home            5
2     work    work    home            4
3     home    work    home            4
4  leisure    home    work            5
5     work    home leisure            1
6  leisure    home    work            4
7     home leisure    home            6
8     home    work    home            5
9     work    home leisure            4
10 leisure    home    work            7

2 Comments

why would you use decimals? consider using cut(1:10, c(1,4,7, 10),c("leisure", "work", "home"), include.lowest = T)
@Onyambu. You are correct; in this case, it's better to use cut(1:10, c(1, 4, 7, 10), c("leisure", "work", "home"), include.lowest = TRUE) since all values are integers. I must admit that I regularly encounter problems using cut to set the correct boundaries, especially with decimal numbers like BMI. To solidify the boundaries in my mind, I often resort to using decimals.
0

you can also use dplyr::case_match() as a roughly equivalent to case_when used by @dufei:

diary00 |> 
  mutate(across(starts_with("act1"),\(x)
  case_match(x,1:4 ~ "leisure"
              ,5:7 ~ "work"
              ,c(0,8,9) ~ "home"
             )
  ))

Comments

0

Using cut with data.table.

> library(data.table)
> cols <- grep('^act1_[0-9]$', names(dat), value=TRUE)
> setDT(dat)[, (paste0(cols, '_cat')) := lapply(.SD, cut, breaks=c(-1, 0, 4, 7, 9), 
+                                           labels=c('home', 'leisure', 'work', 'home')), 
+            .SDcols=cols]
> dat
    X1 X2 act1_1 act1_2 act1_1_cat act1_2_cat
 1:  X  X      0      2       home    leisure
 2:  X  X      4      8    leisure       home
 3:  X  X      0      8       home       home
 4:  X  X      8      3       home    leisure
 5:  X  X      9      4       home    leisure
 6:  X  X      3      4    leisure    leisure
 7:  X  X      1      3    leisure    leisure
 8:  X  X      9      1       home    leisure
 9:  X  X      0      7       home       work
10:  X  X      7      2       work    leisure
11:  X  X      6      9       work       home
12:  X  X      3      0    leisure       home
13:  X  X      8      9       home       home
14:  X  X      4      7    leisure       work
15:  X  X      3      5    leisure       work
16:  X  X      9      9       home       home
17:  X  X      1      7    leisure       work

Data:

> dput(dat)
structure(list(X1 = c("X", "X", "X", "X", "X", "X", "X", "X", 
"X", "X", "X", "X", "X", "X", "X", "X", "X"), X2 = c("X", "X", 
"X", "X", "X", "X", "X", "X", "X", "X", "X", "X", "X", "X", "X", 
"X", "X"), act1_1 = c(0L, 4L, 0L, 8L, 9L, 3L, 1L, 9L, 0L, 7L, 
6L, 3L, 8L, 4L, 3L, 9L, 1L), act1_2 = c(2L, 8L, 8L, 3L, 4L, 4L, 
3L, 1L, 7L, 2L, 9L, 0L, 9L, 7L, 5L, 9L, 7L)), class = "data.frame", row.names = c(NA, 
-17L))

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.