Converting multiple columns from numeric to character in r

Question

I have a dataframe called diary00 with multiple columns that starts with "act1". These columns contain numeric values.

I want to categorise these numeric values into 3 groups. Say, I want to classify 1,2,3,4 as leisure, 5,6,7 as work and 8,9,0 as home.

Is there a way for me to use either the function starts_with("act1") or "^act1" then convert the numeric values to character all at once?

I tried using the mutate() and recode() functions.

mutate(act1_001 = recode (act1_001, 1,2,3,4 = "leisure")

but an error returns:

Error: unexpected '=' in:
"  mutate(act1_001 = recode(act1_001, 1,2,3,4 = "leisure")

look into the function called cut or case_when stackoverflow.com/questions/39123458/… — Onyambu
– Onyambu, Commented Dec 8, 2023 at 19:16

dufei · Accepted Answer · 2023-12-08 19:30:08Z

2

To apply the same transformation to multiple columns, use across():

library(tidyverse)

diary00 <- tibble(
  act1_a = sample(0:9, 10),
  act1_b = sample(0:9, 10)
)

diary00 |> 
  mutate(across(
    starts_with("act1"),
    \(x) case_when(
      x %in% 1:4 ~ "leisure",
      x %in% 5:7 ~ "work",
      x %in% c(8, 9, 0) ~ "home"
    )
  ))
#> # A tibble: 10 × 2
#>    act1_a  act1_b 
#>    <chr>   <chr>  
#>  1 leisure work   
#>  2 home    leisure
#>  3 work    leisure
#>  4 leisure home   
#>  5 leisure work   
#>  6 work    leisure
#>  7 home    work   
#>  8 leisure leisure
#>  9 work    home   
#> 10 home    home

^{Created on 2023-12-08 with reprex v2.0.2}

answered Dec 8, 2023 at 19:30

dufei

3,4891 gold badge15 silver badges29 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

TarJae · Accepted Answer · 2023-12-08 20:08:06Z

0

Note input of @Onyambu:

library(dplyr)

diary00 %>%
  mutate(across(starts_with("act1"), ~cut(
    .,
    breaks = c(-1, 4, 7, 9), 
    labels = c("leisure", "work", "home"),
    include.lowest = TRUE,
    right = FALSE
  )))

Here is a version using cut:

set.seed(123) 
diary00 <- data.frame(
  act1a = sample(0:9, 10, replace = TRUE),
  act1b = sample(0:9, 10, replace = TRUE),
  act1c = sample(0:9, 10, replace = TRUE),
  other_column = sample(0:9, 10, replace = TRUE)
)

library(dplyr)

diary00 %>%
  mutate(across(starts_with("act1"), ~cut(
    .,
    breaks = c(-Inf, 0.5, 4.5, 7.5, Inf), 
    labels = c("home", "leisure", "work", "home"),
    include.lowest = TRUE
  )))

    act1a   act1b   act1c other_column
1  leisure    home    home            5
2     work    work    home            4
3     home    work    home            4
4  leisure    home    work            5
5     work    home leisure            1
6  leisure    home    work            4
7     home leisure    home            6
8     home    work    home            5
9     work    home leisure            4
10 leisure    home    work            7

edited Dec 8, 2023 at 20:08

answered Dec 8, 2023 at 19:37

TarJae

80.2k6 gold badges30 silver badges94 bronze badges

2 Comments

Onyambu Over a year ago

why would you use decimals? consider using cut(1:10, c(1,4,7, 10),c("leisure", "work", "home"), include.lowest = T)

TarJae Over a year ago

@Onyambu. You are correct; in this case, it's better to use cut(1:10, c(1, 4, 7, 10), c("leisure", "work", "home"), include.lowest = TRUE) since all values are integers. I must admit that I regularly encounter problems using cut to set the correct boundaries, especially with decimal numbers like BMI. To solidify the boundaries in my mind, I often resort to using decimals.

SAL · Accepted Answer · 2023-12-08 23:35:05Z

0

you can also use dplyr::case_match() as a roughly equivalent to case_when used by @dufei:

diary00 |> 
  mutate(across(starts_with("act1"),\(x)
  case_match(x,1:4 ~ "leisure"
              ,5:7 ~ "work"
              ,c(0,8,9) ~ "home"
             )
  ))

answered Dec 8, 2023 at 23:35

SAL

2,2862 gold badges8 silver badges17 bronze badges

Comments

jay.sf · Accepted Answer · 2023-12-09 08:54:46Z

Using cut with data.table.

> library(data.table)
> cols <- grep('^act1_[0-9]$', names(dat), value=TRUE)
> setDT(dat)[, (paste0(cols, '_cat')) := lapply(.SD, cut, breaks=c(-1, 0, 4, 7, 9), 
+                                           labels=c('home', 'leisure', 'work', 'home')), 
+            .SDcols=cols]
> dat
    X1 X2 act1_1 act1_2 act1_1_cat act1_2_cat
 1:  X  X      0      2       home    leisure
 2:  X  X      4      8    leisure       home
 3:  X  X      0      8       home       home
 4:  X  X      8      3       home    leisure
 5:  X  X      9      4       home    leisure
 6:  X  X      3      4    leisure    leisure
 7:  X  X      1      3    leisure    leisure
 8:  X  X      9      1       home    leisure
 9:  X  X      0      7       home       work
10:  X  X      7      2       work    leisure
11:  X  X      6      9       work       home
12:  X  X      3      0    leisure       home
13:  X  X      8      9       home       home
14:  X  X      4      7    leisure       work
15:  X  X      3      5    leisure       work
16:  X  X      9      9       home       home
17:  X  X      1      7    leisure       work

Data:

> dput(dat)
structure(list(X1 = c("X", "X", "X", "X", "X", "X", "X", "X", 
"X", "X", "X", "X", "X", "X", "X", "X", "X"), X2 = c("X", "X", 
"X", "X", "X", "X", "X", "X", "X", "X", "X", "X", "X", "X", "X", 
"X", "X"), act1_1 = c(0L, 4L, 0L, 8L, 9L, 3L, 1L, 9L, 0L, 7L, 
6L, 3L, 8L, 4L, 3L, 9L, 1L), act1_2 = c(2L, 8L, 8L, 3L, 4L, 4L, 
3L, 1L, 7L, 2L, 9L, 0L, 9L, 7L, 5L, 9L, 7L)), class = "data.frame", row.names = c(NA, 
-17L))

Collectives™ on Stack Overflow

Converting multiple columns from numeric to character in r

4 Answers 4

Comments

2 Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

Comments

2 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related