5

I have a dataframe that contains columns that have categorical responses. I'd like to perform label enconding of the observations on all the columns at a go

Gender <- c("Male", "Female", "Female", "Male","Male")
School <- c("Primary", "Secondary", "Tertiary", "Primary","Secondary")
Town <- c("HA", "CA", "DD", "HA", "CA")

DF <- data.frame(Gender, School, Town)

So far, I'm able to do this repetitively. e.g.

DF$gender_num <- as.numeric(factor(DF$Gender))
DF$sch_num <- as.numeric(factor(DF$School))
DF$town_num <- as.numeric(factor(DF$Town))

However, I'd like an R code that maybe loops(?) all the columns and performs label encoding. This is because the dataframe I have contains 33 columns that need this feature.

How do I go about this?

2
  • 1
    This is a duplicate: stackoverflow.com/q/27627941 but I closed the other question in favor of this one since this has received better answer(s). Commented Apr 28 at 0:59
  • Thanks, I searched for this on SO, but I see it has a different title than mine Commented Apr 28 at 7:23

7 Answers 7

6

You can use data.matrix().

> data.matrix(DF) |> as.data.frame()
  Gender School Town
1      2      1    3
2      1      2    1
3      1      3    2
4      2      1    3
5      2      2    1

If you're happy with "matrix" format, you can omit as.data.frame().

Sign up to request clarification or add additional context in comments.

Comments

2

mutate_if() is another option.

DF <- DF |> 
  mutate_if(is.character, as.factor) |>
  mutate_if(is.factor, as.numeric)

1 Comment

mutate_if() is superseded by across().
2

With dplyr using across

library(dplyr)

DF %>% 
  mutate(across(everything(), ~ as.numeric(factor(.x)), .names="{.col}_num"))

output

  Gender    School Town Gender_num School_num Town_num
1   Male   Primary   HA          2          1        3
2 Female Secondary   CA          1          2        1
3 Female  Tertiary   DD          1          3        2
4   Male   Primary   HA          2          1        3
5   Male Secondary   CA          2          2        1

Tidyverse solutions used on data frames have the advantage of working directly within the same data class without the necessity of intermediate matrix or list calls.

Comments

2

If the desired labelling follows the factor fashion, nothing can outperform the data.matrix approach provided in the solution by @jay.sf.

If the label follows the order by occurrences of entries per column, you can try Map + match

> list2DF(Map(match, DF, DF))
  Gender School Town
1      1      1    1
2      2      2    2
3      2      3    3
4      1      1    1
5      1      2    2

Comments

1

Base R:

cbind(
  DF,
  lapply(setNames(DF, paste0(names(DF), "_num")), function(z) as.numeric(factor(z)))
)
#   Gender    School Town Gender_num School_num Town_num
# 1   Male   Primary   HA          2          1        3
# 2 Female Secondary   CA          1          2        1
# 3 Female  Tertiary   DD          1          3        2
# 4   Male   Primary   HA          2          1        3
# 5   Male Secondary   CA          2          2        1

Comments

1
DF[paste0(names(DF), "_num")] <- lapply(DF, \(x) match(x, sort(unique(x))))

#   Gender    School Town Gender_num School_num Town_num
# 1   Male   Primary   HA          2          1        3
# 2 Female Secondary   CA          1          2        1
# 3 Female  Tertiary   DD          1          3        2
# 4   Male   Primary   HA          2          1        3
# 5   Male Secondary   CA          2          2        1

Comments

0

model.matrix creates dummies matrices for each column, then which.max gets which column has a 1. Then coerce to data.frame.
A one-liner should do.

Gender <- c("Male", "Female", "Female", "Male","Male")
School <- c("Primary", "Secondary", "Tertiary", "Primary","Secondary")
Town <- c("HA", "CA", "DD", "HA", "CA")
DF <- data.frame(Gender, School, Town)

lapply(DF, \(x) model.matrix(~0 + x) |> apply(1L, which.max)) |> as.data.frame()
#>   Gender School Town
#> 1      2      1    3
#> 2      1      2    1
#> 3      1      3    2
#> 4      2      1    3
#> 5      2      2    1

Created on 2025-04-27 with reprex v2.1.1

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.