0

I'm using R and have a dataset with ~3000 psychological test data. Most of the data is stored as string variables:

> table(rona_full$FQ169_6)

             At no time        Some of the time Less than half the time More than half the time 
                    418                      73                      48                      36 
       Most of the time         All of the time 
                     20                       9 

I want to recode the string data into variables as such:

> table(rona_full$FQ169_6)

  0   1   2   3   4   5 
443  63  39  30  21   9 

Currently, this is my approach:

rona_full$FQ169_6 <-ifelse(rona_full$MQ169_6 == "At no time", 0, #this recodes MDI from string
                    ifelse(rona_full$MQ169_6 == "Some of the time", 1,
                    ifelse(rona_full$MQ169_6 == "Less than half the time", 2,
                    ifelse(rona_full$MQ169_6 == "More than half the time", 3,
                    ifelse(rona_full$MQ169_6 == "Most of the time", 4, 5)))))

I imagine there is a more efficient way to perform this exact same recoding on 25+ columns rather than doing each one this way.

1
  • 1
    Best to provide reproducible example. We don't know if you have a factor variables (then what are the levels) or if it is a straight character? In the latter case your solution is not too bad. Commented Mar 3, 2021 at 9:15

4 Answers 4

1

Arthur's answer is a very good solution using the {data.table} package. In case you are more familiar to the {tidyverse} way-of-life, here is another option.

reprex first

First lets create a random data that looks like a bit like yours. I assume by looking at your current table output that your data is read as factor. You can check with is.factor().

categs <- c("At no time",
            "Some of the time",
            "Less than half the time",
            "More than half the time",
            "Most of the time", 
            "All of the time")

n <- 10
df <- data.frame(
  y   = rnorm(n),
  mq_1 = sample(categs, size = n, replace = T), 
  mq_2 = sample(categs, size = n, replace = T),
  mq_3 = sample(categs, size = n, replace = T)
) %>% 
  mutate(across(starts_with("mq"), factor))

Solution

If the naming of your variable follows a certain logic, you can use starts_with() to select the variables to transform, otherwise, create a vector with the variable names.

vars_to_transf <- c("mq_1", "mq_2", "mq_3")

df_transf <- df %>% mutate(
  across(starts_with("mq_"), .fns = as.numeric) #note: function without parenthesis
)
head(df_transf)
table(df_transf$mq_1)

More generic option

you can also pass a custom function to the mutate(across(.).) functions:

f_trans_factor <- function(x) {
  dplyr::case_when(
    x == "At no time"               ~ 0,
    x == "Some of the time"         ~ 1,
    x == "Less than half the time"  ~ 2,
    x == "More than half the time"  ~ 3,
    x == "Most of the time"         ~ 4,
    x == "All of the time"          ~ 5,
    TRUE                            ~ NA_real_
  )
}
df %>% 
  mutate(
    across(vars_to_transf, .fns = f_trans_factor) #note: function without parenthesis
  )
Sign up to request clarification or add additional context in comments.

Comments

1

{data.table} is really efficient. You can use a custom function to be passed to a desired number of columns. For me this method is flexible and scalable.

install.packages("data.table")
library(data.table)

# dummy data
d <- data.table(id = 1:11, x = LETTERS[1:10], y = LETTERS[5:15])

# make a function suited to your needs
f <- function(x){
      as.numeric(
          fcase(
           x == "A", "1",
           x == "B", "2",
           x == "E", "5",
           x == "F", "6"
           ))}


# list of columns to where the function will be called
l_cols <- c("x", "y")

# test what you want creating new modified columns
#d[, c(paste0("new_",l_cols)) := lapply(.SD, f), .SDcols = l_cols]

# to change by reference your variables: no turning back this time
d[, c(l_cols) := lapply(.SD, f), .SDcols = l_cols]

EDIT: Maybe it would be more clear what´s going on if I use your data:

install.packages("data.table")
library(data.table)

# transform your data.frame to a data.table
d <- data.table::as.data.table(rona_full)

# make a function suited to your needs
f <- function(x){
      as.numeric(
          data.table::fcase(
           x == "At no time", "0",
           x == "Some of the time", "1",
           x == "Less than half the time", "2",
           x == "More than half the time", "3",
           x == "Most of the time", "4"
           ))}

# list of columns to where the function will be called (assuming MQ169_7)
l_cols <- c("MQ169_6", "MQ169_7")
   
# to change by reference your variables
d[, c(l_cols) := lapply(.SD, f), .SDcols = l_cols]

# confirm if it did work
head(d$MQ169_6)

3 Comments

Ok, so as evidenced by my provided code--I'm not much of a coder but I'm trying. When I try to replicate this (exactly as you've provided) I get the error: Error in fcase(x == "A", "1", x == "B", "2", x == "E", "5", x == "F", : could not find function "fcase" I'm sure this is an easy fix but I'm missing it.
fcase() is a function of {data.table}, make sure you have the package installed and loaded. Other option, if you are sure that {data.table} is indeed installed, is to load just the function, using double "::", like so: data.table::fcase(...). To be quite open with you, the tidyverse options given by Marcelo could be more intuitive for a begginer (or anyone else). Good luck!
Got it! Updated my version of R/Studio and {data.table} package fixed that issue.
0

I take it that you want to transform your dataset to factors, and you use the hand-made approach :D

You can get the factor levels by assessing the @.Data slot of a factor variable.

c("Foo", "Foo", "Bar") -> strings
as.factor(strings) -> factors
[email protected]

1 Comment

This is much easier than my hand-made approach (and I got it to work, which means it's not difficult); however as far as I can tell I would still need to apply this to each column rather than scaling up to hundreds?
0

This code will take unique values in a column and assign them a unique number serially:

rona_full$FQ169_6 = as.character(as.integer(factor(rona_full$FQ169_6, levels = unique(rona_full$FQ169_6))))

1 Comment

Useful but still requires me to apply to each column as far as I can see in my practice.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.