4

assuming I have a data set that looks something like this:

df <- data.frame("age" = c(55, 21, 11), 
                 "sex" = c("m", "m", "f"),
                 "a2" = c(0,1,0),
                 "a3" = c(0,0,0),
                 "a4" = c(0,1,0),
                 "b2" = c(0,0,1),
                 "b3" = c(0,0,0),
                 "b4" = c(1,0,1),
                 "c2" = c(0,1,0)
                 )
age sex a2 a3 a4 b2 b3 b4 c2
 55   m  0  0  0  0  0  1  0
 21   m  1  0  1  0  0  0  1
 11   f  0  0  0  1  0  1  0

How would I be able to transform it into long format and change the values of columns a1-c1 so that every column has the value of its name? e. g. in a2 0=1 and 1=2?

I tried it with pivot_longer like this:

df %>% pivot_longer(
   cols = stars_with("a"),
   names_to = "A"
)

My desired output looks like this:

age sex a2 a3 a4 b2 b3 b4 c2
 55   m  1  1  1  1  1  4  1
 21   m  2  1  4  1  3  1  2
 11   f  1  1  1  2  1  4  1

Thanks!

2
  • I'm not clear what your desired output is. a3=0 becomes 1 or ?? And 1 becomes 2 or ??3. You ask for longer a* to c* but are only using starts with a? Can you post a reproducable example i.e. the code to make df (see dput) and what you are hoping to see? Commented Mar 20, 2021 at 23:17
  • @CALUMPolwart Hi! I edited my post. Basically I want the number in the column name as the value for 1. I dont use a-c because in the data set they are not in that order. Commented Mar 20, 2021 at 23:37

5 Answers 5

4

This is a good question because you cannot access column names within mutate_at, which is how you would normally mutate across a list of variables. Here is a solution using map_dfc from purrr:

library(purrr)
library(dplyr)
library(tidyr)
library(readr)

We define a function (although not necessary but helpful to avoid foreign syntax if you are not accustomed to purrr.

my_fun <- function(x) transmute(df, !!x := ifelse(!!sym(x) == 0, 1, parse_number(x)))

transmute to avoid duplicate columns

!!x := is a way to assign a column name based on a string from a variable. In our case, a column name.

!!sym() converts our variable name from a string to a symbol. Without this, for column 'a2' this would read "a2" == 0 thus replacing with the column number. Refer to this answer for an in-depth explanation.

parse_number will find the number within the column name

ifelse states if the column value is equal to 0, then the value equals 1, otherwise the value is equal to the number in the column name

var_names <- c("a2","a3","a4","b2","b3","b4","c2")

map_dfc(var_names, my_fun)
Sign up to request clarification or add additional context in comments.

2 Comments

Can you please add a brief blurb about what sym() is doing here?
@iamericfletcher I have edited the explanation in.
4

pivot_longer is not appropriate strategy here. Follow mutate with across.

  • across will pass desired columns into mutate statement
  • cur_column() will pass current colnames into syntax
library(dplyr)
library (stringr) 
df %>% mutate(across(-c("age", "sex"), ~ ifelse(. == 0, 1, as.numeric(str_extract(cur_column(), "\\d$")))))

  age sex a2 a3 a4 b2 b3 b4 c2
1  55   m  1  1  1  1  1  4  1
2  21   m  2  1  4  1  1  1  2
3  11   f  1  1  1  2  1  4  1

Comments

2

In base R you could do:

# extract numbers from column names
vals <- as.numeric(gsub("[^0-9]+", "", colnames(df)[-c(1:2)]))

# replace 0 with 1 and 1 with extracted number in each column
df[, -c(1:2)] <- sapply(seq_along(vals), function(x) ifelse(df[, x+2]==0, 1, vals[x]))

1 Comment

similar df[, -c(1:2)] <- Map('^', vals, df[, -c(1:2)])
2

Base R option using Map :

inds <- -(1:2)
cols <- as.integer(sub('[a-z](\\d+)', '\\1', names(df)[inds]))
df[inds] <- Map(function(x, y) ifelse(x == 1, y, 1), df[inds], cols)
df

#  age sex a2 a3 a4 b2 b3 b4 c2
#1  55   m  1  1  1  1  1  4  1
#2  21   m  2  1  4  1  1  1  2
#3  11   f  1  1  1  2  1  4  1

Comments

0

One possibility (though may not be the most efficient)

library(tidyverse)
as_tibble(df) %>% transmute(
  a2 = case_when(a2 == 1 ~ 2, a2 == 0 ~ 1),
  a3 = case_when(a3 == 1 ~ 3, a3 == 0 ~ 1),
  a4 = case_when(a4 == 1 ~ 4, a4 == 0 ~ 1),
  b2 = case_when(b2 == 1 ~ 2, b2 == 0 ~ 1),
  b3 = case_when(b3 == 1 ~ 3, b3 == 0 ~ 1),
  b4 = case_when(b4 == 1 ~ 4, b4 == 0 ~ 1),
  c2 = case_when(c2 == 1 ~ 2, c2 == 0 ~ 1)
)

If you wish to keep the other columns use mutate() instead of transmute().

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.