0

I'm trying to get a value in a column to set as the column name. The characters that come before a colon should be the column name.

df = cbind.data.frame(
    id = c(1, 2 ,3, 4, 5),
    characteristics_ch1 = c("gender: Female", "gender: Male", "gender: Female", "gender: Male", "gender: Female"),
    characteristics_ch1.1 = c("Thing One: a", "Thing One: a", "Thing One: a", "Thing One: b", "Thing One: b"),
    characteristics_ch1.2 = c("age: 60", "age: 45", "age: 63", "age: 56", "age: 65"))

For columns 2-5 I'd like to remove "gender: ", "Thing One: ", and "age: " making them the name of their respective columns.

The resulting data frame would be:

Result = cbind.data.frame(
        id = c(1, 2 ,3, 4, 5),
        gender = c("Female", "Male", "Female", "Male", "Female"),
        `Thing One` = c("a", "a", "a", "b", "b"),
        age = c("60", "45", "63", "56", "65")
)

To do this I'm running the following function:

re_col = function(i){
        new_name = str_split_fixed(i, ": ", 2)[1]
        return(assign(new_name, str_split_fixed(i, ": ", 2)[,2]))
}

Through the following applying functions:

plyr::colwise(re_col)(df)

#and

purrr::map(df, re_col)

Without success.

There could also be a much better approach. I initially tried to write a function that could be used with dplyr in data cleaning as a %>% step but was unsuccessful.

2 Answers 2

1

We can gather the data frame to long-format, separate the value column by :, and then spread the data frame back to wide-format.

library(tidyverse)

df2 <- df %>%
  gather(Column, Value, -id) %>%
  separate(Value, into = c("New_Column", "Value"), sep = ": ") %>%
  select(-Column) %>%
  spread(New_Column, Value, convert = TRUE)
df2
#   id age gender Thing One
# 1  1  60 Female         a
# 2  2  45   Male         a
# 3  3  63 Female         a
# 4  4  56   Male         b
# 5  5  65 Female         b
Sign up to request clarification or add additional context in comments.

1 Comment

Awesome! This is an amazingly simple solution, wish I had thought of it!
1

A workaround, using stringi to split the data-values by a regex pattern supplied to whichever columns specified

rename.df_cols <- function(df, rgx_pattern = NULL, col_idx = NULL,...){
    if(max(col_idx) > ncol(df)){
        col_idx <- min(col_idx):ncol(df)
    }
    o <- lapply(col_idx, function(i){
    
        parts <- stri_split_regex(df[[i]], rgx_pattern, simplify = T)
        col_name <- unique(parts[,1])
        new_dat <- parts[,2]
        
        colnames(df)[[i]] <<- col_name
        df[[i]] <<- new_dat
    })
    return(df)
}

> df
  id characteristics_ch1 characteristics_ch1.1 characteristics_ch1.2
1  1      gender: Female          Thing One: a               age: 60
2  2        gender: Male          Thing One: a               age: 45
3  3      gender: Female          Thing One: a               age: 63
4  4        gender: Male          Thing One: b               age: 56
5  5      gender: Female          Thing One: b               age: 65
> rename.df_cols(df = df, col_idx = 2:4, rgx_pattern = "(\\s+)?\\:(\\s+)?")
  id gender Thing One age
1  1 Female         a  60
2  2   Male         a  45
3  3 Female         a  63
4  4   Male         b  56
5  5 Female         b  65

Is that what you're looking for?

EDIT with pipe:

> df %>% rename.df_cols(rgx_pattern = "(\\s+)?\\:(\\s+)?", col_idx = 2:5)
  id gender Thing One age
1  1 Female         a  60
2  2   Male         a  45
3  3 Female         a  63
4  4   Male         b  56
5  5 Female         b  65

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.