1

What I'd like to do is apply a function to multiple columns in a dataframe, recording the output as a new column. To make this clearer, I'd like to take a dataframe of the form:

first_name  last_name   age
   Alice       Smith     45
    Bob       Richards   20

to:

first_name  last_name   age  first_name_lower  last_name_lower
   Alice       Smith     45      alice            smith
    Bob       Richards   20       bob            richards

I can do this column-wise with something like:

df$first_name_lower <- apply(df[,c('first_name')], 1, function(x) str_to_lower(x))
df$last_name_lower <- apply(df[,c('last_name')], 1, function(x) str_to_lower(x))

but of course for multiple columns this isn't a particularly elegant solution.

Thanks!

2
  • You want to get a solution where all character columns get converted to lower case? Or you just need an alternate way of doing it in an elegant way? Commented Jan 3, 2018 at 22:26
  • Ideally all character columns get converted to lower case but are recorded as a new column and we preserve the original data. Commented Jan 3, 2018 at 22:32

2 Answers 2

3

This could work transmute_if takes a predicate and performs and action on all columns satisfying the predicate and throws away all the rest - in this case we use is.character as predicate. Since we want to keep the orignial data we combine both datasets with cbind.
To change the names of the new columns we use select_all to paste "_lower" to end of the column names

dta <- read.table(header = TRUE,sep = ",",stringsAsFactors = FALSE,
                  text = "first_name,last_name,age
Alice,Smith,45
                  Bob,Richards,20")
library(tidyverse)
cbind(dta,
      dta %>%
        transmute_if(is.character,tolower) %>% 
        select_all(funs(paste0(.,"_lower")))))

Hope it helps!

Sign up to request clarification or add additional context in comments.

2 Comments

You are missing a comma after stringsAsFactors = FALSE. With this the there would be multiple columns with same name in data.
@suchait your are right I added a fix to that problem renaming the new columns
2

Using tidyverse solution:

library(tidyverse)
mydf %>% 
    mutate(first_name_lower=first_name,
           last_name_lower=last_name) %>% 
    mutate_at(vars=first_name_lower,last_name_lower), ~ str_to_lower(.)))

Whether you don't want to preserve original variables:

mydf %>% 
        mutate_at(vars(first_name_lower, last_name_lower), ~ str_to_lower(.))

3 Comments

I like your approach, however, using column indices makes me nervous, wouldn't you consider mutate_at(vars(first_name_lower, last_name_lower), ~str_to_lower(.)) an improvement?
Ok, but you have to use ""...in any case I consider it an elegant solution only if you don't have to change more than 3 o 4 variables ;-)
You can use bare column names using vars() as it supports unquoting. That makes vars(colnames(df)) quite handy.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.