Generating new columns in an R dataframe based on applying a function across multiple columns

Question

What I'd like to do is apply a function to multiple columns in a dataframe, recording the output as a new column. To make this clearer, I'd like to take a dataframe of the form:

first_name  last_name   age
   Alice       Smith     45
    Bob       Richards   20

to:

first_name  last_name   age  first_name_lower  last_name_lower
   Alice       Smith     45      alice            smith
    Bob       Richards   20       bob            richards

I can do this column-wise with something like:

df$first_name_lower <- apply(df[,c('first_name')], 1, function(x) str_to_lower(x))
df$last_name_lower <- apply(df[,c('last_name')], 1, function(x) str_to_lower(x))

but of course for multiple columns this isn't a particularly elegant solution.

Thanks!

You want to get a solution where all character columns get converted to lower case? Or you just need an alternate way of doing it in an elegant way? — sm925
– sm925, Commented Jan 3, 2018 at 22:26
Ideally all character columns get converted to lower case but are recorded as a new column and we preserve the original data. — anthr
– anthr, Commented Jan 3, 2018 at 22:32

Bertil Baron · Accepted Answer · 2018-01-03 23:11:56Z

3

This could work transmute_if takes a predicate and performs and action on all columns satisfying the predicate and throws away all the rest - in this case we use is.character as predicate. Since we want to keep the orignial data we combine both datasets with cbind.
To change the names of the new columns we use select_all to paste "_lower" to end of the column names

dta <- read.table(header = TRUE,sep = ",",stringsAsFactors = FALSE,
                  text = "first_name,last_name,age
Alice,Smith,45
                  Bob,Richards,20")
library(tidyverse)
cbind(dta,
      dta %>%
        transmute_if(is.character,tolower) %>% 
        select_all(funs(paste0(.,"_lower")))))

Hope it helps!

edited Jan 3, 2018 at 23:11

answered Jan 3, 2018 at 22:50

Bertil Baron

5,0031 gold badge17 silver badges24 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

sm925 Over a year ago

You are missing a comma after stringsAsFactors = FALSE. With this the there would be multiple columns with same name in data.

Bertil Baron Over a year ago

@suchait your are right I added a fix to that problem renaming the new columns

Scipione Sarlo · Accepted Answer · 2018-01-03 23:06:30Z

2

Using tidyverse solution:

library(tidyverse)
mydf %>% 
    mutate(first_name_lower=first_name,
           last_name_lower=last_name) %>% 
    mutate_at(vars=first_name_lower,last_name_lower), ~ str_to_lower(.)))

Whether you don't want to preserve original variables:

mydf %>% 
        mutate_at(vars(first_name_lower, last_name_lower), ~ str_to_lower(.))

edited Jan 3, 2018 at 23:06

answered Jan 3, 2018 at 22:37

Scipione Sarlo

1,5081 gold badge20 silver badges33 bronze badges

3 Comments

Kevin Arseneau Over a year ago

I like your approach, however, using column indices makes me nervous, wouldn't you consider mutate_at(vars(first_name_lower, last_name_lower), ~str_to_lower(.)) an improvement?

Scipione Sarlo Over a year ago

Ok, but you have to use ""...in any case I consider it an elegant solution only if you don't have to change more than 3 o 4 variables ;-)

Kevin Arseneau Over a year ago

You can use bare column names using vars() as it supports unquoting. That makes vars(colnames(df)) quite handy.

Collectives™ on Stack Overflow

Generating new columns in an R dataframe based on applying a function across multiple columns

2 Answers 2

2 Comments

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related