66

(Somewhat related question: Enter new column names as string in dplyr's rename function)

In the middle of a dplyr chain (%>%), I would like to replace multiple column names with functions of their old names (using tolower or gsub, etc.)

library(tidyr); library(dplyr)
data(iris)
# This is what I want to do, but I'd like to use dplyr syntax
names(iris) <- tolower( gsub("\\.", "_", names(iris) ) )
glimpse(iris, 60)
# Observations: 150
# Variables:
#   $ sepal_length (dbl) 5.1, 4.9, 4.7, 4.6, 5.0, 5.4, 4.6,...
#   $ sepal_width  (dbl) 3.5, 3.0, 3.2, 3.1, 3.6, 3.9, 3.4,...
#   $ petal_length (dbl) 1.4, 1.4, 1.3, 1.5, 1.4, 1.7, 1.4,...
#   $ petal_width  (dbl) 0.2, 0.2, 0.2, 0.2, 0.2, 0.4, 0.3,...
#   $ species      (fctr) setosa, setosa, setosa, setosa, s...

# the rest of the chain:
iris %>% gather(measurement, value, -species) %>%
  group_by(species,measurement) %>%
  summarise(avg_value = mean(value)) 

I see ?rename takes the argument replace as a named character vector, with new names as values, and old names as names.

So I tried:

iris %>% rename(replace=c(names(iris)=tolower( gsub("\\.", "_", names(iris) ) )  ))

but this (a) returns Error: unexpected '=' in iris %>% ... and (b) requires referencing by name the data frame from the previous operation in the chain, which in my real use case I couldn't do.

iris %>% 
  rename(replace=c(    )) %>% # ideally the fix would go here
  gather(measurement, value, -species) %>%
  group_by(species,measurement) %>%
  summarise(avg_value = mean(value)) # I realize I could mutate down here 
                                     #  instead, once the column names turn into values, 
                                     #  but that's not the point
# ---- Desired output looks like: -------
# Source: local data frame [12 x 3]
# Groups: species
# 
#       species  measurement avg_value
# 1      setosa sepal_length     5.006
# 2      setosa  sepal_width     3.428
# 3      setosa petal_length     1.462
# 4      setosa  petal_width     0.246
# 5  versicolor sepal_length     5.936
# 6  versicolor  sepal_width     2.770
# ... etc ....  
2
  • 10
    The elegant approach is: iris %>% `names<-`(.,tolower( gsub("\\.", "_", names(.) ) )) (I'm only joking.) Commented May 21, 2015 at 19:51
  • Some functions used in the answers below have been deprecated. rename_with is the latest dplyr verb to programmatically rename variables with a function. See answer below. Commented Mar 17, 2021 at 8:45

8 Answers 8

58

This is a very late answer, on May 2017

As of dplyr 0.5.0.9004, soon to be 0.6.0, many new ways of renaming columns, compliant with the maggritr pipe operator %>%, have been added to the package.

Those functions are:

  • rename_all
  • rename_if
  • rename_at

There are many different ways of using those functions, but the one relevant to your problem, using the stringr package is the following:

df <- df %>%
  rename_all(
      funs(
        stringr::str_to_lower(.) %>%
        stringr::str_replace_all(., '\\.', '_')
      )
  )

And so, carry on with the plumbing :) (no pun intended).

Sign up to request clarification or add additional context in comments.

2 Comments

Good to know, thanks. Also worth noting, you can do df %<>% foo() as shorthand for df <- df %>% foo()
Due to the new dplyr update where they changed how funs() works (really wish they hadn't), you need to substitute list for funs and place a tilde ~ before the function e.g. list(~str_replace(., to_replace, replacement))
39

I think you're looking at the documentation for plyr::rename, not dplyr::rename. You would do something like this with dplyr::rename:

iris %>% rename_(.dots=setNames(names(.), tolower(gsub("\\.", "_", names(.)))))

6 Comments

You can put . in place of iris in its latter appearances.
This is very useful, why you had to use rename_ instead of rename?
Habit, since I mostly use dplyr programmatically
@Konrad Actually, I don't have the doc in front of me, but I think the nonsafe version doesn't have the .dots argument
FYI: rename_ is slowly being deprecated. I haven't found an obvious replacement, though @Frank's use of setNames seems the most direct (if not provided by dplyr).
|
31

Here's a way around the somewhat awkward rename syntax:

myris <- iris %>% setNames(tolower(gsub("\\.","_",names(.))))

9 Comments

Another dependency for a workaround? This is getting more esoteric.
You can replace setnames with setNames and drop the call to data.table.
@MatthewPlourde Do you know of a reason to prefer the longer rename over the simpler route? Your answer looks like rename_(.dots=this_answer), right? The help page for rename does not advertise modification by reference as setnames from data.table does.
@Anton A fair point, but that's the nature of workarounds. (Thanks to Mathhew's comment, the dependency is gone again.) I feel like the dplyr syntax should be extended to support the OP's expectations (based on plyr), like rename(replace_all=...). Seems deficient if constructing a named list and knowing to pass it to weird argument .dots is required here.
@Frank I wound up using your answer (+1) because it is a simpler way to do what I wanted -- and taught me about setNames-- but @MatthewPlourde more literally answered the question as written (i.e. using rename). Thanks for your time!
|
29

As of 2020, rename_if, rename_at and rename_all are marked superseded. The up-to-date way to tackle this the dplyr way would be rename_with():

iris %>% rename_with(tolower)

or a more complex version:

iris %>% 
  rename_with(stringr::str_replace, 
              pattern = "Length", replacement = "len", 
              matches("Length"))

(edit 2021-09-08)
As mentioned in a comment by @a_leemo, this notation is not mentioned in the manual verbatim. Rather, one would deduce the following from the manual:

iris %>% 
  rename_with(~ stringr::str_replace(.x, 
                                     pattern = "Length", 
                                     replacement = "len"), 
              matches("Length")) 

Both do the same thing, yet, I find the first solution a bit more readable. In the first example pattern = ... and replacement = ... are forwarded to the function as part of the ... dots implementation. For more details see ?rename_with and ?dots.

9 Comments

Thank you! I was struggling to figure out how to code this using rename_with and this did the trick.
how would one do this for a custom function @loki ? If I write the function in the rename_with statement it works to hand over the names automagically, if I define the function elsewhere, it doesn't argument is not an atomic vector
just found out: simply do not give any argument to the function but specify it as a function mydataframe %>% rename_with(myawesomefunction)
This solved a problem I was having, thanks! But why are the arguments inside the str_replace() function pulled outside of it? I couldn't figure this syntax out from the help documentation.
@LarissaCury you probably want to use mutate or rename with !!. Have a look at the examples with ?rlang::`topic-inject`.
|
9

For this particular [but fairly common] case, the function has already been written in the janitor package:

library(janitor)

iris %>% clean_names()

##   sepal_length sepal_width petal_length petal_width species
## 1          5.1         3.5          1.4         0.2  setosa
## 2          4.9         3.0          1.4         0.2  setosa
## 3          4.7         3.2          1.3         0.2  setosa
## 4          4.6         3.1          1.5         0.2  setosa
## 5          5.0         3.6          1.4         0.2  setosa
## 6          5.4         3.9          1.7         0.4  setosa
## .          ...         ...          ...         ...     ...

so all together,

iris %>% 
    clean_names() %>%
    gather(measurement, value, -species) %>%
    group_by(species,measurement) %>%
    summarise(avg_value = mean(value))

## Source: local data frame [12 x 3]
## Groups: species [?]
## 
##       species  measurement avg_value
##        <fctr>        <chr>     <dbl>
## 1      setosa petal_length     1.462
## 2      setosa  petal_width     0.246
## 3      setosa sepal_length     5.006
## 4      setosa  sepal_width     3.428
## 5  versicolor petal_length     4.260
## 6  versicolor  petal_width     1.326
## 7  versicolor sepal_length     5.936
## 8  versicolor  sepal_width     2.770
## 9   virginica petal_length     5.552
## 10  virginica  petal_width     2.026
## 11  virginica sepal_length     6.588
## 12  virginica  sepal_width     2.974

Comments

9

My eloquent attempt using base, stringr and dplyr:

EDIT: library(tidyverse) now includes all three libraries.

library(tidyverse)
library(maggritr) # Though in tidyverse to use %>% pipe you need to call it 
# library(dplyr)
# library(stringr)
# library(maggritr)

names(iris) %<>% # pipes so that changes are apply the changes back
    tolower() %>%
    str_replace_all(".", "_")

I do this for building functions with piping.

my_read_fun <- function(x) {
    df <- read.csv(x) %>%
    names(df) %<>%
        tolower() %>%
        str_replace_all("_", ".")
    tempdf %<>%
        select(a, b, c, g)
}

2 Comments

str_replace_all is not in either of those packages. Fyi, no need to include "edit" notations in the text of your answer; just make it the best answer possible. Folks can see the edit history if they want by clicking a link below the answer.
The period in the first str_replace_all function should be escaped \\. - otherwise everything is replaced with an underscore
2

Both select() and select_all() can be used to rename columns.

If you wanted to rename only specific columns you can use select:

iris %>% 
  select(sepal_length = Sepal.Length, sepal_width = Sepal.Width, everything()) %>% 
  head(2)

  sepal_length sepal_width Petal.Length Petal.Width Species
1          5.1         3.5          1.4         0.2  setosa
2          4.9         3.0          1.4         0.2  setosa

rename does the same thing, just without having to include everything():

iris %>% 
  rename(sepal_length = Sepal.Length, sepal_width = Sepal.Width) %>% 
  head(2)

  sepal_length sepal_width Petal.Length Petal.Width Species
1          5.1         3.5          1.4         0.2  setosa
2          4.9         3.0          1.4         0.2  setosa

select_all() works on all columns and can take a function as an argument:

iris %>% 
  select_all(tolower)

iris %>% 
  select_all(~gsub("\\.", "_", .)) 

or combining the two:

iris %>% 
  select_all(~gsub("\\.", "_", tolower(.))) %>% 
  head(2)

  sepal_length sepal_width petal_length petal_width species
1          5.1         3.5          1.4         0.2  setosa
2          4.9         3.0          1.4         0.2  setosa

1 Comment

this worked better and is much more straightforward than anything in the rename family... it's strange that it's easier to use a select_all with ~gsub than rename_at or rename_if with some kind of predicate of variable declaration... it seems like that's what rename_* is for
2

In case you don't want to write the regular expressions yourself, you could use

  • the snakecase-pkg which is very flexible,
  • janitor::make_clean_names() which has some nice defaults or
  • janitor::clean_names() which does the same as make_clean_names(), but works directly on data frames.

Invoking them inside of a pipeline should be straightforward.

library(magrittr)
library(snakecase)

iris %>% setNames(to_snake_case(names(.)))
iris %>% tibble::as_tibble(.name_repair = to_snake_case)
iris %>% purrr::set_names(to_snake_case)
iris %>% dplyr::rename_all(to_snake_case)
iris %>% janitor::clean_names()

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.