rename column in dataframe using variable name R

Question

I have a number of data frames. Each with the same format. Like this:

           A           B          C
1  -0.02299388  0.71404158  0.8492423
2  -1.43027866 -1.96420767 -1.2886368
3  -1.01827712 -0.94141194 -2.0234436

I would like to change the name of the third column--C--so that it includes part if the name of the variable name associated with the data frame.

For the variable df_elephant the data frame should look like this:

     A           B          C.elephant
1  -0.02299388  0.71404158  0.8492423
2  -1.43027866 -1.96420767 -1.2886368
3  -1.01827712 -0.94141194 -2.0234436

I have a function which will change the column name:

rename_columns <- function(x) {

  colnames(x)[colnames(x)=='C'] <-
    paste( 'C',
           strsplit (deparse (substitute(x)), '_')[[1]][2], sep='.' ) 
  return(x)
}

This works with my data frames. However, I would like to provide a list of data frames so that I do not have to call the function multiple times by hand. If I use lapply like so:

lapply( list (df_elephant, df_horse), rename_columns )

The function renames the data frames with an NA rather than portion of the variable name.

[[1]]
         A            B       C.NA
1  -0.02299388  0.71404158  0.8492423
2  -1.43027866 -1.96420767 -1.2886368
3  -1.01827712 -0.94141194 -2.02344361

[[2]]
         A            B       C.NA
1   0.45387054  0.02279488  1.6746280
2  -1.47271378  0.68660595 -0.2505752
3   1.26475917 -1.51739927 -1.3050531

Is there some way that I kind provide a list of data frames to my function and produce the desired result?

Deena · Accepted Answer · 2016-07-11 13:07:34Z

You are trying to process the data frame column names instead of the actual lists' name. And this is why it's not working.

# Generating random data
n = 3
item1 = data.frame(A = runif(n), B = runif(n), C = runif(n))
item2 = data.frame(A = runif(n), B = runif(n), C = runif(n))
myList = list(df_elephant = item1,  df_horse = item2)


# 1- Why your code doesnt work: ---------------
names(myList) # This will return the actual names that you want to use : [1] "df_elephant" "df_horse"   
lapply(myList, names) # This will return the dataframes' column names. And thats why you are getting the "NA"


# 2- How to make it work: ---------------
lapply(seq_along(myList), # This will return an array of indicies  

       function(i){
         dfName = names(myList)[i] # Get the list name
         dfName.animal = unlist(strsplit(dfName, "_"))[2] # Split on underscore and take the second element

         df = myList[[i]] # Copy the actual Data frame 
         colnames(df)[colnames(df) == "C"] = paste("C", dfName.animal, sep = ".") # Change column names

         return(df) # Return the new df 
       })


# [[1]]
# A          B C.elephant
# 1 0.8289368 0.06589051  0.2929881
# 2 0.2362753 0.55689663  0.4854670
# 3 0.7264990 0.68069346  0.2940342
# 
# [[2]]
# A         B   C.horse
# 1 0.08032856 0.4137106 0.6378605
# 2 0.35671556 0.8112511 0.4321704
# 3 0.07306260 0.6850093 0.2510791

Roman · Accepted Answer · 2016-07-11 12:53:33Z

You can also try. Somehow similar to Akrun's answer using also Map in the end:

# Your data
d <- read.table("clipboard")
# create a list with names A and B
d_list <- list(A=d, B=d)

# function
foo <- function(x, y){
  gr <- which(colnames(x) == "C") # get index of colnames C 
  tmp <- colnames(x) #new colnames vector
  tmp[gr] <- paste(tmp[gr], y, sep=".") # replace the old with the new colnames.
  setNames(x, tmp) # set the new names
}
# Result
Map(foo, d_list, names(d_list))
$A
            A          B        C.A
1 -0.02299388  0.7140416  0.8492423
2 -1.43027866 -1.9642077 -1.2886368
3 -1.01827712 -0.9414119 -2.0234436

$B
            A          B        C.B
1 -0.02299388  0.7140416  0.8492423
2 -1.43027866 -1.9642077 -1.2886368
3 -1.01827712 -0.9414119 -2.0234436

akrun · Accepted Answer · 2016-07-11 12:55:42Z

1

We can try with Map. Get the datasets in a list (here we used mget to return the values of the strings in a list), using Map, we change the names of the third column with that of the corresponding vector of names.

 Map(function(x, y) {names(x)[3] <- paste(names(x)[3], sub(".*_", "", y), sep="."); x},  
     mget(c("df_elephant", "df_horse")), c("df_elephant", "df_horse"))
#$df_elephant
#            A          B  C.elephant
#1 -0.02299388  0.7140416   0.8492423
#2 -1.43027866 -1.9642077  -1.2886368
#3 -1.01827712 -0.9414119  -2.0234436

#$df_horse
#           A           B   C.horse
#1  0.4538705  0.02279488  1.6746280
#2 -1.4727138  0.68660595 -0.2505752
#3  1.2647592 -1.51739927 -1.3050531

edited Jul 11, 2016 at 12:55

answered Jul 11, 2016 at 12:41

akrun

891k38 gold badges590 silver badges700 bronze badges

Collectives™ on Stack Overflow

rename column in dataframe using variable name R

3 Answers 3

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related