Rename variables based on values in another dataframe

Question

I have two dataframes, DF1 and DF2:

DF1 <- data.frame(V1 = factor(c("A", "B", "C", "D")),
                 V2 = factor(c("E", "F", "G", "H")),
                 Va3 = factor(c("I", "J", "K", "L")),
                 column = factor(c("M", "N", "O", "P")))

DF2 <- data.frame(N1 = factor(c("x", "V1", "V2", "y", "z", "Va3", "a", "column")),
                  N2 = factor(c("A", "var1", "random", "R", "Q", "nameofcolumn", "S", "varname4")))

I want to change the name of variables in DF1 (V1:column) based on the value of the corresponding cell in DF2$N2, so that, e.g. V2 becomes random and column becomes varname4.

Normally, I would just use colnames(DF1) <- DF2$N2 if the variable names in DF1 matched the cell values in DF2; but here I have those additional values. How can I rename the variables properly?

NelsonGon · Accepted Answer · 2019-03-17 04:56:22Z

2

We can just use match

names(DF1)=DF2$N2[match(names(DF1),DF2$N1)]
DF1
  var1 random nameofcolumn varname4
1    A      E            I        M
2    B      F            J        N
3    C      G            K        O
4    D      H            L        P

Update

names(DF1)[which(names(DF1)%in%DF2$N1)]=as.character(DF2$N2[match(names(DF1)[which(names(DF1)%in%DF2$N1)],DF2$N1)])
DF1
  var1 random nameofcolumn varname4 somethingelse
1    A      E            I        M             M
2    B      F            J        N             N
3    C      G            K        O             O
4    D      H            L        P             P

edited Mar 17, 2019 at 4:56

NelsonGon

13.3k7 gold badges32 silver badges60 bronze badges

answered Mar 17, 2019 at 1:59

BENY

324k22 gold badges176 silver badges250 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

KaC Over a year ago

This is a great, elegant solution. And it works perfectly in this example, but it seems to have an unfortunate side effect of deleting the names of all columns in the a larger version of DF1 that do not have an equivalent in DF2$N2.

BENY Over a year ago

@KaC check the update , I adding one additional columns in DF1 somethingelse, which is not in the DF2

Uwe · Accepted Answer · 2019-03-18 00:19:09Z

2

With version 1.12.0 (on CRAN 13 Jan 2019), data.table's setnames() function has gained a new parameter skip_absent to skip names in old that aren't present. setnames() does work with data.frame and data.table likewise.

data.table::setnames(DF1, as.character(DF2$N1), as.character(DF2$N2), skip_absent = TRUE)
DF1

  var1 random nameofcolumn varname4
1    A      E            I        M
2    B      F            J        N
3    C      G            K        O
4    D      H            L        P

Or, with an additional column not included in DF2:

DF1 <- data.frame(V1 = factor(c("A", "B", "C", "D")),
                  V2 = factor(c("E", "F", "G", "H")),
                  Va3 = factor(c("I", "J", "K", "L")),
                  column = factor(c("M", "N", "O", "P")),
                  other = 1:4)
data.table::setnames(DF1, as.character(DF2$N1), as.character(DF2$N2), skip_absent = TRUE)
DF1

  var1 random nameofcolumn varname4 other
1    A      E            I        M     1
2    B      F            J        N     2
3    C      G            K        O     3
4    D      H            L        P     4

answered Mar 18, 2019 at 0:19

Uwe

42.8k13 gold badges97 silver badges143 bronze badges

1 Comment

Mus Over a year ago

I authored this additional parameter and it is precisely the reason why it was implemented - to prevent setnames from ceasing whenever a value wasn't present. I would run this on enormous datasets and it would occasionally cease to function after a number of minutes because a value (column name) didn't exist in an automatically generated data frame (of which I had no control / prior knowledge). I am glad to see skip_absent being required and implemented elsewhere in the community.

M-- · Accepted Answer · 2019-03-17 01:47:56Z

1

You need to use a regex. Based on your actual needs, the pattern that you need to extract values based on may change. Right now, I am extracting the "cells" that start with (^ translates to that) varname and * means whatever comes next. Assuming that order of the variable names is also right.

Note: based on first version of the question which had varname# as the column names.

colnames(DF1) <-  subset(DF2$N2, grepl("^varname*", DF2$N2))

str(DF1)
# 'data.frame': 4 obs. of  4 variables:
# $ varnames1: Factor w/ 4 levels "A","B","C","D": 1 2 3 4
# $ varname2 : Factor w/ 4 levels "E","F","G","H": 1 2 3 4
# $ varname3 : Factor w/ 4 levels "I","J","K","L": 1 2 3 4
# $ varname4 : Factor w/ 4 levels "M","N","O","P": 1 2 3 4

_{I am aware of the redundancy in my pattern. Just included both * and ^ to give OP some more insight.}

Update to answer edited question: Matching values in N1 to find columns names in N2:

You can subset based on values in N1 and colnames(DF1):

subset(DF2, (N1 %in% colnames(DF1)))
#       N1           N2
# 2     V1         var1
# 3     V2       random
# 6    Va3 nameofcolumn
# 8 column     varname4

You can assign them as column names of DF1 like below (you can try $ operator as well):

colnames(DF1) <- DF2$N2[as.numeric(rownames(subset(DF2, (N1 %in% colnames(DF1)))))]

If the sorting was different in two dataframes, look at this thread: Sort one vector based on another

edited Mar 17, 2019 at 1:47

answered Mar 17, 2019 at 1:16

M--

33.7k12 gold badges74 silver badges115 bronze badges

2 Comments

KaC Over a year ago

Thank you. Now, I'm aware this is a different question, but what if there is no clear pattern? Is there a way to match variable names to row values simply based on location in DF2?

KaC Over a year ago

This is perfect. Thank you. Much appreciated.

Collectives™ on Stack Overflow

Rename variables based on values in another dataframe

3 Answers 3

2 Comments

1 Comment

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

2 Comments

1 Comment

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related