0

I have two dataframes, DF1 and DF2:

DF1 <- data.frame(V1 = factor(c("A", "B", "C", "D")),
                 V2 = factor(c("E", "F", "G", "H")),
                 Va3 = factor(c("I", "J", "K", "L")),
                 column = factor(c("M", "N", "O", "P")))

DF2 <- data.frame(N1 = factor(c("x", "V1", "V2", "y", "z", "Va3", "a", "column")),
                  N2 = factor(c("A", "var1", "random", "R", "Q", "nameofcolumn", "S", "varname4")))

I want to change the name of variables in DF1 (V1:column) based on the value of the corresponding cell in DF2$N2, so that, e.g. V2 becomes random and column becomes varname4.

Normally, I would just use colnames(DF1) <- DF2$N2 if the variable names in DF1 matched the cell values in DF2; but here I have those additional values. How can I rename the variables properly?

0

3 Answers 3

2

We can just use match

names(DF1)=DF2$N2[match(names(DF1),DF2$N1)]
DF1
  var1 random nameofcolumn varname4
1    A      E            I        M
2    B      F            J        N
3    C      G            K        O
4    D      H            L        P

Update

names(DF1)[which(names(DF1)%in%DF2$N1)]=as.character(DF2$N2[match(names(DF1)[which(names(DF1)%in%DF2$N1)],DF2$N1)])
DF1
  var1 random nameofcolumn varname4 somethingelse
1    A      E            I        M             M
2    B      F            J        N             N
3    C      G            K        O             O
4    D      H            L        P             P
Sign up to request clarification or add additional context in comments.

2 Comments

This is a great, elegant solution. And it works perfectly in this example, but it seems to have an unfortunate side effect of deleting the names of all columns in the a larger version of DF1 that do not have an equivalent in DF2$N2.
@KaC check the update , I adding one additional columns in DF1 somethingelse, which is not in the DF2
2

With version 1.12.0 (on CRAN 13 Jan 2019), 's setnames() function has gained a new parameter skip_absent to skip names in old that aren't present. setnames() does work with data.frame and data.table likewise.

data.table::setnames(DF1, as.character(DF2$N1), as.character(DF2$N2), skip_absent = TRUE)
DF1
  var1 random nameofcolumn varname4
1    A      E            I        M
2    B      F            J        N
3    C      G            K        O
4    D      H            L        P

Or, with an additional column not included in DF2:

DF1 <- data.frame(V1 = factor(c("A", "B", "C", "D")),
                  V2 = factor(c("E", "F", "G", "H")),
                  Va3 = factor(c("I", "J", "K", "L")),
                  column = factor(c("M", "N", "O", "P")),
                  other = 1:4)
data.table::setnames(DF1, as.character(DF2$N1), as.character(DF2$N2), skip_absent = TRUE)
DF1
  var1 random nameofcolumn varname4 other
1    A      E            I        M     1
2    B      F            J        N     2
3    C      G            K        O     3
4    D      H            L        P     4

1 Comment

I authored this additional parameter and it is precisely the reason why it was implemented - to prevent setnames from ceasing whenever a value wasn't present. I would run this on enormous datasets and it would occasionally cease to function after a number of minutes because a value (column name) didn't exist in an automatically generated data frame (of which I had no control / prior knowledge). I am glad to see skip_absent being required and implemented elsewhere in the community.
1

You need to use a . Based on your actual needs, the pattern that you need to extract values based on may change. Right now, I am extracting the "cells" that start with (^ translates to that) varname and * means whatever comes next. Assuming that order of the variable names is also right.

Note: based on first version of the question which had varname# as the column names.

colnames(DF1) <-  subset(DF2$N2, grepl("^varname*", DF2$N2))

str(DF1)
# 'data.frame': 4 obs. of  4 variables:
# $ varnames1: Factor w/ 4 levels "A","B","C","D": 1 2 3 4
# $ varname2 : Factor w/ 4 levels "E","F","G","H": 1 2 3 4
# $ varname3 : Factor w/ 4 levels "I","J","K","L": 1 2 3 4
# $ varname4 : Factor w/ 4 levels "M","N","O","P": 1 2 3 4        

I am aware of the redundancy in my pattern. Just included both * and ^ to give OP some more insight.

Update to answer edited question: Matching values in N1 to find columns names in N2:

You can subset based on values in N1 and colnames(DF1):

subset(DF2, (N1 %in% colnames(DF1)))
#       N1           N2
# 2     V1         var1
# 3     V2       random
# 6    Va3 nameofcolumn
# 8 column     varname4

You can assign them as column names of DF1 like below (you can try $ operator as well):

colnames(DF1) <- DF2$N2[as.numeric(rownames(subset(DF2, (N1 %in% colnames(DF1)))))]

If the sorting was different in two dataframes, look at this thread: Sort one vector based on another

2 Comments

Thank you. Now, I'm aware this is a different question, but what if there is no clear pattern? Is there a way to match variable names to row values simply based on location in DF2?
This is perfect. Thank you. Much appreciated.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.