0

Newbie using R, and would like to know how to create a new column in a data frame based on data from another data frame. Suppose I have 2 data frames, linked by letters "a", "b", and "c" in df1 col2 and df2 col1 as follows:

> col1<-c(1, 2, 3, 4)
> col2<-c("a","b","c","c")
> df1<-data.frame(col1, col2)
> df1
  col1 col2
1    1    a
2    2    b
3    3    c
4    4    c
> c1<-c("a","b","c")
> c2<-c("Jim","Sue","Bob")
> c3<-c("abc","def","ghi")
> df2<-data.frame(c1,c2,c3)
> df2
  c1  c2 c3
1  a Jim abc
2  b Sue def
3  c Bob ghi

I want to add a column3 to df1 to add user names based on the values of "a", "b", or "c" in df1. That is, how do I get the following using R?

> df1
  col1 col2 col3
1    1    a Jim
2    2    b Sue
3    3    c Bob
4    4    c Bob

I've tried df1["col3"]<-df2[df1$col2==df2$c1]$c2 but it's not working.

Note: I only want to add one column from df2 to df as shown above (e.g. not all columns in df2).

1
  • 2
    look at merge or match. I like left_join from dplyr. For completeness, it would just be merge(df1, df2, by.x="col2", by.y="c1") Commented Nov 1, 2015 at 19:55

2 Answers 2

1

You need merge().

# this merges the two data frames and rebinds the result to df1 
df1 <- merge(df1, df2, by.x = "col2", by.y = "c1")
# This renames the "c1" column to "col3"
names(df1) <- c("col2", "col1", "col3")
# This rearranges the columns in the right order
df1 <- df1[,c(2,1,3)] 

The result is:

 col1 col2 col3
1    1    a  Jim
2    2    b  Sue
3    3    c  Bob
4    4    c  Bob

Type ?merge to see how it works.

Sign up to request clarification or add additional context in comments.

4 Comments

Thanks @kliron, this apparently merges all columns in df2... how to merge only the specified column c2 in df2?
What do you mean? This is exactly the output you wanted in your question. If you have more columns that you don't need in your df2 (that you don't show in your question) just subset your df2 keeping only the columns you want and merge that data frame to df1 instead.
You can also subset the resulting df1 or even drop individual columns by setting them to NULL like this: df1$unwantedColumn <- NULL
I figured it out... df1 <- merge(df1, df2[c("c1","c2")], by.x = "col2", by.y = "c1")
1
> merge(df1, df2, by.x = "col2", by.y = "c1")
  col2 col1  c2
1    a    1 Jim
2    b    2 Sue
3    c    3 Bob
4    c    4 Bob

1 Comment

Only c2 is added n this answer (which matches yours except the column name of c2 is called col3 in your question. You can change that with names(result)[3] <- "col3".

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.