1

I have been trying to do this but not getting anywhere. Any help will be very much appreciated.

df1 <- data.frame(chrom = "chr1", start=c(10,20,30), end = c(100,200,300), stringsAsFactors=FALSE)
df2 <- data.frame(chrom = c("chr1", "chr2", "chr3"),start=c(15,500,150), end = c(75,1000,300), stringsAsFactors=FALSE)

I want to get all rows of df2 where df1$chrom == df2$chrom. Or better yet: I want to generate the output in a new vector and display the rows of df1 followed by df2 or vice versa where df1$chrom == df2$chrom.

I have tried this using a for loop as follows:

for(i in 1:nrow(df2)){
    x[i] <- df2[which(df1$chrom == df2$chrom[i])]
}

Not working!

1
  • What is it you're trying to accomplish in doing this comparison between data frames? There may just be an easier solution to your work flow than the approach you're taking--i.e., if you only want a vector out of a data frame, are you going to require many such vectors? A new data frame? What is the end-goal? That context is important to the questions you ask. Commented Apr 9, 2012 at 20:17

1 Answer 1

3

Is this what you want?

df2[df2$chrom == df1$chrom, ]
#   chrom start end
# 1  chr1    15  75

Per your comment, you might also want to try the following.

merge(df1, df2, by = 'chrom')

This will do a database "join" on the two frames ("tables"). The result is this.

  chrom start.x end.x start.y end.y
1  chr1      10   100      15    75
2  chr1      20   200      15    75
3  chr1      30   300      15    75

It isn't always an efficient approach to take in R, but it is convenient. You can control the ".x" stuff with parameters (see the help pages: ?merge). If you want all the fields from df2 included, you could add the "all = TRUE" parameter setting to merge.

As I alluded to before, it is best to consider the overall approach. This isn't necessarily an efficient way to process your data because now you've entered a lot of redundancy into the resulting frame. Instead, in database terms, we think of df2 as a "look up" table. The "chr1" in df1 references information in df2 (a foreign key) that is associated with df1 but distinct from it. Instead of, as the merge above shows, having the information of df2 repeated, we can simply access it when required. This is where the merge makes that convenient.

Sign up to request clarification or add additional context in comments.

7 Comments

Yes that is exactly the format that I want. It will be nice to have the matching rows of both data frames side by side in a new data frame with 6 columns. Actually my ultimate goal is much more complicated where the comparison will be done satisfying many conditions between the two data frames. The above condition is just one of them.
The statement you sent works very well. Thanks. I am having a hard time wrapping my head around it...but it works! Thank you so much
It's at first hard to get (particularly if you're used to another language that uses loops) but once you get it it's pretty straight forward. if you have multiple conditions remember %in% and the logical operators & and | are great tools in indexing which is the method Bryan showed (rather than an explicit loop).
Would df2[df2$chrom %in% df1$chrom, ] be more robust in this circumstance?
Thanks a bunch Tyler. Its kinda complicated to explain what I am trying to do....and its harder because I am very new with R. I will work on it a bit more and hopefully my next post will makes more sense. Anyways thanks for taking the time.......
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.