I have two data sets that are supposed to be the same size but aren't. I need to trim the values from A that are not in B and vice versa in order to eliminate noise from a graph that's going into a report. (Don't worry, this data isn't being permanently deleted!)
I have read the following:
- Selecting columns in R data frame based on those *not* in a vector
- http://www.ats.ucla.edu/stat/r/faq/subset_R.htm
- How to combine multiple conditions to subset a data-frame using "OR"?
But I'm still not able to get this to work right. Here's my code:
bg2011missingFromBeg <- setdiff(x=eg2011$ID, y=bg2011$ID)
#attempt 1
eg2011cleaned <- subset(eg2011, ID != bg2011missingFromBeg)
#attempt 2
eg2011cleaned <- eg2011[!eg2011$ID %in% bg2011missingFromBeg]
The first try just eliminates the first value in the resulting setdiff vector. The second try yields and unwieldy error:
Error in `[.data.frame`(eg2012, !eg2012$ID %in% bg2012missingFromBeg)
: undefined columns selected
mergeis appropriate here. I do not want the datasets to be combined.mergeis exactly appropriate. An inner join would give you only rows that are in both A and B. You can then subset the columns of the result if the merge added any extraneous ones.A <- merge(A,B). And getting the correct columns is no harder than something likemerge(A,B)[,colnames(A)]assuming none are duplicated. But if you really only are matching on one column, then adibender's solution is probably simpler for your purposes..xor .y` appended to their names so you can tell which of the original data frames they came from. It would require 1-2 extra lines of code, but usingmergewould work just fine.