Subsetting rows in a dataframe based on rows in another dataframe in R

Question

Just a snapshot of the data that we are working with

What I want to be able to do is to identify the blocks (BlockId) that have greater than 90% of class 5 present and then remove all of those blocks from the dataset. I started with subsetting the data with subset(NLCD2008,Class==5 & Percent< .90)which gave me a DF with a column with the blocks that should be removed as seen below:

    > dput(ids)
structure(list(BLOCKID = c(100, 131, 179, 200, 222, 236, 238, 
241, 244, 254, 257, 258, 265, 266, 27, 278, 57, 63, 69, 75, 81
), Class = c("5", "5", "5", "5", "5", "5", "5", "5", "5", "5", 
"5", "5", "5", "5", "5", "5", "5", "5", "5", "5", "5"), CA = c(22983987.0806, 
24692082.1724, 23533460.3724, 23401233.5635, 24116398.1926, 23766711.1699, 
24795140.5362, 24876914.4067, 24898552.2795, 24985030.0734, 25012822.6465, 
24993341.0278, 25041230.4987, 25049166.7966, 22372955.0846, 24737206.1697, 
24104160.9584, 24922870.2331, 24943920.0281, 24162534.823, 23096329.0313
), TLA = c(25018769.0617, 25057087.1604, 25149935.9177, 25176830.9298, 
25207224.138, 24802986.7321, 24852905.0566, 24883383.5601, 24898641.1381, 
24985030.0734, 25012822.6465, 25049866.3254, 25090169.5911, 25072609.4832, 
24830593.7725, 25144460.7117, 24935516.21, 24930068.7064, 24947519.2647, 
24961803.5077, 24974601.3436), MSI = c(1.69665962298056, 1.31048429936865, 
1.33110171648693, 1.36242160001161, 1.27666751812728, 1.22789953816493, 
1.26867391259833, 1.25128851571841, 1.18533526393745, 1.18792224187668, 
1.18520978795299, 1.39406482047182, 1.24884906769663, 1.24939571303602, 
1.31731564029142, 1.59900472213938, 1.38890295951441, 1.20315890311899, 
1.18325402703837, 1.27998393051198, 1.47485350719615), Percent = c(0.918669780432366, 
0.985433063880751, 0.935726454707888, 0.929474945784445, 0.956725661682217, 
0.958219726785611, 0.997675743730222, 0.99974002115169, 0.999996431186766, 
1, 1, 0.997743489052367, 0.998049471438513, 0.999065008107126, 
0.901023764859709, 0.983803409161585, 0.96665979382185, 0.999711253370988, 
0.999855727675293, 0.967980331050461, 0.92479270093409)), row.names = c(NA, 
-21L), class = c("tbl_df", "tbl", "data.frame"))

What I would like to do from here is take 21 unique block ids from this subset and remove them from the original data. So this subset identified blocks 27,57,63.... as unsuitable blocks and I would like to be able to take that list and remove them from the original data.

Devin Mendez · Accepted Answer · 2020-04-06 16:34:58Z

2

You can try this:

NLCD2008[ !(with(NLCD2008, Class==5 & Percent > .90)), ]

using subset()

# remove all blocks that contain greater than 90% of class 5 from NLCD2008 dataset.
subset(NLCD2008, !(Class==5 & Percent > .90))

# get filtered block ids   
ids <- subset(NLCD2008, Class == 5 & Percent > 0.9)
# remove the block ids from original data.
NLCD2008[!(NLCD2008$BLOCKID %in% unique(ids$BLOCKID)), ]

edited Apr 6, 2020 at 16:34

Devin Mendez

1019 bronze badges

answered Apr 5, 2020 at 17:02

Sathish

12.7k3 gold badges46 silver badges60 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Subsetting rows in a dataframe based on rows in another dataframe in R

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related