1

I want to remove rows with character strings (exact matches) from the second column in a dataframe:

Input:

   >data

      habitat       species
         wet species1_ind1
         wet species1_ind1
         dry species2_ind1
         dry species2_ind1
         dry species3_ind1
         dry species3_ind1
         ...

Desired output (with row containing species2_ind1 removed):

    >new_data

      habitat       species
         wet species1_ind1
         wet species1_ind1
         dry species3_ind1
         dry species3_ind1
         ...

Ideally I'd like to supply a list of character strings to remove from the dataframe.

1 Answer 1

2

You can do this with %in%

data[!(data$species %in% c("species2_ind1")), ]
  habitat       species
1     wet species1_ind1
2     wet species1_ind1
5     dry species3_ind1
6     dry species3_ind1

Details: This is selecting the rows where species is not in the list. Data has both rows and columns. When you specify data[x,y] x gives the rows and y gives the columns. data[x, ] means that you have specified the rows with x but take all columns. The above expression takes all columns, but specifies the rows as !(data$species %in% c("species2_ind1")).
data$species %in% c("species2_ind1")) gives those rows for which the value of data$species is in the list. But those are the ones we want to exclude, so we use ! to negate the logical expression and get the rows where data$species is not on the list.

Sign up to request clarification or add additional context in comments.

4 Comments

What is the purpose of the exclamation point and comma after c("species2_ind1"))?
Will add to answer.
When you delete rows 3:4, can the output then go from 1:4 instead of 1:2 and 5:6, skipping the deleted rows?
The row numbers will be as above, but if you want them to just be one-up numbered, after deleting 3&4 you can use row.names(data) = 1:nrow(data)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.