0

I am attempting to subset a data frame by removing rows containing certain charater patterns, which are stored in a vector. My issue is that only the last pattern of the vector is removed from my data frame. How can I make my loop work iteratively, so that all patterns stored in the vector are removed from my data frame?

Mock input:

df<-data.frame(organism=c("human_longname","cat_longname","bird_longname","virus_longname","bat_longname","pangolian_longname"),size=c(6,4,2,1,3,5))
df
   organism            size
1     human_longname     6
2       cat_longname     4
3      bird_longname     2
4     virus_longname     1
5       bat_longname     3
6 pangolian_longname     5

used code and output:

vectors<-c("bat","virus","pangolian")
for(i in vectors){df_1<-df[!grepl(i,df$organism),]}
df_1
  organism             size
1    human_longname      6
2      cat_longname      4
3     bird_longname      2
4    virus_longname      1
5      bat_longname      3

Expected output

df_1
  organism             size
1    human_longname      6
2      cat_longname      4
3     bird_longname      2
5
  • 1
    Try this: df[!df$organism %in% c("bat","virus","pangolian"),] Commented Aug 3, 2020 at 15:36
  • 1
    ... or: subset(df, !organism %in% vectors) Commented Aug 3, 2020 at 15:40
  • Thanks a lot guys. I prefer these elegant solutions. However, I am still curious to learn how this could have been achieved with the for loop :) Commented Aug 3, 2020 at 15:51
  • Sorry guys. I forgot to mention that I HAVE to subset by pattern, as my strings in the data frame are very long. I will change to mock data to better show this. But thanks a lot for your answers. Commented Aug 3, 2020 at 16:16
  • @RikkiFranklinFrederiksen I have updated the solution to work with the new data. Please check if you like. Commented Aug 3, 2020 at 16:45

1 Answer 1

1

You can try this:

df[!df$organism %in% c("bat","virus","pangolian"),]

  organism size
1    human    6
2      cat    4
3     bird    2

Update: Based on new data, here an approach using grepl(). These functions can be used to avoid loops:

#Vectors
vectors<-c("bat","virus","pangolian")
#Format
vectors2 <- paste0(vectors,collapse = '|')
#Avoid loop
df[!grepl(pattern = vectors2,df$organism),]

        organism size
1 human_longname    6
2   cat_longname    4
3  bird_longname    2

Also just for curious, here maybe a not optimal loop to do the same task creating a new dataframe and an index:

#Create index
index <- c()
#Loop
for(i in 1:dim(df)[1])
{
  if(grepl(vectors2,df$organism[i])==F) 
  {
    index <- c(index,i)
  }
  ndf <- df[index,]
}

ndf

        organism size
1 human_longname    6
2   cat_longname    4
3  bird_longname    2
Sign up to request clarification or add additional context in comments.

5 Comments

It works! Thank you so much Duck. It seems like such an easy task, but I have been struggling with it for hours trying with for loops and apply() functions to no avail.
@RikkiFranklinFrederiksen Great!! Most of functions in R are vectorized like grepl so you can avoid loops :)
@RikkiFranklinFrederiksen Just for curious, I have added the way to do the task with loop, maybe not the most optimal compared to the rest of solutions but you can check.
Thanks for showing me Duck. Your first solution is definitely more easy for me to grasp :).
@RikkiFranklinFrederiksen Totally agree with you :)

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.