1

I have a dataframe containing the mark and the name of many products as follows:

    mark      name
    Caudalie  Caudalie Eau démaquillante 200ml
    Mustela   Mustela Bébé lait hydra corps 300ml
    Lierac    Lierac Phytolastil gel prévention 

In many rows, the mark exist in the product name. What I want to do is to detect if the mark exists in the product name, If so I want to remove It.

Edit: I used this sample of code to detect if the mark exists in the product name:

   df1$CheckMark <- Vectorize(grepl)(df1$mark, df1$name)

My dataframe looks like this now:

    mark      name                                ChekMark
    Caudalie  Caudalie Eau démaquillante 200ml    TRUE
    Mustela   Mustela Bébé lait hydra corps 300ml TRUE
    Lierac    Lierac Phytolastil gel prévention   TRUE

I want to subset the mark from the product name.

UPDATE After many attempts. I switched my big dataframe to a list according to the mark:

    list.mark.name=split( df1 , df1$mark )

And I found this awesome combination between sapply and gsub:

    listt<-sapply(1:length(list.marque.nom), function(i)
    {
     dtfr<-list.marque.nom[[i]]
      if(dtfr$CheckMark==TRUE)
     {listt[[i]]<-as.data.frame(sapply(dtfr,gsub,pattern=dtfr$mark,replacement=""))}
      else
     {listt[[i]]<-dtfr} 
     }

I thought that everything is okey but I noticed these warnings:

     Warning messages:
     1: In if (dtfr$CheckMark == TRUE) { ... :
      the condition has length > 1 and only the first element will be used

What's the problem please.

Any help would be appreciated.

4
  • 2
    Can you elaborate on what you've tried already, i.e. post some of the code? Commented Jan 18, 2016 at 11:58
  • In the updated example, there is no mark? Commented Jan 18, 2016 at 12:05
  • Actually Yes. I used mark as example Commented Jan 18, 2016 at 12:07
  • 2
    This is probably some type of a dupe of this or this Commented Jan 18, 2016 at 12:14

1 Answer 1

1

If we need to subset the rows by removing the "name" elements that starts with 'mark', then use grep

df1[!grepl('^mark', df1$name),]

The ^ signifies the start of the string.

NOTE: The subtract part in the title is not clear.

Update

Based on the updated dataset, if we want to check 'name' that doesn't have a matching substring in any of the 'mark' elements, we can paste the 'mark' elements together and use grep to get the index and then subset with [,

df1[!grepl(paste(df1$mark, collapse="|"), df1$name),]

Or if the idea is to subset rows based on corresponding elements of 'name', 'mark', stri_detect from stringi is an option.

library(stringi)
df1[!stri_detect_fixed(df1$name, df1$mark),]
Sign up to request clarification or add additional context in comments.

11 Comments

@user5779182 Check if the update helps.
df1[!grepl(paste(df1$mark, collapse="|"), df1$name),] will also remove a row where there is a mark-name from a different row - not sure if this is desired
@docendodiscimus As the OP didn't show the expected result, I added both options. The stringi should work on each row.
@akrun . It's okey for the grepl function : I used this sample of code df1$CheckMark <- Vectorize(grepl)(df1$mark, df1$name) . I want now to remove the mark from the product name. Any idea ?
@akrun , switching sapply by mapply resolved the problem : df1=as.data.frame(mapply(gsub,df1$mark,"",df1$name)). Thank you for your time and efforts.
|

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.