I have a dataframe containing the mark and the name of many products as follows:
mark name
Caudalie Caudalie Eau démaquillante 200ml
Mustela Mustela Bébé lait hydra corps 300ml
Lierac Lierac Phytolastil gel prévention
In many rows, the mark exist in the product name. What I want to do is to detect if the mark exists in the product name, If so I want to remove It.
Edit: I used this sample of code to detect if the mark exists in the product name:
df1$CheckMark <- Vectorize(grepl)(df1$mark, df1$name)
My dataframe looks like this now:
mark name ChekMark
Caudalie Caudalie Eau démaquillante 200ml TRUE
Mustela Mustela Bébé lait hydra corps 300ml TRUE
Lierac Lierac Phytolastil gel prévention TRUE
I want to subset the mark from the product name.
UPDATE After many attempts. I switched my big dataframe to a list according to the mark:
list.mark.name=split( df1 , df1$mark )
And I found this awesome combination between sapply and gsub:
listt<-sapply(1:length(list.marque.nom), function(i)
{
dtfr<-list.marque.nom[[i]]
if(dtfr$CheckMark==TRUE)
{listt[[i]]<-as.data.frame(sapply(dtfr,gsub,pattern=dtfr$mark,replacement=""))}
else
{listt[[i]]<-dtfr}
}
I thought that everything is okey but I noticed these warnings:
Warning messages:
1: In if (dtfr$CheckMark == TRUE) { ... :
the condition has length > 1 and only the first element will be used
What's the problem please.
Any help would be appreciated.
mark?markas example