0

Assume I have a dataframe

mydata <- c("10 stack"," 10 stack and x" , "10 stack / dd" ," 10 stackxx")
R>mydata
[1] " 10 stack"
[2] " 10 stack and x" 
[3] " 10 stack  / dd"   
[4] " 10 stackxx"

what I want to do is to replace and word begin with 10 stack [anything]to any other words in the dataframe , but without removing the rest of the string the desired output. Also replace the backslash with and or comma.

[1] " new"
[2] " new and x" 
[3] " new  and dd"   
[4] " new"

my code is

mydata[mydata =="10 stack" ] <- new # I can replace one type, but I need faster operation.
mydata[mydata =="///" ] <- and #for replacing backslash with and

I found another method can solve the problem

mydata<-as.data.frame(sapply(mydata,gsub,pattern="//\",replacement=","))
4
  • do you want to replace 10 stackxx with new? Commented Jun 9, 2016 at 14:27
  • yes, any word begin with 10. I want to replace it with "new", but if there is another word in the dataframe I want to keep it. Commented Jun 9, 2016 at 14:31
  • do you want to replace 10 with new or 10 stack[anything] with new Commented Jun 9, 2016 at 14:33
  • 10 stack[anything] with new. Commented Jun 9, 2016 at 14:35

2 Answers 2

3

Try

library(stringi) 
stri_replace_all_regex(mydata, c("10 stack", "\\/"), c("new", "and"), vectorize_all=FALSE)

Which gives:

#[1] "new"        " new and x" "new and dd" " newxx"  

As per mentioned by @rock321987 in the comments, if you want to replace 10 stack[anything], You could use the pattern \\b10 stack[^\\s]* instead:

stri_replace_all_regex(mydata, c("\\b10 stack[^\\s]*", "\\/"), c("new", "and"), 
                       vectorize_all=FALSE)

Which gives:

#[1] "new"        " new and x" "new and dd" " new"  
Sign up to request clarification or add additional context in comments.

7 Comments

This is fast answer , but still if I want to replace 10 stack[anything]. in other words any string begins with 10.
@Maged stri_replace_all_regex(mydata, c("\\b10 stack[^\\s]*", "\\/"), c("new", "and"), vectorize_all=FALSE)
This doesn't make much sense to me, "10 stack[anything]" is a lot different from "any string that begins with 10". You mean "10" and the word following ? All words following "10" (in which case you would also replace "and x") ? "All words following "10" that are not "\" or "and" or ","...
sorry if it is not being cleared form the beginning. in the data I have, the word stack could be stackx, stackk, or anything like that. so from the beginning I want to replace 10 stack[anything], but keep the remaining dataframe.
@Maged: The metacharacter \\b is an anchor like the caret and the dollar sign. It matches at a position that is called a "word boundary". This match is zero-length. regular-expressions.info/wordboundaries.html
|
2

You need to use sub() function, which matches pattern and substitute it with replacement.

sub("10 stack", " new", mydata)

4 Comments

it solves the problem partially, I need to replce 10 stack[anything] not 10 stack. in other words any string begins with "10".
then try sub("^10", "new", mydata)
it should work, but I don't know for some reasons the data won't replaced.
after your explaination in the other thread I think it may look like this: sub("^10 stack[a-zA-Z]*", " new", mydata)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.