replacing values in selected columns of a dataframe using RegExp

Question

Assume I have a dataframe

mydata <- c("10 stack"," 10 stack and x" , "10 stack / dd" ," 10 stackxx")
R>mydata
[1] " 10 stack"
[2] " 10 stack and x" 
[3] " 10 stack  / dd"   
[4] " 10 stackxx"

what I want to do is to replace and word begin with 10 stack [anything]to any other words in the dataframe , but without removing the rest of the string the desired output. Also replace the backslash with and or comma.

[1] " new"
[2] " new and x" 
[3] " new  and dd"   
[4] " new"

my code is

mydata[mydata =="10 stack" ] <- new # I can replace one type, but I need faster operation.
mydata[mydata =="///" ] <- and #for replacing backslash with and

I found another method can solve the problem

mydata<-as.data.frame(sapply(mydata,gsub,pattern="//\",replacement=","))

yes, any word begin with 10. I want to replace it with "new", but if there is another word in the dataframe I want to keep it. — john
– john, Commented Jun 9, 2016 at 14:31
do you want to replace 10 with new or 10 stack[anything] with new — rock321987
– rock321987, Commented Jun 9, 2016 at 14:33

Steven Beaupré · Accepted Answer · 2016-06-09 14:44:40Z

3

Try

library(stringi) 
stri_replace_all_regex(mydata, c("10 stack", "\\/"), c("new", "and"), vectorize_all=FALSE)

Which gives:

#[1] "new"        " new and x" "new and dd" " newxx"

As per mentioned by @rock321987 in the comments, if you want to replace 10 stack[anything], You could use the pattern \\b10 stack[^\\s]* instead:

stri_replace_all_regex(mydata, c("\\b10 stack[^\\s]*", "\\/"), c("new", "and"), 
                       vectorize_all=FALSE)

Which gives:

#[1] "new"        " new and x" "new and dd" " new"

edited Jun 9, 2016 at 14:44

answered Jun 9, 2016 at 14:28

Steven Beaupré

21.7k7 gold badges60 silver badges79 bronze badges

Sign up to request clarification or add additional context in comments.

7 Comments

john Over a year ago

This is fast answer , but still if I want to replace 10 stack[anything]. in other words any string begins with 10.

rock321987 Over a year ago

@Maged stri_replace_all_regex(mydata, c("\\b10 stack[^\\s]*", "\\/"), c("new", "and"), vectorize_all=FALSE)

Steven Beaupré Over a year ago

This doesn't make much sense to me, "10 stack[anything]" is a lot different from "any string that begins with 10". You mean "10" and the word following ? All words following "10" (in which case you would also replace "and x") ? "All words following "10" that are not "\" or "and" or ","...

john Over a year ago

sorry if it is not being cleared form the beginning. in the data I have, the word stack could be stackx, stackk, or anything like that. so from the beginning I want to replace 10 stack[anything], but keep the remaining dataframe.

Steven Beaupré Over a year ago

@Maged: The metacharacter \\b is an anchor like the caret and the dollar sign. It matches at a position that is called a "word boundary". This match is zero-length. regular-expressions.info/wordboundaries.html

|

zielinskipp · Accepted Answer · 2016-06-09 14:29:08Z

2

You need to use sub() function, which matches pattern and substitute it with replacement.

sub("10 stack", " new", mydata)

answered Jun 9, 2016 at 14:29

zielinskipp

1205 bronze badges

4 Comments

john Over a year ago

it solves the problem partially, I need to replce 10 stack[anything] not 10 stack. in other words any string begins with "10".

zielinskipp Over a year ago

then try sub("^10", "new", mydata)

john Over a year ago

it should work, but I don't know for some reasons the data won't replaced.

zielinskipp Over a year ago

after your explaination in the other thread I think it may look like this: sub("^10 stack[a-zA-Z]*", " new", mydata)

Collectives™ on Stack Overflow

replacing values in selected columns of a dataframe using RegExp

2 Answers 2

7 Comments

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

7 Comments

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related