extract string using same pattern in a text using R

Question

I'm trying to deal with text with R and here is my question.

From this source text

#Pray4Manchester# I hope that #ArianaGrande# will be better soon.

I want to extract Pray4Manchester and ArianaGrande using the pattern #.+#, but when I run

str_extract_all(text,pattern="#.+#")

I get

#Pray4Manchester# I hope that #ArianaGrande#

How to solve this? Thanks.

it doesn't work because the character # also matches pattern .+, and this (I guess) causes str_extract to look greedily for widest match. You will need pattern that does not include # in itself, such as the one suggested by akrun, for example. — M. Prokhorov
– M. Prokhorov, Commented Jul 10, 2017 at 13:42
you need to use the non-greedy modifier str_extract_all(text,pattern="#.+?#") — emilliman5
– emilliman5, Commented Jul 10, 2017 at 13:51

Community · Accepted Answer · 2020-06-20 09:12:55Z

2

We can do

str_extract_all(text, "(?<=#)\\w*(?=#)")[[1]]
#[1] "Pray4Manchester" "ArianaGrande"

text <- "#Pray4Manchester# I hope that #ArianaGrande# will be better soon."

CommunityBot

11 silver badge

answered Jul 10, 2017 at 13:33

akrun

891k38 gold badges590 silver badges700 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

msanford · Accepted Answer · 2017-07-10 14:14:59Z

1

You could use regex to look for results that match text between two hashes that don't contain a space character.

Something like this: ([#]{1}[^\s]+[#]{1})

msanford

12.4k13 gold badges73 silver badges100 bronze badges

answered Jul 10, 2017 at 13:44

BoogieMan2718

1091 gold badge1 silver badge10 bronze badges