1

I'm trying to deal with text with R and here is my question.

From this source text

#Pray4Manchester# I hope that #ArianaGrande# will be better soon.

I want to extract Pray4Manchester and ArianaGrande using the pattern #.+#, but when I run

str_extract_all(text,pattern="#.+#")

I get

#Pray4Manchester# I hope that #ArianaGrande#

How to solve this? Thanks.

2
  • 1
    it doesn't work because the character # also matches pattern .+, and this (I guess) causes str_extract to look greedily for widest match. You will need pattern that does not include # in itself, such as the one suggested by akrun, for example. Commented Jul 10, 2017 at 13:42
  • 1
    you need to use the non-greedy modifier str_extract_all(text,pattern="#.+?#") Commented Jul 10, 2017 at 13:51

2 Answers 2

2

We can do

str_extract_all(text, "(?<=#)\\w*(?=#)")[[1]]
#[1] "Pray4Manchester" "ArianaGrande"   

data

text <- "#Pray4Manchester# I hope that #ArianaGrande# will be better soon."
Sign up to request clarification or add additional context in comments.

Comments

1

You could use regex to look for results that match text between two hashes that don't contain a space character.

Something like this: ([#]{1}[^\s]+[#]{1})

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.