0

I'm struggling to find the best solution to extract multiple urls from a (very long) string.

Here's an example text:

miserie <- "some text /Home/123/home-name/Specs some other text http://www.example.com/Specs some other text /Home/456/home-name/Specs"

Edit: Updated example:

miserie <- "/Home/homes?query=123 qdf /Home/123/home-name/Specs , homeurl : http://www.example.com/ },{ id :1, y : 02 , p :false, url : /Home/456/home-name/Specs"

This is the outcome I want:

[1] "/Home/123/home-name/Specs"
[2] "/Home/456/home-name/Specs"

In essence, I need a solid solution that extract all paths that start with "/Home" and end with "/Specs".

I've tried the following pattern:

pat <- ".*(/Home/.*/Specs).*"

And the following functions:

str_match_all(miserie,pat)
gsub(x=miserie, pattern=pat, replace="\\1")

The first returned this result:

[[1]]
     [,1]                                                                                                                     
[1,] "some text /Home/123/home-name/Specs some other text http://www.example.com/Speccs some other text /Home/456/home-name/Specs"
     [,2]                       
[1,] "/Home/456/home-name/Specs"

And the second only returned the last URL:

[1] "/Home/456/home-name/Specs"

Any suggestions?

4
  • Do you only want paths starting with /Home and ending in /Specs? Or, might you also want to capture other types of paths? Commented Feb 9, 2020 at 13:59
  • Only starting with /Home and ending in /Specs Commented Feb 9, 2020 at 14:15
  • It is now very unclear what you are trying to achieve here. I suggest editing your question and showing clear input along with the output you expect. You have not done this (and, by the way, your recent edit to the question invalidated the answers already given below). Commented Feb 9, 2020 at 15:13
  • Hello Tim -- Very sorry for the confusion. In essence, I need a solid solution that extract all paths from a lengthy string that start with "/Home" and end with "/Specs". I've made this update in the original post as well. Your help so far has been truly appreciated. Commented Feb 9, 2020 at 15:21

2 Answers 2

3

We can try using gregexpr and regmatches with the following regex pattern:

(?<!\\S)/Home(/[^/\\s]+)*/Specs

Sample script:

miserie <- "some text /Home/123/home-name/Specs some other text http://www.example.com/Specs some other text /Home/456/home-name/Specs"
regmatches(miserie, gregexpr("(?<!\\S)/Home(/[^/\\s]+)*/Specs", miserie, perl=TRUE))

[[1]]
[1] "/Home/123/home-name/Specs" "/Home/456/home-name/Specs"

Here is an explanation of the regex pattern being used:

(?<!\\S)       assert that what precedes is either whitespace or
               the start of the string
/Home          match /Home
(/[^/\\s]+)*   optionally match zero or more other components
/Specs         ending in Specs
Sign up to request clarification or add additional context in comments.

3 Comments

Thanks -- while it works on quite a few strings, it's not giving me the results for my very large string. Take for example: "example.com/Specs/tests/?test=true / sfdqdf /some /heres-some-other-text text /Home/123/home-name/Specs some other text some other text /Home/456/home-name/Specs"
@BroQ I have updated my answer based on what my latest understanding of your question is.
Thanks Tim -- that works! For the full text, I found that the combination of your answer with Ronak's returned the exact result I needed: str_match_all(urls_text,"(?<!\\S)/Home(/[^/\\s]+)*/Specs")[[1]][,1] -- not sure why that gives a different result though..
2

You could use :

stringr::str_match_all(miserie,".*?(/Home/.*?/Specs).*?")[[1]][,2]
#[1] "/Home/123/home-name/Specs" "/Home/456/home-name/Specs"

Using ? allows to make the pattern lazy matching as few characters as possible.

2 Comments

Thanks. Worked on this piece of string, however it doesn't seem to be working with a string like "/Home/homes?query=123 qdf /Home/123/home-name/Specs , homeurl : example.com },{ id :1, y : 02 , p :false, url : /Home/456/home-name/Specs"
@BroQ The formatting is not clear in the comments. Please update your post with relevant example.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.