Extract multiple paths from string

Question

I'm struggling to find the best solution to extract multiple urls from a (very long) string.

Here's an example text:

miserie <- "some text /Home/123/home-name/Specs some other text http://www.example.com/Specs some other text /Home/456/home-name/Specs"

Edit: Updated example:

miserie <- "/Home/homes?query=123 qdf /Home/123/home-name/Specs , homeurl : http://www.example.com/ },{ id :1, y : 02 , p :false, url : /Home/456/home-name/Specs"

This is the outcome I want:

[1] "/Home/123/home-name/Specs"
[2] "/Home/456/home-name/Specs"

In essence, I need a solid solution that extract all paths that start with "/Home" and end with "/Specs".

I've tried the following pattern:

pat <- ".*(/Home/.*/Specs).*"

And the following functions:

str_match_all(miserie,pat)
gsub(x=miserie, pattern=pat, replace="\\1")

The first returned this result:

[[1]]
     [,1]                                                                                                                     
[1,] "some text /Home/123/home-name/Specs some other text http://www.example.com/Speccs some other text /Home/456/home-name/Specs"
     [,2]                       
[1,] "/Home/456/home-name/Specs"

And the second only returned the last URL:

[1] "/Home/456/home-name/Specs"

Any suggestions?

Do you only want paths starting with /Home and ending in /Specs? Or, might you also want to capture other types of paths? — Tim Biegeleisen
– Tim Biegeleisen, Commented Feb 9, 2020 at 13:59
It is now very unclear what you are trying to achieve here. I suggest editing your question and showing clear input along with the output you expect. You have not done this (and, by the way, your recent edit to the question invalidated the answers already given below). — Tim Biegeleisen
– Tim Biegeleisen, Commented Feb 9, 2020 at 15:13
Hello Tim -- Very sorry for the confusion. In essence, I need a solid solution that extract all paths from a lengthy string that start with "/Home" and end with "/Specs". I've made this update in the original post as well. Your help so far has been truly appreciated. — BroQ
– BroQ, Commented Feb 9, 2020 at 15:21

Tim Biegeleisen · Accepted Answer · 2020-02-09 15:32:27Z

3

We can try using gregexpr and regmatches with the following regex pattern:

(?<!\\S)/Home(/[^/\\s]+)*/Specs

Sample script:

miserie <- "some text /Home/123/home-name/Specs some other text http://www.example.com/Specs some other text /Home/456/home-name/Specs"
regmatches(miserie, gregexpr("(?<!\\S)/Home(/[^/\\s]+)*/Specs", miserie, perl=TRUE))

[[1]]
[1] "/Home/123/home-name/Specs" "/Home/456/home-name/Specs"

Here is an explanation of the regex pattern being used:

(?<!\\S)       assert that what precedes is either whitespace or
               the start of the string
/Home          match /Home
(/[^/\\s]+)*   optionally match zero or more other components
/Specs         ending in Specs

edited Feb 9, 2020 at 15:32

answered Feb 9, 2020 at 13:50

Tim Biegeleisen

526k32 gold badges324 silver badges399 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

BroQ Over a year ago

Thanks -- while it works on quite a few strings, it's not giving me the results for my very large string. Take for example: "example.com/Specs/tests/?test=true / sfdqdf /some /heres-some-other-text text /Home/123/home-name/Specs some other text some other text /Home/456/home-name/Specs"

Tim Biegeleisen Over a year ago

@BroQ I have updated my answer based on what my latest understanding of your question is.

BroQ Over a year ago

Thanks Tim -- that works! For the full text, I found that the combination of your answer with Ronak's returned the exact result I needed: str_match_all(urls_text,"(?<!\\S)/Home(/[^/\\s]+)*/Specs")[[1]][,1] -- not sure why that gives a different result though..

Ronak Shah · Accepted Answer · 2020-02-09 13:50:38Z

2

You could use :

stringr::str_match_all(miserie,".*?(/Home/.*?/Specs).*?")[[1]][,2]
#[1] "/Home/123/home-name/Specs" "/Home/456/home-name/Specs"

Using ? allows to make the pattern lazy matching as few characters as possible.

answered Feb 9, 2020 at 13:50

Ronak Shah

391k20 gold badges173 silver badges237 bronze badges

2 Comments

BroQ Over a year ago

Thanks. Worked on this piece of string, however it doesn't seem to be working with a string like "/Home/homes?query=123 qdf /Home/123/home-name/Specs , homeurl : example.com },{ id :1, y : 02 , p :false, url : /Home/456/home-name/Specs"

Ronak Shah Over a year ago

@BroQ The formatting is not clear in the comments. Please update your post with relevant example.

Collectives™ on Stack Overflow

Extract multiple paths from string

2 Answers 2

3 Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related