4

I would like to extract paths in the form of:

$/Server/First Level Folder/Second_Level_Folder/My File.extension

The challenge here is that the paths are embedded in a "free form" email like so:

Hello,

 You can download the file here:
  • $/Server/First Level Folder/Second_Level_Folder/My File.extension <- Click me!

Given a string, I would like to extract all paths from it using RegEx. Is this even possible?

Thanks!

5
  • Does it always say " <- Click me!" at the end or is the end of the line sometimes different? Otherwise I think it would be impossible to distinguish the path from other text on the same line. Commented May 15, 2013 at 10:47
  • Something like this should maybe help you ... Commented May 15, 2013 at 10:51
  • @Tharwen No. That is just a sample to imply that paths may be inline with other texts. Commented May 15, 2013 at 12:32
  • @Kent VBA. I think the tool/language is not really necessary, right? Reg. expressions are tool/language agnostic, I think. Commented May 15, 2013 at 12:44
  • yes it is. basic regex, extended regex, PCRE ... vim has it's own (powerful) regex engine .... Commented May 15, 2013 at 13:31

2 Answers 2

12

Yes, this is possible (\$/.*?\.\S*) should do the job just fine.

\$/ matches the start of the path

.*? matches everything till the next part of the regex

\.\S* matches the dot and anything but a whitespace (space, tab)

And the ( ) around it make it capture all that is matched.

EDIT:

For further use

Just the path

(\$/.*?/)[^/]*?\.\S*

Just the filename

\$/.*?/([^/]*?\.\S*)

Sign up to request clarification or add additional context in comments.

6 Comments

Thanks! It works! Just a follow-up, what if I also want to match only up to the "directory" level like so: $/Server/First Level Folder/Second_Level_Folder
Assuming there is a file after that it is possible to do it by (\$/.*?/)(?:[^/]*?\.\S*) Again the (\$/.*?/) matches the path But the (?:[^/]*?\.\S*) matches the filename using ?: the group is made non capturing. So if you want to have the filename and not the path use (?:\$/.*?/)([^/]*?\.\S*)
(\$/.*?/)(?:[^/]*?\.\S*) and (?:\$/.*?/)([^/]*?\.\S*) did not work. I tried validating it against the sample string using regexpal.com
Which programming language or tool are you using? It's possible it doesn't support non capturing groups. If that's the case (\$/.*?/)[^/]*?\.\S* should work. (I edited the answer so it doesn't use non capturing groups anymore.)
I'm using VBA but testing using regexpal.com.
|
4

If the filename contains [escaped forward slashes / or no period symbol] AND the filepath spaces are escaped with a backslash '\ ' you can still do it with this (i've escaped the forward and back slashes)

(\/.*?\/)((?:[^\/]|\\\/)+?)(?:(?<!\\)\s|$)

Regular expression visualization

Debuggex Demo

This creates two capture groups - one for the path and one for the file basename. If your test strings contains filenames with unescaped spaces (as shown) then you would have to use the period in the filename as an anchor as per B8vrede's answer.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.