0

I know regular expressions can be used, but am not able to find the right one. Also are there any built-in functions available that do this?

4
  • Could you explain a little more clearly what you want? Exactly what do you want to match and extract? Can you give some example input strings that you want to match & not match, and what output you're after? Commented Feb 28, 2012 at 6:30
  • 1
    Not without some extra restrictions, they can't. Modern *nix paths can contain any character except \0. Meaning, you can have a path that looks like a sentence if you want. The only thing you can really go by, for absolute pathnames, is a leading slash. Commented Feb 28, 2012 at 6:30
  • "bmake: stopped in /bb/cc/xx/yy/zz/aa". from this I just want to extract /bb/cc/xx/yy/zz/aa Commented Feb 28, 2012 at 6:35
  • In this case I'd suggest matching bmake: stopped in instead, and then whatever follows it. Like /stopped in (.*)/. Commented Feb 28, 2012 at 6:52

3 Answers 3

4

I'd suggest using File::Spec core module instead. If you just need to check whether what's given to you is absolute path or not, use file_name_is_absolute(); if you need to transform relative path to absolute, use rel2abs(), you see the pattern. ) It's easier and way more readable.

Sign up to request clarification or add additional context in comments.

Comments

2

Given:

bmake: stopped in /bb/cc/xx/yy/zz/aa

This regex will pull the pathname:

m%\s(/.*)%

It looks for a white space character followed by a slash followed by anything. If you don't have white space in your path names, then you can use the more restrictive:

m%\s(/\S*)%

If you're sure you'll always have one or more path components to the names, you can add more restrictions:

m%\s(/\S+/\S*)%

And so it goes on. The more you know about what can be valid in the path, the better your chances of matching only the file name. But note that a file name on Unix can contain any character except / (because it is the delimiter between sections of the path name) and \0, the NUL byte. Everything else - newlines, tabs, controls, etc - is fair game and could be part of a file name. Mercifully, most of them usually aren't present in file names.

Note that relative pathnames are even harder than absolute path names.

Comments

1

/(.+/)*.* use this pattern. A slash at the start, then directories (may not be any) and a file name or directory name in the end (this may not be too). Actually this will match everything which starts with slash but it's OK because path in unix can contain everything except \0.

2 Comments

\Unmatched [ in regex; marked by <-- HERE in m/[ <-- HERE / at test.pl line 26.. this is what I get ? Is the syntax wrong ?
@karthikA Your delimiter is /, so you can't use unescaped slashes in your regex. Use another delimiter, e.g. m#/(.+)*.*#.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.