2

I currently have this as my command to cut out the URL from a line that starts with href in HTML:

sed -ne 's/.*href="\([^"]*\).*/\1/p'

Since href can start with a ' or " and my command only accounts for " right now, I was wondering how to have that command account for both ' and ".

1
  • 1
    s/.*href=("|')\([^"]*\).*/\1/p will recognize both " or ' Commented Jan 9, 2019 at 23:44

1 Answer 1

4
/^(<)(.*?)(href=)("|')(.*?)(>)$/gm

Demo

or is |.

() helps you to group your string step by step.

It is certainly not the best, but that online tool might help you.

Sign up to request clarification or add additional context in comments.

5 Comments

And in a one liner delimited by ' one can use \x27 for ', like /...("|\x27).../. There should be no need for parentheses around every element (they capture), except for the ones around the part that they are after, .*?, and one has to delimit the alternation, ("|\x27). That can also be done by non-capturing ones, (?:"|\x27). The ?: makes parens not capture (remember) what's between them, but only group those things.
(the comment above is meant to be constructive/helpful, etc -- not criticism!)
Yes, that looks good, except for the parenthesis. They "capture" so that what is matched inside is then in $1 (the first parens), $2 (the second), etc. So capture only what is needed later. Judged by the question that's only the URL, so I'd put parens only around .*? before > (which matches the URL). But, we also must have them around the alternation, ("|'), to separate it from the rest. Since this isn't needed later, use (?:) (with ?: they don't "capture" but only group). So I'd say /^<.*?href=(?:\x22|\x27)(.*?)(?:\x22|\x27)>$/gm and after this the URL is availble in $1.
[continued...] I've added the pattern for the closing quote as well. So what is captured (.*?) is really only the URL. Without that the (.*?) itself would capture the closing quote as well.
Right -- one thing is inefficiency, but also (for me) the trouble with unneeded captures is that later it's harder to keep track of which $N I need! If I actually need only one thing, and capture only that one, then I know it's $1 :)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.