3

I need to extract a URL that is wrapped with <strong> tags. It's a simple regular expression, but I don't know how to do that in shell script. Here is example:

line="<strong>http://www.example.com/index.php</strong>"
url=$(echo $line | sed -n '/strong>(http:\/\/.+)<\/strong/p')

I need "http://www.example.com/index.php" in the $url variable.

Using busybox.

0

4 Answers 4

1

This might work:

url=$(echo $line | sed -r 's/<strong>([^<]+)<\/strong>/\1/')
Sign up to request clarification or add additional context in comments.

Comments

0
url=$(echo $line | sed -n 's!<strong>\(http://[^<]*\)</strong>!\1!p')

Comments

0

You don't have to escape forward slashes with backslashes. Only backslashes need to be escaped in regular expressions. You should also use non-greedy matching with the ?-operator to avoid getting more than you want when there are multiple strong tags in the HTML sourcecode.

strong>(http://.+?)</strong

Comments

0

Update: as busybox uses ash, the solution assuming bash features likely won't work. Something only a little longer but still POSIX-compliant will work:

url=${line#<strong>}  # $line minus the initial "<strong>"
url=${url%</strong>}  # Remove the trailing "</strong>"

If you are using bash (or another shell with similar features), you can combine extended pattern matching with parameter substitution. (I don't know what features busybox supports.)

# Turn on extended pattern support
shopt -s extglob

# ?(\/) matches an optional forward slash; like /? in a regex
# Expand $line, but remove all occurrances of <strong> or </strong>
# from the expansion
url=${line//<?(\/)strong>}

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.