3

i need to strip the url from a given string. The only solution i have is:

_url="http://download.enlightenment.org/rel/apps/econnman/econnman-1.1.tar.gz"
_l=${_url%.*/*}        #  http://download.enlightenment
_l=${#_l}              #  29
_url=${_url:0:${l}+4}  #  http://download.enlightenment.org

But this will fail on every none 3 chars length TLD and it's just the wrong way.

thx a lot.

1
  • bash only? Can you use awk or sed? Commented Nov 6, 2016 at 21:17

3 Answers 3

6

You can use grep:

 $ echo "$_url" | grep -Eo '^http[s]?://[^/]+'
 http://download.enlightenment.org

You can use expr with a regex:

$ echo `expr "$_url" : '\(http://[^/]*\)'`
http://download.enlightenment.org

Or, use awk:

echo "$_url" | awk -F/ 'BEGIN{OFS=FS} {print $1 OFS OFS $3}'
http://download.enlightenment.org

You can use cut:

echo "$_url" | cut -d/ -f1-3
http://download.enlightenment.org

cut is probably the easiest to get the rest of the url as well:

$ echo "$_url" | cut -d/ -f4-
rel/apps/econnman/econnman-1.1.tar.gz

Or, completely internal to Bash:

 $ [[ $_url =~ ^([^:]+://[^/]+)/?(.*)$ ]] && server="${BASH_REMATCH[1]}"
 $ echo "$server"
 http://download.enlightenment.org

and "${BASH_REMATCH[2]}" has the rest of the url.

Sign up to request clarification or add additional context in comments.

Comments

3

To extract a substring from a value already contained in a shell variable, use Bash's regex-matching operator, =~, which supports extended regular expressions:

Note: dawg's answer contains solutions that are better suited to input from a file or stdin with multiple inputs.
They incur startup cost due to involving child processes, but for collections of inputs that is well worth it, because external utilities are much more efficient at processing larger input sets.

_url='http://download.enlightenment.org/rel/apps/econnman/econnman-1.1.tar.gz'
[[ $_url =~ ^https?://[^/]+ ]] && _url="${BASH_REMATCH[0]}"
echo "$_url"   # -> 'http://download.enlightenment.org'
  • ^https?://[^/]+ ]] matches any string that starts with (^) literal http:// or https:// and matches the longest nonempty (+) run of characters that follows that doesn't include / ([^/]+).

  • Built-in array BASH_REMATCH contains the results of the most recent application of the =~ operator, with the first element (with index 0) containing whatever the regex matched as a whole.
    (Subsequent elements would contain what parenthesized sub-expressions (a.k.a capture groups) matched, but in this case we're not using any).

2 Comments

Works perfectly for my needs i only change it a little bit ... ^http://[^/]+ ... to ... ^https*://[^/]+ ... so i can use http and https.
@wfx: Glad to hear it; ^https?://[^/]+ is a little more robust, though - I've updated the answer.
0

I don't know if this works for bash, but it works for pcre regex engine.

(?<=:\/\/)(.*)(?=\/)

Finds all text between first / and second /. Works for https://google.com/ but doesn't for google.com/ or https://google.com. Depends on what you need.

Regex 101

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.