4

I want to extract the url from a string with shell/bash script, if there is more than one url in the string, then only the first one should be returned.

I have provided some examples of input and output string below. I'm guessing I would need to do some regex, but I'm not too familiar on how I would do this in bash/shell?

Input: Take a look at this site: http://www.google.com/ and you'll find your answer
Output: http://www.google.com/


Input: http://www.google.com
Output: http://www.google.com


Input: Check out http://www.bing.com and http://www.google.com
Output: http://www.bing.com


Input: Grettings, visit <http://www.mywebsite.com> today!
Output: http://www.mywebsite.com
2
  • 2
    Is there anything you have done to try to solve this problem? We will be more willing to answer your question if you tell us what you have tried so far. (Helpful links for asking better questions: How to Ask, FAQ) Commented May 11, 2013 at 23:38
  • Now that I've thought about it I do agree. But I tried to search for it on Google and found no precise answers. I'm not too familiar with either bash or regex, so it's not the greatest combo. But I should've researched more on beforehand. Commented May 11, 2013 at 23:55

2 Answers 2

9

try this:

grep -Eo 'http://[^ >]+' yourFile|head -1 

for example:

kent$  echo "Check out http://www.bing.com and http://www.google.com"|grep -Eo 'http://[^ >]+'|head -1 
http://www.bing.com
kent$  echo "Grettings, visit <http://www.mywebsite.com> today"|grep -Eo 'http://[^ >]+'|head -1 
http://www.mywebsite.com
Sign up to request clarification or add additional context in comments.

5 Comments

Thanks, this seem to work like a charm! I'm gonna try and talk myself trough this one, I want to know why it works. So you do grep, which by default do a search on each line and returns the whole line where it found the match. Since you have the -o flag enabled only the matched part is returned instead of the full line. You use -E to have grep make use of extended regex and behave like egrep. Any particular reason you didn't just use egrep? 'http:// means that it have to start with http:// . But the part with [^ >]+ I'm not fully understand. | head -1 makes it so only first is returned.
I read some more and [^ ] signifies that it'll match any characters that are not within the brackets. Since > and a space is inside the [] the pattern matching will "stop" when it encounters either a space or the > char in the string. The + after this ensures that it's repeated until it hits either space or >. Have I understood this correctly?
-E is not required if you replace + to \+. [^ >]+ means any characters which is not (space) or > (one or many times). If there is <tab> immediately after the url, you may want to add a \t or if your grep supports -P, use -P 'http://[^\s>]+' . also you could change to https?//.... because there could be https://url
Thanks! Means I understood that part correctly (but I was terrible at explaining it to myself, haha. Your explanation was very good). I'll probarbly not encounter tabs since the text are printed from irssi, but thanks for the note. Thanks for the https tip. I was about to do something like http|https but of course yours is much simpler. Thank you very much for your solution and explanation, I'll make sure I understand whatever I make use of before I use it!
glad to help, @user1015149. at SO not every OP is like you. many of them won't want to learn how to do fishing, they just want the fish, and better well cooked. so next time they got hungry, they come and ask the same, no matter the question is easy or difficult. You just did the right way. I must +1 your comment! I'll make sure I understand whatever I make use of before I use it!
0

Use grep command, for example:

cat yourinput.txt | grep "your_regex_here"

2 Comments

his question is about the "your_regex_here" part. also the cat is not necessary
"cat is not necessary" i.e.: you can put the filename as grep last argument

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.