0

Did a bit of searching already but cannot seem to find an elegant way of doing this. I'd like to be able to search through a list like below and only end up with a plain text output file containing on the domain name, no http:// or anything after the /

So a list like this:

http://7wind.ru/file/Behind+the+dune/
http://aldersgatencsc.org/open.php?utm_source=5r2ke0ow6k&utm_medium=qqod2h9a88&utm_campaign=2d1hl1v8c5&utm_term=mz34ligqc4&utm_content=bgi71kl5oy
http://amunow.org/test.php?utm_source=5r2ke0ow6k&utm_medium=qqod2h9a88&utm_campaign=2d1hl1v8c5&utm_term=dhxg1r4l76&utm_content=tr71txtklp

I want to end up with plain text output file like this.

7wind.ru
aldersgatencsc.org
amunow.org

6 Answers 6

3

Given:

$ echo "$txt"
http://7wind.ru/file/Behind+the+dune/
http://aldersgatencsc.org/open.php?utm_source=5r2ke0ow6k&utm_medium=qqod2h9a88&utm_campaign=2d1hl1v8c5&utm_term=mz34ligqc4&utm_content=bgi71kl5oy
http://amunow.org/test.php?utm_source=5r2ke0ow6k&utm_medium=qqod2h9a88&utm_campaign=2d1hl1v8c5&utm_term=dhxg1r4l76&utm_content=tr71txtklp

You can use cut:

$ echo "$txt" | cut -d'/' -f3
7wind.ru
aldersgatencsc.org
amunow.org

Or, if your content is in a file:

$ cut -d'/' -f3 file
7wind.ru
aldersgatencsc.org
amunow.org

Then redirect that to the file you want:

$ cut -d'/' -f3 file >new_file
Sign up to request clarification or add additional context in comments.

1 Comment

Thanks this works great and it's short and to the point. Much obliged.
1
awk -F \/ '{ print $3 }' outputfile > newfile

Print the 3rd field delimited by /

Comments

1
$ sed -r 's#.*//([^/]*)/.*#\1#' Input_file
7wind.ru
aldersgatencsc.org
amunow.org

Comments

0

try following awks.

Solution 1st:

awk '{sub(/.*\/\//,"");sub(/\/.*/,"");print}'   Input_file

Solution 2nd:

awk '{match($0,/\/.[^/]*/);print substr($0,RSTART+2,RLENGTH-2)}'   Input_file

Comments

0

This works by stripping the protocol and :// first, then anything after and including the next slash.

sed "s|.*://||; s|/.*||" url-list.txt

Add -i to change the file directly.

Comments

0

try this regexp

((http|https):\/\/)?([a-zA-Z\.]+)(\/)?

first match, 3th group but it may validate invalid url too! be careful

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.