0

I have a text file which have multiple urls with detail paths i want to get only base urls using regex

Text file contain urls like this

www.facbook.com/abc/xyz
www.google.com.pk/q=12hsjak
www.yahoo.co.uk/avga/ajak
defence.pk/zya/bahawalpur
Pic2fly.uk/abc

Want output like this

Www.facebook.com
Www.google.com.pk
Www.yahoo.co.uk
Defence.pk
Pic2fly.uk

Please help

I have tried this

Print re.search(r'(https?://)?(www\.)?([^/]*)', url)[3]
2
  • Did you tried something? What doesn't work? Commented Apr 16, 2017 at 9:59
  • Yes i have added in the question Commented Apr 16, 2017 at 10:13

2 Answers 2

1

You don't really need re for this, try os.path.split or urlparse.

Sign up to request clarification or add additional context in comments.

1 Comment

Don't use os.path.split. That's the wrong tool for the job, even if it works on some/most/all operating systems.
1

I would keep all URLs with ('/') inside, into a list, then would search the list like this:

list1=['www.facbook.com/abc/xyz','www.google.com.pk/q=12hsjak','www.yahoo.co.uk/avga/ajak','defence.pk/zya/bahawalpur','Pic2fly.uk/abc']
i=0
while i<len(list1):
    print(list1[i][:list1[i].find('/')])
    i+=1

result is what you want, like this:

www.facbook.com
www.google.com.pk
www.yahoo.co.uk
defence.pk
Pic2fly.uk

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.