I have a string containing URLs:
string = https://www.link1.net/abc/cik?xai=En8MmT__aF_nQm-F48&sig=Cg0A7_5AE&urlfix=1&;ccurl=https://aax-us.link-two.com/x/c/Qoj_sZnkA%2526adurl%253Dhttp%253A%252F%252Fwww.link-three.mu%252F
I want to extract all of them to have a result like this:
['https://www.link1.net/abc/cik?xai=En8MmT__aF_nQm-F48&sig=Cg0A7_5AE&urlfix=1&;ccurl=','https://aax-us.link-two.com/x/c/Qoj_sZnkA%2526adurl%253D','http%253A%252F%252Fwww.link-three.mu%252F']
I am trying:
urls = [x for x in re.split('(http[s]?)', string) if x]
print urls
And the result is:
['https', '://www.link1.net/abc/cik?xai=En8MmT__aF_nQm-
F48&sig=Cg0A7_5AE&urlfix=1&;ccurl=', 'https', '://aax-us.link-two.com/x/c/Qoj_sZnkA%2526adurl%253D', 'http', '%253A%252F%252Fwww.link-three.mu%252F']
How can I get the the complete URL together given that it can start with 'http' or 'https'?
Any ideas please?
(?=http). Also, no need to putsin a set[s]as it's interpreted literally by default (it doesn't have special meaning alone). Also, no need to check forssincehttpis all you really need to look for (think about it, who cares if there's ansat the end ofhttpifhttpexists - it already satisfies your first requirement).http%253aa valid URL?https://aax-us.link-two.com/x/c/Qoj_sZnkA%2526adurl%253Dhttp%253A%252F%252Fwww.link-three.mu%252F