Python Regular Expression to add links to urls

Question

I'm trying to make a regular expression that will correctly capture URLs, including ones that are wrapped in parenthesis as in (http://example.com) and spoken about on coding horror at https://blog.codinghorror.com/the-problem-with-urls/

I'm currently using the following to create HTML A tags in python for links that start with http and www.

r1 = r"(\b(http|https)://([-A-Za-z0-9+&@#/%?=~_()|!:,.;]*[-A-Za-z0-9+&@#/%=~_()|]))"
r2 = r"((^|\b)www\.([-A-Za-z0-9+&@#/%?=~_()|!:,.;]*[-A-Za-z0-9+&@#/%=~_()|]))"
return re.sub(r2,r'<a rel="nofollow" target="_blank" href="http://\1">\1</a>',re.sub(r1,r'<a rel="nofollow" target="_blank" href="\1">\1</a>',text))

this works well except for the case where someone wraps the url in parens. Does anyone have a better way?

nosklo · Accepted Answer · 2009-01-24 19:03:05Z

4

Problem is, URLs could have parenthesis as part of them... (http://en.wikipedia.org/wiki/Tropical_Storm_Alberto_(2006)) . You can't treat that with regexp alone, since it doesn't have state. You need a parser. So your best chance would be to use a parser, and try to guess the correct close parenthesis. That is error-prone (the url could open parenthesis and never close it) so I guess you're out of luck anyway.

See also http://en.wikipedia.org/wiki/, or (http://en.wikipedia.org/wiki/)) and other similar valid URLs.

edited Jan 24, 2009 at 19:03

answered Jan 24, 2009 at 18:57

nosklo

224k58 gold badges300 silver badges299 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Python Regular Expression to add links to urls

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related