What is a safe an efficient way to filter links from user inputted data, and create an anchor link that is then used in the html. Like how when writing a question, and you copy-paste a link, it automatically becomes an anchor link.
1 Answer
Use Gruber's regular expression to find the URIs.
import re
text = "foo http://www.stackoverflow.com bar"
uri_re = re.compile(r"""(?i)\b((?:[a-z][\w-]+:(?:/{1,3}|[a-z0-9%])|"""
r"""www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}/)(?"""
r""":[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))"""
r"""*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|"""
r"""[^\s`!()\[\]{};:'".,<>?«»“”‘’]))""")
print uri_re.sub(r"""<a href="\g<0>">\g<0></a>""", text)
Result:
foo <a href="http://www.stackoverflow.com">http://www.stackoverflow.com</a> bar
Now the Gruber regex will actually match partial URIs such as www.stackoverflow.com (this is missing the http:// scheme), which won't work when you just stick it into an anchor tag. You can write a function that checks for that and adds it where necessary, then use that to do the replacement:
text = "foo www.stackoverflow.com bar"
def link(match):
uri = match.group()
if ":" not in uri[:7]:
uri = "http://" + uri
return r"""<a href="{0}">{0}</a>""".format(uri)
print uri_re.sub(link, text)
4 Comments
Wiz
Should you escape/sanitize text before you run this script on it?
kindall
I'd sanitize first, fix up the links after.
Wiz
What do you mean by fix up the links?
kindall
By fix up the links I mean the stuff I posted in my answer: adding HTML markup to turn the naked URIs into links.