0

What is a safe an efficient way to filter links from user inputted data, and create an anchor link that is then used in the html. Like how when writing a question, and you copy-paste a link, it automatically becomes an anchor link.

1 Answer 1

1

Use Gruber's regular expression to find the URIs.

import re

text = "foo http://www.stackoverflow.com bar"

uri_re = re.compile(r"""(?i)\b((?:[a-z][\w-]+:(?:/{1,3}|[a-z0-9%])|"""
                    r"""www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}/)(?"""
                    r""":[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))"""
                    r"""*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|"""
                    r"""[^\s`!()\[\]{};:'".,<>?«»“”‘’]))""")

print uri_re.sub(r"""<a href="\g<0>">\g<0></a>""", text)

Result:

foo <a href="http://www.stackoverflow.com">http://www.stackoverflow.com</a> bar

Now the Gruber regex will actually match partial URIs such as www.stackoverflow.com (this is missing the http:// scheme), which won't work when you just stick it into an anchor tag. You can write a function that checks for that and adds it where necessary, then use that to do the replacement:

text = "foo www.stackoverflow.com bar"

def link(match):
    uri = match.group()
    if ":" not in uri[:7]:
        uri = "http://" + uri
    return r"""<a href="{0}">{0}</a>""".format(uri)

print uri_re.sub(link, text)
Sign up to request clarification or add additional context in comments.

4 Comments

Should you escape/sanitize text before you run this script on it?
I'd sanitize first, fix up the links after.
What do you mean by fix up the links?
By fix up the links I mean the stuff I posted in my answer: adding HTML markup to turn the naked URIs into links.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.