Python -Filtering links from user inputted data

Question

What is a safe an efficient way to filter links from user inputted data, and create an anchor link that is then used in the html. Like how when writing a question, and you copy-paste a link, it automatically becomes an anchor link.

kindall · Accepted Answer · 2012-09-09 00:39:57Z

1

Use Gruber's regular expression to find the URIs.

import re

text = "foo http://www.stackoverflow.com bar"

uri_re = re.compile(r"""(?i)\b((?:[a-z][\w-]+:(?:/{1,3}|[a-z0-9%])|"""
                    r"""www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}/)(?"""
                    r""":[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))"""
                    r"""*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|"""
                    r"""[^\s`!()\[\]{};:'".,<>?«»“”‘’]))""")

print uri_re.sub(r"""<a href="\g<0>">\g<0></a>""", text)

Result:

foo <a href="http://www.stackoverflow.com">http://www.stackoverflow.com</a> bar

Now the Gruber regex will actually match partial URIs such as www.stackoverflow.com (this is missing the http:// scheme), which won't work when you just stick it into an anchor tag. You can write a function that checks for that and adds it where necessary, then use that to do the replacement:

text = "foo www.stackoverflow.com bar"

def link(match):
    uri = match.group()
    if ":" not in uri[:7]:
        uri = "http://" + uri
    return r"""<a href="{0}">{0}</a>""".format(uri)

print uri_re.sub(link, text)

edited Sep 9, 2012 at 0:39

answered Sep 9, 2012 at 0:25

kindall

185k36 gold badges291 silver badges321 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Wiz Over a year ago

Should you escape/sanitize text before you run this script on it?

kindall Over a year ago

I'd sanitize first, fix up the links after.

Wiz Over a year ago

What do you mean by fix up the links?

kindall Over a year ago

By fix up the links I mean the stuff I posted in my answer: adding HTML markup to turn the naked URIs into links.

Collectives™ on Stack Overflow

Python -Filtering links from user inputted data

1 Answer 1

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related