1

I Have a HTML string,

I was surfing http://www.google.com, where I found my tweet, 
check it out <a href="http://tinyurl.com/blah">http://tinyurl.com/blah</a>
<span>http://www.google.com</span>

to this,

I was surfing <a href="http://www.google.com">http://www.google.com</a>, where I found my tweet, 
check it out <a href="http://tinyurl.com/blah">http://tinyurl.com/blah</a>
<span><a href="http://www.google.com">http://www.google.com</a></span>

I try this Demo

my python code is

import re
p = re.compile(ur'<a\b[^>]*>.*?</a>|((ftp|http|https):\/\/(\w+:{0,1}\w*@)?(\S+)(:[0-9]+)?(\/|\/([\w#!:.?+=&%@!\-\/]))?)', re.MULTILINE)
test_str = u"I was surfing http://www.google.com, where I found my tweet, check it out <a href=\"http://tinyurl.com/blah\">http://tinyurl.com/blah</a>"

for item in re.finditer(p, test_str):
    print item.group(0)

Output:

>>> http://www.google.com,
>>> <a href="http://tinyurl.com/blah">http://tinyurl.com/blah</a>
4
  • so what are you missing? you found the url, now just check if its not an <a> already and replace, right? Commented Oct 27, 2015 at 13:34
  • @mikus i update my question, when i use it in my python code it return anchor tag also. Commented Oct 27, 2015 at 13:39
  • So the desired output is just >>> http://www.google.com, ? Commented Oct 27, 2015 at 14:01
  • @JimK yes, so i append it into anchor tag. Thanks! Commented Oct 28, 2015 at 5:12

3 Answers 3

1

I hope this can help you.

Code:

import re
p = re.compile(ur'''[^<">]((ftp|http|https):\/\/(\w+:{0,1}\w*@)?(\S+)(:[0-9]+)?(\/|\/([\w#!:.?+=&%@!\-\/]))?)[^< ,"'>]''', re.MULTILINE)
test_str = u"I was surfing http://www.google.com, where I found my tweet, check it out <a href=\"http://tinyurl.com/blah\">http://tinyurl.com/blah</a>"

for item in re.finditer(p, test_str):
    result = item.group(0)
    result = result.replace(' ', '')
    print result
    end_result = test_str.replace(result, '<a href="' + result + '">' + result + '</a>')

print end_result

Output:

http://www.google.com
I was surfing <a href="http://www.google.com">http://www.google.com</a>, where I found my tweet, check it out <a href="http://tinyurl.com/blah">http://tinyurl.com/blah</a>
Sign up to request clarification or add additional context in comments.

1 Comment

Its work, but suppose url in span or another tag then it also ignore. i only ignor anchor tag, so help me with this scenario. Thanks!!
1

Ok, I think I finally found what you're looking for. The basic idea is to try to match <a href and a URL. If there is an <a href then don't do anything, but if there is not then add the link. Here is the code:

import re
test_str = """I was surfing http://www.google.com, where I found my tweet, 
check it out <a href="http://tinyurl.com/blah">http://tinyurl.com/blah</a>
<span>http://www.google.com</span>
"""
def repl_func(matchObj):
    href_tag, url = matchObj.groups()
    if href_tag:
        # Since it has an href tag, this isn't what we want to change,
        # so return the whole match.
        return matchObj.group(0)
    else:
        return '<a href="%s">%s</a>' % (url, url)

pattern = re.compile(
    r'((?:<a href[^>]+>)|(?:<a href="))?'
    r'((?:https?):(?:(?://)|(?:\\\\))+'
    r"(?:[\w\d:#@%/;$()~_?\+\-=\\\.&](?:#!)?)*)",
    flags=re.IGNORECASE)
result = re.sub(pattern, repl_func, test_str)
print(result)

Output:

I was surfing <a href="http://www.google.com">http://www.google.com</a>, where I found my tweet,
check it out <a href="http://tinyurl.com/blah">http://tinyurl.com/blah</a>
<span><a href="http://www.google.com">http://www.google.com</a></span>

The main idea is from https://stackoverflow.com/a/3580700/5100564. I also borrowed from https://stackoverflow.com/a/6718696/5100564.

Comments

0

You could make the regex more complex, but as mikus suggested, it seems easier to do the following:

for item in re.finditer(p, test_str):
    result = item.group(0)
    if not "<a " in result.lower():
        print(result)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.