2

I'm using the following method to parse URLs:

Regex.Replace(text, @"((www\.|(http|https|ftp)\://)[.a-z0-9-]+\.[a-z0-9\/_:@=.+?,##%&~-]*[^.|\'|\# |!|\(|?|,| |>|<|;|\)])",
                            "<a href=\"$1\" target=\"&#95;blank\">$1</a>", RegexOptions.IgnoreCase).Replace("href=\"www.", "href=\"http://www.");

It works great, but:

  1. asdhttp://google.com will be parsed, so how can I disallow characters before "http://" / "www"?

  2. When a URL is inside a tag, I don't want it to be parsed:

[url]http://google.com[/url]

How can I do that?

1
  • how about URLs inside IMG and LINK tags, are they allowed to match? does "a tag" in your description means a tag? Commented Oct 14, 2010 at 10:57

3 Answers 3

1

use ^ before http and www which means your string should start with http, www or https or ftp

^(www\.|(http|https|ftp)
Sign up to request clarification or add additional context in comments.

3 Comments

But then something like "google: http ://google.com" won't work
@Alex: Do you have specific set of strings which need to be allowed or not? Because if you try to include google, then you will have to include adshttp as well. or you have to hardcode google like http|ftp|https|google
I just have to parse URLs in a text. Just like any forum works. "Hello, this is my website: http: //as.com" - URL should be parsed here. "Hihttp://as.com" - should not be parsed. So using ^ and $ is not a solution.
1

added ^ at the beginning and $ at the end, nothing comes before http and after the normal url

Regex.Replace(text, @"^((www\.|(http|https|ftp)\://)[.a-z0-9-]+\.[a-z0-9\/_:@=.+?,##%&~-]*[^.|\'|\# |!|\(|?|,| |>|<|;|\)])$",
                            "<a href=\"$1\" target=\"&#95;blank\">$1</a>", RegexOptions.IgnoreCase).Replace("href=\"www.", "href=\"http://www.");

Comments

0

Since the it seems the url is part a part or a block of text, use \b for word boundary:

Regex.Replace(text, @"\b((www\.| ... "

Your second question is a bit more tricky - have you considered using the same regex for both tasks?

3 Comments

Looks like that's what I need. But how can I exclude the word?
@Alex - I gave it some thought, and it isn't so simple. You could use (?<=\[url\]) before the regex (negative look behind), but it wouldn't work for [url]http://www.example.com[/url] - which will capture www.example.com. As I've said, you may need to write a small parser for that, so you can parse these tokens first, and let the regex handle the rest.
Ok, thanks. I'll try to find something about BB code parsers online.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.