-1

Possible Duplicates:
PHP validation/regex for URL
PHP regex for validating a URL

I am using

(((?:http|https):\/\/[a-zA-Z0-9\/\?=_#&%~-]+(\.[a-zA-Z0-9\/\?=_#&%~-]+)+)|(www(\.[a-zA-Z0-9\/\?=_#&%~-]+){2,}))

to validate URL in my script.

But my friend told me there is a problem with this URL:

http://www.example.com/example(200)aaaa.rar

How can I add "(" and ")" to my regexp statement?

Are there another characters should I put in my regexp?

5
  • What regex engine are you using? Commented Nov 17, 2010 at 18:16
  • According to your regex, www.foo.#%~ is a valid URL. Whatever language you're using probably already has a URL validator that works better. Commented Nov 17, 2010 at 18:18
  • This question has a LOT of possible duplicates Commented Nov 17, 2010 at 18:21
  • @Paul : It doesn't work regexr.com?2simr Commented Nov 17, 2010 at 18:28
  • What doesn't work? I didn't make any suggestions just pointed out a possible duplicate? Commented Nov 17, 2010 at 18:31

3 Answers 3

2

PHP already has a way to validate URLs, filter_var, which will work better than your regex (which as I commented above, allows false positives):

$url = "http://www.example.com/example(200)aaaa.rar";
var_dump(filter_var($url, FILTER_VALIDATE_URL));
Sign up to request clarification or add additional context in comments.

Comments

0

May i recommend this site: http://regexlib.com/ Click Browse at the top and select Uri button.

To answer your question though, (((?:http|https):\/\/[a-zA-Z0-9\/\?=#&%~-]+(.[a-zA-Z0-9\/\?=#&%~-]+)+)|(www(.[a-zA-Z0-9\/\?=_#&%~-\\)\\(]+){2,}))

Note the \) and \( towards the end. They must be escaped (prefixed with \\) as these are characters used for grouping within regex.

4 Comments

It doesn't work regexr.com?2simo
That doesn't take into account all those (unfortunately) now valid internationalized domains with non-ASCII characters, though.
I'm not 100% familiar with this site's formatting. As such, some characters are missing within, which is why I assume point you to a source that will have the answers, unscathed. @GCATNM: very true, but I don't think (though I may be wrong) they are looking to be that all-inclusive.
SyntaxError: unterminated parenthetical
0

I believe the specification will answer your question RFC-2068, though you will need to unpack your BNF boots for the journey.

In summary, pretty much any character can be used after the the domain name, excepting the few reserved ones which must be escaped:

The BNF [in the RFC] includes national characters not allowed in valid URLs as specified by RFC 1738, since HTTP servers are not restricted in the set of unreserved characters allowed to represent the rel_path part of addresses, and HTTP proxies may receive requests for URIs not defined by RFC 1738

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.