how to write a regex in jquery to find valid url format with valid domain name

Question

I am trying to parse HTML to find URLs in the posts. Actually most of the times it works, but in one case it does not parse. I need to parse all the links present in the post. Link format varies as follows:-

google.com
google.com/q=love
google.com/in-love/1212/a
www.google.com/in-love/1212/a
www.google.com/q=love
www.google.com
http://www.google.com/in-love/1212/a
http://google.com
http://www.google.com
http://google.com/q=love
https://www.google.com/in-love/1212/a
https://google.com
https://www.google.com
https://google.com/q=love

but in some cases my regex parses these too:-

tanmoy.kundu
i.e

I am using this regex to parse the HTML post:

/\(?(?:(http|https|ftp):\/\/)?(?:((?:[^\W\s]|\.|-|[:]{1})+)@{1})?((?:www.)?(?:[^\W\s]|\.|-)+[\.][^\## Heading ##W\s]{2,4}|localhost(?=\/)|\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})(?::(\d*))?([\/]?[^\s\?]*[\/]{1})*(?:\/?([^\s\n\?\[\]\{\}\#]*(?:(?=\.)){1}|[^\s\n\?\[\]\{\}\.\#]*)?([\.]{1}[^\s\?\#]*)?)?(?:\?{1}([^\s\n\#\[\]]*))?([\#][^\s\n]*)?\)?/g

I need a valid domain checking parsing. Like .com, .uk etc

Well there can be anything after a . there is no pattern for that. the extension can be anything these days. If you want to limit your extensions then you need to manually check them not using a pattern. — Harry Bomrah
– Harry Bomrah, Commented Dec 9, 2015 at 9:56
I hate to be negative here, but this is never going to be a complete solution. First off, take a look at why you should not try to use regex alone to parse HTML here. Then take a look at just how complex a URL can be at [RFC]( tools.ietf.org/html/rfc3986). There are always going to be corner cases you miss with RE — N. Leavy
– N. Leavy, Commented Dec 9, 2015 at 10:06
This is a good related article - while it is talking about matching emails, it has a good discussion about domain names. — James Thorpe
– James Thorpe, Commented Dec 9, 2015 at 10:39

user5554671 · Accepted Answer · 2015-12-10 07:05:46Z

This Regx is helpful for my case

/(((?:ht|f)tp[s]?:[\/]{2})?(?:\w+(?::\w+)?@)?(?:(?:(?:\d{1,3}\.){3}\d{1,3})|(?:(?:\w|\d|\.|\$|_|@|\+|\-)*(?:\w|\d|\$|_|@|\+|\-)\.(?:aero|asia|biz|cat|com|coop|edu|gov|info|int|jobs|mil|mobi|museum|name|net|org|pro|tel|travel|ac|ad|ae|af|ag|ai|al|am|an|ao|aq|ar|as|at|au|aw|ax|az|ba|bb|bd|be|bf|bg|bh|bi|bj|bm|bn|bo|br|bs|bt|bv|bw|by|bz|ca|cc|cd|cf|cg|ch|ci|ck|cl|cm|cn|co|cr|cu|cv|cx|cy|cz|de|dj|dk|dm|do|dz|ec|ee|eg|er|es|et|eu|fi|fj|fk|fm|fo|fr|ga|gb|gd|ge|gf|gg|gh|gi|gl|gm|gn|gp|gq|gr|gs|gt|gu|gw|gy|hk|hm|hn|hr|ht|hu|id|ie|il|im|in|io|iq|ir|is|it|je|jm|jo|jp|ke|kg|kh|ki|km|kn|kp|kr|kw|ky|kz|la|lb|lc|li|lk|lr|ls|lt|lu|lv|ly|ma|mc|md|me|mg|mh|mk|ml|mm|mn|mo|mp|mq|mr|ms|mt|mu|mv|mw|mx|my|mz|na|nc|ne|nf|ng|ni|nl|no|np|nr|nu|nz|om|pa|pe|pf|pg|ph|pk|pl|pm|pn|pr|ps|pt|pw|py|qa|re|ro|rs|ru|rw|sa|sb|sc|sd|se|sg|sh|si|sj|sk|sl|sm|sn|so|sr|st|su|sv|sy|sz|tc|td|tf|tg|th|tj|tk|tl|tm|tn|to|tp|tr|tt|tv|tw|tz|ua|ug|uk|us|uy|uz|va|vc|ve|vg|vi|vn|vu|wf|ws|ye|yt|za|zm|zw))(?!\w))(?::\d{1,5})?(?:\/+(?:\w|\d|\.|\=|\$|_|@|\+|\-|~)*(?:\w|\d|\$|_|@|\+|\-|~))*\/*(?:\?(?:\w|\d|\.|\$|_|@|\+|\-|&|=)*)?)/g

Thanks :-)

Doc Roms · Accepted Answer · 2015-12-09 11:04:00Z

Regex exist for check to enable the largest possible number of cases with a same rule.

Now, with the case of one validation of URL, it's verry difficult to check all URLs with one REGEX because the new gTLD (list of all GTLD and "old" extensions are here => https://en.wikipedia.org/wiki/List_of_Internet_top-level_domains) are more longer, many website have a subDomain... etc...

For me, the best regex pattern should be test the extension (for know if the URL can be really exist... or not) I know this website => https://mathiasbynens.be/demo/url-regex for get many REGEX PATTERN for checked specif URL.

In your case,

i.e
tanmoy.kundu

If the regex checked if your extension is valid, ('e' and 'kundu' are not a valid exentsions) your regex works :p

And, don't forget you can test your regex with http://www.regexpal.com ^_^ it's easy.

Collectives™ on Stack Overflow

how to write a regex in jquery to find valid url format with valid domain name

2 Answers 2

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related