Put URLs from string into array using regex (problem with trailing period)

Question

I am trying to write a function that pulls all url's from a string and remove a potential trailing slash from the end.

function getUrls($string) {
    $regex = '/https?\:\/\/[^\" ]+/i';
    preg_match_all($regex, $string, $matches);
    return ($matches[0]);
}

But that returns http://test.com. (trailing period) If i have

$string = "Hi I am sharing http://test.com.";
$urls = getUrls($string);

It returns the URL with the period at the end.

cambraca · Accepted Answer · 2010-11-23 04:42:45Z

1

This one seems to work (taken from here)

$regex="/(https?:\/\/+[\w\-]+\.[\w\-]+)/i";

answered Nov 23, 2010 at 4:42

cambraca

28k17 gold badges71 silver badges102 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

David Ryder Over a year ago

Thanks! That does work. Any idea on how to make it match with or without http:// ?

cambraca Over a year ago

Making it match urls without http is risky.. Consider the following string: this is a comment.net is great., it would incorrectly match comment.net as a url. I recommend reading the pages linked by martineno, there is some discussion about this problem in Alan Storm's page.

David Ryder Over a year ago

That actually wasn't working correctly, so I finally found a solution: yellow5.us/journal/server_side_text_linkification

David Ryder · Accepted Answer · 2010-11-23 15:04:34Z

In case anyone comes across this, here is what I put together:

$aProtocols = array('http:\/\/', 'https:\/\/', 'ftp:\/\/', 'news:\/\/', 'nntp:\/\/', 'telnet:\/\/', 'irc:\/\/', 'mms:\/\/', 'ed2k:\/\/', 'xmpp:', 'mailto:');
$aSubdomains = array('www'=>'http://', 'ftp'=>'ftp://', 'irc'=>'irc://', 'jabber'=>'xmpp:');
$sRELinks = '/(?:(' . implode('|', $aProtocols) . ')[^\^\[\]{}|\\"\'<>`\s]*[^!@\^()\[\]{}|\\:;"\',.?<>`\s])|(?:(?:(?:(?:[^@:<>(){}`\'"\/\[\]\s]+:)?[^@:<>(){}`\'"\/\[\]\s]+@)?(' . implode('|', array_keys($aSubdomains)) . ')\.(?:[^`~!@#$%^&*()_=+\[{\]}\\|;:\'",<.>\/?\s]+\.)+[a-z]{2,6}(?:[\/#?](?:[^\^\[\]{}|\\"\'<>`\s]*[^!@\^()\[\]{}|\\:;"\',.?<>`\s])?)?)|(?:(?:[^@:<>(){}`\'"\/\[\]\s]+@)?((?:(?:(?:(?:[0-1]?[0-9]?[0-9])|(?:2[0-4][0-9])|(?:25[0-5]))(?:\.(?:(?:[0-1]?[0-9]?[0-9])|(?:2[0-4][0-9])|(?:25[0-5]))){3})|(?:[A-Fa-f0-9:]{16,39}))|(?:(?:[^`~!@#$%^&*()_=+\[{\]}\\|;:\'",<.>\/?\s]+\.)+[a-z]{2,6}))\/(?:[^\^\[\]{}|\\"\'<>`\s]*[^!@\^()\[\]{}|\\:;"\',.?<>`\s](?:[#?](?:[^\^\[\]{}|\\"\'<>`\s]*[^!@\^()\[\]{}|\\:;"\',.?<>`\s])?)?)?)|(?:[^@:<>(){}`\'"\/\[\]\s]+:[^@:<>(){}`\'"\/\[\]\s]+@((?:(?:(?:(?:[0-1]?[0-9]?[0-9])|(?:2[0-4][0-9])|(?:25[0-5]))(?:\.(?:(?:[0-1]?[0-9]?[0-9])|(?:2[0-4][0-9])|(?:25[0-5]))){3})|(?:[A-Fa-f0-9:]{16,39}))|(?:(?:[^`~!@#$%^&*()_=+\[{\]}\\|;:\'",<.>\/?\s]+\.)+[a-z]{2,6}))(?:\/(?:(?:[^\^\[\]{}|\\"\'<>`\s]*[^!@\^()\[\]{}|\\:;"\',.?<>`\s])?)?)?(?:[#?](?:[^\^\[\]{}|\\"\'<>`\s]*[^!@\^()\[\]{}|\\:;"\',.?<>`\s])?)?))|([^@:<>(){}`\'"\/\[\]\s]+@(?:(?:(?:[^`~!@#$%^&*()_=+\[{\]}\\|;:\'",<.>\/?\s]+\.)+[a-z]{2,6})|(?:(?:(?:(?:(?:[0-1]?[0-9]?[0-9])|(?:2[0-4][0-9])|(?:25[0-5]))(?:\.(?:(?:[0-1]?[0-9]?[0-9])|(?:2[0-4][0-9])|(?:25[0-5]))){3})|(?:[A-Fa-f0-9:]{16,39}))))(?:[^\^*\[\]{}|\\"<>\/`\s]+[^!@\^()\[\]{}|\\:;"\',.?<>`\s])?)/i';

function getUrls($string) {
    global $sRELinks;
    preg_match_all($sRELinks, $string, $matches);
    return ($matches[0]);
}

From http://yellow5.us/journal/server_side_text_linkification/

martineno · Accepted Answer · 2010-11-23 04:37:30Z

0

Depending on how strict you want to be, consider the Liberal, Accurate Regex Pattern for Matching URLs regular expression pattern discussed on Daring Fireball. The pattern in full is:

\b(([\w-]+://?|www[.])[^\s()<>]+(?:\([\w\d]+\)|([^[:punct:]\s]|/)))

If you are interested in how it works, Alan Storm has a great explanation.

answered Nov 23, 2010 at 4:37

martineno

2,63518 silver badges15 bronze badges

1 Comment

tchrist Over a year ago

@David: He’s updated his pattern here. He also points out that not everybody supports [[:punct:]]. Me, I’d be more liketo to use [\pP\pS] instead. He also includes a version that works only for http and https.

Collectives™ on Stack Overflow

Put URLs from string into array using regex (problem with trailing period)

3 Answers 3

3 Comments

Comments

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

3 Comments

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related