I am trying to parse URLs containing & with preg_replace.
$content = preg_replace('#https?://[a-z0-9._/\?=&-]+#i', '<a href="$0" target="_blank">$0</a>', $content);
But I use it for user comments, so I'm also using htmlspecialchars() function to prevent XSS.
function formatContributionContent($content)
{
$content = nl2br(htmlspecialchars($content));
// Regexp for mails
$content = preg_replace('#[a-z0-9._-]+@[a-z0-9._&-]{2,}\.[a-z]{2,4}#', '<a href="mailto:$0">$0</a>', $content);
// Regexp for urls
$content = preg_replace('#https?://[a-z0-9._/\?=&-]+#i', '<a href="$0" target="_blank">$0</a>', $content);
var_dump($content);
}
formatContributionContent('https://openclassrooms.com/index.php?page=3&skin=blue');
And htmlspecialchars transforms & into "&" so my regexp produce a wrong result. Indeed, with the following URL.
http://www.siteduzero.com/index.php?page=3&skin=blue
I obtain ;
<a href="https://openclassrooms.com/index.php?page=3&" target="_blank">https://openclassrooms.com/index.php?page=3&</a>;skin=blue
htmlspecialchars()method to output the result. But probably you would have to apply it to the separate parts of that URL, not to the whole URL, since it would obviously turn the URL into its readable notation instead of rendering it in a usable way. So your whole approach won't work. You'd have to split that URL first and handle the tokens separately.