RegEx to remove http://www. if it exists in PHP and JS

Question

Could someone please help me with a regular expression (I need it in php and in js) to remove http:// and www. from the beginning of a url string and remove the trailing / if its there.

For Example

http://www.google.com/ would be google.com
https://yahoo.com?page=1 would be yahoo.com?page=1
fancysite.com/articles/2012/ would be fancysite.com/articles/2012

Heres the code Im using for the JS side:

row.page_href.replace(/^(https?|ftp):\/\//, '')

And heres the code Im using for the php side:

$urlString = rtrim($urlString, '/');
$urlString = preg_replace('~^(?:https?://)?(?:www[.])?~i', '', $urlString);

As you can see the JS regex only removes http:// currently and the php requires two steps to do everything.

Why don't you add the www to the JS regex? Or why don't you use the the same in both cases? I don't think PHP requires you trim a possible / from the end of the string... that's just how you choose to do it. — Felix Kling
– Felix Kling, Commented Dec 28, 2012 at 16:40
The right regular expression will work in both JS and PHP. — Jason McCreary
– Jason McCreary, Commented Dec 28, 2012 at 16:41
Its a requirement for my project... Why are you questioning why I need something? And no this isn't for anchor text at all. — RachelD
– RachelD, Commented Dec 28, 2012 at 16:42
But... what's the problem with ^(?:https?://)?(?:www[.])?? Looks fine to me, just use it in JS and PHP. — Felix Kling
– Felix Kling, Commented Dec 28, 2012 at 16:44

Brad Christie · Accepted Answer · 2012-12-28 16:43:29Z

4

function cleanUrl($url)
{
  if (($d= parse_url($url)) !== false) // valid url
  {
    return sprintf('%s%s%s',
      ltrim($d['host'], 'www.'),
      rtrim($d['path']. '/'),
      !empty($d['query']) ? '?'.$d['query'] : '');
  }
  return $url;
}

I would take advantage of parse_url (validate the url along with 'clean' it)

answered Dec 28, 2012 at 16:43

Brad Christie

102k16 gold badges160 silver badges200 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

David Harris Over a year ago

Uh-duh, why didn't I think of that. I always forget about that function for soem reason. Use this OP.

RachelD Over a year ago

I was going with the regex because I assumed it was faster than parsing and trimming the URL. Am I mistaken in my assumption?

Brad Christie Over a year ago

@RachelD: Regex requires more overhead than the php's plain string parser. For this reason, I consider regex more overhead than is necessary.

David Harris · Accepted Answer · 2012-12-28 16:42:04Z

0

#(https?(://))?(www.?)?(.*)#i

Worked just fine for me. You could change the last (.*) to match the RFC standards of a URL.

Outputs:

david@david-desktop ~ $ php -a
Interactive shell

php > $str = preg_replace('#(https?(://))?(www.?)?(.*)#i', '$4', 'https://www.google.ca');
php > echo $str . PHP_EOL;
google.ca
php > $str = preg_replace('#(https?(://))?(www.?)?(.*)#i', '$4', 'https://google.ca');
php > echo $str . PHP_EOL;
google.ca
php > $str = preg_replace('#(https?(://))?(www.?)?(.*)#i', '$4', 'http://google.ca');
php > echo $str . PHP_EOL;
google.ca
php >

answered Dec 28, 2012 at 16:42

David Harris

2,71719 silver badges27 bronze badges

1 Comment

RachelD Over a year ago

Thank you I had something very similar to this but it wasn't doing what I wanted so I thought I was on the wrong trail. I will play with this more.

Collectives™ on Stack Overflow

RegEx to remove http://www. if it exists in PHP and JS

2 Answers 2

3 Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related