PHP Parse URL - Domain returned as path when protocol prefix not present

Question

I am trying to parse URL's in PHP where the input could be any of the following:

Code:

$info = parse_url('http://www.domainname.com/');
print_r($info);

$info = parse_url('www.domain.com');
print_r($info);

$info = parse_url('/test/');
print_r($info);

$info = parse_url('test.php');
print_r($info);

Returns:

Array
(
    [scheme] => http
    [host] => www.domainname.com
    [path] => /
)
Array
(
    [path] => www.domain.com
)
Array
(
    [path] => /test/
)
Array
(
    [path] => test.php
)

The problem you can see is the second example where the domain is returned as a path.

Taha Paksu · Accepted Answer · 2012-04-28 00:29:02Z

12

This gives the right results but the file needs to start with a slash:

parse('http://www.domainname.com/');
parse('www.domain.com');
parse('/test/');
parse("/file.php");

function parse($url){
    if(strpos($url,"://")===false && substr($url,0,1)!="/") $url = "http://".$url;
    $info = parse_url($url);
    if($info)
    print_r($info);
}

and the result is :

Array
(
    [scheme] => http
    [host] => www.domainname.com
    [path] => /
)
Array
(
    [scheme] => http
    [host] => www.domain.com
)
Array
(
    [path] => /test/
)
Array
(
    [path] => /file.php
)

edited Apr 28, 2012 at 0:29

answered Apr 28, 2012 at 0:18

Taha Paksu

15.6k2 gold badges50 silver badges83 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

Matt Over a year ago

just a quick one, how can I differentiate between a file name and domain name to append the leading slash?

Taha Paksu Over a year ago

check if there's any www preceeding it, but it may not be safe, checking it's extension - if you know all the file extension possibilities - would be better. counting the "."'s won't be safe either.

Matt Over a year ago

Well my code is scanning a page for links so there's no guarantee the link will have www or a subdomain or neither at all. Mammoth task if I need to check for all tld's!

Taha Paksu Over a year ago

If you are fetching urls from anchors in a web page, there's three possibilities: first, remote urls, they always start with "http://", second; "relative to root" urls, they always start with "/", third, "relative to current path" urls, they directly start with the path or file. You won't be running into "www.yourdomain.com" type urls in anchors.

Taha Paksu Over a year ago

Two more possibilities, first, inline page anchors, they start with "#", second: "javascript:" action href's.

Courtney Miles · Accepted Answer · 2017-11-13 06:15:54Z

0

To handle a URL in a way that preserves that it is was a schema-less URL, whilst also allowing a domain to be identified, use the following code.

if (!preg_match('/^([a-z][a-z0-9\-\.\+]*:)|(\/)/', $url)) {
    $url = '//' . $url;
}

So this will apply "//" to beginning of the URL only if the URL does not have a valid scheme and does not begin with "/".

Some quick background on this:

The parser assumes (valid) characters before ":" is the schema, whilst characters following "//" is the domain. To indicate the URL has both a scheme and domain, the two markers must be used consecutively, "://". For example

[scheme]:[path//path]
//[domain][/path]
[scheme]://[domain][/path]
[/path]
[path]

This is how PHP parses URLs with parse_url() but I couldn't say if it's to standard.

The rules for a valid scheme name is: alpha *( alpha | digit | "+" | "-" | "." )

answered Nov 13, 2017 at 6:15

Courtney Miles

4,0644 gold badges33 silver badges50 bronze badges

2 Comments

Shardj Over a year ago

preg_match(): Unknown modifier ')'

Courtney Miles Over a year ago

@Shardj I'm afraid I can't replicate the error you have reported. Perhaps double check you have copied the expressions correctly. I suspect you have (/) in the expression instead of (\/).

Collectives™ on Stack Overflow

PHP Parse URL - Domain returned as path when protocol prefix not present

2 Answers 2

5 Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

5 Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related