Parsing Domainname From URL In PHP

Question

How I can parse a domain from URL in PHP? It seems that I need a country domain database.

Examples:

http://mail.google.com/hfjdhfjd/jhfjd.html -> google.com
http://www.google.bg/jhdjhf/djfhj.html -> google.bg
http://www.google.co.uk/djhdjhf.php -> google.co.uk
http://www.tsk.tr/jhjgc.aspx -> tsk.tr
http://subsub.sub.nic.tr/ -> nic.tr
http://subsub.sub.google.com.tr -> google.com.tr
http://subsub.sub.itoy.info.tr -> itoy.info.tr

Can it be done with whois request?

Edit: There are few domain names with .tr (www.nic.tr, www.tsk.tr) the others are as you know: www.something.com.tr, www.something.org.tr

Also there is no www.something.com.bg, www.something.org.bg. They are www.something.bg like the Germans' .de

But there are www.something.a.bg, www.something.b.bg thus a.bg, b.bg, c.bg and so on. (a.bg is like co.uk)

There on the net must be list of these top domain names.

Check how is coloured the url http://www.agrotehnika97.a.bg/ in Internet Explorer. Check also

www.google.co.uk<br>
www.google.com.tr<br>
www.nic.tr<br>
www.tsk.tr

Note, that co.uk, com.tr and info.tr itself are completely valid domains/host names, and all those are not top level domains. As such google in google.co.uk is just a subdomain of co.uk. Given that you can freely combine nearly everything, you probably won't be able to make a complete table for that.. — poke
– poke, Commented Feb 24, 2010 at 17:19
@poke, I saw the list in a web site. Firefox was/is using the list of that website. But I do not remember it. — ilhan
– ilhan, Commented Feb 24, 2010 at 17:27

Franz · Accepted Answer · 2010-02-24 17:15:42Z

3

The domain is stored in $_SERVER['HTTP_HOST'].

EDIT: I believe this returns the whole domain. To just get the top-level domain, you could do this:

// Add all your wanted subdomains that act as top-level domains, here (e.g. 'co.cc' or 'co.uk')
// As array key, use the last part ('cc' and 'uk' in the above examples) and the first part as sub-array elements for that key
$allowed_subdomains = array(
    'cc'    => array(
        'co'
    ),
    'uk'    => array(
        'co'
    )
);

$domain = $_SERVER['HTTP_HOST'];
$parts = explode('.', $domain);
$top_level = array_pop($parts);

// Take care of allowed subdomains
if (isset($allowed_subdomains[$top_level]))
{
    if (in_array(end($parts), $allowed_subdomains[$top_level]))
        $top_level = array_pop($parts).'.'.$top_level;
}

$top_level = array_pop($parts).'.'.$top_level;

edited Feb 24, 2010 at 17:15

answered Feb 24, 2010 at 16:57

Franz

11.6k8 gold badges52 silver badges72 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

a'r Over a year ago

This isn't quite what ilhan is after.

a'r Over a year ago

It still doesn't work even after the edit ;-). It does not deal with the google.co.uk case, as this would return 'co.uk'.

Franz Over a year ago

Well, how should a computer know whether a domain is meant to be a subdomain or not then without manually adding all examples like co.uk, co.cc and so on? I'll try to edit mine, though.

Franz Over a year ago

Better now? I added an option to manually specify these exceptions.

Franz Over a year ago

I guess you can use the list at publicsuffix.org/list and convert it to this format (I'd suggest using a script) ;)

casraf · Accepted Answer · 2010-02-24 17:00:36Z

1

You can use parse_url() to split it up and get what you want. Here's an example...

    $url = 'http://www.google.com/search?hl=en&source=hp&q=google&btnG=Google+Search&meta=lr%3D&aq=&oq=dasd';
    print_r(parse_url($url));

Will echo...

Array
(
    [scheme] => http
    [host] => www.google.com
    [path] => /search
    [query] => hl=en&source=hp&q=google&btnG=Google+Search&meta=lr%3D&aq=&oq=dasd
)

answered Feb 24, 2010 at 17:00

casraf

21.8k10 gold badges60 silver badges93 bronze badges

2 Comments

Franz Over a year ago

I did the same mistake in the beginning. He only wants google.com, though.

casraf Over a year ago

I see. Fair enough -- he can preg_match() to get the rest. Assuming $url_split is the parsed URL -- this can be done with... preg_match('/www\.?([\w\-\.]+)([a-z\.]+)/i', $url_split['host'], $matches) -- he can then use $matches[1].$matches[2] to fetch the host without the first domain. Problem with this though, is you can never predict how far the subdomain goes -- it could be sub1.sub2.domain.co.uk -- this would fetch sub2.domain.co.uk, not domain.co.uk

Maurice Kherlakian · Accepted Answer · 2010-02-24 17:16:00Z

I reckon you'll need a list of all suffixes used after a domain name. http://publicsuffix.org/list/ provides an up-to-date (or so they claim) of all suffixes in use currently. The list is actually here Now the idea would be for you to parse up that list into a structure, with different levels split by the dot, starting by the end levels:

so for instance for the domains: com.la com.tr com.lc

you'd end up with:

[la]=>[com]
[lc]=>[com]

etc...

Then you'd get the host from base_url (by using parse_url), and you'd explode it by dots. and you start matching up the values against your structure, starting with the last one:

so for google.com.tr you'd start by matching tr, then com, then you won't find a match once you get to google, which is what you want...

Oleksandr Fediashov · Accepted Answer · 2016-06-20 10:41:05Z

1

Regex and parse_url() aren't solution for you.

You need package that uses Public Suffix List, only in this way you can correctly extract domains with two-, third-level TLDs (co.uk, a.bg, b.bg, etc.). I recomend use TLD Extract.

Here example of code:

$extract = new LayerShifter\TLDExtract\Extract();

$result = $extract->parse('http://subsub.sub.google.com.tr');
$result->getRegistrableDomain(); // will return (string) 'google.com.tr'

answered Jun 20, 2016 at 10:41

Oleksandr Fediashov

4,3351 gold badge26 silver badges43 bronze badges

Collectives™ on Stack Overflow

Parsing Domainname From URL In PHP

4 Answers 4

5 Comments

2 Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

5 Comments

2 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related