How to Extract Top-Level Domain Name (Tld) from Url

How to extract top-level domain name (TLD) from URL

No, there is no "intrinsic" way of knowing that (e.g.) zap.co.it is a subdomain (because Italy's registrar DOES sell domains such as co.it) while zap.co.uk isn't (because the UK's registrar DOESN'T sell domains such as co.uk, but only like zap.co.uk).

You'll just have to use an auxiliary table (or online source) to tell you which TLD's behave peculiarly like UK's and Australia's -- there's no way of divining that from just staring at the string without such extra semantic knowledge (of course it can change eventually, but if you can find a good online source that source will also change accordingly, one hopes!-).

Extract top level domain from URL

You can try something like this:

((?<![^\/]\/)\b\w+\.\b\w{2,3}(?:\.\b\w{2})??)(?:$|\/)

Demo

Breaking Down the Pattern:

  • (?<![^\/]\/) Ensures that the string is not preceded by a single slash (since /index.php looks like a domain), but is okay to be preceded by double slashes (as in https://)
  • \b\w+\. captures the main domain, ensuring that the entire string is a word by using a word boundary on the left and requiring a dot on the right. (again, issue with it capturing everything but the i in /index.php, which is why the \b is required.)
  • \b\w{2,3} Matches the Top-level domain (.com)
  • (?:\.\b\w{2})?) Optional, captures the country specific TLD if available
  • (?:$|\/) Requires that the entire match is followed by either the end of string $ or a forward slash \/

Alternative that uses lookahead instead of capture group:

(?<![^\/]\/)\b\w+\.\b\w{2,3}(?:\.\b\w{2})?(?=$|\/)

Essentially, you remove the capturing group, and replace the non-capturing group at the end (?:$|\/) with a positive lookahead (?=$|\/).

Demo

Get .tld from URL via PHP

Use parse_url() function to get host part of the url then explode by . and get last element of an array

Example below:

$url = 'http://www.example.com/site';
echo end(explode(".", parse_url($url, PHP_URL_HOST))); // echos "com"

Before that it would be nice to check if $url is actual URL with filter_var for example

EDIT:

$url =  'http://' . $_SERVER['SERVER_NAME']; 
echo end(explode(".", parse_url($url, PHP_URL_HOST)));
// echos "com"

Extract Top Level Domain from Domain name

Updated to incorporate Traxo's point about the . wildcard; I think my answer is a little fuller so I'll leave it up but we've both essentially come to the same solution.

//set up test variables
$aTLDList = ['ag', 'asia', 'asia_sunrise', 'com', 'com.ag', 'org.hn'];
$sDomain = "badgers.co.uk"; // for example

//build the match
$reMatch = '/^.*?\.(' . str_replace('.', '\.', implode('|', $aTLDList)) . ')$/';
$sMatchedTLD = preg_match($reMatch, $sDomain) ?
preg_replace($reMatch, "$1", $sDomain) :
"";

Resorting to Regular Expressions may be overkill but it makes for a concise example. This will give you either the TLD matched or an empty string in the $sMatchedTLD variable.

The trick is to make the first .* match ungreedy (.*?) otherwise badgers.com.ag will match ag rather than com.ag.

Regex to extract the top level domain from a URL

You kind of mistook the words here... A TLD (Top Level Domain) refers to the last segment of a domain name or the part that follows immediately after the "dot" symbol. (E.g.: .com, .net, etc..)

What you're searching for is the second level domain (or SLD).

I've edited Daveo's answer for your question, so the match will be returned to the first capture group:

(?:[-a-zA-Z0-9@:%_\+~.#=]{2,256}\.)?([-a-zA-Z0-9@:%_\+~#=]*)\.[a-z]{2,6}\b(?:[-a-zA-Z0-9@:%_\+.~#?&\/\/=]*)

Here is a demo: https://regex101.com/r/x2luiO/1

Explanation:

  • (?:[-a-zA-Z0-9@:%_\+~.#=]{2,256}\.)? - This first part will get everything before your SLD (subdomains).
  • ([-a-zA-Z0-9@:%_\+~#=]*) - This is your capturing group (Where the domain should be returned)
  • \.[a-z]{2,6} - This will match the TLD (if you also want to capture)
  • \b(?:[-a-zA-Z0-9@:%_\+.~#?&\/\/=]*) - And this is the rest of the regex, that should match the port and/or the rest of the URL (/example/page/).

It's also good to point that this regex will not match if you're testing a domain with the SLD and ccTLD (Country Code TLD) 'combo', example: .co.uk and .co.it, both are just the end of a domain for commercial and general websites, however, both will return co as the SLD.



Related Topics



Leave a reply



Submit