Get Domain Name from Full Url

How to get domain name from URL

I once had to write such a regex for a company I worked for. The solution was this:

  • Get a list of every ccTLD and gTLD available. Your first stop should be IANA. The list from Mozilla looks great at first sight, but lacks ac.uk for example so for this it is not really usable.
  • Join the list like the example below. A warning: Ordering is important! If org.uk would appear after uk then example.org.uk would match org instead of example.

Example regex:

.*([^\.]+)(com|net|org|info|coop|int|co\.uk|org\.uk|ac\.uk|uk|__and so on__)$

This worked really well and also matched weird, unofficial top-levels like de.com and friends.

The upside:

  • Very fast if regex is optimally ordered

The downside of this solution is of course:

  • Handwritten regex which has to be updated manually if ccTLDs change or get added. Tedious job!
  • Very large regex so not very readable.

Get The Current Domain Name With Javascript (Not the path, etc.)

How about:

window.location.hostname

The location object actually has a number of attributes referring to different parts of the URL

Get domain name from full URL

Check the code below, it should do the job fine.

<?php

function get_domain($url)
{
$pieces = parse_url($url);
$domain = isset($pieces['host']) ? $pieces['host'] : $pieces['path'];
if (preg_match('/(?P<domain>[a-z0-9][a-z0-9\-]{1,63}\.[a-z\.]{2,6})$/i', $domain, $regs)) {
return $regs['domain'];
}
return false;
}

print get_domain("http://mail.somedomain.co.uk"); // outputs 'somedomain.co.uk'

?>

Get domain name from given url

If you want to parse a URL, use java.net.URI. java.net.URL has a bunch of problems -- its equals method does a DNS lookup which means code using it can be vulnerable to denial of service attacks when used with untrusted inputs.

"Mr. Gosling -- why did you make url equals suck?" explains one such problem. Just get in the habit of using java.net.URI instead.

public static String getDomainName(String url) throws URISyntaxException {
URI uri = new URI(url);
String domain = uri.getHost();
return domain.startsWith("www.") ? domain.substring(4) : domain;
}

should do what you want.


Though It seems to work fine, is there any better approach or are there some edge cases, that could fail.

Your code as written fails for the valid URLs:

  • httpfoo/bar -- relative URL with a path component that starts with http.
  • HTTP://example.com/ -- protocol is case-insensitive.
  • //example.com/ -- protocol relative URL with a host
  • www/foo -- a relative URL with a path component that starts with www
  • wwwexample.com -- domain name that does not starts with www. but starts with www.

Hierarchical URLs have a complex grammar. If you try to roll your own parser without carefully reading RFC 3986, you will probably get it wrong. Just use the one that's built into the core libraries.

If you really need to deal with messy inputs that java.net.URI rejects, see RFC 3986 Appendix B:

Appendix B. Parsing a URI Reference with a Regular Expression


As the "first-match-wins" algorithm is identical to the "greedy"
disambiguation method used by POSIX regular expressions, it is
natural and commonplace to use a regular expression for parsing the
potential five components of a URI reference.

The following line is the regular expression for breaking-down a
well-formed URI reference into its components.

  ^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(\?([^#]*))?(#(.*))?
12 3 4 5 6 7 8 9

The numbers in the second line above are only to assist readability;
they indicate the reference points for each subexpression (i.e., each
paired parenthesis).

get domain name only in not full path

tricky way :)

 $url_p=  parse_url($url);

if(!isset($url_p['scheme'])){
$url='http://'.$url;
$url_p=parse_url($url);
}
echo $url_p['host'];

how to get only domain name from given URL in php

What you are looking for is parse_url()

<?php

$url = parse_url("http://abc.news18.co.in");
//echo $url['host'];
echo preg_replace("/^([a-zA-Z0-9].*\.)?([a-zA-Z0-9][a-zA-Z0-9-]{1,61}[a-zA-Z0-9]\.[a-zA-Z.]{2,})$/", '$2', $url['host']);

?>

Extract domain name only from url, getting rid of the path (Python)

You can do it using regex like this:

import re

text = 'http://supremecosts.com/contact-us/'

m = re.search('(https?:\/\/[^:\/\n]+)', text)
if m:
print(m.group(1))

Working example

How to get domain name from URL with PHP?

Just use built-in php function parse_url

You can filter a subdomain from a hostname like this

$url = 'http://subdomain.whatever.example.com/test.html';

$data = parse_url($url);

$host = $data['host'];

$hostname = explode(".", $host);
$domain = $hostname[count($hostname)-2] . "." . $hostname[count($hostname)-1];

print $domain;

Will output

example.com

If you have an url with a port, parse_url will deal with it easily, example

$url = 'http://example.com:88/testing';

$data = parse_url($url);

print_r($data);

Will output

Array
(
[scheme] => http
[host] => example.com
[port] => 88
[path] => /testing
)

And below you check if the hostname is a valid IP address or not

$url = 'http://188.123.44.12/test.php';

$data = parse_url($url);

print_r($data);

$hostIsIpAddress = ip2long($data['host']) !== false;

var_dump($hostIsIpAddress);

Which will output bool(true) or bool(false) respectively



Related Topics



Leave a reply



Submit