Extract hostname name from string
There is no need to parse the string, just pass your URL as an argument to URL
constructor:
const url = 'http://www.youtube.com/watch?v=ClkQA2Lb_iE';
const { hostname } = new URL(url);
console.assert(hostname === 'www.youtube.com');
Extract host name/domain name from URL string
You can use the java.net.URI
-class to extract the hostname from the string.
Here below is a method from which you can extract your hostname from a string.
public String getHostName(String url) {
URI uri = new URI(url);
String hostname = uri.getHost();
// to provide faultproof result, check if not null then return only hostname, without www.
if (hostname != null) {
return hostname.startsWith("www.") ? hostname.substring(4) : hostname;
}
return hostname;
}
This above gives you the hostname, and is faultproof if your hostname does start with either hostname.com/...
or www.hostname.com/...
, which will return with 'hostname'.
If the given url
is invalid (undefined hostname), it returns with null.
Extract hostname from a given domain string and remove extension and www
You can use built in URL.hostname
and replace www.
console.log(new URL('https://example.com/').hostname.replace('www.', ''))
console.log(new URL('https://www.example.com/').hostname.replace('www.', ''))
URL
can be used in most of the browsers https://caniuse.com/?search=URL
Get domain name from given url
If you want to parse a URL, use java.net.URI
. java.net.URL
has a bunch of problems -- its equals
method does a DNS lookup which means code using it can be vulnerable to denial of service attacks when used with untrusted inputs.
"Mr. Gosling -- why did you make url equals suck?" explains one such problem. Just get in the habit of using java.net.URI
instead.
public static String getDomainName(String url) throws URISyntaxException {
URI uri = new URI(url);
String domain = uri.getHost();
return domain.startsWith("www.") ? domain.substring(4) : domain;
}
should do what you want.
Though It seems to work fine, is there any better approach or are there some edge cases, that could fail.
Your code as written fails for the valid URLs:
httpfoo/bar
-- relative URL with a path component that starts withhttp
.HTTP://example.com/
-- protocol is case-insensitive.//example.com/
-- protocol relative URL with a hostwww/foo
-- a relative URL with a path component that starts withwww
wwwexample.com
-- domain name that does not starts withwww.
but starts withwww
.
Hierarchical URLs have a complex grammar. If you try to roll your own parser without carefully reading RFC 3986, you will probably get it wrong. Just use the one that's built into the core libraries.
If you really need to deal with messy inputs that java.net.URI
rejects, see RFC 3986 Appendix B:
Appendix B. Parsing a URI Reference with a Regular Expression
As the "first-match-wins" algorithm is identical to the "greedy"
disambiguation method used by POSIX regular expressions, it is
natural and commonplace to use a regular expression for parsing the
potential five components of a URI reference.The following line is the regular expression for breaking-down a
well-formed URI reference into its components.^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(\?([^#]*))?(#(.*))?
12 3 4 5 6 7 8 9
The numbers in the second line above are only to assist readability;
they indicate the reference points for each subexpression (i.e., each
paired parenthesis).
Java regex to extract host name and domain name from a URL
You may use this regex with optional matches and capture groups:
^(?:\w+://)?((?:[^./?#]+\.)?([^/?#]+))
RegEx Demo
RegEx Details:
^
: Start(?:\w+://)?
: Optionally match scheme names followed by://
(
: Start capture group #1(?:[^./?#]+\.)?
: Optionally match first part of domain name using a non-capture group([^/?#]+)
: Match 1+ of any character that is not/
,?
,#
in capture group #2
)
: End capture group #1
How to Extract Domain name from string with Regex in C#?
According to Lidqy answer, I wrote this function, which I think supports most possible situations, and if the input value is out of this, you can make it an exception.
public static string ExtractDomainName(string Url)
{
var regex = new Regex(@"^((https?|ftp)://)?(www\.)?(?<domain>[^/]+)(/|$)");
Match match = regex.Match(Url);
if (match.Success)
{
string domain = match.Groups["domain"].Value;
int freq = domain.Where(x => (x == '.')).Count();
while (freq > 2)
{
if (freq > 2)
{
var domainSplited = domain.Split('.', 2);
domain = domainSplited[1];
freq = domain.Where(x => (x == '.')).Count();
}
}
return domain;
}
else
{
return String.Empty;
}
}
Extract domain name from string using PHP
Use below Regular Expression for extracting domain name from string, Try:
$input = 'AmericanSwan.com Indian Websites';
preg_match_all('#[-a-zA-Z0-9@:%_\+.~\#?&//=]{2,256}\.[a-z]{2,4}\b(\/[-a-zA-Z0-9@:%_\+.~\#?&//=]*)?#si', $input, $result);
echo $result[0][0]; //$result will give list of all domain names of string, you can also loop through it
Output:
AmericanSwan.com
How to extract hostname and port from URL string?
Without reinventing the wheel, you can simply leverage java.net.URL
val url = new java.net.URL("http://www.domain.com:8080/one/two")
val hostname = url.getHost // www.domain.com
val port = url.getPort // 8080
A minor difference, getPort
returns -1
if no port is specified, so you have to handle that case explicitly.
val port = if (url.getPort == -1) url.getDefaultPort else url.getPort
how to Extract hostname from url using jquery
To do that you can simply use the String.prototype.split()
method with /
as delimiter to extract the hostname and then you take the end of the hostname (that contains a dot) with String.prototype.match()
:
var m = url.split('/')[0].match(/[^.]+\.[^.]+$/);
if (m)
var domain = m[0];
Note: if the url begins with a scheme you need to remove it before:
var pat = '^https?://';
url = url.replace(new RegExp(pat, 'i'), '');
An other way consists to find the domain directly:
var pat = '^(?:https?://)?(?:[^/:]*:[^/@]*@)?[^/]*([^./]+\\.[^./]+)';
var m = url.match(new RegExp(pat, 'i'));
if (m)
var domain = m[1];
But in this case, you need to deal with a possible login/pass part before the hostname. This is the reason of this subpattern: (?:[^/:]*:[^/@]*@)?
Related Topics
What's the Significant Use of Unary Plus and Minus Operators
Underscore Template Throwing Variable Not Defined Error
Are There Performance Concerns with 'Return Await'
How to Fire and Forget a Promise in Nodejs (Es7)
What's Wrong with Awaiting a Promise Chain
How Does Facebook Disable the Browser's Integrated Developer Tools
Using "Object.Create" Instead of "New"
How to Open a Bootstrap Modal Window Using Jquery
Angularjs:Why Ng-Bind Is Better Than {{}} in Angular
Youtube Iframe API: How to Control an Iframe Player That's Already in the HTML
Stop Execution of JavaScript Function (Client Side) or Tweak It
What Is the Purpose of a Plus Symbol Before a Variable
Get Global Variable Dynamically by Name String in JavaScript
Selecting Null: What Is the Reason Behind Selectall(Null) in D3
Chrome Desktop Notification Example