Get Protocol + Host Name from Url

Get protocol + host name from URL

You should be able to do it with urlparse (docs: python2, python3):

from urllib.parse import urlparse
# from urlparse import urlparse # Python 2
parsed_uri = urlparse('http://stackoverflow.com/questions/1234567/blah-blah-blah-blah' )
result = '{uri.scheme}://{uri.netloc}/'.format(uri=parsed_uri)
print(result)

# gives
'http://stackoverflow.com/'

Get protocol, hostname, and path from URL

Your regex won't capture https://www.google.com.

Use capturing group and apply your regex with regex.exec(). Then access the returned array to set your variable:

str="https://www.google.com/webhp?sourceid=chrome-instant&ion=1&espv=2&ie=UTF-8#q=picture%20of%20a%20potato";regex = new RegExp('(https?://.*?\)/');match = regex.exec(str)[1];console.log(match);

Get protocol, domain, and port from URL

first get the current address

var url = window.location.href

Then just parse that string

var arr = url.split("/");

your url is:

var result = arr[0] + "//" + arr[2]

Hope this helps

Java | API to get protocol://domain.port from URL

Create a new URL object using your String value and call getHost() or any other method on it, like so:

URL url = new URL("https://test.domain.com/a/b/c.html?test=hello");
String protocol = url.getProtocol();
String host = url.getHost();
int port = url.getPort();

// if the port is not explicitly specified in the input, it will be -1.
if (port == -1) {
return String.format("%s://%s", protocol, host);
} else {
return String.format("%s://%s:%d", protocol, host, port);
}

Get protocol and domain (WITHOUT subdomain) from a URL

I am using tldextract When I doing the domain parse.

In your case you only need combine the domain + suffix

import tldextract
tldextract.extract('mail.google.com')
Out[756]: ExtractResult(subdomain='mail', domain='google', suffix='com')
tldextract.extract('classes.usc.edu/xxx/yy/zz')
Out[757]: ExtractResult(subdomain='classes', domain='usc', suffix='edu')
tldextract.extract('google.co.uk')
Out[758]: ExtractResult(subdomain='', domain='google', suffix='co.uk')

.NET - Get protocol, host, and port

The following (C#) code should do the trick

Uri uri = new Uri("http://www.mywebsite.com:80/pages/page1.aspx");
string requested = uri.Scheme + Uri.SchemeDelimiter + uri.Host + ":" + uri.Port;

Regex - get URL protocol, host, path, but not filename - PCRE

https?:\/\/(?:[^\/ ]*\/)*

Demo here.

Explanation

http      //Should start with http
s? // s is optional
:\/\/ // should follow up with ://
(?: //START Non capturing group
[^\/ ]* //Any character but a / or a space
\/ //Ends with /
) //END Non capturing group
* //Repeat non-capturing group

How to get host name with port from a http or https request

You can use HttpServletRequest.getScheme() to retrieve either "http" or "https".

Using it along with HttpServletRequest.getServerName() should be enough to rebuild the portion of the URL you need.

You don't need to explicitly put the port in the URL if you're using the standard ones (80 for http and 443 for https).

Edit: If your servlet container is behind a reverse proxy or load balancer that terminates the SSL, it's a bit trickier because the requests are forwarded to the servlet container as plain http. You have a few options:

  1. Use HttpServletRequest.getHeader("x-forwarded-proto") instead; this only works if your load balancer sets the header correctly (Apache should afaik).

  2. Configure a RemoteIpValve in JBoss/Tomcat that will make getScheme() work as expected. Again, this will only work if the load balancer sets the correct headers.

  3. If the above don't work, you could configure two different connectors in Tomcat/JBoss, one for http and one for https, as described in this article.

Get domain name from given url

If you want to parse a URL, use java.net.URI. java.net.URL has a bunch of problems -- its equals method does a DNS lookup which means code using it can be vulnerable to denial of service attacks when used with untrusted inputs.

"Mr. Gosling -- why did you make url equals suck?" explains one such problem. Just get in the habit of using java.net.URI instead.

public static String getDomainName(String url) throws URISyntaxException {
URI uri = new URI(url);
String domain = uri.getHost();
return domain.startsWith("www.") ? domain.substring(4) : domain;
}

should do what you want.


Though It seems to work fine, is there any better approach or are there some edge cases, that could fail.

Your code as written fails for the valid URLs:

  • httpfoo/bar -- relative URL with a path component that starts with http.
  • HTTP://example.com/ -- protocol is case-insensitive.
  • //example.com/ -- protocol relative URL with a host
  • www/foo -- a relative URL with a path component that starts with www
  • wwwexample.com -- domain name that does not starts with www. but starts with www.

Hierarchical URLs have a complex grammar. If you try to roll your own parser without carefully reading RFC 3986, you will probably get it wrong. Just use the one that's built into the core libraries.

If you really need to deal with messy inputs that java.net.URI rejects, see RFC 3986 Appendix B:

Appendix B. Parsing a URI Reference with a Regular Expression


As the "first-match-wins" algorithm is identical to the "greedy"
disambiguation method used by POSIX regular expressions, it is
natural and commonplace to use a regular expression for parsing the
potential five components of a URI reference.

The following line is the regular expression for breaking-down a
well-formed URI reference into its components.

  ^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(\?([^#]*))?(#(.*))?
12 3 4 5 6 7 8 9

The numbers in the second line above are only to assist readability;
they indicate the reference points for each subexpression (i.e., each
paired parenthesis).



Related Topics



Leave a reply



Submit