Get protocol + host name from URL
You should be able to do it with urlparse
(docs: python2, python3):
from urllib.parse import urlparse
# from urlparse import urlparse # Python 2
parsed_uri = urlparse('http://stackoverflow.com/questions/1234567/blah-blah-blah-blah' )
result = '{uri.scheme}://{uri.netloc}/'.format(uri=parsed_uri)
print(result)
# gives
'http://stackoverflow.com/'
Get protocol, hostname, and path from URL
Your regex won't capture https://www.google.com
.
Use capturing group and apply your regex with regex.exec()
. Then access the returned array to set your variable:
str="https://www.google.com/webhp?sourceid=chrome-instant&ion=1&espv=2&ie=UTF-8#q=picture%20of%20a%20potato";regex = new RegExp('(https?://.*?\)/');match = regex.exec(str)[1];console.log(match);
Get protocol, domain, and port from URL
first get the current address
var url = window.location.href
Then just parse that string
var arr = url.split("/");
your url is:
var result = arr[0] + "//" + arr[2]
Hope this helps
Java | API to get protocol://domain.port from URL
Create a new URL
object using your String
value and call getHost()
or any other method on it, like so:
URL url = new URL("https://test.domain.com/a/b/c.html?test=hello");
String protocol = url.getProtocol();
String host = url.getHost();
int port = url.getPort();
// if the port is not explicitly specified in the input, it will be -1.
if (port == -1) {
return String.format("%s://%s", protocol, host);
} else {
return String.format("%s://%s:%d", protocol, host, port);
}
Get protocol and domain (WITHOUT subdomain) from a URL
I am using tldextract
When I doing the domain parse.
In your case you only need combine the domain
+ suffix
import tldextract
tldextract.extract('mail.google.com')
Out[756]: ExtractResult(subdomain='mail', domain='google', suffix='com')
tldextract.extract('classes.usc.edu/xxx/yy/zz')
Out[757]: ExtractResult(subdomain='classes', domain='usc', suffix='edu')
tldextract.extract('google.co.uk')
Out[758]: ExtractResult(subdomain='', domain='google', suffix='co.uk')
.NET - Get protocol, host, and port
The following (C#) code should do the trick
Uri uri = new Uri("http://www.mywebsite.com:80/pages/page1.aspx");
string requested = uri.Scheme + Uri.SchemeDelimiter + uri.Host + ":" + uri.Port;
Regex - get URL protocol, host, path, but not filename - PCRE
https?:\/\/(?:[^\/ ]*\/)*
Demo here.
Explanation
http //Should start with http
s? // s is optional
:\/\/ // should follow up with ://
(?: //START Non capturing group
[^\/ ]* //Any character but a / or a space
\/ //Ends with /
) //END Non capturing group
* //Repeat non-capturing group
How to get host name with port from a http or https request
You can use HttpServletRequest.getScheme()
to retrieve either "http" or "https".
Using it along with HttpServletRequest.getServerName()
should be enough to rebuild the portion of the URL you need.
You don't need to explicitly put the port in the URL if you're using the standard ones (80 for http and 443 for https).
Edit: If your servlet container is behind a reverse proxy or load balancer that terminates the SSL, it's a bit trickier because the requests are forwarded to the servlet container as plain http. You have a few options:
Use
HttpServletRequest.getHeader("x-forwarded-proto")
instead; this only works if your load balancer sets the header correctly (Apache should afaik).Configure a RemoteIpValve in JBoss/Tomcat that will make
getScheme()
work as expected. Again, this will only work if the load balancer sets the correct headers.If the above don't work, you could configure two different connectors in Tomcat/JBoss, one for http and one for https, as described in this article.
Get domain name from given url
If you want to parse a URL, use java.net.URI
. java.net.URL
has a bunch of problems -- its equals
method does a DNS lookup which means code using it can be vulnerable to denial of service attacks when used with untrusted inputs.
"Mr. Gosling -- why did you make url equals suck?" explains one such problem. Just get in the habit of using java.net.URI
instead.
public static String getDomainName(String url) throws URISyntaxException {
URI uri = new URI(url);
String domain = uri.getHost();
return domain.startsWith("www.") ? domain.substring(4) : domain;
}
should do what you want.
Though It seems to work fine, is there any better approach or are there some edge cases, that could fail.
Your code as written fails for the valid URLs:
httpfoo/bar
-- relative URL with a path component that starts withhttp
.HTTP://example.com/
-- protocol is case-insensitive.//example.com/
-- protocol relative URL with a hostwww/foo
-- a relative URL with a path component that starts withwww
wwwexample.com
-- domain name that does not starts withwww.
but starts withwww
.
Hierarchical URLs have a complex grammar. If you try to roll your own parser without carefully reading RFC 3986, you will probably get it wrong. Just use the one that's built into the core libraries.
If you really need to deal with messy inputs that java.net.URI
rejects, see RFC 3986 Appendix B:
Appendix B. Parsing a URI Reference with a Regular Expression
As the "first-match-wins" algorithm is identical to the "greedy"
disambiguation method used by POSIX regular expressions, it is
natural and commonplace to use a regular expression for parsing the
potential five components of a URI reference.The following line is the regular expression for breaking-down a
well-formed URI reference into its components.^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(\?([^#]*))?(#(.*))?
12 3 4 5 6 7 8 9
The numbers in the second line above are only to assist readability;
they indicate the reference points for each subexpression (i.e., each
paired parenthesis).
Related Topics
Does Python Have an "Or Equals" Function Like ||= in Ruby
Running Ruby, Node, Python and Docker on the New Apple Silicon Architecture
List Comprehension in Haskell, Python and Ruby
Differencebetween Ruby and Python Versions Of"Self"
How Can One Find the Unicode Codepoints That a Font Has Glyphs For, on a Debian-Based System
Different Yaml Array Representations
Find in Files Using Ruby or Python
Aes Python Encryption and Ruby Encryption - Different Behaviour
How to Import a JSON from a File on Cloud Storage to Bigquery
Rally APIs: How to Copy Test Folder and Member Test Cases
Python VS. Ruby for Metaprogramming
Which of These Scripting Languages Is More Appropriate for Pen-Testing
Is There Something Like Bpython for Ruby