Uri Starting With Two Slashes ... How Do They Behave

URI starting with two slashes ... how do they behave?

The resource you're looking for is the RFC 3986.

See Section 4.2 and Section 5.4. Quoting from the latter:

Reference Resolution Examples

Within a representation with a well defined base URI of:

    http://a/b/c/d;p?q

a relative reference is transformed to its target URI as follows:

  "g:h"           =  "g:h"
"g" = "http://a/b/c/g"
"./g" = "http://a/b/c/g"
"g/" = "http://a/b/c/g/"
"/g" = "http://a/g"
"//g" = "http://g"
"?y" = "http://a/b/c/d;p?y"
"g?y" = "http://a/b/c/g?y"
"#s" = "http://a/b/c/d;p?q#s"
"g#s" = "http://a/b/c/g#s"
"g?y#s" = "http://a/b/c/g?y#s"
";x" = "http://a/b/c/;x"
"g;x" = "http://a/b/c/g;x"
"g;x?y#s" = "http://a/b/c/g;x?y#s"
"" = "http://a/b/c/d;p?q"
"." = "http://a/b/c/"
"./" = "http://a/b/c/"
".." = "http://a/b/"
"../" = "http://a/b/"
"../g" = "http://a/b/g"
"../.." = "http://a/"
"../../" = "http://a/"
"../../g" = "http://a/g"

This means that when the base URI is http://a/b/c/d;p?q and you use //g, the relative reference is transformed to http://g.

Two forward slashes in a url/src/href attribute

The "two forward slashes" are a common shorthand for "request the referenced resource using whatever protocol is being used to load the current page".

Best known as "protocol relative URLs", they are particularly useful when elements — such as the JS file in your example — could be served and/or requested from either a http or a https context. By using protocol relative URLs, you can avoid implementing

if (window.location.protocol === 'http:') {
myResourceUrl = 'http://example.com/my-resource.js';
} else {
myResourceUrl = 'https://example.com/my-resource.js';
}

type of logic all over your codebase (assuming, of course, that the server at example.com is able to serve content through both http and https).

A prominent real-world example is the Magento 1.X E-Commerce engine: for performance reasons, the category and product pages use plain http by default, whereas the checkout is https enabled.

If some resources (e.g. promotional banners in the site's header) are referenced via non protocol relative URLs (i.e. http://example.com/banner.jpg), customers reaching the https enabled checkout are greeted with a rather unfriendly

"there are insecure elements on this page"

prompt - which, one can safely assume, isn't exactly great for business.

If the aforementioned resource is referenced via //example.com/banner.jpg though, the browser takes care of loading it via the proper protocol both on the plain http product/category pages and in the https-enabled checkout flow.

tl;dr: With even the slightest possibility of a mixed http/https environment, just use the double slash/protocol relative URLs to reference resources — assuming that the host serving them supports both http and https.

url with multiple forward slashes, does it break anything?

HTTP RFC 2396 defines path separator to be single slash.

However, unless you're using some kind of URL rewriting (in which case the rewriting rules may be affected by the number of slashes), the uri maps to a path on disk, but in (most?) modern operating systems (Linux/Unix, Windows), multiple path separators in a row do not have any special meaning, so /path/to/foo and /path//to////foo would eventually map to the same file.

An additional thing that might be affected is caching. Since both your browser and the server cache individual pages (according to their caching settings), requesting same file multiple times via slightly different URIs might affect the caching (depending on server and client implementation).

Resolve URI with multiple slashes in relative part

I'll start by confirming that all the URIs you provided are valid, and by providing the outcome of the URI resolutions you mentioned (and the outcome of a couple of my own):

$ perl -MURI -e'
for my $rel (qw( /g //g ///g ////g h//g g////h h///g:f )) {
my $uri = URI->new($rel)->abs("http://a/b/c/d;p?q");
printf "%-20s + %-7s = %-20s host: %-4s path: %s\n",
"http://a/b/c/d;p?q", $rel, $uri, $uri->host, $uri->path;
}

for my $base (qw( http://host/a/b/c/d http://host/a/b/c//d )) {
my $uri = URI->new("../../e")->abs($base);
printf "%-20s + %-7s = %-20s host: %-4s path: %s\n",
$base, "../../e", $uri, $uri->host, $uri->path;
}
'
http://a/b/c/d;p?q + /g = http://a/g host: a path: /g
http://a/b/c/d;p?q + //g = http://g host: g path:
http://a/b/c/d;p?q + ///g = http:///g host: path: /g
http://a/b/c/d;p?q + ////g = http:////g host: path: //g
http://a/b/c/d;p?q + h//g = http://a/b/c/h//g host: a path: /b/c/h//g
http://a/b/c/d;p?q + g////h = http://a/b/c/g////h host: a path: /b/c/g////h
http://a/b/c/d;p?q + h///g:f = http://a/b/c/h///g:f host: a path: /b/c/h///g:f
http://host/a/b/c/d + ../../e = http://host/a/e host: host path: /a/e
http://host/a/b/c//d + ../../e = http://host/a/b/e host: host path: /a/b/e

Next, we'll look at the syntax of relative URIs, since that's what your question circles around.

relative-ref  = relative-part [ "?" query ] [ "#" fragment ]

relative-part = "//" authority path-abempty
/ path-absolute
/ path-noscheme
/ path-empty

path-abempty = *( "/" segment )
path-absolute = "/" [ segment-nz *( "/" segment ) ]
path-noscheme = segment-nz-nc *( "/" segment )
path-rootless = segment-nz *( "/" segment )

segment = *pchar ; 0 or more <pchar>
segment-nz = 1*pchar ; 1 or more <pchar> nz = non-zero

The key things from these rules for answering your question:

  • An absolute path (path-absolute) can't start with //. The first segment, if provided, must be non-zero in length. If the relative URI starts with //, what follows must be an authority.
  • // can otherwise occur in a path because segments can have zero-length.

Now, let's look at each of the resolutions you provided in turn.

/g is an absolute path path-absolute, and thus a valid relative URI (relative-ref), and thus a valid URI (URI-reference).

  • Parsing the URIs (say, using the regular expression in Appendix B) gives us the following:

    Base.scheme:    "http"       R.scheme:    undef
    Base.authority: "a" R.authority: undef
    Base.path: "/b/c/d;p" R.path: "/g"
    Base.query: "q" R.query: undef
    Base.fragment: undef R.fragment: undef
  • Following the algorithm in §5.2.2, we get:

    T.path:         "/g"      ; remove_dot_segments(R.path)
    T.query: undef ; R.query
    T.authority: "a" ; Base.authority
    T.scheme: "http" ; Base.scheme
    T.fragment: undef ; R.fragment
  • Following the algorithm in §5.3, we get:

    http://a/g

//g is different. //g isn't an absolute path (path_absolute) because an absolute path can't start with an empty segment ("/" [ segment-nz *( "/" segment ) ]).

Instead, it's follows the following pattern:

"//" authority path-abempty
  • Parsing the URIs (say, using the regular expression in Appendix B) gives us the following:

    Base.scheme:    "http"       R.scheme:    undef
    Base.authority: "a" R.authority: "g"
    Base.path: "/b/c/d;p" R.path: ""
    Base.query: "q" R.query: undef
    Base.fragment: undef R.fragment: undef
  • Following the algorithm in §5.2.2, we get the following:

    T.authority:    "g"           ; R.authority
    T.path: "" ; remove_dot_segments(R.path)
    T.query: "" ; R.query
    T.scheme: "http" ; Base.scheme
    T.fragment: undef ; R.fragment
  • Following the algorithm in §5.3, we get the following:

    http://g

Note: This contacts server g!


///g is similar to //g, except the authority is blank! This is surprisingly valid.

  • Parsing the URIs (say, using the regular expression in Appendix B) gives us the following:

    Base.scheme:    "http"       R.scheme:    undef
    Base.authority: "a" R.authority: ""
    Base.path: "/b/c/d;p" R.path: "/g"
    Base.query: "q" R.query: undef
    Base.fragment: undef R.fragment: undef
  • Following the algorithm in §5.2.2, we get the following:

    T.authority:    ""        ; R.authority
    T.path: "/g" ; remove_dot_segments(R.path)
    T.query: undef ; R.query
    T.scheme: "http" ; Base.scheme
    T.fragment: undef ; R.fragment
  • Following the algorithm in §5.3, we get the following:

    http:///g

Note: While valid, this URI is useless because the server name (T.authority) is blank!


////g is the same as ///g except the R.path is //g, so we get

    http:////g

Note: While valid, this URI is useless because the server name (T.authority) is blank!


The final three (h//g, g////h, h///g:f) are all relative paths (path-noscheme).

  • Parsing the URIs (say, using the regular expression in Appendix B) gives us the following:

    Base.scheme:    "http"       R.scheme:    undef
    Base.authority: "a" R.authority: undef
    Base.path: "/b/c/d;p" R.path: "h//g"
    Base.query: "q" R.query: undef
    Base.fragment: undef R.fragment: undef
  • Following the algorithm in §5.2.2, we get the following:

    T.path:         "/b/c/h//g"    ; remove_dot_segments(merge(Base.path, R.path))
    T.query: undef ; R.query
    T.authority: "a" ; Base.authority
    T.scheme: "http" ; Base.scheme
    T.fragment: undef ; R.fragment
  • Following the algorithm in §5.3, we get the following:

    http://a/b/c/h//g         # For h//g
    http://a/b/c/g////h # For g////h
    http://a/b/c/h///g:f # For h///g:f

I don't think the examples are suitable for answering what I think you really want to know, though.

Take a look at the following two URIs. They aren't equivalent.

http://host/a/b/c/d     # Path has 4 segments: "a", "b", "c", "d"

and

http://host/a/b/c//d    # Path has 5 segments: "a", "b", "c", "", "d"

Most servers will treat them the same —which is fine since servers are free to interpret paths in any way they wish— but it makes a difference when applying relative paths. For example, if these were the base URI for ../../e, you'd get

http://host/a/b/c/d + ../../e = http://host/a/e

and

http://host/a/b/c//d + ../../e = http://host/a/b/e

Urls with two leading slashes and without schema

This was discussed multiple times here. Yes it is safe and yes all browsers support this.

Links start with two slashes

It's a protocol-relative URL (typically HTTP or HTTPS). So if I'm on http://example.org and I link (or include an image, script, etc.) to //example.com/1.png, it goes to http://example.com/1.png. If I'm on https://example.org, it goes to https://example.com/1.png.

This lets you easily avoid mixed content security errors.

Is a slash (/) equivalent to an encoded slash (%2F) in the path portion of an HTTP URL

From the data you gathered, I would tend to say that encoded "/" in an uri are meant to be seen as "/" again at application/cgi level.

That's to say, that if you're using apache with mod_rewrite for instance, it will not match pattern expecting slashes against URI with encoded slashes in it.
However, once the appropriate module/cgi/... is called to handle the request, it's up to it to do the decoding and, for instance, retrieve a parameter including slashes as the first component of the URI.

If your application is then using this data to retrieve a file (whose filename contains a slash), that's probably a bad thing.

To sum up, I find it perfectly normal to see a difference of behaviour in "/" or "%2F" as their interpretation will be done at different levels.

How to replace double slash with single slash for an url

To avoid replacing the first // in http:// use the following regex :

String to = from.replaceAll("(?<!http:)//", "/");

PS: if you want to handle https use (?<!(http:|https:))// instead.

Do browsers ignore slashes in URLs?

Path separators are defined to be a single slash according to this. (Search for Path Component)

Note that browsers don't usually modify the URL. Browsers could append a / at the end of a URL, but in your case, the URL with extra slashes is simply sent along in the request, so it is the server ignoring the slashes instead.

Also, have a look at:

  • Is a URL with // in the path-section valid?
  • URL with multiple forward slashes, does it break anything?
  • What does the double slash mean in URLs?

Even if this behavior is convenient for you, it is generally not recommended. In addition, caching may also be affected (source):

Since both your browser and the server cache individual pages (according to their caching settings), requesting same file multiple times via slightly different URIs might affect the caching (depending on server and client implementation).



Related Topics



Leave a reply



Submit