What Are Fragment Urls and Why to Use Them

What are fragment URLs and why to use them?

A fragment is an internal page reference, sometimes called a named anchor. It usually appears at the end of a URL and begins with a hash (#) character followed by an identifier. It refers to a section within a web page.

In HTML documents, the browser looks for an anchor tag with a name attribute matching the fragment.

There are a few things about the fragments, the most important may be that they aren't sent in HTTP request messages but you can find some more info about them on this page.

Javascript can manipulate fragments on the current page which can be used to to add history entries for a page without forcing a complete reload.

Is the URL fragment identifier sent to the server?

Fragment identifiers are not sent to the server. The hash fragment is used by the browser to link to elements within the same page.

How secure is it to use fragment identifiers to hold private data in URLs?

Tyler Close and others who did the security architecture for Waterken did the relevent research form this. They use unguessable strings in URI fragments as web-keys:

This leakage of a permission bearing URL via the Referer header is only a problem in practice if the target host of a hyperlink is different from the source host, and so potentially malicious. RFC 2616 foresaw the danger of such leakage of information and so provided security guidance in section 15.1.3:

"Because the source of a link might be private information or might reveal an otherwise private information source, … Clients SHOULD NOT include a Referer header field in a (non-secure) HTTP request if the referring page was transferred with a secure protocol."


Unfortunately, clients have implemented this guidance to the letter, meaning the Referer header is sent if both the referring page and the destination page use HTTPS, but are served by different hosts.

This enthusiastic use of the Referer header would present a significant barrier to implementation of the web-key concept were it not for one unrelated, but rather fortunate, requirement placed on use of the Referer header. Section 14.36 of RFC 2616, which governs use of the Referer header, states that: "The URI MUST NOT include a fragment." Testing of deployed web browsers has shown this requirement is commonly implemented.

Putting the unguessable permission key in the fragment segment produces an https URL that looks like: <https://www.example.com/app/#mhbqcmmva5ja3>.

Fetching a representation


Placing the key in the URL fragment component prevents leakage via the Referer header but also complicates the dereference operation, since the fragment is also not sent in the Request-URI of an HTTP request. This complication is overcome using the two cornerstones of Web 2.0: JavaScript and XMLHttpRequest.


So, yes, you can use fragment identifiers to hold secrets, though those secrets could be stolen and exfiltrated if your application is susceptible to XSS, and there is no equivalent of http-only cookies for fragment identifiers.

I believe Waterken mitigates this by removing the secret from the fragment before it runs any application code in the same way many sensitive daemons zero-out their argv.

Fragment URL in Openid - what does it mean?

I wasn't aware of an addition to the Standard in Openid 2.0 that says the following:

IDENTIFIER RECYCLING

OpenID identifiers can be recycled over time, and OpenID 2.0 specifies
that OpenID Providers append URL fragments to the end of an OpenID URL
as a generation identifier. The entire OpenID URL with the fragment,
if present, should be used to identify the user. For instance, the
following two OpenIDs are unique and represent different users:
http://openid.example.com/username#aa
http://openid.example.com/username#bb

So it in the end really makes sense, because the identity itself doesn't really change. Meaning if I request the openid, the fragment gets stripped, and I always request the same resource with the same XRDS document behind.

What is hash fragment referring to in the following text?

Yes, you're correct. They refer to the URL part after the # character.

In example.com#state=dJfw&access_token=lkTyd234AsdF the hash fragment (fragment component) would be:

state=dJfw&access_token=lkTyd234AsdF

This is used by the OAuth2 implicit grant to deliver the response data. In the specification, the terminology used is either fragment component or fragment.

In contrast, the authorization code grant would deliver the information in the query part of the URL, example.com?state=asdTwe3SD&code=kjh56Sdgv.

RESTful API with # fragment expansion in URI Template (RFC 6570)

Postman, curl and browser requests appear to strip out everything after the # before sending to the server. Is that expected behaviour and described in an RFC somewhere?

Yes, it is described in RFC 3896

the fragment identifier is not used in the scheme-specific processing of a URI; instead, the fragment identifier is separated from the rest of the URI prior to a dereference, and thus the identifying information within the fragment itself is dereferenced solely by the user agent, regardless of the URI scheme.

The same idea, as described by RFC 7230



Does that mean these RESTful URLs are invalid?

Sort of.

https://api.acme.com/orders#customer=1234,state=open

That's a perfectly valid URL (see the production rules in RFC 3986 appendix A).

Let's review the specification for fragments:

The fragment identifier component of a URI allows indirect identification of a secondary resource by reference to a primary resource and additional identifying information. The identified secondary resource may be some portion or subset of the primary resource, some view on representations of the primary resource, or some other resource defined or described by those representations.

The HTTP definition is a bit clearer:

The optional fragment component allows for indirect identification of a secondary resource

So for practice, let's look at the identifier for the fragment specification itself

https://www.rfc-editor.org/rfc/rfc3986#section-3.5

What we have here are identifiers for two resources: the "primary resource" is the web page itself, which is identified by the sequence of characters prior to the fragment delimiter -- https://www.rfc-editor.org/rfc/rfc3986 ; then we have the secondary resource within that primary resource, section-3.5 tells us what to look for in the primary resource. So the browser knows to (a) download the primary resource, and then (b) upon discovering that the primary resource has a text/html representation, knows how to dig through the html tags to find the tag that matches the fragment. The browser can then jump directly to the correct location in the document.

In your example, https://api.acme.com/orders is the primary resource, and customer=1234,state=open is the fragment used to identify some resource within the representation of the primary resource.

BUT

RFC 7230 defines the request line of HTTP

method SP request-target SP HTTP-version CRLF

Where request-target is in turn defined as

     request-target = origin-form
/ absolute-form
/ authority-form
/ asterisk-form

The two that are of interest to us are origin-form and absolute-form. Both are defined by RFC 7230

origin-form    = absolute-path [ "?" query ]
absolute-form = absolute-URI

where absolute-URI is defined in RFC 3986

Some protocol elements allow only the absolute form of a URI without a fragment identifier.

absolute-URI  = scheme ":" hier-part [ "?" query ]

This is all consistent with the restriction on target-uri

The target URI excludes the reference's fragment component, if any, since fragment identifiers are reserved for client-side processing



What is the proper use case for the {#var} fragment expansion in URL Templates?

Trying to construct a link to a secondary resource, in particular as a collaboration of two pieces of code; one piece of coding knowing the values for the template variables but not knowing where they are supposed to go (and in particular, which variables belong in the fragment), and another piece of code knowing where in the URI each of the different variables is supposed to appear, but not knowing what the values are.

In other words, it's exactly the same sort of thing we are doing when we create a URI template to allow the production of the identifier of a primary resource, but also allowing variable expansion in the fragment element as well.

http://example.org/people/bob
http://example.org/people#bob

How do I use url fragments when other things get appended to them?

Should only have to worry about ? and just parse your hash between # and first instance of ?

var hash = location.hash;
if(hash.length > 1){
hash = hash.replace(/#|\?.+/g,'');// remove `#` and anything including and after `?`
// do whatever with your hash records
}

To be on the safe side I suggest you also send any mismatches to your server and store the url for analysis and future tweaks

Can I use # fragment portion of URL in the Nginx Location matcher?

No we cannot setup a Nginx Location matcher based on the fragment (i.e. starting with #), as it never gets sent to the server. Thanks to the clarification from @RichardSmith (Refer to comments section)

The part of the URL from # onwards (called the fragment) is never sent
to the server. It is used by the browser or a client-side JS
application - @RichardSmith

So the pattern I was trying to match in the Location matcher will never work as it never reaches the server.

URL fragment (#) allowed characters

tl;dr

The fragment identifier component can contain:

  • 0 - 9
  • a - z
  • A - Z
  • ? / : @ - . _ ~ ! $ & ' ( ) * + , ; =
  • percent-encoded characters (a % followed by two hexadecimal digits)

How can I find this out?

The URI standard is STD 66, which currently maps to RFC 3986.

In this document, you’ll find everything you need to know.

The fragment identifier component is defined in section 3.5:

fragment = *( pchar / "/" / "?" )

This means that the fragment can contain nothing or (any combination of)

  • characters defined in pchar
  • the /
  • the ?

Definition of pchar

Refer to the appendix A. to see how pchar is defined:

pchar = unreserved / pct-encoded / sub-delims / ":" / "@"

So this adds

  • characters defined in unreserved
  • characters defined in pct-encoded
  • characters defined in sub-delims
  • the :
  • the @

Definition of unreserved

Now check how unreserved is defined:

unreserved = ALPHA / DIGIT / "-" / "." / "_" / "~"

This adds

  • characters defined in ALPHA
  • characters defined in DIGIT
  • the -
  • the .
  • the _
  • the ~

Definition of ALPHA and DIGIT

Check how ALPHA and DIGIT are defined. They are not listed in the appendix, because they are from the core ABNF rules, as is explained in section 1.3:

ALPHA (letters), […] DIGIT (decimal digits) […]

So this adds

  • a-z, A-Z
  • 0-9

Definition of pct-encoded

Check how pct-encoded is defined:

pct-encoded = "%" HEXDIG HEXDIG

This allows for any percent-encoded character.

Definition of sub-delims

Check how sub-delims is defined:

sub-delims = "!" / "$" / "&" / "'" / "(" / ")" / "*" / "+" / "," / ";" / "="

This adds

  • the !
  • the $
  • the &
  • the '
  • the (
  • the )
  • the *
  • the +
  • the ,
  • the ;
  • the =


Related Topics



Leave a reply



Submit