What are fragment URLs and why to use them?
A fragment is an internal page reference, sometimes called a named anchor. It usually appears at the end of a URL and begins with a hash (#) character followed by an identifier. It refers to a section within a web page.
In HTML documents, the browser looks for an anchor tag with a name attribute matching the fragment.
There are a few things about the fragments, the most important may be that they aren't sent in HTTP request messages but you can find some more info about them on this page.
Javascript can manipulate fragments on the current page which can be used to to add history entries for a page without forcing a complete reload.
Is the URL fragment identifier sent to the server?
Fragment identifiers are not sent to the server. The hash fragment is used by the browser to link to elements within the same page.
How secure is it to use fragment identifiers to hold private data in URLs?
Tyler Close and others who did the security architecture for Waterken did the relevent research form this. They use unguessable strings in URI fragments as web-keys:
This leakage of a permission bearing URL via the
Referer
header is only a problem in practice if the target host of a hyperlink is different from the source host, and so potentially malicious. RFC 2616 foresaw the danger of such leakage of information and so provided security guidance in section 15.1.3:"Because the source of a link might be private information or might reveal an otherwise private information source, … Clients SHOULD NOT include a
Referer
header field in a (non-secure) HTTP request if the referring page was transferred with a secure protocol."
Unfortunately, clients have implemented this guidance to the letter, meaning the
Referer
header is sent if both the referring page and the destination page use HTTPS, but are served by different hosts.This enthusiastic use of the Referer header would present a significant barrier to implementation of the web-key concept were it not for one unrelated, but rather fortunate, requirement placed on use of the
Referer
header. Section 14.36 of RFC 2616, which governs use of theReferer
header, states that: "The URI MUST NOT include a fragment." Testing of deployed web browsers has shown this requirement is commonly implemented.Putting the unguessable permission key in the fragment segment produces an https URL that looks like:
<https://www.example.com/app/#mhbqcmmva5ja3>
.Fetching a representation
Placing the key in the URL fragment component prevents leakage via the
Referer
header but also complicates the dereference operation, since the fragment is also not sent in theRequest-URI
of an HTTP request. This complication is overcome using the two cornerstones of Web 2.0: JavaScript and XMLHttpRequest.
So, yes, you can use fragment identifiers to hold secrets, though those secrets could be stolen and exfiltrated if your application is susceptible to XSS, and there is no equivalent of http-only cookies for fragment identifiers.
I believe Waterken mitigates this by removing the secret from the fragment before it runs any application code in the same way many sensitive daemons zero-out their argv
.
Fragment URL in Openid - what does it mean?
I wasn't aware of an addition to the Standard in Openid 2.0 that says the following:
IDENTIFIER RECYCLING
OpenID identifiers can be recycled over time, and OpenID 2.0 specifies
that OpenID Providers append URL fragments to the end of an OpenID URL
as a generation identifier. The entire OpenID URL with the fragment,
if present, should be used to identify the user. For instance, the
following two OpenIDs are unique and represent different users:
http://openid.example.com/username#aa
http://openid.example.com/username#bb
So it in the end really makes sense, because the identity itself doesn't really change. Meaning if I request the openid, the fragment gets stripped, and I always request the same resource with the same XRDS document behind.
What is hash fragment referring to in the following text?
Yes, you're correct. They refer to the URL part after the #
character.
In example.com#state=dJfw&access_token=lkTyd234AsdF
the hash fragment (fragment component) would be:
state=dJfw&access_token=lkTyd234AsdF
This is used by the OAuth2 implicit grant to deliver the response data. In the specification, the terminology used is either fragment component or fragment.
In contrast, the authorization code grant would deliver the information in the query part of the URL, example.com?state=asdTwe3SD&code=kjh56Sdgv
.
RESTful API with # fragment expansion in URI Template (RFC 6570)
Postman, curl and browser requests appear to strip out everything after the # before sending to the server. Is that expected behaviour and described in an RFC somewhere?
Yes, it is described in RFC 3896
the fragment identifier is not used in the scheme-specific processing of a URI; instead, the fragment identifier is separated from the rest of the URI prior to a dereference, and thus the identifying information within the fragment itself is dereferenced solely by the user agent, regardless of the URI scheme.
The same idea, as described by RFC 7230
Does that mean these RESTful URLs are invalid?
Sort of.
https://api.acme.com/orders#customer=1234,state=open
That's a perfectly valid URL (see the production rules in RFC 3986 appendix A).
Let's review the specification for fragments:
The fragment identifier component of a URI allows indirect identification of a secondary resource by reference to a primary resource and additional identifying information. The identified secondary resource may be some portion or subset of the primary resource, some view on representations of the primary resource, or some other resource defined or described by those representations.
The HTTP definition is a bit clearer:
The optional fragment component allows for indirect identification of a secondary resource
So for practice, let's look at the identifier for the fragment specification itself
https://www.rfc-editor.org/rfc/rfc3986#section-3.5
What we have here are identifiers for two resources: the "primary resource" is the web page itself, which is identified by the sequence of characters prior to the fragment delimiter -- https://www.rfc-editor.org/rfc/rfc3986 ; then we have the secondary resource within that primary resource, section-3.5
tells us what to look for in the primary resource. So the browser knows to (a) download the primary resource, and then (b) upon discovering that the primary resource has a text/html
representation, knows how to dig through the html tags to find the tag that matches the fragment. The browser can then jump directly to the correct location in the document.
In your example, https://api.acme.com/orders is the primary resource, and customer=1234,state=open
is the fragment used to identify some resource within the representation of the primary resource.
BUT
RFC 7230 defines the request line of HTTP
method SP request-target SP HTTP-version CRLF
Where request-target is in turn defined as
request-target = origin-form
/ absolute-form
/ authority-form
/ asterisk-form
The two that are of interest to us are origin-form and absolute-form. Both are defined by RFC 7230
origin-form = absolute-path [ "?" query ]
absolute-form = absolute-URI
where absolute-URI is defined in RFC 3986
Some protocol elements allow only the absolute form of a URI without a fragment identifier.
absolute-URI = scheme ":" hier-part [ "?" query ]
This is all consistent with the restriction on target-uri
The target URI excludes the reference's fragment component, if any, since fragment identifiers are reserved for client-side processing
What is the proper use case for the {#var} fragment expansion in URL Templates?
Trying to construct a link to a secondary resource, in particular as a collaboration of two pieces of code; one piece of coding knowing the values for the template variables but not knowing where they are supposed to go (and in particular, which variables belong in the fragment), and another piece of code knowing where in the URI each of the different variables is supposed to appear, but not knowing what the values are.
In other words, it's exactly the same sort of thing we are doing when we create a URI template to allow the production of the identifier of a primary resource, but also allowing variable expansion in the fragment element as well.
http://example.org/people/bob
http://example.org/people#bob
How do I use url fragments when other things get appended to them?
Should only have to worry about ?
and just parse your hash between #
and first instance of ?
var hash = location.hash;
if(hash.length > 1){
hash = hash.replace(/#|\?.+/g,'');// remove `#` and anything including and after `?`
// do whatever with your hash records
}
To be on the safe side I suggest you also send any mismatches to your server and store the url for analysis and future tweaks
Can I use # fragment portion of URL in the Nginx Location matcher?
No we cannot setup a Nginx Location matcher based on the fragment
(i.e. starting with #
), as it never gets sent to the server. Thanks to the clarification from @RichardSmith (Refer to comments section)
The part of the URL from # onwards (called the fragment) is never sent
to the server. It is used by the browser or a client-side JS
application - @RichardSmith
So the pattern I was trying to match in the Location matcher will never work as it never reaches the server.
URL fragment (#) allowed characters
tl;dr
The fragment identifier component can contain:
0
-9
a
-z
A
-Z
?
/
:
@
-
.
_
~
!
$
&
'
(
)
*
+
,
;
=
- percent-encoded characters (a
%
followed by two hexadecimal digits)
How can I find this out?
The URI standard is STD 66, which currently maps to RFC 3986.
In this document, you’ll find everything you need to know.
The fragment identifier component is defined in section 3.5:
fragment = *( pchar / "/" / "?" )
This means that the fragment can contain nothing or (any combination of)
- characters defined in pchar
- the
/
- the
?
Definition of pchar
Refer to the appendix A. to see how pchar is defined:
pchar = unreserved / pct-encoded / sub-delims / ":" / "@"
So this adds
- characters defined in unreserved
- characters defined in pct-encoded
- characters defined in sub-delims
- the
:
- the
@
Definition of unreserved
Now check how unreserved is defined:
unreserved = ALPHA / DIGIT / "-" / "." / "_" / "~"
This adds
- characters defined in ALPHA
- characters defined in DIGIT
- the
-
- the
.
- the
_
- the
~
Definition of ALPHA
and DIGIT
Check how ALPHA and DIGIT are defined. They are not listed in the appendix, because they are from the core ABNF rules, as is explained in section 1.3:
ALPHA (letters), […] DIGIT (decimal digits) […]
So this adds
a
-z
,A
-Z
0
-9
Definition of pct-encoded
Check how pct-encoded is defined:
pct-encoded = "%" HEXDIG HEXDIG
This allows for any percent-encoded character.
Definition of sub-delims
Check how sub-delims is defined:
sub-delims = "!" / "$" / "&" / "'" / "(" / ")" / "*" / "+" / "," / ";" / "="
This adds
- the
!
- the
$
- the
&
- the
'
- the
(
- the
)
- the
*
- the
+
- the
,
- the
;
- the
=
Related Topics
Why Is Constructing Pdo Connection Slow
How to Prepare Statement for Update Query
Many Hash Iterations: Append Salt Every Time
Access Denied for User 'Root'@'Localhost' with PHPmyadmin
PHP Unexpected Result of Float to Int Type Cast
Remove Accents Without Using Iconv
Php: Static and Non Static Functions and Objects
Remove Xml Version Tag When a Xml Is Created in PHP
String Contains Any Items in an Array (Case Insensitive)
How Get All Values in a Column Using PHP
Remove Index.Phproute=Common/Home from Opencart
How to Access PHP Session Variables from Jquery Function in a .Js File
Access a File Which Is Located Before/Outside the Server Root Directory
Does PHP Execution Stop After a User Leaves the Page
How to Check Which PHP Extensions Have Been Enabled/Disabled in Ubuntu Linux 12.04 Lts
How to Make a Multidimensional Array Unique
How to Use PHP to Dynamically Publish an Ical File to Be Read by Google Calendar