Cache Busting Via Params

Cache busting via params

The param ?v=1.123 indicates a query string, and the browser will therefore think it is a new path from, say, ?v=1.0. Thus causing it to load from file, not from cache. As you want.

And, the browser will assume that the source will stay the same next time you call ?v=1.123 and should cache it with that string. So it will remain cached, however your server is set up, until you move to ?v=1.124 or so on.

Cache busting images that are fetched use query params

In such cases the query parameters are implied to always change. If you simply add &id=1123 that will change nothing.

Try adding &t=nnnn with nnnn equal to, say, the current time in seconds.

Or if you generate the link on the same server, better still, use for nnnn the modification time of the image in seconds, or as a timestamp:

?img.png&t=2019-07-04.17.03.22

UPDATE

You're saying that somehow

 http://www.server.com/get?img.png&id=1

and

 http://www.server.com/get?img.png&id=2

are being treated by the browser as the same image. I find this difficult to accept, because it would be (among other things) a security flaw - it means that, say, get?report.pdf&user=whoknows&password=whatever might end up downloading get?report.pdf&user=realuser&password=realpassword without need to supply real login information the second time.

Totally not saying it's your fault (as a developer I found myself often in your exact situation), but someone here seems to have overdone it somewhere. The problems are how to pinpoint where, and what can you do, if anything, with the tools and access you were given.

Why the server might be doing this is explained the easiest: the server, or some caching system in front of it, strips extra parameters. So you can ask id=x273y3 as much as you like, that information never reaches the server and can't make it do anything. It would be interesting to know what the use case for this was.

In some, limited cases you might get it done through a ugly hack - if you request 12345/../img.png instead of img.png, and path parsing is done just in the right way, then the cache layer might not cache the request and yet the server still reply with the newer image. But it's a brittle hack because lots of legitimate changes in the server architecture might end up breaking it completely, resulting in no image being sent at all.

If you're battling with server side caching then you'd better try and add the appropriate no-cache pragmas to the request itself. The reason many use the extra parameter hack is because, due to long-standing abuse by clients, several cache servers can be, and often are, configured to ignore those headers.

Especially if someone went as far as parameter stripping, they should have endeavoured to properly support cache directives instead.

(On the other hand, if you've got a server that ignores both legitimate headers and request hacks, you've a pretty solid case that whatever happens is on their heads).

Otherwise, what might be happening is that the client believes it can cache the resource because this got sent with specific resource headers (ETag, etc.) and cache revalidation doesn't complete properly because of client/server incomprehensions, which also happens quite often. You should record a full set of conversations and post them here, to help circumscribe the problem:

  • headers of the first request to an image
  • headers of the reply
  • headers of the request to an image that meanwhile changed
  • headers of the reply to that

It could also be something very simple, for example the server actually replies with a fixed 302 that strips extra parameters. Then it is the new URL that gets cached:

GET /get?img.png&...
302 Location: http://static-images.server.com/images/img.png

This could be due to a too thorough Rewrite-Rule by an Apache Server rewrite engine, for example, where "\?(.*.(png|jpg|gif))" is taken from the source request and rewritten to "NewLocation/$1". In such a case another brittle workaround would be to request /get?img.png?t=12345.png, with two ?'s, to trick the rewrite engine into capturing img.png?t=12345 instead of just img, thus including the cache busting.

The proper, if lengthier, remediation however is to have the rewrite people and the cache people talk to each other and collaborate instead of working at odds.

Cache invalidation using the query-string, bad practice?

It's true that query string cache invalidation is not exactly best practice. There are cases where it doesn't work... some browsers (supposedly), and your CDN might be set up to ignore the query string (serve the same file). But this doesn't mean it's not effective for development workflows or as a quick fix that scratches the itch.

Some folks feel strongly that query strings are not good enough. For a professional site (especially with continuous integration) you should use filenames based on last updated date or a hash of file contents.

Links on the topic...

  • https://css-tricks.com/strategies-for-cache-busting-css
  • https://www.stevesouders.com/blog/2008/08/23/revving-filenames-dont-use-querystring
  • File Caching: Query string vs Last-Modified?

What broswers don't support Cache Busting?

To clarify the question, what browsers support cache busting via query stings?

Cache busting isn't something browsers "support"; it's a technique that uses the standard behavior of browser caching.

Data is cached in the browser per URL. Each unique URL is supposed to represent a unique piece of data, which can be individually cached. By appending a meaningless value in the query string, you change the URL, making it unique, causing the browser to download it because it doesn't have it cached yet. That's all there is to it.

For this not to work a browser would have to have non-standard cache behavior and somehow consider two different URLs equal, and use a cached version of a different URL for a URL it has in fact not yet downloaded. I know of no browser which does this (doesn't mean it doesn't exist, but this would be severely broken).

File Caching: Query string vs Last-Modified?

TL;DR

Why do so many websites use the "query string" method, instead of just letting the last-modified header do its work?

Changing the query string changes the url, ensuring content is "fresh".

Should I unset the Last-modified header and just work with query strings?

No. Though that's almost the right answer.


There are three basic caching strategies used on the web:

  • No caching, or caching disabled
  • Using validation/conditional requests
  • Caching forever

To illustrate all three, consider the following scenario:

A user accesses a website for the first time, loads ten pages and leaves. Each page loads the same css file. For each of the above caching strategies how many requests would be made?

No caching: 10 requests

In this scenario, it should be clear that there isn't anything else influencing the result, 10 requests for the css file would result in it being sent to the client (browser) 10 times.

Advantages

  • Content always fresh
  • No effort/management required

Disadvantages

  • Least efficient, content always transferred

Validation requests: 10 requests

If Last-Modified or Etag are used, there will also be 10 requests. However 9 of them will only be the headers, and no body is transferred. Clients use conditional requests to avoid re-downloading something it already has. Take for example the css file for this site.

The very first time the file is requested, the following happens:

$ curl -i http://cdn.sstatic.net/stackoverflow/all.css
HTTP/1.1 200 OK
Server: cloudflare-nginx
Date: Mon, 12 May 2014 07:38:31 GMT
Content-Type: text/css
Connection: keep-alive
Set-Cookie: __cfduid=d3fa9eddf76d614f83603a42f3e552f961399880311549; expires=Mon, 23-Dec-2019 23:50:00 GMT; path=/; domain=.sstatic.net; HttpOnly
Cache-Control: public, max-age=604800
Last-Modified: Wed, 30 Apr 2014 22:09:37 GMT
ETag: "8026e7dfc064cf1:0"
Vary: Accept-Encoding
CF-Cache-Status: HIT
Expires: Mon, 19 May 2014 07:38:31 GMT
CF-RAY: 1294f50b2d6b08de-CDG
.avatar-change:hover{backgro.....Some KB of content

A subsequent request for the same url would look like this:

$ curl -i -H "If-Modified-Since:Wed, 30 Apr 2014 22:09:37 GMT" http://cdn.sstatic.net/stackoverflow/all.css
HTTP/1.1 304 Not Modified
Server: cloudflare-nginx
Date: Mon, 12 May 2014 07:40:11 GMT
Content-Type: text/css
Connection: keep-alive
Set-Cookie: __cfduid=d0cc5afd385060dd8ba26265f0ebf40f81399880411024; expires=Mon, 23-Dec-2019 23:50:00 GMT; path=/; domain=.sstatic.net; HttpOnly
Cache-Control: public, max-age=604800
Last-Modified: Wed, 30 Apr 2014 22:09:37 GMT
ETag: "8026e7dfc064cf1:0"
Vary: Accept-Encoding
CF-Cache-Status: HIT
Expires: Mon, 19 May 2014 07:40:11 GMT
CF-RAY: 1294f778e75d04a3-CDG

Note there is no body, and the response is a 304 Not Modified. This is telling the client that the content it already has (in local cache) for that url is still fresh.

That's not to say this is the optimal scenario. Using tools such as the network tab of chrome developer tools allows you to see exactly how long, and doing what, a request takes:

Sample Image

Because the response has no body, the response time will be much less because there's less data to transfer. But there is still a response. and there is still all of the overhead of connecting to the remote server.

Advantages

  • Content always fresh
  • Only one "Full" request sent
  • Nine requests are much slimmer only containing headers
  • More efficient

Disadvantages

  • Still issues the maximum number of requests
  • Still incurs DNS lookups
  • Still needs to establish a connection to the remote server
  • Doesn't work offline
  • May require server configuration

Caching forever: 1 request

If there are no etags, no last modified header and only an expires header set far in the future - only the very first access to a url will result in any communication with the remote server. This is a well-known? best practice for better frontend performance. If this is the case, for subsequent requests a client will read the content from it's own cache and not communicate with the remote server at all.

This has clear performance advantages, which are especially significant on mobile devices where latency can be significant (to put it mildly).

Advantages

  • Most efficient, content only transferred once

Disadvantages

  • The url must change to prevent existing visitors loading stale cached versions
  • Most effort to setup/manage

Don't use query strings for cache busting

It is to circumvent a client's cache that sites use a query argument. When the content changes (or if a new version of the site is published) the query argument is modified, and therefore a new version of that file will be requested as the url has changed. This is less work/more convenient than renaming the file every time it changes, it is not however without its problems,

Using query strings prevents proxy caching, in the below quote the author is demonstating that a request from browser<->proxy cache server<->website does not use the proxy cache:

Loading mylogo.gif?v=1.2 twice (clearing the cache in between) results
in these headers:

>> GET http://stevesouders.com/mylogo.gif?v=1.2 HTTP/1.1
<< HTTP/1.0 200 OK
<< Date: Sat, 23 Aug 2008 00:19:34 GMT
<< Expires: Tue, 21 Aug 2018 00:19:34 GMT
<< X-Cache: MISS from someserver.com
<< X-Cache-Lookup: MISS from someserver.com

>> GET http://stevesouders.com/mylogo.gif?v=1.2 HTTP/1.1
<< HTTP/1.0 200 OK
<< Date: Sat, 23 Aug 2008 00:19:47 GMT
<< Expires: Tue, 21 Aug 2018 00:19:47 GMT
<< X-Cache: MISS from someserver.com
<< X-Cache-Lookup: MISS from someserver.com

Here it’s clear the second response was not served by the proxy: the
caching response headers say MISS, the Date and Expires values change,
and tailing the stevesouders.com access log shows two hits.

This shouldn't be taken lightly - when accessing a website physically located on the other side of the world response times can be very slow. Getting an answer from a proxy server located along the route can mean the difference between a website being usable or not - in the case of cached-forever resources it means the first load of a url is slow, in the case of using validation requests it means the whole site will be sluggish.

Instead version-control assets

The "best" solution is to version control files such that whenever the content changes so does the url. Normally that would be automated as part of the build process.

However a near-compromise to that is to implement a rewrite rule such as

# ------------------------------------------------------------------------------
# | Filename-based cache busting |
# ------------------------------------------------------------------------------

# If you're not using a build process to manage your filename version revving,
# you might want to consider enabling the following directives to route all
# requests such as `/css/style.12345.css` to `/css/style.css`.

# To understand why this is important and a better idea than `*.css?v231`, read:
# http://stevesouders.com/blog/2008/08/23/revving-filenames-dont-use-querystring

<IfModule mod_rewrite.c>
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule ^(.+)\.(\d+)\.(js|css|png|jpe?g|gif)$ $1.$3 [L]
</IfModule>

In this way a request for foo.123.css is processed by the server as foo.css - this has all the advantages of using a query parameter for cache busting, but without the problem of disabling proxy caching.

What's the ultimate and definitive way to clear the browser cache?

This doesn't clear caches but will solve your updating problem.

In the HTML, add an (unused) query string to the html link to linked files and alter it each time you make an update to the file. e.g. for css:

<link rel="stylesheet" href="styles.css?a">

Then, each time you make changes to the file pointed to, change the 'a' to 'b' or anything (Don't change the linked file's name, the query string will be ignored).

This forces the browser to 'change' the linked file each time the href changes and so the altered file gets reloaded.

The method will work for script and other linked files. The query string could be something meaningful such as version numbers - ?v1, but anything will do.

Edit, as noted by @GerardoFurtado, a further discussion of this idea is available here Cache busting via params



Related Topics



Leave a reply



Submit