Caching Http Responses When They Are Dynamically Created by PHP

Caching HTTP responses when they are dynamically created by PHP

Serving huge or many auxiliary files with PHP is not exactly what it's made for.

Instead, look at X-accel for nginx, X-Sendfile for Lighttpd or mod_xsendfile for Apache.

The initial request gets handled by PHP, but once the download file has been determined it sets a few headers to indicate that the server should handle the file sending, after which the PHP process is freed up to serve something else.

You can then use the web server to configure the caching for you.

Static generated content

If your content is generated from PHP and particularly expensive to create, you could write the output to a local file and apply the above method again.

If you can't write to a local file or don't want to, you can use HTTP response headers to control caching:

Expires: <absolute date in the future>
Cache-Control: public, max-age=<relative time in seconds since request>

This will cause clients to cache the page contents until it expires or when a user forces a page reload (e.g. press F5).

Dynamic generated content

For dynamic content you want the browser to ping you every time, but only send the page contents if there's something new. You can accomplish this by setting a few other response headers:

ETag: <hash of the contents>
Last-Modified: <absolute date of last contents change>

When the browser pings your script again, they will add the following request headers respectively:

If-None-Match: <hash of the contents that you sent last time>
If-Modified-Since: <absolute date of last contents change>

The ETag is mostly used to reduce network traffic as in some cases, to know the contents hash, you first have to calculate it.

The Last-Modified is the easiest to apply if you have local file caches (files have a modification date). A simple condition makes it work:

if (!file_exists('cache.txt') || 
filemtime('cache.txt') > strtotime($_SERVER['HTTP_IF_MODIFIED_SINCE'])) {
// update cache file and send back contents as usual (+ cache headers)
} else {
header('HTTP/1.0 304 Not modified');
}

If you can't do file caches, you can still use ETag to determine whether the contents have changed meanwhile.

Caching Dynamically Generated Pages

If your website has hundreds of pages with many visitors everyday, you might want to implement some sort of caching mechanism for your website to speed up page loading time. Each client-server request consist of multiple database queries, server response and the processing time increasing overall page loading time. The most common solution is to make copies of dynamic pages called cache files and store them in a separate directory, which can later be served as static pages instead of re-generating dynamic pages again and again.
Understanding Dynamic pages & Cache Files

Cache files are static copies generated by dynamic pages, these files are generated one time and stored in separate folder until it expires, and when user requests the content, the same static file is served instead of dynamically generated pages, hence bypassing the need of regenerating HTML and requesting results from database over and over again using server-side codes. For example, running several database queries, calculating and processing PHP codes to the HTML output takes certain seconds, increasing overall page loading time with dynamic page, but a cached file consist of just plain HTML codes, you can open it in any text editor or browser, which means it doesn’t require processing time at all.

Dynamic page :— The example in the picture below shows how a dynamic page is generated. As its name says, it’s completely dynamic, it talks to database and generates the HTML output according to different variables user provides during the request. For example a user might want to list all the books by a particular author, it can do that by sending queries to database and generating fresh HTML content, but each request requires few seconds to process also the certain server memory is used, which is not a big deal if website receives very few visitors. However, consider hundreds of visitors requesting and generating dynamic pages from your website over and over again, it will considerably increase the pressure, resulting delayed output and HTTP errors on the client’s browser.

dynamic-page-example

Cached File :— Picture below illustrates how cached files are served instead of dynamic pages, as explained above the cached files are nothing but static web pages. They contain plain HTML code, the only way the content of the cached page will change is if the Web developer manually edits the file. As you can see cached files neither require database connectivity nor the processing time, it is an ideal solution to reduce server pressure and page loading time consistently.

cached-file-example
PHP Caching

There are other ways to cache dynamic pages using PHP, but the most common method everyone’s been using is PHP Output Buffer and Filesystem Functions, combining these two methods we can have magnificent caching system.

PHP Output buffer :— It interestingly improves performance and decreases the amount of time it takes to download, because the output is not being sent to browser in pieces but the whole HTML page as one variable. The method is insanely simple take a look at the code below :

<?php
ob_start(); // start the output buffer

/* the content */
ob_get_contents(); gets the contents of the output buffer
ob_end_flush(); // Send the output and turn off output buffering
?>

When you call ob_start() on the top of the code, it turns output buffering on, which means anything after this will be stored in the buffer, instead of outputting on the browser. The content in the buffer can be retrieved using ob_get_contents(). You should call ob_end_flush() at the end of the code to send the output to the browser and turn buffering off.

PHP Filesystem :— You may be familiar with PHP file system, it is a part of the PHP core, which allow us to read and write the file system. Have a look at the following code.

$fp = fopen('/path/to/file.txt', 'w');  //open file for writing
fwrite($fp, 'I want to write this'); //write
fclose($fp); //Close file pointer

As you can see the first line of the code fopen() opens the file for writing, the mode ‘w’places the file pointer at the beginning of the file and if file does not exist, it attempts to create one. Second line fwrite() writes the string to the opened file, and finally fclose()closes the successfully opened file at the beginning of the code.
Implementing PHP caching

Now you should be pretty clear about PHP output buffer and filesystem, we can use these both methods to create our PHP caching system. Please have a look at the picture below, the Flowchart gives us the basic idea about our cache system.

php-cache-system

The cycle starts when a user request the content, we just check whether the cache copy exist for the currently requested page, if it doesn’t exist we generate a new page, create cache copy and then output the result. If the cache already exist, we just have to fetch the file and send it to the user browser.

Take a look at the Full PHP cache code below, you can just copy and paste it in your PHP projects, it should work flawlessly as depicted in above Flowchart. You can play with the settings in the code, modify the cache expire time, cache file extension, ignored pages etc.

<?php
//settings
$cache_ext = '.html'; //file extension
$cache_time = 3600; //Cache file expires afere these seconds (1 hour = 3600 sec)
$cache_folder = 'cache/'; //folder to store Cache files
$ignore_pages = array('', '');

$dynamic_url = 'http://'.$_SERVER['HTTP_HOST'] . $_SERVER['REQUEST_URI'] . $_SERVER['QUERY_STRING']; // requested dynamic page (full url)
$cache_file = $cache_folder.md5($dynamic_url).$cache_ext; // construct a cache file
$ignore = (in_array($dynamic_url,$ignore_pages))?true:false; //check if url is in ignore list

if (!$ignore && file_exists($cache_file) && time() - $cache_time < filemtime($cache_file)) { //check Cache exist and it's not expired.
ob_start('ob_gzhandler'); //Turn on output buffering, "ob_gzhandler" for the compressed page with gzip.
readfile($cache_file); //read Cache file
echo '<!-- cached page - '.date('l jS \of F Y h:i:s A', filemtime($cache_file)).', Page : '.$dynamic_url.' -->';
ob_end_flush(); //Flush and turn off output buffering
exit(); //no need to proceed further, exit the flow.
}
//Turn on output buffering with gzip compression.
ob_start('ob_gzhandler');
######## Your Website Content Starts Below #########
?>
<!DOCTYPE html>
<html>
<head>
<title>Page to Cache</title>
</head>
<body>
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Integer ut tellus libero.
</body>
</html>
<?php
######## Your Website Content Ends here #########

if (!is_dir($cache_folder)) { //create a new folder if we need to
mkdir($cache_folder);
}
if(!$ignore){
$fp = fopen($cache_file, 'w'); //open file for writing
fwrite($fp, ob_get_contents()); //write contents of the output buffer in Cache file
fclose($fp); //Close file pointer
}
ob_end_flush(); //Flush and turn off output buffering

?>

You must place your PHP content between the enclosed comment lines, In fact I’d suggest putting them in separate header and footer file, so that it can generate and serve cache files for all the different dynamic pages. If you read the comment lines in the code carefully, you should find it pretty much self explanatory.

How to cache dynamically created images?

Your server is the one deciding here whether it thinks the browser should still have the image cached or not. Obviously it is hardly the authority on that. There are any number of reasons why the browser may not have the image cached, e.g. if you open the inspector tools (which typically disables caching).

The browser has a mechanism for informing the server about its cache status: the HTTP headers If-None-Match for ETags and If-Modified-Since for time-based expiry. If any of these two headers is present in the request, that means the browser still has a copy of the resourced cached and would happily accept a 304 response instead of downloading the resource again.

If you set an ETag header in your response, the browser will do another request using If-None-Match (that essentially replaces your cookie mechanism, more reliably); if you just set an expiration date, the browser will check again with the server using the If-Modified-Since header. That's what you should base your 304 reply on.

Example using ETags:

$hash = sha1($originalSource);

header("Expires: " . gmdate("D, d M Y H:i:s", time() + $offset) . " GMT");

if (
isset($_SERVER['HTTP_IF_NONE_MATCH']) &&
trim($_SERVER['HTTP_IF_NONE_MATCH'], '"') === $hash
) {
header("HTTP/1.1 304 Not Modified");
exit;
}

header("Content-type: image/jpeg");
header("Cache-Control: max-age=$offset");
header("ETag: \"$hash\"");

// output image

Handling caching dynamic data

I'm not a php person but I can tell you about caching.

Dynamic sites that generate content per user are the trickiest to do effectively, however, it can be done. It will require that you look at how data flows in your application in order to determine how, where and what to cache. Here are some guidelines :

  • Data that does not change per user or per page -
    cache in the application memory and
    grab it instead of going to the DB.
  • Data that changes per user but not
    per page - cache in the user session
  • Data that changes per page but not
    per user - cache in app memory using
    the page name as the key
  • Data that changes per user per page -
    cache in session with page name as
    key
  • Data that is unique per page request - do not cache.

Not just data from the database is a candidate for caching. If you have a block of complex logic that manipulates data, consider caching the output of that logic.

Dynamic CSS is not being cached by browsers - response returns 200 OK instead of 304 Not Modified

The Expires: header tells the browser or cache server how long it can keep reusing the same resource without reloading it from the origin server.

If you want the browser to make a conditional request (eg. using If-Modified-Since: or If-None-Match:), you need to send a Last-Modified: and/or ETag: header, and you need to write code to test for these headers and produce the appropriate response (304 or 200).

See RFC 2616 RFC 7232 for full details.

How do you create a dynamic page in Wordpress and not encounter issues with caching?

I was finally able to debug this problem using wget on the server like so:

$ wget https://localhost/blog/ --no-check-certificate --server-response

Once I disabled the two plugins I was using for caching in Wordpress, this command allowed me to bypass Cloudflare and see the headers being set by Apache.

--2020-05-19 13:21:08--  https://localhost/blog/
Resolving localhost (localhost)... 127.0.0.1
Connecting to localhost (localhost)|127.0.0.1|:443... connected.
WARNING: cannot verify localhost's certificate, issued by ‘ST=California,L=San Francisco,OU=CloudFlare Origin SSL Certificate Authority,O=CloudFlare\\, Inc.,C=US’:
Unable to locally verify the issuer's authority.
WARNING: no certificate subject alternative name matches
requested host name ‘localhost’.
HTTP request sent, awaiting response...
HTTP/1.1 200 OK
Date: Tue, 19 May 2020 12:21:08 GMT
Server: Apache
Link: <https://localhost/blog/wp-json/>; rel="https://api.w.org/"
Cache-Control: private, must-revalidate
Expires: Tue, 19 May 2020 12:31:08 GMT
Vary: Accept-Encoding,User-Agent
Content-Type: text/html; charset=UTF-8
Keep-Alive: timeout=5, max=100
Connection: Keep-Alive
Transfer-Encoding: chunked
Length: unspecified [text/html]
Saving to: ‘index.html’

I noticed that the Cache-Control header was different to the one in my Apache configuration.

Cache-Control: max-age=0, private, no-store, no-cache, must-revalidate

This was because the Cache-Control header was being set in the root domain Apache config but not for the blog (it is being hosted with a reverse proxy).

The solution was to copy all the Expires and Cache-Control header config into my blog Apache configuration file and then voila:

$ wget https://localhost/blog/ --no-check-certificate --server-response--2020-05-19 16:41:19--  https://localhost/blog/
Resolving localhost (localhost)... 127.0.0.1
Connecting to localhost (localhost)|127.0.0.1|:443... connected.
WARNING: cannot verify localhost's certificate, issued by ‘ST=California,L=San Francisco,OU=CloudFlare Origin SSL Certificate Authority,O=CloudFlare\\, Inc.,C=US’:
Unable to locally verify the issuer's authority.
WARNING: no certificate subject alternative name matches
requested host name ‘localhost’.
HTTP request sent, awaiting response...
HTTP/1.1 200 OK
Date: Tue, 19 May 2020 15:41:20 GMT
Server: Apache
Vary: Accept-Encoding,Cookie,User-Agent
Link: <https://localhost/blog/wp-json/>; rel="https://api.w.org/"
Cache-Control: private, no-store, no-cache, must-revalidate
Expires: Tue, 19 May 2020 15:41:20 GMT
Content-Type: text/html; charset=UTF-8
Keep-Alive: timeout=5, max=100
Connection: Keep-Alive
Transfer-Encoding: chunked
Length: unspecified [text/html]
Saving to: ‘index.html’

For completeness, please see my new Apache config for my blog:

# avoids sending hackers too much info about the server
ServerTokens Prod

<VirtualHost *:8080>
ServerName www.example.com
ServerAdmin dagmar@example.com

ErrorLog /var/log/apache2/blog/error.log
CustomLog /var/log/apache2/blog/access.log common

DocumentRoot /var/www/blog
<Directory /var/www/blog>
AllowOverride All
Options -Indexes
</Directory>

# Enable Compression
<IfModule mod_deflate.c>
SetOutputFilter DEFLATE
SetEnvIfNoCase Request_URI \.(?:gif|jpe?g|png)$ no-gzip dont-vary
SetEnvIfNoCase Request_URI \.(?:exe|t?gz|zip|bz2|sit|rar)$ no-gzip dont-vary
Header append Vary User-Agent
</IfModule>

# Enable expires headers
<IfModule mod_expires.c>
ExpiresActive On
ExpiresByType image/jpg "access plus 1 year"
ExpiresByType image/jpeg "access plus 1 year"
ExpiresByType image/gif "access plus 1 year"
ExpiresByType image/png "access plus 1 year"
ExpiresByType text/css "access plus 1 month"
ExpiresByType application/pdf "access plus 1 month"
ExpiresByType text/x-javascript "access plus 1 month"
ExpiresByType text/javascript "access plus 1 month"
ExpiresByType application/javascript "access plus 1 month"
ExpiresByType application/x-javascript "access plus 1 month"
ExpiresByType image/x-icon "access plus 1 year"
ExpiresByType text/xml "access plus 0 seconds"
ExpiresByType text/html "access plus 0 seconds"
ExpiresByType text/plain "access plus 0 seconds"
ExpiresByType application/xml "access plus 0 seconds"
ExpiresByType application/json "access plus 0 seconds"
ExpiresByType application/rss+xml "access plus 1 hour"
ExpiresByType application/atom+xml "access plus 1 hour"
ExpiresByType text/x-component "access plus 1 hour"
ExpiresDefault "access plus 0 seconds"
</IfModule>

# Enable caching headers
<IfModule mod_headers.c>
# Calculate etag on modified time and file size (could be turned off too ?)
FileETag MTime Size
# NEVER CACHE - notice the extra directives
<FilesMatch "\.(html|htm|php)$">
Header set Cache-Control "private, no-store, no-cache, must-revalidate"
</FilesMatch>
</IfModule>

Prevent caching of dynamic (php) content

Your saying that you want fresh content without a page refresh. For that you will need something else such as JQuery and AJAX, disabling caching won't make pages update themselfs without refresh.

Is this what your looking for?

 <FilesMatch "\.(pl|php|cgi|spl|scgi|fcgi)$">
FileETag None
<ifModule mod_headers.c>
Header unset ETag
Header set Cache-Control "max-age=0, no-cache, no-store, must-revalidate"
Header set Pragma "no-cache"
Header set Expires "Wed, 11 Jan 1984 05:00:00 GMT"
</ifModule>
</FilesMatch>

Source: http://www.askapache.com/htaccess/using-http-headers-with-htaccess.html



Related Topics



Leave a reply



Submit