« Back to Index

Caching strategies and headers

View original Gist on GitHub

Tags: #cache #caching #headers #stale #revalidate #memcache #redis #varnish #nginx #fastly #cdn #if-none-match #conditional #http

caching-strategies.md

Page Level Cache

Application Cache

Redis or Memcache?

The answer is that it depends on your use case.

Memcache:

Redis:

HTTP Cache

These are handled by HTTP headers/status code sent back with the server response

e.g. Cache-Control: private,max-age=30

Configuration

References: W3C Specification and MDN docs

no-store, no-cache, must-revalidate

These headers require a little extra clarification.

Unlike max-age, the no-store, no-cache and must-revalidate directives are about instructing caches to not cache a resource. However, they differ in subtle ways.

no-store is pretty self-explanatory, and in fact, it does even a little more than the name suggests. When present, a HTTP/1.1 compliant cache must not attempt to store anything, and must also take actions to delete any copy it might have, either in memory, or stored on disk.

The no-cache directive, on the other hand, is arguably much less self-explanatory. This directive actually means to never use a local copy without first validating with the origin server. By doing so, it prevents all possibility of a cache hit, even with fresh resources.

To put it another way, the no-cache directive says that caches must revalidate their representations with the origin server. But then comes another directive, awkwardly named… must-revalidate.

If this starts to get confusing for you, rest assured, you are not alone. If what one wants is not to cache, it has to use no-store instead of no-cache. And if what one wants is to always revalidate, it has to use no-cache instead of must-revalidate.

Confusing, indeed.

As for the must-revalidate directive, it is used to forbid a cache to serve a stale resource. If a resource is fresh, must-revalidate perfectly allows a cache to serve it without forcing any revalidation, unlike with no-store and no-cache. That’s why this header should always be used with a max-age directive, to indicate a desire to cache a resource for some time and when it’s become stale, enforce a revalidation.

When it comes to these last three directives, we find the choice of words to describe each of them particularly confusing: no-store and no-cache are expressed negatively whereas must-revalidate is expressed positively. Their differences would probably be more obvious if they were to be expressed in the same fashion.

Therefore, it is helpful to think about each of them expressed in terms of what is not allowed:

no-store: never store anything
no-cache: never cache hit
must-revalidate: never serve stale

stale-while-revalidate

It’s important to note that the ability for a cache to ‘serve stale’ while revalidating (i.e. to see if there is a fresher version of the cached content) is reliant upon the origin providing either a ETag or Last-Modified header. If neither of these headers are sent, then the cache will not be able to update either the object’s age (i.e. reset it back to zero once the content is refreshed), nor its ‘grace’ period (i.e. how long it will be able to serve it stale) and so this will result in the cache/proxy having to make a full request for content from the origin server.

Each header would cause the browser/cache to issue a conditional header request using a different header itself, for example:

Reference: https://developer.mozilla.org/en-US/docs/Web/HTTP/Caching#Cache_validation

The If-None-Match HTTP request header makes the request conditional. For GET and HEAD methods, the server will send back the requested resource, with a 200 status, only if it doesn’t have an ETag matching the given ones. For other methods, the request will be processed only if the eventually existing resource’s ETag doesn’t match any of the values listed.

# typically is a HEAD request
curl -v --head --header 'If-None-Match: "e54f84f5ccb54dcf20dc2802ce8b8fae6f477f8e"' https://example.com

# works with GET requests too
curl -svo /dev/null --header 'If-None-Match: "e54f84f5ccb54dcf20dc2802ce8b8fae6f477f8e"' https://example.com

The If-Modified-Since request HTTP header makes the request conditional: the server will send back the requested resource, with a 200 status, only if it has been last modified after the given date. If the request has not been modified since, the response will be a 304 without any body.

It should also be noted that the official W3C specification provides ‘rules’ for when to use ETag vs Last-Modified. In summary…

the preferred behavior for an HTTP/1.1 origin server is to send both a strong entity tag and a Last-Modified value.

Below is a sequence diagram (paste it into https://sequencediagram.org/ to see it properly) which demonstrates the stale-while-revalidate flow using ETags as the revalidation mechanism…

title Fastly stale-while-revalidate (ETag example)

Client->CDN: GET /foo
CDN-->CDN: CACHE MISS
CDN->Origin: GET /foo
Origin-->Origin: generate response\n(and hash response for ETag comparison)
Origin->CDN: 200 OK\n**Content-Length**:<N>\n**ETag**: XYZ\n**Cache-Control**: no-store, must-revalidate\n**Surrogate-Control**: max-age=1day, stale-while-revalidate=1day, stale-if-error=1year
note over CDN: cache response
CDN->Client: 200 OK

note over Client #yellow: max-age TTL expires

Client->CDN:GET /foo
CDN-->CDN: CACHE MISS

group asynchronous request flow
  CDN->Origin: GET /foo\n**If-None-Match**: XYZ
  Origin-->Origin: generate response\n(and hash response for ETag comparison)
  Origin->CDN: 304 Not Modified\n**Content-Length**:0
  note over CDN: cached object not updated
end

CDN->Client: 200 OK (stale content)
note over CDN #pink:even though we'll serve stale to client\nwhile asynchronously trying to update\nour expired cache object, this still\nmeans we'll end up hitting the origin\nagain until we get fresh content.\n\nthis is helped by Fastly's ability to do\n"request collapsing" so for multiple\nclient requests only one will reach the\norigin while the remaining will receive\na stale response.

Client->CDN:GET /foo
CDN-->CDN: CACHE MISS
CDN->Origin: GET /foo\n**If-None-Match**: XYZ
Origin-->Origin: generate response\n(and hash response for ETag comparison)
Origin->CDN: 200 OK\n**Content-Length**:<N>\n**ETag**: XYZ\n**Cache-Control**: no-store, must-revalidate\n**Surrogate-Control**: max-age=1day, stale-while-revalidate=1day, stale-if-error=1year
note over CDN: cached object updated with fresh max-age\nand stale-while-revalidate, stale-if-error TTLs

surrogate/cache-control and proxies

the behaviour of specific Cache-Control values when also used alongside Surrogate-Control headers can become confusing so we’ve documented them below…