How CDN caching works

Sumanta pakira
3 min readAug 2, 2021

Let's first understand how caching works behind the scene, lets consider you open your browser at 9:00 AM

Note: We should either use Cache-Control or Expire header. We do not need both at the same time. All modern CDN providers use the Cache-Control header as primary, so in our case, we should use Cache-Control. Having Expires header will not make any difference if the Cache-Control header is already present.

Similarly using both Etag and Last-Modified is of no use. We should use either one of them. In our AEM case, we should use Last-Modified, this is because Etag has a compression issue in Apache. Apache uses gzip compression and adds something like this “2dc51–5beaacb536de4-gzip”. Because of this issue, we get 200 HTTP code instead of 304 even though there is no new content. This is not the same for the Node.js applications.

To overcome the above problem we could add the following code in this file in vhost file

RequestHeader edit "If-None-Match" "^\"(.*)-gzip\"$" "\"$1\""

Header edit "ETag" "^\"(.*[^g][^z][^i][^p])\"$" "\"$1-gzip\""

The above solution works but it does not solve redundancy issue. So I would suggest not to use the above code and use the below code:

Header unset ETag

Header unset Expires

FileETag None

When you press F5 in Chrome, it will always send requests to the server. These will be made with the Cache-Control:max-age=0 header. The server will usually respond with a 304 (Not Changed) status code. But we could also get 200 when there is new Etag/Last-Modified header.

When you press Ctrl+F5 or Shift+F5, the same requests are performed, but with the Cache-Control:no-cache header, thus forcing the server to send an uncached version, usually with a 200 (OK) status code. Here it could be possible that we get Cache Hit and not the updated content because cache-control header is not expired.

The Age header is a AWS Cliudfront specific and tells how many seconds the content is sitting in the cache. If the Age value is greater than max-age and if origin sends back either 200/304 then it reset to 0 again.

What will happen if there is no “cache-control” or “expiries” headers are present?

The html page will be cached forever in the browser and will get 304, until cache is cleared manually by hard refresh. In this case, Age header value will keep on increase, and it will be 0 once cache is removed from Cloudfront.

In our current setup we always get updated CSS/JS/images, the reason is because we do not have HTML caching enabled. As a result every time CDN goes to Origin server (Apache) which returns HTML with updated css/js/image because image/css/js are part of HTML content tag.

Please continue to the next part https://sumantapakira.medium.com/cache-query-parameter-in-aws-cloudfront-part-1-7ab49132b682

--

--