Cache Query parameter in AWS Cloudfront — Part 1

There are 3 options in order to leverage Cloud-Front caching based on query parameters:

  • None(Improve caching)
  • Forward All, Cache based on all
  • Forward All, Cache based on whitelist

Let's see the advantage and disadvantages of each option

None(Improve caching): When we select this option then the Cloud-Front does not forward any query parameter to the Origin server. For example, let's take this servlet as an example which takes the query param in the url and returns the JSON output with the parameter value: https://your-server/bin/servlets/cachetest.json?testparam=mytest-1

Here we will see the output as null

Explanation:

Advantage: This is useful for frontend react application where all code is executed on the browser and do not need to send parameter to the backend server.

Disadvantage: This is not suitable for AEM applications where we need to process requests based on query parameters. For example, a search request will not bring any result because the underlaying servlet expects query parameter value.

Forward All, Cache based on all: Cloud-Front forwards all the query parameters to the origin server. This means we have to allow query parameters in the origin server i.e in the Dispatcher ignoreUrl parameter so that Apache can set Cache-Control header to the requested URL. If we do not allow in dispatcher then we will always get 200 HTTP code and Cache Miss from Cloud-Front because there will be no Cache-Control header added to the response. And Cache-Control header is added after dispatcher passes the ignoreUrl parameter check, if ignoreUrl is not whitelisted then the dispatcher will always send request AEM. We should also enable TTL in the dispatcher otherwise we will get old cached response from the dispatcher cache even though Cache-Control value is expired.

This option is useful if the origin server returns different versions of the response for all query string parameters. For example, this url: https://your-server/bin/servlets/cachetest.json?testparam=1&time=true. In this servlet it takes two query params, first it takes the value of testparam and returns 10 times the value, and the second param if true then returns the current timestamp.

Forward All, Cache based on the whitelist: If the origin server returns different versions of the response based on one or more query string parameters, then specify the parameters that we want CloudFront to use as a basis for caching. What it means is that in our previous URL example: https://your-server/bin/servlets/cachetest.json?testparam=1&time=true , we have two request parameters, i.e testparam and time and we expect the response to be different based on the parameter value. For example, if testparam=3 then the output should be “Hello word: 30” and if time=false then TimeStamp will not be shown. That means output changes based on the query params.

Now let's suppose we have a requirement where we want only to consider ‘testparam’ query parameter, and we do not care whether ‘time’ parameter value is true or false. So we whitelist this ‘testparam’ in Cloud-Front distribution. This means Cloud-Front will maintain cache when query param ‘testparam’ is present in a subsequent request.

Recommendation:

  • The first option: None (Improve caching) should not be used because we have many request parameter which needs to be sent to AEM
  • We have to allow ignoreUrl parameter in the dispatcher otherwise Cache-Control header cannot be set by Apache. One possibility is that we can set Cache-Control via AEM Sling filter and the dispatcher will create a header file (en.html.h) and add Cache-Control info inside of it but in this case, we have to maintain a separate list of allowed parameters which is not very good in terms of maintainability.
  • The query parameter value should be always in lowercase because for example a parameter ?color=Red and ?color=red will cause Cloud-Front to cache two different objects. Our code should always support lower-case and to make sure values entered by the end-users are always in lowercase, we need to write Lambda@Edge to pass lowercase always to the origin server.
  • My suggestion will be to use “Forward all, cache based on all” because we do not have to maintain a separate list of allowed query parameters, and also most importantly if we do not know the query parameters and have 301 redirects then we will get an infinite loop, more details here: