Cache-Control Headers are utilized by Content Delivery Network (CDN) services to provide the caching platform needed for content optimization and accelerated distribution. The caching used by CDNs adheres to similar technology principles which your web browser deploys to make sites load faster on your local device. Just as your browser keeps local data cached on your hard drive for quick access, so do CDNs keep cached data on their nodes ensuring visitors to your site experience faster load times. However, how does this technology work and how do CDNs know when to serve you content from the cache or refresh it with material from the originating source?
The majority of website content consists of static data which is not expected to change much over time. Generally, this static data includes images, text, video, formatting stylesheets and JavaScript. When you visit a web page, the data your browser receives is a mix of static files and dynamically generated content based on any personalization the site offers its users. For example, when you go to your bank’s website, your landing page is usually filled with static content which provides details about the bank’s solutions and services. However, once you log in to the site, you can access your personal information such as your account balances. CDNs and any other caches for that matter, keep the static content close by so the site loads faster. In the banking example, the account balances would be dynamically generated content which would originate from the bank’s website, but the caching service provides all the images, styling and other static content which improves the site load time providing a richer user experience.
The caching services CDNs provide, offer a wide range of business benefits to website operators. Firstly, by caching the site’s static content, CDNs help save on bandwidth as the user’s browser downloads the bulk of the data from the CDN and not the originating server. Secondly, the proximity of the CDN to the end user enables the site to load faster, and thirdly, the traffic capacity of the CDN ensures reliable content delivery as it assists the originating web server with traffic delivery when multiple users visit the site simultaneously. Making all of this possible are the HTTP cache control headers introduced with HTTP version 1.1.
Cache-Control Headers Explained
Cache control is a specific HTTP header setting which defines the caching policies in both the client requests and the corresponding server responses. These policies determine how a particular web asset is cached, where it is cached, and how long it should remain in the cache before expiring. Setting the cache control header is accomplished by designating specific directives which are listed below.
Max-Age – This directive prescribes the amount of time it takes before the cached copy of a website asset expires. For example, cache-control: max-age=3600 denotes that the resource will expire in one hour at which time the user’s browser needs to request a newer version.
No-Cache – The no-cache directive states that the user’s browser is permitted to cache a response, but must submit a validation request to the originating server before doing so.
No-Store – This directive explicitly states that the browser is not permitted to cache the response and must obtain the content from the originating server each time.
Public – Setting the cache-control directive to “public” permits any resource to cache the web asset.
Private – The private cache-control directive signifies the cached web asset is user specific which means a user’s browser can cache the response but a CDN should not do so.
Must-Revalidate – This directive informs the cache service that it must first revalidate a web asset with the originating server after it becomes stale and not deliver it until the execution of an end-to-end revalidation has occurred.
Proxy-Revalidate – The proxy-revalidate directive offers the same functionality as the must-revalidate directive but is utilized exclusively by shared caches such as proxy servers.
No-Transform – This cache-control directive instructs an intermediary such as cache server or proxy that no modifications can be made on the original asset at all. Specifically, the content-type, content-range, and content-encoding headers must remain in their original state.
In addition to these cache control headers, the HTTP protocol also offers the “Expires” header where you can define a specific date and time for the cached resource to expire. Furthermore, The HTTP “ETag” header communicates the version of the content by tagging it with a unique identifier which the web server utilizes to determine the latest version of the content, and the “Vary” header which determines the responses that must match a cached asset.
Cache-Control Headers and CDNs
As stated, providing a user’s browser with the ability to cache content introduces efficiency by reducing the amount of bandwidth utilized by the web server and improves the user experience with faster load times. However, without cache-control, this web capability would result in the entire solution becoming extremely fragile. Every web asset across every site would be limited to following the same caching rules. This impediment would result in confidential and public information being treated in the same way, and web assets which need to be frequently updated being cached in the same manner and for the same amount of time as assets which do not change as often.
By leveraging cache-control, developers can granularly control how each web resource is cached. This protocol flexibility ensures cache-control delivers the benefits users and developers expect of a caching solution. When it comes to CDN caching, using these cache-control headers gives website operators the ability to set specific controls for intermediary services, giving them the flexibility they need to craft a content delivery solution which leverages the advantages CDNs have to offer.
However, ensuring website operators obtain the best possible benefit of using a CDN, a careful balance is needed between the content delivered to the user via the CDN cache, and the content provided directly from the originating server. Setting the appropriate cache-control headers is therefore essential in ensuring maximum performance from a CDN. An efficient policy maximizes the utilization of the CDN caching service resulting in a higher cache hit ratio. As such, website operators must use discretion, taking the requirements of the application and the user into account when setting the relevant cache-control headers as they have a direct impact on performance and experience.
CDNs generally allow website operators to set their caching options on their application or on the CDN service itself. However, as discussed previously, the settings must take the granular nature of content delivery into account. Setting the cache-control headers on the application gives the administrator far more control than using a site-wide setting on a CDN service. As such, the configuration of both the application and the CDN must ensure the correct cache control headers are set for the relevant content types for both the website resources and the user.
In essence, getting the best performance from your CDN involves setting the max-age cache-control header to the highest value possible for assets which remain unchanged for an extended period. For medium-term assets, adding the must-revalidate or proxy-validate cache-control headers affirms the CDN checks the content before delivering it. This setting may have a negligible impact on site loading times but ensures the user gets the latest content with the best possible performance.