Caching is an integral component of most systems. Without it, many systems would not have sufficient capacity to keep up with demand, and even if they did, customers would need to wait longer for a response. Caching is seen as a Good ThingTM, and rightfully so.
That said, caching is not always a simple task. Some readers may remember this quote from Phil Karlton: "There are two hard problems in computer science: cache invalidation and naming things." "Naming things" is a light-hearted joke we can all understand, but cache invalidation is no laughing matter. Cache invalidation is an action that must be performed when some data in a cache is no longer safe to use. Something has changed, and that cached data is now wholly inaccurate or, at best, out of date. Understanding and accounting for these somethings is key.
I'd like to present design patterns for caching large systems in a layered and resilient manner, with no invalidation at all. We'll achieve this by designing components that are individually reliant only on immutable state, and then layering them together. There are significant challenges in coaxing certain components into immutability, so the patterns presented will revolve around accounting for sources of state, such as wall time, databases, user settings, external systems, and software versions.
These ideas aren't strictly new. I've had peers describe these concepts as "how you do systems engineering as if you could use C++’s const keyword on a subsystem level" and as "functional programming applied to software architecture."
Immutability is one of the most powerful behaviors a component can have. Immutable properties can be applied to any input -> output operation, whether it's a database query, function, service call, or something else entirely — so long as we select the right composite key to describe the operation. In the context of this article, I define a component as immutable if it always returns the same result given the same input. The inputs completely drive the service's response, and there is no state on the server-side that can change the response. This behavior is powerful because it is easy to test and cache any component that repeatably produces the same output for a given input. There are no "details" to work around or compensate for - it just works!
If state changes in a server are allowed to cause different results for the client, even with the same input, that makes caching (and testing) challenging. How do you know if your cache entry is still valid? Would the back-end provide the same value if I queried it again? Ambiguity of this nature must be avoided if we want the benefits of invalidation-free caching and repeatable testing.
Let's consider a contrived function, days_old(user)
, which calculates how many days old a user is. This operation is impacted by two forms of state:
- User Settings: The user's birthday
- Wall Time: The current date
Despite the simplicity of this operation, it is perilously difficult to cache. If the current date changes or the user's birthday is updated, any cache is no longer valid. To account for Wall Time, one would need to either set an expiration (TTL) on the cache entry -- which isn't too bad -- or clear the cache every day at midnight. In the real world, both are easier said than done, since time zones make the current date an ambiguous concept.
The user's birthday poses similar challenges. While it's not a frequent operation, we want to build systems that respect a user's settings immediately when changed. I would be angry if I used a system that had lag in respecting a change I made to my settings or profile. We must invalidate any relevant cache entry immediately when a user setting is changed, lest we anger our users. This means we would need to build a component to actively invalidate cache entries whenever user settings are updated. I consider any active cache invalidation a red flag. The only good cache invalidation strategy is no strategy.
To avoid any active invalidation, we can refactor a stateful operation into two phases: the "State Gathering" portion and the "Immutable" portion. Using our prior example, the days_old(user)
method becomes the state gathering entry point, and we'll add a second layer days_old(date_born, date_now)
. The first function will fetch the current date and then query the User Settings database and resolve user
into the specific values needed to complete the operation -- in this case, date_born
. The days_old(date_born, date_now)
operation is now perfectly immmutable and can be cached without any concerns at all about the validity of its cache. There is no unaccounted-for state that can change the result of the operation.
In this situation, you will notice that the User Database is not protected by a cache. That is intentional to get the behavior we want, which is that there is no active cache invalidation and the user's settings are respected immediately. Databases that hold User Settings are usually very small by modern standards and can be sharded trivially. Write traffic to them is sparse and random. Most users aren't changing their settings frequently. This allows you to achieve scale via replication and sharding. In my experience, this is sufficient given the performance of a modern RDBMS. Despite this, I have cached User Settings in the past — when dealing with enterprise traffic — where non-human role accounts make millions of requests per minute.
In the prior example, we demonstrated two important concepts. The first was that we can (and should) factor stateful components into stateful and immutable layers by accounting for each source of state inherent in the operation.
The second was that we can account for both Wall Time and the state from Small Databases by resolving those ambiguous data points into explicit values early in the request flow.
Let's look at our current set of tools for dealing with various types of state, bearing in mind that I am classifying "User Settings" into the general class of "Small Databases":
Solved:
- Wall Time: Resolve into values early in request flow
- Small Databases: Resolve into values early in request flow, replicate/shard to scale
Yet to solve:
- Big Databases
- Software Versions
- External Systems
The types of state used in the prior example are ubiquitous, but relatively easy to accommodate. Real systems are considerably more complex. To facilitate our discussion, let's consider a typical system with multiple user-facing applications, a shared business logic service, and a large database. For the purposes of this conversation about immutability and caching, the DB Access Service shown in the diagram isn't relevant, but I believe so strongly in the importance of access services and not exposing database implementation details that I refuse to make a diagram showing a database without one. By including the "State Gathering" and "Immutable Zone" in the diagram, we would like both the business logic service and the database to be immutable and cached, and for the application service to fill the role of resolving state to accomplish that.
The first challenge we need to discuss is the large, constantly changing database at the heart of this real-world system (and many others). It's not feasible to put these canonical databases outside the cache like we did for the small User Settings database, so we need a strategy that allows for caching that maintains the immutable behavior we want -- that a query to the database always returns the same result. Let's discuss two patterns for achieving this: The Timestamped Data pattern and the Snapshot pattern.
The Timestamped Data Pattern
In the timestamped data pattern, we structure our database such that all data is annotated with a timestamp column and writes are append-only. In this way, we can factor access to the database into two phases:
- Stateful call to get the most recent timestamp for the entity of interest:
select max(timestamp) from table where entity="B"
- Immutable call to get entity's data using said timestamp:
select data from table where entity="B" and timestamp <= 5
The first operation is stateful and needs to be uncached, or at most cached with a very short TTL window. For many data models, this isn't as problematic as it seems, as the read throughput available for queries on fully-indexed columns in modern databases is tremendous. At Bloomberg, we have been able to keep up with timestamp read traffic with a small amount of replication and a well-performing RDBMS.
The second operation is now perfectly cacheable, as the results of timestamp-limited queries are guaranteed to never change because writes are append-only, and any new data would be given a new timestamp. You may ask, does this really work in practice? Consider an email system using the data model above, where each email is a row with a timestamp. Our user is expecting an email and hitting F5 (reload) in their browser over and over. Every time the user hits reload, we only need to run the first query, which will give us the same timestamp and therefore we'll hit the cache. Impact on the email system is very low, just one fully indexed read query. When the new email comes in, a new row is added, and the timestamp changes. The second access then misses the cache, generates a new response to return to the UI, and puts the response in the cache. The user may have hit the reload button dozens of times, but you only ran the actual heavy back-end process twice, once for each unique timestamp. All the other reloads were cache hits.
Timestamped Data & Object Stores
While the timestamped data pattern works with traditional relational tables as described above, I find this pattern is best applied to object stores holding immutable documents.
In this model, the timestamp table and data are separated. The timestamps are stored in a smaller fully-indexed database with pointers to specific immutable documents in the object store. By separating these sources of data, it's easy to scale the index or timestamp database using the same techniques (sharding and replication) that we apply to "small databases" like the User Settings database in our prior example. The object store can use a relatively low-cost solution (S3) as all read access is perfectly cachable.
Layered Caching
Fitting this pattern into our example system unlocks some powerful behaviors. Imagine our system analyzes stocks and the client wants to see the analysis for IBM.
The client loads the application, and the UI sends a call to the application service, which is stateful. The application service is then responsible for gathering the state needed to make subsequent calls immutable. One thing it does is make a query to the timestamps database to get the pointer to the most recent immutable document for IBM. The application service then calls the business logic service, providing the immutable URL instead of the ambiguous "IBM". The business logic service then calls the cheap, slow, and easily scalable object store to fetch the required data. This document read is cacheable, without any concerns about invalidation since a new URL will be generated if IBM's data is updated. In addition, the business logic service itself can be cached, as it only depends on its own internal logic and the contents of the immutable URL. We now have layered caching, without invalidation, of both our database and business logic service.
The Snapshot Pattern
The timestamped data pattern is powerful, but there are many situations where it isn't appropriate. The snapshot pattern is an alternative technique that can be used to convert access to a stateful database or external system into an immutable operation. The pattern has value because it can be easily retrofitted onto existing components that are not able to be refactored, redesigned, or replaced. It is particularly useful as a pattern that can be applied to external systems over which you have no control. But, depending on how it is applied, it can be challenging to achieve high cache hit rates.
The core components of this pattern are a new append-only object store and a "snapshot service" to manage it. Instead of accessing the stateful database or external system directly, our application will instead call the snapshot service and request a new snapshot be created for some entity or subset of data within the database. The snapshot service will fetch the stateful data from the canonical source, write it to the append-only object store, and return a URL that is guaranteed to be immutable. Like in the timestamped data pattern, use of that URL allows caching across multiple layers of the system. The semantics of when snapshots are generated needs to be tailored to the specifics of the system. Snapshots can be generated on demand, in a periodic fashion, or at the beginning of bulk jobs.
The success of this pattern is directly proportional to how often a single snapshot can be reused. For example, if a system needs to make a new snapshot for every operation, then it's unlikely that the use of snapshots is worth it. This pattern is often augmented with an index of all snapshots generated previously, to facilitate use cases where snapshots are re-used. The contents of this index will vary widely based on the specifics of the application.
I've used this pattern to cache access to economic and market data in high-volume applications. We take a snapshot of the state of the world's financial markets every minute, and then reuse that snapshot for all traffic within the application. If the snapshot process fails for any reason, we use the old snapshot until we can generate a new one. Even in the ideal situation where each snapshot only lives for one minute, we still observe cache hit rates above 99.9% because we are servicing millions of requests per minute. This specific implementation of the snapshot pattern has a snapshot index with time as the primary key, as our access pattern is time-based.
Pitfall - Write-Heavy Databases
The effectiveness of both patterns described above is proportional to the number of reads received between the write of a new timestamp or snapshot. For databases where the write rate for entities is very frequent -- on the order of seconds -- these patterns may not provide a high enough cache hit rate to justify their cost and overhead. In these cases, if the read rate is high enough, you might still be able to apply the snapshot pattern, but I can imagine situations where snapshots wouldn't be appropriate. Short-window TTL (~1 minute) may also be an option, but that opens a can of worms in terms of layered caching upwards. If you do take that path, please be very deliberate about how you cache upwards. You may need to set similarly short TTLs in the upper layers or pass in a UNIX timestamp rounded to the minute as a parameter to higher level calls to periodically force cache misses.
What did we just do?
In the prior example, we learned two patterns that can help us cache large mostly-read databases and some external systems. We also learned that you could cache in a layered manner by making low-level components immutable (the DB) and stacking upwards (the service that calls the DB). It's important to recognize that this powerful layering only works from the bottom-up. If you try to apply caches from the top-down, you end up with all the stateful behaviors we are trying to avoid.
Let's look at our current set of tools for dealing with various types of state.
Solved:
- Wall Time: Resolve into values early in request flow
- Small Databases: Resolve into values early in request flow, replicate/shard to scale
- Big Mostly-Read Databases: Apply timestamped data or snapshot pattern, resolve URL early in the request flow
Yet to solve:
- Software Versions
Pitfalls:
- External Systems: Apply snapshot pattern if possible, resolve URL early in the request flow
- Write-Heavy Databases: Apply snapshot pattern if possible, resolve URL early in the request flow. Consider short-window TTL caching as a last resort.
The prior examples assumed our software was static and did not account for the deployment of new software versions. Luckily, this is a relatively simple problem to solve.
Cache Key Generation
Throughout this article, we've implied that our cache keys are simply the name of an operation plus its parameters:
key = (function, params)
To be thorough, we need to account for the software versions as well. Plus, we always hash the raw key:
key = hash(function, params, software_version)
When dealing with layered caching, we must include the versions of all software further down the chain as well:
key = hash(function, params, software_version, dependent_version_1, ..., dependent_version_N)
How do you actually do this?
At Bloomberg, our cache keys are generated by a transparent reverse-proxy that is aware of the active versions of all software in the stack, and it also has a map of what services depend on each other. It is then trivial for the reverse-proxy to calculate the appropriate key and interact with the cache. Gradual deployments and A/B testing are supported under this model. You just need to either: - use a stable mechanism for determining which active version of a currently-being-rolled-out service a given user request will hit, OR - make the A/B versioning decisions up front the first time a client request hits the proxy, and then pass along the version selections in a header so downstream proxies can respect it.
Resilient Caching
One of the strengths of this approach is that the deployment of a new service high in the stack, as shown in the diagram above, does not clear all caches in the system. The lower-level caches would still be functioning, reducing the impact of the one cold cache on the total system.
Another benefit of this approach is that old cache entries for the prior service version will remain in the cache for some time, assuming the cache is not undersized. In the case a roll-back is performed, it's likely that some old cache entries will not have been evicted yet by the memcached or Redis LRU, and you can avoid the classic cold-cache-on-service-rollback problem. This behavior is only possible because we use key generation to store cache entries for both service versions side-by-side instead of using active invalidation to clear the cache.
What did we just learn?
While we were able to avoid talking directly about it until the end of this article, key generation is the right time to deal with any cache complexity, because it happens right up front and results in the cache being full of data that is correctly labelled and usable. No external knowledge is needed to determine if the cache is good, because the key contains everything you need.
Our table is now complete!
Solved:
- Wall Time: Resolve into values early in request flow
- Small Databases: Resolve into values early in request flow, replicate/shard to scale
- Big Mostly-Read Databases: Apply timestamped data or snapshot pattern, resolve URL early in request flow
- Software Versions: Include versions of all dependent software in cache key
Pitfalls:
- External Systems: Apply snapshot pattern if possible, resolve URL early in request flow
- Write-Heavy Databases: Apply snapshot pattern if possible, resolve URL early in request flow. Consider short-window TTL caching as a last resort.
The Model System
This simplified diagram represents a potential model for many distributed systems. We want to have as much functionality as possible in the "Immutable Zone," not just the minimal service stack that is shown here. This contrasts with the "State Gathering" portion of the stack, which should ideally be just the service that acts as the client entry point. There will be many more databases than just the user database and the one big central database, but that's fine as long as there is a strategy for fitting them into the larger picture. Building systems in this manner provides many powerful behaviors:
- High Performance via the layered caching of multiple components
- Reliable Performance via the minimization of cold caches due to both layered caching and the storage of cache items side-by-side to facilitate rollbacks
- Simpler Testing paradigms due to the repeatability of software
Building a system that conforms to these ideas is purely an engineering exercise. Engineering is hard. Please keep these high-level ideas in mind the next time you are brainstorming a new system design:
- Immutability reduces total complexity
- Any cache invalidation should be a non-starter
- Interface design drives caching characteristics (among other things)
- Mutable interfaces can be converted into immutable ones internally
- Factor systems into state gathering and immutable layers
- Make low-level components immutable and chain upwards
- Key generation is the right place to account for state
This article is based on Peter's talk "Caching Entire Systems without Invalidation" from SREcon22 Europe/Middle East/Africa.