Hello? This is Jayon.
Today, I'd like to explain the criteria for setting up a cache. Since this post is written based on my personal experience in the field, please consider it as a reference. ㅎㅎ
What is Cache?
Cache refers to storing the results of future requests in advance and providing them quickly. In other words, it's a technique where results are stored in advance, and when a request comes later, instead of referring to the DB or API for that request, the cache is accessed to process the request. The background for the emergence of such a cache is Pareto's Law.
Pareto's Law states that 80% of results are caused by 20% of causes. I recommend that you refer to the image below!
In essence, caching doesn't require caching all results. By only caching the frequently used 20% when providing services, overall efficiency can be improved.
Which data should be cached?
According to Pareto's Law, not just any data should be cached; only the necessary data should be. So, what kind of data should be cached?
Data that needs to be read frequently but rarely written
Theoretically, it's often said that "data that needs to be read frequently but rarely written should be cached." However, the criteria for "frequently read" and "rarely written" were quite ambiguous.
So, I investigate the data to be cached in the following steps.
- Check the top 5 RDB query call history through an APM like DataDog.
- Among them, find the select queries and check which table the data is retrieved from.
- Check how often the update query for that table is called.
Through this process, we look at whether there are many select queries but few update queries. The table I checked in the field had 1.74 million select queries per day, but the update queries were at most 500. This is clearly suitable for caching, wouldn't you agree? ㅎㅎ
Data sensitive to updates
Data sensitive to updates means that the inconsistency between the RDB and the cache should be short. For example, information related to payments is very sensitive to updates, so even if it meets the above caching conditions, its application needs to be considered.
I had to cache a payment-related table that met the two characteristics above. Therefore, I did not apply caching to all logic using the payment-related table. Instead, I decided to partially cache it in relatively safe logic where payments do not actually occur.
Local Caching vs. Global Caching
Now, we have somewhat determined the data to be cached and the scope of caching. Then, we need to consider "where" to store the cached data. Generally, it can be stored in local memory or on a separate server like Redis.
Local Caching
Local caching is a method of storing cached data in the memory of the application server. Guava cache or Caffeine cache are commonly used.
Advantages
- Since the cache is retrieved from the memory within the same server while executing the application logic, it is fast.
- It's easy to implement.
Disadvantages
- If there are multiple instances, several problems arise.
Global Caching
Global caching is a method of using a separate server to store cached data, such as Redis.
Advantages
- Since instances share the cache, even if one instance modifies the cache, all instances can obtain the same cache value.
- When a new instance starts, it can simply refer to the existing cache repository, eliminating the need to repopulate the cache.
Disadvantages
- It requires network traffic, making it slower than local caching.
- A separate cache server is required, leading to infrastructure management costs.
Which one did I choose?
Currently, the company's application server uses a structure with multiple instances, but I chose local caching.
There are three main reasons.
- The caching data stored in the RDB is slightly less than 40,000, and even if all of it is loaded into memory, it's less than 4MB.
- The retrieval performance of payment-related data needed to be improved.
- Although Redis is already in place, storing new cache data in Redis would incur infrastructure costs.
How to update the cache?
If there are multiple application servers and local caching is applied, the cached values stored in each application server may differ. For example, the cached data stored on server A might be "1," but the cached data on server B might be "2" after a change on server B. In this situation, if a user sends a request to the load balancer, they will receive different values from servers A and B.
Therefore, the cache needs to be automatically removed from each instance so that it retrieves data from the RDB. TTL is mainly used for this purpose.
What should the TTL be set to?
TTL stands for Time To Live, which is a setting that deletes the cache after a certain amount of time. For example, if the TTL is set to 5 seconds, the cached data will be automatically deleted after 5 seconds. After that, if a cache miss occurs, the data will be retrieved from the RDB and stored.
So, what should the TTL be set to?
Read/write occurs on a single cache server
If read/write occurs on a single global caching server like Redis, or on a single application server with local caching applied, the TTL value can be increased to at least the hour unit. Anyway, when writing, the existing cache will be updated, and the server retrieving data from that cache will always see the latest data.
In this case, instead of setting the TTL, the cache server can be configured to automatically clear the cache gradually using the LRU algorithm when it becomes full.
Read/write occurs on multiple cache servers
If read/write occurs on multiple global caching servers or on multiple application servers with local caching applied, it is better to set the TTL to seconds or minutes. This is because there is a possibility of reading outdated data from a cache server that has not yet reflected the modified data.
In this case, the TTL is determined in various contexts. The more important the update is and the higher the probability of value changes, the shorter the TTL should be. Conversely, if the update is less important and the probability of value changes is low, the TTL can be set slightly longer.
How did I set the TTL?
The data I'm caching is payment-related, and even if caching is not applied to the strict logic where payments actually occur, updates are still important due to the nature of payments. However, the possibility of updates is low, so I set the TTL to 5 seconds as a safety measure.
Conclusion
In summary, the caching method I chose is as follows:
- Payment-related data
- Queries are very frequent, but modifications are rare.
- Caching is applied only to logic where queries occur, but actual payments do not occur.
- Local caching is applied, and the TTL is set to 5 seconds.
The next step is to conduct performance tests specifically for the applied caching method. I'm still figuring out the details of how to conduct the performance test, so I'll write about it in a future post!
Comments0