What is a Cache?
A cache is a temporary storage layer that holds frequently accessed data closer to where it is needed. This reduces the time required to retrieve data from slower sources such as databases, disks, or remote servers, improving overall system performance.
Why do we need Cache?
Caching improves system performance by storing frequently accessed data in a fast-access storage layer. It reduces latency, minimizes database and backend service load, increases throughput, and enhances scalability by serving repeated requests without repeatedly accessing the original data source. Cache is used for the following purposes:
-
Reduced Latency:
Frequently accessed data is served from a high-speed storage layer (cache) instead of slower storage systems such as databases, disks, or external services, resulting in lower response times.
-
Lower Backend Load:
Caching reduces the number of requests hitting databases, APIs, and downstream services, decreasing CPU, memory, and I/O utilization on backend systems.
-
Higher Throughput:
Since fewer resources are spent retrieving the same data repeatedly, the system can process a larger number of requests per second (RPS).
-
Improved Scalability:
By offloading read traffic from the primary data source, caching enables the application to handle increased user traffic without requiring proportional scaling of backend infrastructure.
How Many Types of Cache Are There?
There are many ways to classify caches, but in software engineering and system design, the most common types are:
-
CPU Cache:
CPU cache is a small, extremely fast memory located inside or very close to the CPU. It stores recently accessed data and instructions so the processor does not have to fetch them repeatedly from slower memory (RAM). There are three primary levels of CPU cache:
-
L1 Level CPU Cache:
- Fastest cache level
- Smallest in size
- Located inside each CPU core
- Stores the most frequently accessed instructions and data
Example in Java:
public class L1Tutorial { public static void main(String[] args) { int total = 0; for (int i = 0; i < 1000; i++) { total += i; } System.out.println(total); } }In the above program, the variables
iandtotalare accessed repeatedly. Due to temporal locality, the CPU can keep the associated data in registers and cache lines, reducing the need for repeated RAM accesses.Note: The CPU does not store Java variables directly in L1 cache. The JVM and CPU manage memory automatically using registers, cache lines, and memory hierarchies. Performance improvements come from temporal locality (reusing recently accessed data) and spatial locality (accessing nearby memory locations).
-
L2 Level CPU Cache:
- Larger than L1 cache
- Slightly slower than L1 but still very fast
- Typically dedicated to each CPU core
Example in Java:
public class L2Tutorial { public static void main(String[] args) { int size = 50_000_000; int[] arr = new int[size]; for (int i = 0; i < size; i++) { arr[i] = i; } long total = 0; // Sequential access → good cache utilization for (int i = 0; i < size; i++) { total += arr[i]; } System.out.println(total); } }L2 cache benefits from sequential memory access because the CPU loads memory in cache lines (typically 64 bytes). When adjacent elements are accessed, there is a high probability that they are already available in the cache, reducing accesses to L3 cache or RAM.
Note: Java does not directly control CPU caches (L1, L2, or L3). The CPU automatically manages cache contents using cache lines and memory access patterns. Sequential access improves performance due to spatial locality, while repeated access improves performance due to temporal locality, resulting in fewer cache misses.
-
L3 Level CPU Cache:
- Largest CPU cache level
- Slower than L1 and L2
- Typically shared among multiple CPU cores
Example in Java:
import java.util.concurrent.*; public class L3CacheExample { private static final int SIZE = 50_000_000; private static final int THREADS = 4; public static void main(String[] args) throws Exception { int[] data = new int[SIZE]; for (int i = 0; i < SIZE; i++) { data[i] = 1; } ExecutorService executor = Executors.newFixedThreadPool(THREADS); long start = System.currentTimeMillis(); int chunk = SIZE / THREADS; for (int t = 0; t < THREADS; t++) { int startIdx = t * chunk; int endIdx = (t == THREADS - 1) ? SIZE : startIdx + chunk; executor.submit(() -> { long sum = 0; for (int i = startIdx; i < endIdx; i++) { sum += data[i]; } System.out.println( Thread.currentThread().getName() + " sum = " + sum); }); } executor.shutdown(); executor.awaitTermination(1,TimeUnit.MINUTES); long end = System.currentTimeMillis(); System.out.println("Total time: " + (end - start) + " ms"); } }A large array is stored in RAM, and multiple threads read different portions of it. As elements are accessed, the CPU loads nearby data into cache lines (typically 64 bytes). Data may be served from L1, L2, the shared L3 cache, or RAM, depending on availability. Since the threads only read the data and do not modify it, cache-coherency overhead is minimal.
Note: L3 cache is a large shared CPU cache that helps improve performance by reducing expensive RAM accesses. It is particularly beneficial in multi-threaded applications where multiple CPU cores access related data.
-
L1 Level CPU Cache:
-
Browser Cache:
- Stores frequently used website files such as images, CSS, JavaScript, fonts, and logos on the user's device.
- When the user revisits the same website, the browser first checks its local cache before downloading the files again.
- If the files are already available in the cache and haven't changed, the browser loads them directly from the device.
- This significantly reduces network requests and download time.
- As a result, web pages load faster and consume less internet bandwidth.
- It also reduces the load on the web server because fewer files need to be transferred.
Note: Browser cache stores frequently used website resources on the user's device so they can be loaded locally on subsequent visits, resulting in faster page loads, reduced bandwidth usage, and lower server load.
-
Application Cache:
Application Cache refers to data that is stored directly within the memory space of a running Java application. The cache resides inside the application's process (JVM) and is used to reduce API calls, or expensive computations by serving frequently accessed data from memory.
Different Type of Java application Cache:
- Manual Cache in Java
- HashMap Cache
- Very simple and fast (O(1) average)
- No external libraries needed
- Good for single-threaded use cases
- Not thread-safe
- No eviction (memory keeps growing)
- No expiration support
- Risk of memory leaks in long-running apps
- ConcurrentHashMap Cache
- Thread-safe (good for concurrent apps)
- High performance under multi-threading
- No locking on full map
- Still no eviction policy
- No TTL (time-based expiry)
- Can grow indefinitely
- LRU Cache
- Automatically removes least used items
- Prevents memory overflow
- Simple implementation
- Not highly scalable under heavy concurrency
- No TTL support by default
- Needs customization for production use
- Spring Boot Cache
- Stores data in JVM memory.
- Thread-safe because it uses ConcurrentHashMap.
- No cache expiration (TTL).
- No automatic eviction policy.
- Cache data is lost when the application restarts.
- Suitable for development and small applications.
- It is best for single instance.
Basic in-memory cache using HashMap
ProsMap<String, String> cache = new HashMap<>(); cache.put("user1", "John"); String value = cache.get("user1");Thread-safe version of HashMap
ProsMap<String, String> cache = new ConcurrentHashMap<>(); cache.put("user1", "John"); String value = cache.get("user1");Evicts least recently accessed entry when full
For Example:
Prosclass LRUCache extends LinkedHashMap<Integer, Integer> { private final int capacity; public LRUCache(int capacity) { super(capacity, 0.75f, true); // accessOrder = true this.capacity = capacity; } @Override protected boolean removeEldestEntry(Map.Entry<Integer, Integer> eldest) { return size() > capacity; } }When developers start learning caching in Spring Boot, it's common to assume that Spring Boot provides multiple cache implementations out of the box. However, that's not entirely accurate. If you enable caching without adding any external cache library, Spring Boot automatically configures a Simple Cache Manager. Internally, the simple Cache manager uses ConcurrentHashMap to store cached data in the application's JVM memory.
Characteristics of Simple Cache Manager
For Example:
@EnableCaching @Service public class UserService { @Cacheable("users") public User getUser(Long id) { return userRepository.findById(id); } } -
Database Cache:
Stores frequently executed query results, pages, or records to improve database performance.
Types of Hibernate Cache:- First Level Cache
- Default cache in Hibernate
- Works at Session level
- Cannot be disabled
For Example:
Session session = sessionFactory.openSession(); User u1 = session.get(User.class, 1L); User u2 = session.get(User.class, 1L);Note: Hibernate First-Level Cache stores entities inside a Session so repeated access to the same entity does not trigger multiple database queries.
- Second Level Cache
- Optional cache
- Shared across multiple sessions
- Requires configuration + cache provider
- Query Cache
- Caches query results instead of entities
-
CDN Cache:
CDN (Content Delivery Network) cache stores content at edge locations closer to users.
Commonly Cached Content
- Images
- Videos
- HTML pages
- CSS/JS files
Not Suitable For
- Highly dynamic content
- Real-time transactions
- Database queries
-
Distributed Cache
Distributed cache stores data across multiple servers (e.g., Redis, Memcached). It provides scalability, high availability, and shared access across systems.
Here are the Redis benefits among all types of cache in simple list form:
- Very fast (in-memory, microsecond latency)
- Works as both cache and database
- Supports multiple data structures (strings, lists, sets, hashes, sorted sets, streams)
- Distributed cache (works across multiple servers)
- Highly scalable (supports clustering and replication)
- Data persistence option (can save data to disk if needed)
- Built-in TTL (auto expiry of cached data)
- Atomic operations (safe in concurrent systems)
- Supports pub/sub messaging (real-time communication)
- Useful for real-time applications (chat, gaming, notifications, leaderboards)
- Reduces load on database significantly
- More flexible than simple caches like Memcached
- Better for dynamic data compared to CDN caching
- Can handle high traffic workloads efficiently
Query query = session.createQuery(
"FROM User WHERE status = 'ACTIVE'"
);