Hellow.
I’m analyzing a fully-featured L2 cache with the following properties:
- Non-blocking
- Write-allocate
- Write-back
- For simplification: Only full-cacheline stores are allowed (every store allocates an entire cache line).
- No other complexities: No load/store buffers, no intentional reordering, no aliasing (due to physical addressing), no coherency.
Cache Miss Strategies On a load miss, there are two basic strategies:
-
- Allocate-on-miss (simpler but rather inefficient):
- Allocate the cache line immediately.
- When data returns from the lower-level cache:
- Update the line only if it still exists in the cache.
- Satisfy pending request(s)
-
- Allocate-on-update (more complex):
- Do not allocate the line on a miss.
- When data returns from the lower-level cache:
- Allocate the line in the cache.
- Satisfy pending requests.
Question: what is the baseline implementation for allocate-on-update?
Worst-Case Scenario Example: Consider this sequence:
store 10 → mem[0x1000] // Allocates line
... // Line [0x1000] evicted
load a ← mem[0x1000] // Miss (no allocation), update to lower level cache
store 20 → mem[0x1000] // Reallocates line (for simplification stores always fit full line)
... // Line [0x1000] evicted again
// Stale data arrives
// update [0x1000]=10
// so we should not allocate it in the cache
...
load b ← mem[0x1000] // Must avoid using stale data
The challenge is to avoid applying stale updates (e.g., the outdated [0x1000]=10, followed by 20→[0x1000]) even if the line was evicted. So we should somehow track this information (save modify and lookup it fast enough). For an L2 cache, a fully associative "outdated stores buffer" or other fully assosiative buffers seems impractical.
What is the standard solution for this? (I couldn’t find a canonical implementation in literature.)