L3 caches as they're being accessed

Ask Question

Asked 7 months ago

Modified 7 months ago

Viewed 97 times

I am trying to understand in a general sense how L1/L2/L3 caches are updated and how the updates are propagated in a multi-core x86/x86-64 CPU.

Assuming a 4 core CPU and 2 pairs of L1/L2 caches, where pairs of cores share a common L1/L2 pair and there's an interconnect between the 2 L1/L2 pairs. And the cache lines are usually 64-bytes wide. So we have:

Core-0/Core-1 on (L1/L2)-0
Core-2/Core-3 on (L1/L2)-1
(L1/L2)-0 is connected to (L1/L2)-1

Let us say there is a thread T0 running on Core-0 that is writing to a 64-bit integer variable called X, and there's another thread T1 on Core-3 continually reading variable X's value - Please ignore the logical race conditions for a moment.

Question: Assuming X has been cached in Core-0's L1. When T0 writes a new value to X, Is the following sequence of events correct?

X's value is pushed from register to L1-0
X's value is pushed from L1-0 to L2-0
X's value is pushed from L2-0 to L2-1
X's value is pushed from L2-1 to L1-1
X's value is pushed from L1-0 to RAM

Note: Steps 2 and 4 may happen concurrently.

Tried reading through the MESI and MOESI protocols docs i could find on the net but it's really difficult to get a grasp on how the various decisions are made about where to keep stuff in L1 or L2 or L3

edited Apr 18 at 23:49

asked Apr 12 at 23:32

Giles Ramoni

615 bronze badges

3

And the cache lines are 64-bit wide - real-world x86 CPUs use 64-byte cache lines. That would make a more sensible example. But anyway, what you're looking for is cache allocation policy. It depends on the design. With pairs of cores sharing an L1 and L2, that's close to AMD Bulldozer's integer cores, except they have separate L1d per integer core. (realworldtech.com/bulldozer/3). AMD likes to use victim caches for outer levels of cache, so on load misses, only L1d, or L1d+L2, are filled, not L3 until the line's evicted from L2.

Peter Cordes
– Peter Cordes

2025-04-13 02:27:15 +00:00
Commented Apr 13 at 2:27
5

AMD Bulldozer is also unique among modern x86 designs in using a write-through L1d cache (with a small write-combining buffer), so a store from a register ends up in L1d and L2. On most CPUs, L1d is write-back with write-allocate so a dirty line can be hot there and invalid everywhere else. And yes, successive write-back to outer levels of cache can happen incrementally. Related: Intel CPU Cache Policy / Which cache mapping technique is used in intel core i7 processor?

Peter Cordes
– Peter Cordes

2025-04-13 02:29:41 +00:00
Commented Apr 13 at 2:29
@petercordes thanks for those comments, I had a typo wrt cache line size, i've fixed that up. Would you be able to compose your two comments into an answer, as i believe they're informative to the question.

Giles Ramoni
– Giles Ramoni

2025-04-18 23:49:15 +00:00
Commented Apr 18 at 23:49

Add a comment |

0 Your Answer

Sign up or log in

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.

Collectives™ on Stack Overflow

How does the caching system in a cpu decide which memory is to be stored in L1/L2/L3 caches as they're being accessed

0

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

0

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

Your Answer

Sign up or log in

Post as a guest

Linked