0

Must the cores of a multi-core CPU all share L3 caches? is it possible that a cpu has several L3 level caches? For example, suppose a cpu has 24 cores, and no three cores share a L3 cache, so there are 8 L3 caches.

1 Answer 1

0

AMD Zen family does this with each "core complex" (CCX) of 4 or 8 cores sharing an L3, but no whole-chip shared cache outside that. AMD's Infinity Fabric connects the CCXs to each other and to memory controllers and I/O, with many-core CPUs build out of multiple modules of CCXs + memory controllers + I/O.

This is a lot like traditional multi-socket systems where each socket had a chip with one shared L3 for all its cores, and links to other sockets with snoop filters to keep bandwidth down to manageable levels (and keep latency fast within one socket / CCX). There are NUMA-style inter-core latency differences for pairs of cores within the same CCX vs. in different CCXs.

The low-end models only have one CCX, which is up to 4 cores in Zen 1 & 2,
or up to 8 cores in Zen 3 and 4. The amount of L3 cache per CCX can vary by model with one generation.

For more details see:


Intel has also done this in a much worse way, for Core 2 Quad by basically sticking two Core 2 Duo dies in one package, with the interconnect between them being the FSB (frontside bus) which was about as slow as going to DRAM. (Last-level cache in those days was L2, so it was two separate L2 caches.) See the "Final Words (Dunnington)" section in chips&cheese's historical look back at Dunnington for some description of how things worked in Core 2 Quads that didn't have its uncore / shared L3, literally just having the other chip snoop the shared FSB and respond instead of DRAM if it had a copy of the line.

Some modern chips have groups of 2 to 4 cores sharing a medium-sized L2, but with multiple groups on the same processor all backed by a large shared L3. For example Intel's E-cores in Alder Lake do this.

AMD's Bulldozer-family did even tighter coupling of a pair of weak integer cores sharing a front-end and L1i cache, and the SIMD/FP unit (calling it CMT as an alternative to SMT.) But separate per-core write-through L1d caches with a shared L2. https://www.realworldtech.com/bulldozer/2/. There was a single L3 shared across the whole chip, though. Bulldozer was overall not very high performance for a lot of reasons.

ARM Cortex-A510 can be clustered in a similar way, sharing an FPU, L2 cache, and L2 TLB. (chipsandcheese discusses the tradeoffs for that in-order efficiency core). But again, there's normally a shared L3 as a backstop outside this.

Apple A14 has 8MiB of L2 cache shared between the two Firestorm big-cores. But there's also a slower L3 shared last-level cache for them + the Ice Storm E-cores and the GPU etc.

Sign up to request clarification or add additional context in comments.

7 Comments

thanks ! It seems to be a trend for single-CPU cores to compose NUMA architectures. Can this cpu be applied to the traditional multilevel NUMA architecture to form a more hierarchical NUMA architecture? Similarly, can big.LITTLE cpus(e.g. Apple M1,intel core i5) form a multilevel NUMA architecture?
What I mean is that the cores inside the same cpu makes up the NUMA architecture, and then multiple such cpus make up the traditional NUMA architecture. is that possible?
@拉克克: Yes, there are multi-socket EPYC systems with multiple Zen chips communicating between packages with the same infinity fabric.
Re: Intel, they could in theory sell multi-socket Xeons with a mix of E and P cores, but in practice don't. Maybe one of the big reasons is that they don't want to disable AVX-512 (including on the P cores because there's no software ecosystem ready to run on heterogeneous ISA extensions), and that a lot of the server market is cloud or datacenter systems where one machine runs all the same workload, not a mix of heavy vs. light weight tasks. They might sell all-E-core Xeons, though, for use in different systems from all-P-core. Not sure.
@拉克克: I'm not aware of Apple aiming at the multi-socket server market, or the server market at all currently. Their Xserve line ended in 2011. Multi-socket-capable AArch64 CPUs are rare, although Ampere exists. (And BTW, AMD has a marketing piece on their web site explaining that single-socket systems are fully viable these days for a lot of tasks, since they make such big EPYC CPUs. amd.com/en/solutions/data-center/insights/… . But multi-socket EPYC is supported, at least up to 2 sockets.)
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.