Memory barrier cache coherence

12/29/2023

In other words, in write-back caches we lose the “at all times” qualifier and replace it with a weaker condition: either the cache contents match memory (this is true for all clean cache lines), or they contain values that eventually need to get written back to memory (for dirty cache lines). Write-back invariant: after writing back all dirty cache lines, the contents of all cache lines present in any of the cache levels are identical to the values in memory at the corresponding addresses. The invariant for write-back caches is slightly different. When a dirty cache line is evicted (usually to make space for something else in the cache), it always needs to perform a write-back first. After a write-back, dirty cache lines are “clean” again. Dirty cache lines can trigger a write-back, at which points their contents are written back to memory or the next cache level. Instead, such modifications are applied locally to the cached data, and the corresponding cache lines are flagged “dirty”. The cache doesn’t pass writes on immediately. This preserves the same invariant as before: if a cache line is present in the cache, its contents match memory, always. If we have the corresponding line cached, we update our copy (or maybe even just discard it), but that’s it. Write-through is the easier one: we just pass stores through to the next-level cache (or memory). There’s two basic approaches here: write-through and write-back. Things gets a bit more complicated once we allow stores, i.e. Once the cache line is present in the L1D$, the load instruction can go ahead and perform its memory read.Īnd as long as we’re dealing with read-only access, it’s all really simple, since all cache levels obey what I’ll call theīasic invariant: the contents of all cache lines present in any of the cache levels are identical to the values in memory at the corresponding addresses, at all times. If not, the whole cache line is brought in from memory (or the next-deeper cache level, if present) – yes, the whole cache line the assumption being that memory accesses are localized, so if we’re looking at some byte in memory we’re likely to access its neighbors soon. The L1D$ checks whether it contains the corresponding cache line. When the CPU core sees a memory load instruction, it passes the address to the L1 data cache (or “L1D$”, playing on the “cache” being pronounced the same way as “cash”). present in any of the cache levels) or not. In particular, I’m going to say “cache line” to mean a suitably aligned group of bytes in memory, no matter whether these bytes are currently cached (i.e. Each cache line knows what physical memory address range it corresponds to, and in this article I’m not going to differentiate between the physical cache line and the memory it represents – this is sloppy, but conventional usage, so better get used to it. You get the idea.Ĭaches are organized into “lines”, corresponding to aligned blocks of either 32 (older ARMs, 90s/early 2000s x86s/PowerPCs), 64 (newer ARMs and x86s) or 128 (newer Power ISA machines) bytes of memory. At this point, there’s generally more cache levels involved this means the L1 cache doesn’t talk to memory directly anymore, it talks to a L2 cache – which in turns talks to memory. And about 20 years ago, the L1 caches would indeed talk to memory directly. The CPU core’s load/store (and instruction fetch) units normally can’t even access memory directly – it’s physically impossible the necessary wires don’t exist! Instead, they talk to their L1 caches which are supposed to handle it. In modern CPUs (almost) all memory accesses go through the cache hierarchy there are some exceptions for memory-mapped IO and write-combined memory that bypass at least parts of this process, but both of these are corner cases (in the sense that the vast majority of user-mode code will never see either), so I’ll ignore them in this post. (If you are, feel free to skip this section.) I’m assuming you know the basic concept, but you might not be familiar with some of the details. This is a whirlwhind primer on CPU caches. I started writing a first post but quickly realized that there are a few basics I need to cover first. I’m planning to write a bit about data organization for multi-core scenarios.

0 Comments

Memory barrier cache coherence

Leave a Reply.

Author

Archives

Categories