Cache Clean/Invalidate

Question

In the ARM architecture, cache clean and cache invalidate operations are crucial, especially when dealing with data arriving from external sources such as peripherals (e.g., UART, SPI). Cache clean is necessary when data is written by the CPU to peripherals, ensuring that the cache reflects the most recent data. On the other hand, cache invalidate is required when data is read from peripherals, ensuring that the CPU fetches the latest data from the external source.

During my exploration on the ARM forum, I came across the "SCB_CleanInvalidateDCache_by_Addr" function. It appears that this function performs both cache cleaning and cache invalidation. However, I am curious about the specific scenarios in which this function is most appropriately used. Why might we choose to use this function over separate cache clean and invalidate operations?

I have experimented with both cache clean and invalidate functions, and I have not observed any performance degradation or data loss when using only the invalidate function.

(And the definition of cache clean and invalidate.)

that is not how you do it, the cortex-m family by design have the peripheral space outside the cached address space. but normally in any architecture you use the mmu to declare the perpheral space as non-cached. clearly you do not do any kind constant invalidation of a space because it is a peripheral. — old_timer
– old_timer, Commented Dec 18, 2023 at 16:26
or any other address space that is modified outside your data path, multi core processor, or memory that is used with dma transfers from a peripheral, etc. — old_timer
– old_timer, Commented Dec 18, 2023 at 16:27
IMO there is NO valid case in a well designed product where you would need to invalidate a cache line, except for the boot/init and if you are doing overlays or otherwise loading different code into memory that was there before and then you need to invalidate that space (overlays or an rtos loading new applications at run time). — old_timer
– old_timer, Commented Dec 18, 2023 at 16:29

artless-noise-bye-due2AI · Accepted Answer · 2023-12-19 18:54:13Z

2

... especially when dealing with data arriving from external sources such as peripherals (e.g., UART, SPI).

This is not true, directly. For a UART/SPI you typically read from a data registers for a controller. This memory should NOT be marked as cacheable. So, in this case, there is no worry. The 2nd case is that you have a DMA engine that transfers these data register automatically to RAM. Again, you can handle this by allocating the DMA buffers so that they are not-cacheable.

In a resource constrained environment, you might wish to perform calculation on the DMA buffer. In this case, you can have the DMA controller flag when a buffer transfer is complete and invalidate the buffer so that the CPU will re-fetch this memory instead of using the cached version.

The term cache clean was also referred to as cache flush. This makes memory available to a DMA peripheral. Ie, it moves it from the cache to external memory It would be used when you are creating a transmit buffer that is cached by the CPU. I think at one point it would only flush, leaving the value in the cache. There are use cases for both, but generally you might as well invalidate it so that the cache can be used for other addresses. Then you have committed a buffer for transfer, you should be done with it and not change it in flight. The clean/invalidate should be done just before the DMA transmit is activated.

These items need to be performed as DMA is normally not cache aware.

I have experimented with both cache clean and invalidate functions, and I have not observed any performance degradation or data loss when using only the invalidate function.

It will be difficult to measure performance increase with 'clean+invalidate'. You need other accesses which cause cache hits that are allowed into the invalidated entries. If it is the same number of cycles, then 'clean+invalidate' is often better as you are telling the CPU/cache controller, you are done with this data set.

However, it is often the case that you operate with a buffer hiearchy, where data is transformed. Ie, an array of ADC channel 0-7 and then you take an SPI compatible message and transfer it to store only the ADC channels. In this case, it is better just to have the DMA buffer as uncached. Ie, you are just doing single reads from it.

The most complex use case is frame memory where you perform blitting directly on DMA that goes to the display. Here the memory is read/modify/write and the use of caching can speed things. This is far more common than DMA and cache with peripherals like SPI/UART,I2C,Ethernet, etc. Although many people will prefer to avoid this as well for a variety of reasons; Typically, it is double buffered (update and active screen) to avoid tearing, etc.

Why might we choose to use this function over separate cache clean and invalidate operations?

It is a single cycle on some CPUs. For example, DCCISW and DCCIMVAC. The operation to combine clean/invalidate is simple as they are updating bits in the same hardware structure. The time consuming issue is to flush the data from cache to memory. Note that the instruction may return BEFORE the flush is complete and there are other instructions to wait completely. For instance, DSB may be required to ensure data flushing/clean is completed.

edited Dec 19, 2023 at 18:54

answered Dec 19, 2023 at 18:21

artless-noise-bye-due2AI

22.7k6 gold badges76 silver badges113 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

artless-noise-bye-due2AI Over a year ago

Long answer? It is why people avoid it if at all possible. Often it does not lead to an increase in performance to cache a hardware buffer; only an increase in available memory.

artless-noise-bye-due2AI Over a year ago

The frame buffer is an instance, where you might not want to invalidate, but only flush/clean on every 'vblank' or 'hblank', as you want to keep the data for more 'read/modify/update' operations in the future.

Koray Over a year ago

Thank you for your comment. I understand, but I'm contemplating this: in UART TX, if the TX buffer is in a cacheable region, we need to flush it to write data from the CPU to the memory block. In UART RX, we need to invalidate the buffer to read data from the memory block to the CPU. I can't comprehend when we need to flush and invalidate the buffer. Could you provide an example or a detailed explanation for when to flush and invalidate? Thank you

artless-noise-bye-due2AI Over a year ago

I can't comprehend when we need to flush and invalidate the buffer. Read above the 'frame buffer' comment and read the paragraph after your 2nd quote on benchmarking. When sending a buffer, often you do not need that data any longer. An invalidate frees the cache line for other use. It maybe repopulated with the next transmission. However, if it is NOT, some other part of your system benefits as it fills an empty cache instead of evicting something that may still be used. Except for RMW cases, this is often the correct thing to do.

artless-noise-bye-due2AI Over a year ago

The next transmission is different data, so caching the old buffer is not useful. Sometimes, these buffers might be 'write buffer' only. Ie, only a mini-cache is used to take several 'char/8bit' write and use a full AXI (64/128bit) transaction or larger for DDR. Keeping write buffers in cache (when no one is going to read them) is wasting the cache resources for other uses.

|

Collectives™ on Stack Overflow

Cache Clean/Invalidate

1 Answer 1

6 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

6 Comments

Your Answer

Sign up or log in

Post as a guest

Related