ARM: Using cache memory to speed up critical code

Question

Suppose that I have an embedded project (with an ARM Cortex-M if it makes a difference) where parts of the code are critical and need to run fast and in a deterministic time as much as possible.

Would it be possible to sacrifice part of the L1 cache and reserve it for the critical code/data? I could then load the critical code/data and always run/access them at L1 cache speeds.

Some ARM microcontrollers may have tightly coupled memory (TCM) which is essentially what you are asking for. — user3185968
– user3185968, Commented Sep 21, 2017 at 11:38
almost all have sram that usually is faster than the flash, the flash is often at its best half the speed of the sram, at its worst several times slower. simply moving that code to sram will likely give you a boost, then i fyou want to cache it just turn the cache on. deterministic is not necessarily going to happen you have to control alignment and some other things. if not careful/aware adding or removing an instruction can change the overall performance of the critical section — old_timer
– old_timer, Commented Sep 21, 2017 at 12:46
more than just the one clock for that instruction but with the cortex-m the fetch is only one or two instructions at a time not like big brother which is more like 8 or 16 instructions at a time (where alignment penalties are far worse). but with the cache on you can gain some more cache penalties, depends on how the cache works... — old_timer
– old_timer, Commented Sep 21, 2017 at 12:47
start with ram, inspect your chip documentation to see if they have TCM or other solutions, the STM32's for example have a cache in front of the flash you cant turn off that for small benchmarks like tight loops gives the illusion of performance, but for real world bouncing around programs may not, and may show the actual flash performance. — old_timer
– old_timer, Commented Sep 21, 2017 at 12:48

Kostas · Accepted Answer · 2017-09-22 07:30:50Z

1

Ok I think the answer is "technically speaking, no". Memory allocated as cache memory is used by the cache controller to do what it should, and that is caching.

So hopefully the chip vendor has provided ways to run code from the fastest memory available. If the chip has TCM, then loading your critical code there should be fine and run as fast as it would run when cached in L1 cache. If the chip provides flash and RAM, then loading critical code on RAM should also be much faster. In the latter case, the cache controller, if it exists, may be configured to use the same RAM for running cached code anyway.

edited Sep 22, 2017 at 7:30

answered Sep 22, 2017 at 7:08

Kostas

1,3222 gold badges14 silver badges22 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Surt Over a year ago

Even though we don't use TCM, we also use the trick with loading the program into our (very) fast on-board static RAM which is much faster than the ROM.

Tycho · Accepted Answer · 2019-11-23 08:33:07Z

1

Yes, it is possible:

TB3186 "How to Achieve Deterministic Code Performance Using aCortex™-M Cache Controller"

http://ww1.microchip.com/downloads/en/DeviceDoc/How-to-Achieve-Deterministic-Code-Performance-using-CortexM-Cache-Controller-DS90003186A.pdf

... With CMCC, a part of the cache can be used as TCM for deterministic code performance by loading the critical code in a WAY and locking it. When a particular WAY is locked, the CMCC does not use the locked WAY for routine cache transactions. The locked cache WAY with the loaded critical code acts as an always-getting cache hit condition.

answered Nov 23, 2019 at 8:33

Tycho

111 bronze badge

Collectives™ on Stack Overflow

ARM: Using cache memory to speed up critical code

2 Answers 2

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related