2

Suppose that I have an embedded project (with an ARM Cortex-M if it makes a difference) where parts of the code are critical and need to run fast and in a deterministic time as much as possible.

Would it be possible to sacrifice part of the L1 cache and reserve it for the critical code/data? I could then load the critical code/data and always run/access them at L1 cache speeds.

5
  • Some ARM microcontrollers may have tightly coupled memory (TCM) which is essentially what you are asking for. Commented Sep 21, 2017 at 11:38
  • So the speed of the TCM is the same as that of L1 cache? Commented Sep 21, 2017 at 11:47
  • almost all have sram that usually is faster than the flash, the flash is often at its best half the speed of the sram, at its worst several times slower. simply moving that code to sram will likely give you a boost, then i fyou want to cache it just turn the cache on. deterministic is not necessarily going to happen you have to control alignment and some other things. if not careful/aware adding or removing an instruction can change the overall performance of the critical section Commented Sep 21, 2017 at 12:46
  • more than just the one clock for that instruction but with the cortex-m the fetch is only one or two instructions at a time not like big brother which is more like 8 or 16 instructions at a time (where alignment penalties are far worse). but with the cache on you can gain some more cache penalties, depends on how the cache works... Commented Sep 21, 2017 at 12:47
  • start with ram, inspect your chip documentation to see if they have TCM or other solutions, the STM32's for example have a cache in front of the flash you cant turn off that for small benchmarks like tight loops gives the illusion of performance, but for real world bouncing around programs may not, and may show the actual flash performance. Commented Sep 21, 2017 at 12:48

2 Answers 2

1

Ok I think the answer is "technically speaking, no". Memory allocated as cache memory is used by the cache controller to do what it should, and that is caching.

So hopefully the chip vendor has provided ways to run code from the fastest memory available. If the chip has TCM, then loading your critical code there should be fine and run as fast as it would run when cached in L1 cache. If the chip provides flash and RAM, then loading critical code on RAM should also be much faster. In the latter case, the cache controller, if it exists, may be configured to use the same RAM for running cached code anyway.

Sign up to request clarification or add additional context in comments.

1 Comment

Even though we don't use TCM, we also use the trick with loading the program into our (very) fast on-board static RAM which is much faster than the ROM.
1

Yes, it is possible:

TB3186 "How to Achieve Deterministic Code Performance Using aCortex™-M Cache Controller"

http://ww1.microchip.com/downloads/en/DeviceDoc/How-to-Achieve-Deterministic-Code-Performance-using-CortexM-Cache-Controller-DS90003186A.pdf

... With CMCC, a part of the cache can be used as TCM for deterministic code performance by loading the critical code in a WAY and locking it. When a particular WAY is locked, the CMCC does not use the locked WAY for routine cache transactions. The locked cache WAY with the loaded critical code acts as an always-getting cache hit condition.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.