Getting Cache Details in Arm Processors - Linux

Disable CPU caches (L1/L2) on ARMv8-A Linux

In general, this is unworkable, for several reasons.

Firstly, clearing the SCTLR.C bit only makes all data accesses non-cacheable. and prevents allocating into any caches. Any data in the caches is still there in the caches, especially dirty lines from anything recently-written; consider what happens when your function returns and the caller tries to restore a stack frame which doesn't even exist in the memory it's now accessing.

Secondly, there are very few uniprocessor ARMv8 systems; assuming you're running SMP Linux, and suddenly disable the caches on just whichever CPU the module loader happened to be scheduled on, then even disregarding the first point things are going to go downhill very fast. Linux expects all CPUs to be coherent with each other, and will typically become very broken very rapidly if that assumption is violated. Note that it's not even worth venturing into SMP cross-calling for this; suffice to say the only safe way to even attempt to run Linux with caches disabled is to make sure they are never enabled to begin with, except...

Thirdly, there is no guarantee Linux will even run with caches disabled. On current hardware, all of the locking and atomic operations in the kernel (not to mention userspace) rely on the exclusive access instructions. Whilst the CPU cluster(s) will implement the architecturally-required local and global exclusive monitors for cacheable memory (usually as part of the cache machinery itself), it is dependent on the system whether a global exclusive monitor for non-cacheable accesses is implemented, as such a thing must be external to the CPU (usually in the interconnect or memory controller). Many systems don't implement such a global monitor, in which case exclusive accesses to external memory may fault, do nothing, or other various implementation-defined behaviours which will result in Linux crashing or deadlocking. It is effectively impossible to run Linux with the cache off on such a system - the amount of hacking just to get a UP arm64 kernel to work (SMP would be literally impossible) would be impractical alone; good luck with userspace.

As it happens, though, the worst problem is none of that, it's this:

...in order to measure performance of optimized code, independent of cache access.

If the code is intended to run in deployment with caches disabled, then logically it can't be intended to run under Linux, therefore the effort spent in hacking up Linux would be better spent on benchmarking in a more realistic execution environment so that results are actually representative. On the other hand, if it is intended to run with caches enabled (under Linux or any other OS), then benchmarking with caches disabled will give meaningless results and be a waste of time. "Optimising" for e.g. an instruction-fetch-bandwidth bottleneck which won't exist in practice is not going to lead you in the right direction.

How to put data in L2 cache with A72 Core?

How to do the same thing but in L2 Cache area ?

Your hardware doesn't support that.

In general, the ARMv8 architecture doesn't make any guarantees about the contents of caches and does not provide any means to explicitly manipulate or query them - it only makes guarantees and provides tools for dealing with coherency.

Specifically, from section D4.4.1 "General behavior of the caches" of the spec:

[...] the architecture cannot guarantee whether:

• A memory location present in the cache remains in the cache.
• A memory location not present in the cache is brought into the cache.

Instead, the following principles apply to the behavior of caches:

• The architecture has a concept of an entry locked down in the cache.
How lockdown is achieved is IMPLEMENTATION DEFINED, and lockdown might
not be supported by:

— A particular implementation.
— Some memory attributes.

• An unlocked entry in a cache might not remain in that cache. The
architecture does not guarantee that an unlocked cache entry remains in
the cache or remains incoherent with the rest of memory. Software must
not assume that an unlocked item that remains in the cache remains dirty.

• A locked entry in a cache is guaranteed to remain in that cache. The
architecture does not guarantee that a locked cache entry remains
incoherent with the rest of memory, that is, it might not remain dirty.

[...]

• Any memory location is not guaranteed to remain incoherent with the rest of memory.

So basically you want cache lockdown. Consulting the manual of your CPU though:

• The Cortex-A72 processor does not support TLB or cache lockdown.

So you can't put something in cache on purpose. Now, you might be able to tell whether something has been cached by trying to observe side effects. The two common side effects of caches are latency and coherency. So you could try and time access times or modify the contents of DRAM and check whether you see that change in your cached mapping... but that's still a terrible idea.

For one, both of these are destructive operations, meaning they will change the property you're measuring, by measuring it. And for another, just because you observe them once does not mean you can rely on that happening.

Bottom line: you cannot guarantee that something is held in any particular cache by the time you use it.



Related Topics



Leave a reply



Submit