0

I am a 1st year Ph.D. student (Research Assistant). I am trying to increase the transfer rate between cache and DRAM. To do so I am planning to integrate a good compression technique (or some other techniques) to reduce the data size (not the prime task now). However, I cannot find or not aware of any tool which can provide me cache data in the format of cache line or block. I want to analyze the cache data to understand the ratio of zero and one.

I have only used Intel Pintool to collect some traces but the way it works it seems like to me that if I modify the C++ code to include a cache simulator (my own), it will not give the proper results.

So what I am trying to do is collecting Cache block data from Cache maybe during idle time or maybe when some application is running. If anyone had worked on the same context or aware of the tools which can give me the result or some way which will give me the same result, will be appreciated. I am using Linux OS.

Thanks in advance!

1
  • This is the wrong forum for that type of question. That is too broad of a question to this site. You would need to break it down to a more specific question such as, my pintool code looks like this (shows code in text form) but it errors (shows error) how do i fix it? Commented Nov 4, 2024 at 19:23

1 Answer 1

0

Real x86 hardware doesn't let you query which cache lines are hot, or read cache contents+metadata. The only way to read cache is with a load that hits in cache, from the virtual address it's caching. AFAIK, that's not a feature present in other ISAs either. (I assume x86 since you mentioned Intel Pin.)

You probably want to run experiments in a simulator like GEM5, so you can add instrumentation / data collection in the simulator, outside the guest machine being simulated. GEM5 probably has configs that are similar to the eviction / replacement / allocation policies and HW prefetcher behaviour of real modern CPUs.


Re: your compression idea, I guess you'd have the metadata able so signal incompressible data so in the worst case (high-entropy data like randomness, or already compressed or encrypted), one 64-byte cache line can still cache the corresponding 64 bytes of memory. But then yeah, for compressible lines, maybe have two tags per line to allow for 2:1 compression? It's not rare for programs to work with small 32-bit or 64-bit integers so the information content is quite low, and simple fast compression schemes could work well for some cases. (Like RLE, or looking for every 32-bit chunk being representable as i16 or u16, Varint with parallel encode/decode like the pdep/pext hardware, or a common prefix in case of pointers... I'm sure you have your own ideas, I just thought it was interesting after hearing the idea.)

I guess you'd mostly want compression in L2 and L3 caches, since L1d needs to be read and even updated with byte granularity (or at least word RMW), and needs to be low latency and multi-ported.

Sign up to request clarification or add additional context in comments.

2 Comments

Thank you very much for your input, Mr. Peter. I had tried GEM5 for some time but it is a very convoluted simulator. I will look into it again as you have suggested.
@SadmanSakibAkash: I haven't used it; there might be other simulators worth trying, like maybe QEMU patches or plugins. But some kind of cache simulator is probably your only hope; I'm pretty sure even with privileged instructions there's no way to get what you want from real hardware.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.