2

I would like to profile the cache behavior of a kernel module with SystemTap (#cache references, #cache misses, etc). There is an example script online which shows how SystemTap can be used to read the perf events and counters, including cache-related ones: https://sourceware.org/systemtap/examples/profiling/perf.stp

This sample script works by default for a process:

probe perf.hw.cache_references.process("/usr/bin/find").counter("find_insns") {} 

I replaced the process keyword with module and the path to the executable with the name of my kernel module:

probe perf.hw.cache_references.module(MODULE_NAME).counter("find_insns") {} 

I'm pretty sure that my module has the debug info, but running the script I get:

semantic error: while resolving probe point: identifier 'perf' at perf.stp:14:7 source: probe perf.hw.instructions.module(MODULE_NAME).counter("find_insns") {}

Any ideas what might be wrong?

Edit:

Okay, I realized that the perf counters could be bound to processes only not to modules (Explained here: https://sourceware.org/systemtap/man/stapprobes.3stap.html). Therefore I changed it back to:

probe perf.hw.cache_references.process(PATH_TO_BINARY).counter("find_insns") {} 

Now, as the sample script suggests, I have:

probe module(MODULE_NAME).function(FUNC_NAME) {
#save counter values on entrance
...
}

But now running it, I get:

semantic error: perf counter 'find_insns' not defined semantic error: while resolving probe point: identifier 'module' at perf.stp:26:7 source: probe module(MODULE_NAME).function(FUNC_NAME)

Edit2:

So here is my complete script:

#! /usr/bin/env stap

# Usage: stap perf.stp <path-to-binary> <module-name> <function-name>

global cycles_per_insn
global branch_per_insn
global cacheref_per_insn
global insns
global cycles
global branches
global cacherefs
global insn
global cachemisses
global miss_per_insn

probe perf.hw.instructions.process(@1).counter("find_insns") {} 
probe perf.hw.cpu_cycles.process(@1).counter("find_cycles") {} 
probe perf.hw.branch_instructions.process(@1).counter("find_branches") {} 
probe perf.hw.cache_references.process(@1).counter("find_cache_refs") {} 
probe perf.hw.cache_misses.process(@1).counter("find_cache_misses") {}


probe module(@2).function(@3)
{
 insn["find_insns"] = @perf("find_insns")
 insns <<< (insn["find_insns"])
 insn["find_cycles"] = @perf("find_cycles")
 cycles <<< insn["find_cycles"]
 insn["find_branches"] = @perf("find_branches")
 branches <<< insn["find_branches"]
 insn["find_cache_refs"] = @perf("find_cache_refs")
 cacherefs <<< insn["find_cache_refs"]
 insn["find_cache_misses"] = @perf("find_cache_misses")
 cachemisses <<< insn["find_cache_misses"]
}


probe module(@2).function(@3).return 
{
    dividend = (@perf("find_cycles") - insn["find_cycles"])
    divisor =  (@perf("find_insns") - insn["find_insns"])
    q = dividend / divisor
    if (q > 0)
    cycles_per_insn <<< q

    dividend = (@perf("find_branches") - insn["find_branches"])
    q = dividend / divisor
    if (q > 0)
    branch_per_insn <<< q

    dividend = (@perf("find_cycles") - insn["find_cycles"])
    q = dividend / divisor
    if (q > 0)
    cacheref_per_insn <<< q

    dividend = (@perf("find_cache_misses") - insn["find_cache_misses"])
    q = dividend / divisor
    if (q > 0)
        miss_per_insn <<< q
}

probe end
{
 if (@count(cycles_per_insn)) {
   printf ("Cycles per Insn\n\n")
   print (@hist_log(cycles_per_insn))
 }
 if (@count(branch_per_insn)) {
   printf ("\nBranches per Insn\n\n")
   print (@hist_log(branch_per_insn))
 }
 if (@count(cacheref_per_insn)) {
   printf ("Cache Refs per Insn\n\n")
   print (@hist_log(cacheref_per_insn))
 }
 if (@count(miss_per_insn)) {
   printf ("Cache Misses per Insn\n\n")
   print (@hist_log(miss_per_insn))
 }
}

1 Answer 1

1

Systemtap can't read hardware perfctr values for kernel probes, because linux doesn't provide a suitable (e.g., atomic) internal API for safely reading those values from all contexts. The perf...process probes work only because that context is not atomic: the systemtap probe handler can block safely.

I cannot answer your detailed question about the two (?) scripts you last experimented with, because they're not complete.

Sign up to request clarification or add additional context in comments.

2 Comments

I added my complete script. Thanks for your help!
OK, with the fuller script I see the same thing. Indeed, systemtap cannot resolve @perf() constructs to read perf counter values within module probes, for the reasons stated above.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.