2

My goal is to profile the execution time of each function in a kernel module. Using the sample scripts I saw online, I came up with the following script to fulfill my need. But occasionally I get negative values for calculated latencies. Although, they happen rarely but I guess that indicates something is wrong with my script. Can anyone help me with that please?

probe module(@1).function(@2).call { 
     begin = gettimeofday_ns()
}

probe module(@1).function(@2).return {
  if (begin>0)
     stats <<< gettimeofday_ns() - begin
}

probe end {
    if (begin == 0) {
        printf("No samples observed so far.\n");

    } else {
        printf("Distribution of %s latencies (in nanoseconds) for %d samples\n", @2, @count(stats))
        printf("max/avg/min: %d/%d/%d\n", @max(stats), @avg(stats), @min(stats))
        print(@hist_log(stats))
    }
}


global begin, stats

1 Answer 1

1

The gettimeofday_*() functions can only approximate wallclock time. It is possible that across CPUs, or across a time adjustment moment, the values won't move monotonically the way you expect. get_cycles() is more monotonic on a given CPU, and a few other clock-related functions are available.

Also, your begin variable is a simple scalar. What if the same function is being called from multiple threads/cpus, or if recursion occurs? It'll get overwritten. This should be enough (and work correctly, from a nesting/concurrency point of view):

// no probe FOO.call
probe module(@1).function(@2).return {
  stats <<< gettimeofday_ns() - @entry(gettimeofday_ns())
}
Sign up to request clarification or add additional context in comments.

2 Comments

Thanks, that worked! I was suspecting that the problem is with "begin" variable being scalar, but since this is my experience with SystemTap, had no clue what to do. Anyways, just out of curiosity: If I wanted to keep the prob FOO.call and use a begin variable as before (basically not using @entry), how could I solve the scalar variable problem?
You could make it an array, and index it with tid(). (Other complications could arise, such as the function being recursive, in which case you need to index also with a nesting level; or the kernel might miss a .call or .return probe for whatever reason, in which case the script must tolerate the bad state.)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.