1

I am profiling some code using the cppgraphgqlgen library - which uses C++20 coroutines extensively in its internals.

I have profiled an application and found that I have some called-into methods that have a higher hit count than their calling parents

Coroutine callgraph

I have searched for clone .actor with reference to profiling and found nothing useful.

It is easy to tell that for classic synchronous code elsewhere - children are always <= their parent costs, in comparison to the coroutine code.

What is clone .actor in this context and why do the "children" cost more than their parents in this case? Is there anyway to tell what this operation actually is doing?

For context on how I gathered my profiling data

  1. Get a profiling dump by running
LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libprofiler.so.0 CPUPROFILE=./prof.out ./my-program
  1. Run that through Google Perftools (gperftools) to make a callgrind style file
/usr/bin/google-pprof --callgrind "$(realpath ./my-program)"  ./prof.out > ./callgrind.out
  1. Open that dump using kcachegrind

1 Answer 1

1
  1. Please (strongly) consider switching to much better and more capable go pprof implementation (github.com/google/pprof). Sadly, distros continue to ship our old perl implementation, but upcoming 2.17 release already had that pprof implementation amputated. So, hopefully, it will encourage distros some more.

  2. .clone thingy is artifact of optimizations and demangling. Sometimes compilers create optimized copies of certain functions (e.g. constant propagating some things). pprof is supposed to remove this detail from function name. (But sometimes you want to see those details and more, such as template arguments; see --symbolize option for that)

  3. if or when you see mixed up parent/child relations, consider checking your stack trace capturing method. Skipping "first parent" stack frame is a known issue with frame-pointers-based stacktrace capturing. See here: https://github.com/gperftools/gperftools/wiki/gperftools'-stacktrace-capturing-methods-and-their-issues#frame-pointers

Sign up to request clarification or add additional context in comments.

2 Comments

I ran this and found different results - they are certainly still not clear cut but now I see profiling time far more in where I would expect it to be. So I'll call that a good answer. That said, the newer pprof is far more focused on profiling go code in terms of documentation - I would love to see more docs on how this go version is meant to be used with respect to profiling C/C++. I ended up still using the libprofiler from my distros version
So lets not conflate things. libprofiler is still there and it is still going to produce actual cpu profile file. To analyze and visualize those profiles you need new pprof tool from github.com/google/pprof (yes, it is written in Go, and previous tool was written in Perl). And yes Go includes a version of that tool, but this is unrelated to the matter here.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.