C++ Profiling - Called method from coroutine function has a higher hit count than its caller

Question

I am profiling some code using the cppgraphgqlgen library - which uses C++20 coroutines extensively in its internals.

I have profiled an application and found that I have some called-into methods that have a higher hit count than their calling parents

I have searched for clone .actor with reference to profiling and found nothing useful.

It is easy to tell that for classic synchronous code elsewhere - children are always <= their parent costs, in comparison to the coroutine code.

What is clone .actor in this context and why do the "children" cost more than their parents in this case? Is there anyway to tell what this operation actually is doing?

For context on how I gathered my profiling data

Get a profiling dump by running

LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libprofiler.so.0 CPUPROFILE=./prof.out ./my-program

Run that through Google Perftools (gperftools) to make a callgrind style file

/usr/bin/google-pprof --callgrind "$(realpath ./my-program)"  ./prof.out > ./callgrind.out

Open that dump using kcachegrind

Aliaksei Kandratsenka · Accepted Answer · 2025-02-04 13:37:13Z

1

Please (strongly) consider switching to much better and more capable go pprof implementation (github.com/google/pprof). Sadly, distros continue to ship our old perl implementation, but upcoming 2.17 release already had that pprof implementation amputated. So, hopefully, it will encourage distros some more.
.clone thingy is artifact of optimizations and demangling. Sometimes compilers create optimized copies of certain functions (e.g. constant propagating some things). pprof is supposed to remove this detail from function name. (But sometimes you want to see those details and more, such as template arguments; see --symbolize option for that)
if or when you see mixed up parent/child relations, consider checking your stack trace capturing method. Skipping "first parent" stack frame is a known issue with frame-pointers-based stacktrace capturing. See here: https://github.com/gperftools/gperftools/wiki/gperftools'-stacktrace-capturing-methods-and-their-issues#frame-pointers

answered Feb 4 at 13:37

Aliaksei Kandratsenka

6875 silver badges5 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Andrew Lipscomb Feb 23 at 23:04

I ran this and found different results - they are certainly still not clear cut but now I see profiling time far more in where I would expect it to be. So I'll call that a good answer. That said, the newer pprof is far more focused on profiling go code in terms of documentation - I would love to see more docs on how this go version is meant to be used with respect to profiling C/C++. I ended up still using the libprofiler from my distros version

Aliaksei Kandratsenka Feb 25 at 3:43

So lets not conflate things. libprofiler is still there and it is still going to produce actual cpu profile file. To analyze and visualize those profiles you need new pprof tool from github.com/google/pprof (yes, it is written in Go, and previous tool was written in Perl). And yes Go includes a version of that tool, but this is unrelated to the matter here.

Collectives™ on Stack Overflow

C++ Profiling - Called method from coroutine function has a higher hit count than its caller

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related