I am profiling some code using the cppgraphgqlgen library - which uses C++20 coroutines extensively in its internals.
I have profiled an application and found that I have some called-into methods that have a higher hit count than their calling parents
I have searched for clone .actor with reference to profiling and found nothing useful.
It is easy to tell that for classic synchronous code elsewhere - children are always <= their parent costs, in comparison to the coroutine code.
What is clone .actor in this context and why do the "children" cost more than their parents in this case? Is there anyway to tell what this operation actually is doing?
For context on how I gathered my profiling data
- Get a profiling dump by running
LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libprofiler.so.0 CPUPROFILE=./prof.out ./my-program
- Run that through Google Perftools (gperftools) to make a callgrind style file
/usr/bin/google-pprof --callgrind "$(realpath ./my-program)" ./prof.out > ./callgrind.out
- Open that dump using
kcachegrind
