Check out the runtime/pprof package.
To print "stack traces of all current goroutines" use:
pprof.Lookup("goroutine").WriteTo(os.Stdout, 1)
To print "stack traces that led to blocking on synchronization primitives" use:
pprof.Lookup("block").WriteTo(os.Stdout, 1)
You can combine these with the functions in the runtime package such as runtime.NumGoroutine to get some basic reporting.
This example deliberately creates many blocked goroutines and waits for them to complete. Every 5 seconds it prints the output of the block pprof profile, as well as the number of goroutines still in existence:
package main
import (
"fmt"
"math/rand"
"os"
"runtime"
"runtime/pprof"
"strconv"
"sync"
"time"
)
var (
wg sync.WaitGroup
m sync.Mutex
)
func randWait() {
defer wg.Done()
m.Lock()
defer m.Unlock()
interval, err := time.ParseDuration(strconv.Itoa(rand.Intn(499)+1) + "ms")
if err != nil {
fmt.Errorf("%s\n", err)
}
time.Sleep(interval)
return
}
func blockStats() {
for {
pprof.Lookup("block").WriteTo(os.Stdout, 1)
fmt.Println("# Goroutines:", runtime.NumGoroutine())
time.Sleep(5 * time.Second)
}
}
func main() {
rand.Seed(time.Now().Unix())
runtime.SetBlockProfileRate(1)
fmt.Println("Running...")
for i := 0; i < 100; i++ {
wg.Add(1)
go randWait()
}
go blockStats()
wg.Wait()
fmt.Println("Finished.")
}
I'm not sure if that's what you're after, but you may be able to modify it to suit your needs.
Playground
runtime.GOMAXPROCS(runtime.NumCPU()), needn't create much additional context-switch overhead. We might be able to help more with additional info on the workload--are your goroutines mostly spinning the CPU, or waiting on a DB, or channel ops, or...?