I have pretty basic benchmark comparing performance of mutex vs atomic:
const (
numCalls = 1000
)
var (
wg sync.WaitGroup
)
func BenchmarkCounter(b *testing.B) {
var counterLock sync.Mutex
var counter int
var atomicCounter atomic.Int64
b.Run("mutex", func(b *testing.B) {
wg.Add(b.N)
for i := 0; i < b.N; i++ {
go func(wg *sync.WaitGroup) {
for i := 0; i < numCalls; i++ {
counterLock.Lock()
counter++
counterLock.Unlock()
}
wg.Done()
}(&wg)
}
wg.Wait()
})
b.Run("atomic", func(b *testing.B) {
wg.Add(b.N)
for i := 0; i < b.N; i++ {
go func(wg *sync.WaitGroup) {
for i := 0; i < numCalls; i++ {
atomicCounter.Add(1)
}
wg.Done()
}(&wg)
}
wg.Wait()
})
}
Typical output of go test -bench. -benchmem looks as follows:
BenchmarkCounter/mutex-8 7680 188508 ns/op 618 B/op 3 allocs/op
BenchmarkCounter/atomic-8 38649 31006 ns/op 40 B/op 2 allocs/op
Running escape analysis with go test -gcflags '-m' show that one allocation in each benchmark iteration (op) belongs with running goroutine:
./counter_test.go:57:17: func literal escapes to heap
./counter_test.go:60:7: func literal escapes to heap
./counter_test.go:72:18: func literal escapes to heap
./counter_test.go:75:7: func literal escapes to heap
(lines 57 and 72 are b.Run() calls, and lines 60 and 75 are go func() calls, so exactly 1 call within each of b.N iteration)
The same analysis shows that variables declared at the beginning of the benchmark function are also moved to heap:
./counter_test.go:21:6: moved to heap: counterLock
./counter_test.go:22:6: moved to heap: counter
./counter_test.go:23:6: moved to heap: atomicCounter
I'm just fine with that. What really bothers me is that I expect alloc/op measure memory allocations per iteration (b.N iterations in total). So, for example, one allocation of, say, counterLock divided by b.N iterations (7.680 in the benchmark output above) should add 1/7.680 = 0 (rounding division result to closest integer). Same should apply to counter and atomicCounter.
However, this is not the case, and I get 3 allocations instead of just 1 for "mutex" benchmark (1 goroutine + counterLock + counter) and 2 for "atomic" (1 goroutine + atomicCounter). It seems thus that benchmarking logic considers function scope variables (counterLock, counter, atomicCounter) being allocated anew during each of b.N iterations, not just once at the beginning of BenchmarkCounter(). Is this logic correct? Am I missing something here?
EDIT. Investigating memprofile with pprof shows allocations for go func() only:

-memprofileand usingpprofshow that onlygo func()cause allocations (I updated my question withpprofoutput) Nothing about function scope variables. Nothing suggests where these3 alloc/op,2 alloc/opmay come from