1

So, I have a piece of code that is concurrent and it's meant to be run onto each CPU/core.

There are two large vectors with input/output values

var (
    input = make([]float64, rowCount)
    output = make([]float64, rowCount)
)

these are filled and I want to compute the distance (error) between each input-output pair. Being the pairs independent, a possible concurrent version is the following:

var d float64 // Error to be computed
// Setup a worker "for each CPU"
ch := make(chan float64)
nw := runtime.NumCPU()
for w := 0; w < nw; w++ {
    go func(id int) {
         var wd float64
         // eg nw = 4
         // worker0, i = 0, 4, 8, 12...
         // worker1, i = 1, 5, 9, 13...
         // worker2, i = 2, 6, 10, 14...
         // worker3, i = 3, 7, 11, 15...
         for i := id; i < rowCount; i += nw {
             res := compute(input[i])
             wd += distance(res, output[i])
         }
         ch <- wd
    }(w)
}
// Compute total distance
for w := 0; w < nw; w++ {
    d += <-ch
}

The idea is to have a single worker for each CPU/core, and each worker processes a subset of the rows.

The problem I'm having is that this code is no faster than the serial code.

Now, I'm using Go 1.7 so runtime.GOMAXPROCS should be already set to runtime.NumCPU(), but even setting it explicitly does not improves performances.

  • distance is just (a-b)*(a-b);
  • compute is a bit more complex, but should be reentrant and use global data only for reading (and uses math.Pow and math.Sqrt functions);
  • no other goroutine is running.

So, besides accessing the global data (input/output) for reading, there are no locks/mutexes that I am aware of (not using math/rand, for example).

I also compiled with -race and nothing emerged.

My host has 4 virtual cores, but when I run this code I get (using htop) CPU usage to 102%, but I expected something around 380%, as it happened in the past with other go code that used all the cores.

I would like to investigate, but I don't know how the runtime allocates threads and schedule goroutines.

How can I debug this kind of issues? Can pprof help me in this case? What about the runtime package?

Thanks in advance

7
  • 1
    actually you have one hidden mutex behind ch channel Commented Feb 21, 2017 at 20:42
  • Yes, thanks! But the mutex is used only nw times, which is usually a low number when compared to the data to be processed, and the channel is used very late in the whole computation process. I don't know if that is the issue, but even if it was, my question remains: how do I know that it is because of that mutex that my code is not using more CPUs? Commented Feb 21, 2017 at 20:58
  • You are right. I just tried a sample similar to yours and it uses all available cores. I think full code example will be more useful. Commented Feb 21, 2017 at 21:02
  • 1
    Speaking of locks, you can use go test -mutexprofile flag in order to profile lock contentions. Commented Feb 21, 2017 at 21:05
  • 1
    pprof still had a blocking profile which shows time spent blocking on synchronization primitives in general. If you're creating too much garbage, it's possible you're limited by the garbage collector. You can see the GC activity using GODEBUG=gctrace=1, but I suggest you start by reading up on profiling Go programs. Commented Feb 21, 2017 at 22:50

1 Answer 1

1

Sorry, but in the end I got the measurement wrong. @JimB was right, and I had a minor leak, but not so much to justify a slowdown of this magnitude.

My expectations were too high: the function I was making concurrent was called only at the beginning of the program, therefore the performance improvement was just minor.

After applying the pattern to other sections of the program, I got the expected results. My mistake in evaluation which section was the most important.

Anyway, I learned a lot of interesting things meanwhile, so thanks a lot to all the people trying to help!

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.