Golang Syncing Goroutines

Question

I am working on an SSH client application for configuring network devices concurrently, and I am running into issues implementing the concurrency. My program takes in a slice of hosts and a slice of config commands to send to each host. I am using a sync.WaitGroup to wait for all the hosts to finish being configured. This works fine for small batches of hosts, but soon the functions within my configuration goroutines start randomly failing. If I rerun the program on the failed hosts, some will succeed and again some will fail. I have to repeat this process until only the hosts with actual errors remain. It always fails either on the authentication saying authentication failed: auth methods tried [none password]... or the values from sysDescr don't get added to some Devices fields. It's as if when there are many hosts and goroutines running, they start returning early or something. I'm really not sure what is going on.

Here is a sample of my code:

package main

import (
    "fmt"
    "net"
    "os"
    "sync"
    "time"

    "golang.org/x/crypto/ssh"
)

func main() {
    // When there are many hosts, many calls to Dial and
    // sysDescr fail. If I rerun the program on the unsuccessful
    // hosts, nothing fails and the expected output is produced.
    var hosts []string
    cfg := &ssh.ClientConfig{
        User:            "user",
        Auth:            []ssh.AuthMethod{ssh.Password("pass")},
        HostKeyCallback: ssh.InsecureIgnoreHostKey(),
        Timeout:         10 * time.Second,
    }
    results := make(chan *result, len(hosts))

    var wg sync.WaitGroup
    wg.Add(len(hosts))
    for _, host := range hosts {
        go connect(host, cfg, results, &wg)
    }
    wg.Wait()
    close(results)

    for res := range results {
        if res.err != nil {
            fmt.Fprintln(os.Stderr, res.Err)
            continue
        }
        d := res.device
        fmt.Println(d.addr, d.hostname, d.vendor, d.os, d.model, d.version)
        d.Client.Close()
    }
}

// Device represents a network device.
type Device struct {
    *ssh.Client
    addr     string
    hostname string
    vendor   string
    os       string
    model    string
    version  string
}

// Dial establishes an ssh client connection to a remote host.
func Dial(host, port string, cfg *ssh.ClientConfig) (*Device, error) {
    // get host info in background, may take a second
    info := make(chan map[string]string)
    go func(c *Client) {
        info <- sysDescr(host)
        close(info)
    }(c)

    // establish ssh client connection to host
    client, err := ssh.Dial("tcp", net.JoinHostPort(host, addr), cfg)
    if err != nil {
        return nil, err
    }

    m := <-info
    d := &Device{
        Client: client,
        addr: m["addr"],
        hostname: m["hostname"],
        vendor: m["vendor"],
        os: m["os"],
        model: m["model"],
        version: m["version"],
    }
    return d, nil
}

// sysDescr attempts to gather information about a remote host.
func sysDescr(host string) map[string]string {
    // get host info
}

// result is the result of connecting to a device.
type result struct {
    device *Device
    err    error
}

// connect establishes an ssh client connection to a host.
func connect(host string, cfg *ssh.ClientConfig, results chan<- *result, wg *sync.WaitGroup) {
    defer wg.Done()
    device, err := Dial(host, "22", cfg)
    results <- &result{device, err}
}

Am I doing something wrong? Can I limit the number of goroutines being spawned instead of spawning a goroutine for each host?

Where does your code fail? And what does sysDescr function do? And cfg is accessed by multiple goroutines, I wonder adding a mutex to protect it helps. Just an idea. — NeoWang
– NeoWang, Commented Jul 29, 2018 at 16:01
@NeoWang it fails on ssh.Dial within my Dial function with the error referencing an authentication error as the reason for failure. So, I think you’re right about the cfg access. Is there another way than using mutexes? Maybe copying the cfg for each call to connect? — mwalto7
– mwalto7, Commented Jul 29, 2018 at 16:05
@NeoWang also, sysDescr uses SNMP to poll a network device’s sysDescr.0 OID to gather information about the device. — mwalto7
– mwalto7, Commented Jul 29, 2018 at 16:08
Reusing the config for multiple concurrent connections is almost certainly the issue. — Adrian
– Adrian, Commented Jul 30, 2018 at 13:24

dm03514 · Accepted Answer · 2018-07-29 00:20:24Z

1

Yes! To answer your second question, there are many patterns in go to limit concurrency. Two of the largest are:

I personally prefer the worker pool variant because IMO it better helps to keep the business logic separate from the scheduling, but both are completely valid and are present out in the wild in many many projects.

answered Jul 29, 2018 at 0:20

dm03514

56.2k18 gold badges117 silver badges147 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

mwalto7 Over a year ago

Thanks for the examples! I will look into them further.

mwalto7 Over a year ago

For the worker pool method, is there an easy way to find the optimal amount of workers and jobs to create? Also, in the worker pool example you gave, can you explain why they initialize the jobs and results channels with capacity 100 if they only create 3 workers and 5 jobs? Wouldn’t both of the channels only need capacity 5, or capacity len(jobs)?

Collectives™ on Stack Overflow

Golang Syncing Goroutines

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related