I need someome to help or at least any tip. I'm trying to read from large files (100mb - 11gb) line by line and then store some data into Map.
var m map[string]string
// expansive func
func stress(s string, mutex sync.Mutex) {
// some very cost operation .... that's why I want to use goroutines
mutex.Lock()
m[s] = s // store result
mutex.Unlock()
}
func main() {
file, err := os.Open("somefile.txt")
if err != nil {
fmt.Println(err)
return
}
defer func() {
if err = file.Close(); err != nil {
fmt.Println(err)
return
}
}()
scanner := bufio.NewScanner(file)
for scanner.Scan() {
go stress(scanner.Text(), mutex)
}
}
Without gouroutines it works fine but slow. As you can see, file is large so within loop there will be a lot of gouroutines. And that fact provides two problems:
- Sometimes mutex doesn't work properly. And programm crashes. (How many goroutines mutex suppose?)
- Everytime some data just lost (But programm doesn't crash)
I suppose I should use WaitGroup, but I cannot understand how it should be. Also I guess there should be some limit for goroutines, maybe some counter. It would be great to run it in 5-20 goroutines.
UPD. Yes, As @user229044 mentioned, I have to pass mutex by pointer. But the problem with limiting goroutines within loop still active.
UPD2. This is how I workaround this problem. I don't exactly understand which way program handle these goroutines and how memory and process time go. Also almost all commentors point on Map structure, but the main problem was to handle runtime of goroutines. How many goroutines spawn if it would be 10billions iterations of Scan() loop, and how goroutines store in RAM?
func stress(s string, mutex *sync.Mutex) {
// a lot of coslty ops
// ...
// ...
mutex.Lock()
m[where] = result // store result
mutex.Unlock()
wg.Done()
}
// main
for scanner.Scan() {
wg.Add(1)
go func(data string) {
stress(data, &mutex)
}(scanner.Text())
}
wg.Wait()