Synchronize write to file from heavy operations in different threads - multithreading

I need to elaborate a file (potentially a big file) one block at a time and write the result to a new file.
To put it simply, I have the basic function to elaborate a block:
func elaborateBlock(block []byte) []byte { ... }
Every block needs to be elaborated and then written to the output file sequentially (preserving original order).
The one-thread implementation is trivial:
for {
buffer := make([]byte, BlockSize)
_, err := inputFile.Read(buffer)
if err == io.EOF {
break
}
processedData := elaborateBlock(buffer)
outputFile.Write(processedData)
}
But the elaboration can be heavy and every block can be processed separately, so a multi-threaded implementation is the natural evolution.
The solution I came up with is to create an array of channels, compute every block in a different thread and sync the final write by looping the channel array:
Utility function:
func blockThread(channel chan []byte, block []byte) {
channel <- elaborateBlock(block)
}
In the main program:
chans = []chan []byte {}
for {
buffer := make([]byte, BlockSize)
_, err := inputFile.Read(buffer)
if err == io.EOF {
break
}
channel := make(chan []byte)
chans = append(chans, channel)
go blockThread(channel, buffer)
}
for i := range chans {
data := <- chans[i]
outputFile.Write(data)
}
This approach works but can be problematic with large files because it requires to load the whole file in memory before starting writing the output.
Do you think there can be a better solution, with also better performance overall?

If blocks do need to be written out in order
If you want to work on multiple blocks concurrently, obviously you need to hold multiple blocks in memory at the same time.
You may decide how many blocks you want to process concurrently, and it's enough to read as many into memory at the same time. E.g. you may say you want to process 5 blocks concurrently. This will limit memory usage, and still utilize your CPU resources potentially to the max. Recommended to pick a number based on your available CPU cores (if processing a block does not already use multi cores). This can be queried using runtime.GOMAXPROCS(0).
You should have a single goroutine that reads the input file sequentially, and prodocue the blocks wrapped in Jobs (which also contain the block index).
You should have multiple worker goroutines, preferable as many as cores you have (but experiment with smaller and higher values too). Each worker goroutine just receives jobs, and calls elaborateBlock() on the data, and delivers it on the results channel.
There should be a single, designated consumer which receives completed jobs, and writes them in order to the output file. Since goroutines run concurrently and we have no control in which order the blocks are completed, the consumer should keep track of the index of the next block to be written to the output. Blocks arriving out of order should only be stored, and only proceed with writing if the subsequent block arrives.
This is an (incomplete) example how to do all these:
const BlockSize = 1 << 20 // 1 MB
func elaborateBlock(in []byte) []byte { return in }
type Job struct {
Index int
Block []byte
}
func producer(jobsCh chan<- *Job) {
// Init input file:
var inputFile *os.File
for index := 0; ; index++ {
job := &Job{
Index: index,
Block: make([]byte, BlockSize),
}
_, err := inputFile.Read(job.Block)
if err != nil {
break
}
jobsCh <- job
}
}
func worker(jobsCh <-chan *Job, resultCh chan<- *Job) {
for job := range jobsCh {
job.Block = elaborateBlock(job.Block)
resultCh <- job
}
}
func consumer(resultCh <-chan *Job) {
// Init output file:
var outputFile *os.File
nextIdx := 0
jobMap := map[int]*Job{}
for job := range resultCh {
jobMap[job.Index] = job
// Write out all blocks we have in contiguous index range:
for {
j := jobMap[nextIdx]
if j == nil {
break
}
if _, err := outputFile.Write(j.Block); err != nil {
// handle error, maybe terminate?
}
delete(nextIdx) // This job is written out
nextIdx++
}
}
}
func main() {
jobsCh := make(chan *Job)
resultCh := make(chan *Job)
for i := 0; i < 5; i++ {
go worker(jobsCh, resultCh)
}
wg := sync.WaitGroup{}
wg.Add(1)
go func() {
defer wg.Done()
consumer(resultCh)
}()
// Start producing jobs:
producer(jobsCh)
// No more jobs:
close(jobsCh)
// Wait for consumer to complete:
wg.Wait()
}
One thing to note here: this alone won't guarantee limiting the used memory. Imagine a case where the first block would require an enormous time to calculate, while subsequent blocks do not. What would happen? The first block would occupy a worker, and the other workers would "quickly" complete the subsequent blocks. The consumer would store all in memory, waiting for the first block to complete (as that has to be written out first). This could increase memory usage.
How could we avoid this?
By introducing a job pool. New jobs could not be created arbitrarily, but taken from a pool. If the pool is empty, the producer has to wait. So when the producer needs a new Job, takes one from a pool. When the consumer has written out a Job, puts it back into the pool. Simple as that. This would also reduce pressure on the garbage collector, as jobs (and large []byte buffers) are not created and thrown away, they could be re-used.
For a simple Job pool implementation you could use a buffered channel. For details, see How to implement Memory Pooling in Golang.
If blocks can be written in any order
Another option could be to allocate the output file in advance. If the size of the output blocks are also deterministic, you can do so (e.g. outsize := (insize / blocksize) * outblockSize).
To what end?
If you have the output file pre-allocated, the consumer does not need to wait input blocks in order. Once an input block is calculated, you can calculate the position where it will go in the output, seek to that position and just write it. For this you may use File.Seek().
This solution still requires to send the block index from the producer to the consumer, but the consumer won't need to store blocks arriving out-of-order, so the consumer can be simpler, and does not need to store completed blocks until the subsequent one arrives in order to proceed with writing the output file.
Note that this solution naturally does not pose a memory threat, as completed jobs are never accumulated / cached, they are written out in the order of completion.
See related questions for more details and techniques:
Is this an idiomatic worker thread pool in Go?
How to collect values from N goroutines executed in a specific order?

here is a working example that should work and is as close as possible to your original code.
the idea is to turn your array into a channel of channels of bytes. then
first fire up a consumer that will read on this channel of channels , get the channel of bytes, read from it and write the result.
Back on the main thread you create a channel of bytes, write it to the channel of channels (now the consumer reading sequentially from them will read the results in order) and then fire up the process that will do the work and write on the allocated channel (producers).
what will happen now is that the there will be a "race" between the procuders and the consumer, as soon as a produced block is read from the consumer and written the resources associated with it will be deallocated. this could be an improvement to your original design.
here is the code and the playground link:
package main
import (
"bytes"
"fmt"
"io"
"sync"
)
func elaborateBlock(b []byte) []byte {
return []byte("werkwerkwerk")
}
func blockThread(channel chan []byte, block []byte, wg *sync.WaitGroup) {
channel <- elaborateBlock(block)
wg.Done()
}
func main() {
chans := make(chan chan []byte)
BlockSize := 3
inputBytes := bytes.NewBuffer([]byte("transmutemetowerkwerkwerk"))
producewg := sync.WaitGroup{}
consumewg := sync.WaitGroup{}
consumewg.Add(1)
go func() {
chancount := 0
for ch := range chans {
data := <-ch
fmt.Printf("got %d block, result:%s\n", chancount, data)
chancount++
}
fmt.Printf("done receiving\n")
consumewg.Done()
}()
for {
buffer := make([]byte, BlockSize)
_, err := inputBytes.Read(buffer)
if err == io.EOF {
go func() {
//wait for all the procuders to finish
producewg.Wait()
//then close the main channel to notify the consumer
close(chans)
}()
break
}
channel := make(chan []byte)
chans <- channel //give the channel that we return the result to the receiver
producewg.Add(1)
go blockThread(channel, buffer, &producewg)
}
consumewg.Wait()
fmt.Printf("main exiting")
}
playground link
as a minor point i don't feel right about the "read the whole file into memory" statement cause you are just reading a block every time from the Reader, maybe "holding the result of the whole computation in memory" is more appropriate?

Related

golang multithreaded web crawler runs into deadlock

I just started to learn multithreaded programming using golang, and I'm trying to write a multithreaded web crawler using BFS traversal, however I cannot get the code working. The error I get is fatal error: all goroutines are asleep - deadlock!
I will paste the code below, but let me explain conceptually how it works:
I have one master thread (the main function itself) and N worker threads. I intentionally chose to use BFS approach with a fixed amount of worker threads, because it seems using a DFS approach I will have to spawn a new thread for each single new URL to crawl, which might become a huge burden for context switch.
I am using two channels:
urlsToCrawl: master thread sends URLs to crawl to worker threads.
urlsDiscovered: worker threads send discovered URLs back to master.
Here is the code implementation, I removed some non relevant details (e.g. how to parse html page etc..)
The trick I'm trying to do here is: I am using the channel as a queue to do BFS, and when the queue's size is 0, it is impossible to know whether it is because "A. there is really no more URLs to crawl" OR because "B. some worker thread(s) are still working so there might be more URLs to crawl soon". Therefore I introduced this count variable, basically whenever a new url is sent to workers to be crawled, count is incremented, therefore when count == 0 and channel is empty, it would mean "A. there is really no more URLs to crawl"; otherwise when count > 0 and channel is empty, it would mean "B. some worker thread(s) are still working so there might be more URLs to crawl soon".
However as I mentioned, this doesn't seem to work and I run into deadlock. Would anyone please shed some light? Thanks!
package main
import (
"fmt"
)
var (
count = 0 // This tracks how many worker threads are actively working right now
)
func crawlUrl(urlsToCrawl chan string, urlsDiscovered chan Pair) {
for url := range urlsToCrawl {
urls := getUrls(url) // This returns an array of string, if no URL found, it returns an empty array
urlsDiscovered <- urls
}
}
func main() {
urlsToCrawl := make(chan string)
urlsDiscovered := make(chan string[])
i := 0
for i < 8 {
go crawlUrl(urlsToCrawl, urlsDiscovered)
i++
}
visited := map[string]bool{"some_seed_url": true}
count++
urlsToCrawl <- "some_seed_url"
for urls := range urlsDiscovered {
count-- // One message is received by master, meaning one worker thread has finished an job item, therefore decrementing count
for _, url := range urls {
_, ok := visited[url]
if ok {
continue // This URL has been crawled before
}
visited[url] = true
count ++ // One more work item will be sent to worker, therefore first increment count
urlsToCrawl <- url
}
if count == 0 {
close(urlsDiscovered)
close(urlsToCrawl)
break
} // else some worker must be working so let's wait to see if there is new msg coming through the channel
}
}
Your channels are unbuffered
urlsToCrawl := make(chan string)
urlsDiscovered := make(chan string[])
So a goroutine which reads from or writes to a channel will block until a goroutine on the other side is doing the opposite.
So you start 8 crawlUrl goroutines which all block while reading from urlsToCrawl, meaning that main can send 8 urls before blocking. The crawlUrl goroutines are blocked until main reads from urlsDiscovered. So if you have more than 8 URL's going around, all goroutines are waiting on each other(deadlock).
The solution to this is to use buffered channels with a capacity you are very unlikely to exceed:
urlsToCrawl := make(chan string, 1000)
urlsDiscovered := make(chan string[], 100)
If you expect you might still exceed the capacity of the channel in extreme cases, you can perform non-blocking operations which allow you to for example discard discovered URL's if the channel is full instead of blocking.
select {
case: urlsDiscovered <- urls:
// on success (url written)
default:
// channel is full, can't write without blocking
}
Try to integrate the WaitGroup package.

Is os.File.Write() thread safe in golang?

Is it thread safe, when two Goroutines writes to file concurrently by os.File.Write()?
According to this question Is os.File's Write() threadsafe?, it isn't thread safe. However, the output file ./test.txt of the following code didn't occur errors.
And according to this question Safe to have multiple processes writing to the same file at the same time? [CentOs 6, ext4], the POSIX "raw" IO syscalls are thread safe. os.File.Write() uses the POSIX IO syscalls, so can we say it is thread safe?
package main
import (
"fmt"
"os"
"sync"
)
func main() {
filePath := "./test.txt"
var wg sync.WaitGroup
wg.Add(2)
worker := func(name string) {
// file, _ := os.Create(filePath)
file, _ := os.OpenFile(filePath, os.O_APPEND|os.O_CREATE, 0666)
defer file.Close()
defer wg.Done()
for i := 0; i < 100000; i++ {
if _, err := file.Write([]byte(name + ": is os.File.Write() thread safe?\n")); err != nil {
fmt.Println(err)
}
}
}
go worker("worker1")
go worker("worker2")
wg.Wait()
}
Documentation does not explicitly say it is thread safe.
Looking at Go 1.16.5 version source code though:
// Write implements io.Writer.
func (fd *FD) Write(buf []byte) (int, error) {
if err := fd.writeLock(); err != nil {
return 0, err
}
defer fd.writeUnlock()
...
It uses internal synchronization. Unless you're coding a mars lander I'd say it's fine to assume writes are thread safe.
In general, you should not expect that Write calls to an io.Writer will be written out atomically, i.e. all at once. Synchronization at a higher level is recommended if you don't want interleaved outputs.
Even if you can assume that for *os.File each call to Write will be written out atomically because of either internal locking or because it's a single system call, there is no guarantee that whatever is using the file will do so. For example:
fmt.Fprintf(f, "[%s] %s\n", date, message)
The fmt library does not guarantee that this will make a single call to the io.Writer. It may, for example, flush [, then the date, then ] then the message, and then \n separately, which could result in two log messages being interleaved.
Practically speaking, writes to *os.File will probably be atomic, but it is difficult to arrange for this to be useful without incurring significant allocations, and making this assumption might compromise the portability of your application to different operating systems or architectures, or even different environments. It is possible, for example, that your binary compiled to WASM will not have the same behavior, or that your binary when writing to an NFS-backed file will behave differently.

Combining multiple maps that are stored on channel (Same key's values get summed.) in Go

My objective is to create a program that counts every unique word's occurrence in a text file in a parallellised fashion, all occurrences have to be presented in a single map.
What I do here is dividing the textfile into string and then to an array. That array is then divided into two slices of equal length and fed concurrently to the mapper function.
func WordCount(text string) (map[string]int) {
wg := new(sync.WaitGroup)
s := strings.Fields(newText)
freq := make(map[string]int,len(s))
channel := make(chan map[string]int,2)
wg.Add(1)
go mappers(s[0:(len(s)/2)], freq, channel,wg)
wg.Add(1)
go mappers(s[(len(s)/2):], freq, channel,wg)
wg.Wait()
actualMap := <-channel
return actualMap
func mappers(slice []string, occurrences map[string]int, ch chan map[string]int, wg *sync.WaitGroup) {
var l = sync.Mutex{}
for _, word := range slice {
l.Lock()
occurrences[word]++
l.Unlock()
}
ch <- occurrences
wg.Done()
}
The bottom line is, is that I get a huge multiline error that starts with
fatal error: concurrent map writes
When I run the code. Which I thought I guarded for through mutual exclusion
l.Lock()
occurrences[word]++
l.Unlock()
What am I doing wrong here? And furthermore. How can I combine all the maps in a channel? And with combine I mean same key's values get summed in the new map.
The main problem is that you use a separate lock in each goroutine. That doesn't do any help to serialize access to the map. The same lock has to be used in each goroutine.
And since you use the same map in each goroutine, you don't have to merge them, and you don't need a channel to deliver the result.
Even if you use the same mutex in each goroutine, since you use a single map, this probably won't help in performance, the goroutines will have to compete with each other for the map's lock.
You should create a separate map in each goroutine, use that to count locally, and then deliver the result map on the channel. This might give you a performance boost.
But then you don't need a lock, since each goroutine will have its own map which it can read/write without a mutex.
But then you'll do have to deliver the result on the channel, and then merge it.
And since goroutines deliver results on the channel, the waitgroup becomes unnecessary.
func WordCount(text string) map[string]int {
s := strings.Fields(text)
channel := make(chan map[string]int, 2)
go mappers(s[0:(len(s)/2)], channel)
go mappers(s[(len(s)/2):], channel)
total := map[string]int{}
for i := 0; i < 2; i++ {
m := <-channel
for k, v := range m {
total[k] += v
}
}
return total
}
func mappers(slice []string, ch chan map[string]int) {
occurrences := map[string]int{}
for _, word := range slice {
occurrences[word]++
}
ch <- occurrences
}
Example testing it:
fmt.Println(WordCount("aa ab cd cd de ef a x cd aa"))
Output (try it on the Go Playground):
map[a:1 aa:2 ab:1 cd:3 de:1 ef:1 x:1]
Also note that in theory this looks "good", but in practice you may still not achieve any performance boost, as the goroutines do too "little" work, and launching them and merging the results requires effort which may outweight the benefits.

waitgroup on subset of go routines

I have situation where in, the main go routines will create "x" go routines. but it is interested only in "y" ( y < x ) go routines to finish.
I was hoping to use Waitgroup. But Waitgroup only allows me to wait on all go routines. I cannot, for example do this,
1. wg.Add (y)
2 create "x" go routines. These routines will call wg.Done() when finished.
3. wg. Wait()
This panics when the y+1 go routine calls wg.Done() because the wg counter goes negative.
I sure can use channels to solve this but I am interested if Waitgroup solves this.
As noted in Adrian's answer, sync.WaitGroup is a simple counter whose Wait method will block until the counter value reaches zero. It is intended to allow you to block (or join) on a number of goroutines before allowing a main flow of execution to proceed.
The interface of WaitGroup is not sufficiently expressive for your usecase, nor is it designed to be. In particular, you cannot use it naïvely by simply calling wg.Add(y) (where y < x). The call to wg.Done by the (y+1)th goroutine will cause a panic, as it is an error for a wait group to have a negative internal value. Furthermore, we cannot be "smart" by observing the internal counter value of the WaitGroup; this would break an abstraction and, in any event, its internal state is not exported.
Implement your own!
You can implement the relevant logic yourself using some channels per the code below (playground link). Observe from the console that 10 goroutines are started, but after two have completed, we fallthrough to continue execution in the main method.
package main
import (
"fmt"
"time"
)
// Set goroutine counts here
const (
// The number of goroutines to spawn
x = 10
// The number of goroutines to wait for completion
// (y <= x) must hold.
y = 2
)
func doSomeWork() {
// do something meaningful
time.Sleep(time.Second)
}
func main() {
// Accumulator channel, used by each goroutine to signal completion.
// It is buffered to ensure the [y+1, ..., x) goroutines do not block
// when sending to the channel, which would cause a leak. It will be
// garbage collected when all goroutines end and the channel falls
// out of scope. We receive y values, so only need capacity to receive
// (x-y) remaining values.
accChan := make(chan struct{}, x-y)
// Spawn "x" goroutines
for i := 0; i < x; i += 1 {
// Wrap our work function with the local signalling logic
go func(id int, doneChan chan<- struct{}) {
fmt.Printf("starting goroutine #%d\n", id)
doSomeWork()
fmt.Printf("goroutine #%d completed\n", id)
// Communicate completion of goroutine
doneChan <- struct{}{}
}(i, accChan)
}
for doneCount := 0; doneCount < y; doneCount += 1 {
<-accChan
}
// Continue working
fmt.Println("Carrying on without waiting for more goroutines")
}
Avoid leaking resources
As this does not wait for the [y+1, ..., x) goroutines to complete, you should take special care in the doSomeWork function to remove or minimize the risk that the work can block indefinitely, which would also cause a leak. Remove, where possible, the feasibility of indefinite blocking on I/O (including channel operations) or falling into infinite loops.
You could use a context to signal to the additional goroutines when their results are no longer required to have them break out of execution.
WaitGroup doesn't actually wait on goroutines, it waits until its internal counter reaches zero. If you only Add() the number of goroutines you care about, and you only call Done() in those goroutines you care about, then Wait() will only block until those goroutines you care about have finished. You are in complete control of the logic and flow, there are no restrictions on what WaitGroup "allows".
Are these y specific go-routines that you are trying to track, or any y out of the x? What are the criteria?
Update:
1. If you hve control over any criteria to pick matching y go-routines:
You can do wp.wg.Add(1) and wp.wg.Done() from inside the goroutine based on your condition by passing it as a pointer argument into the goroutine, if your condition can't be checked outside the goroutine.
Something like below sample code. Will be able to be more specific if you provide more details of what you are trying to do.
func sampleGoroutine(z int, b string, wg *sync.WaitGroup){
defer func(){
if contition1{
wg.Done()
}
}
if contition1 {
wg.Add(1)
//do stuff
}
}
func main() {
wg := sync.WaitGroup{}
for i := 0; i < x; i++ {
go sampleGoroutine(1, "one", &wg)
}
wg.Wait()
}
2. If you have no control over which ones, and just want the first y:
Based on your comment, that you have no control/desire to pick any specific goroutines, but the ones that finish first. If you would want to do it in a generic way, you can use the below custom waitGroup implementation that fits your use case. (It's not copy-safe, though. Also doesn't have/need wg.Add(int) method)
type CountedWait struct {
wait chan struct{}
limit int
}
func NewCountedWait(limit int) *CountedWait {
return &CountedWait{
wait: make(chan struct{}, limit),
limit: limit,
}
}
func (cwg *CountedWait) Done() {
cwg.wait <- struct{}{}
}
func (cwg *CountedWait) Wait() {
count := 0
for count < cwg.limit {
<-cwg.wait
count += 1
}
}
Which can be used as follows:
func sampleGoroutine(z int, b string, wg *CountedWait) {
success := false
defer func() {
if success == true {
fmt.Printf("goroutine %d finished successfully\n", z)
wg.Done()
}
}()
fmt.Printf("goroutine %d started\n", z)
time.Sleep(time.Second)
if rand.Intn(10)%2 == 0 {
success = true
}
}
func main() {
x := 10
y := 3
wg := NewCountedWait(y)
for i := 0; i < x; i += 1 {
// Wrap our work function with the local signalling logic
go sampleGoroutine(i, "something", wg)
}
wg.Wait()
fmt.Printf("%d out of %d goroutines finished successfully.\n", y, x)
}
3. You can also club in context with 2 to ensure that the remaining goroutines don't leak
You may not be able to run this on play.golang, as it has some long sleeps.
Below is a sample output:
(note that, there may be more than y=3 goroutines marking Done, but you are only waiting till 3 finish)
goroutine 9 started
goroutine 0 started
goroutine 1 started
goroutine 2 started
goroutine 3 started
goroutine 4 started
goroutine 5 started
goroutine 5 marking done
goroutine 6 started
goroutine 7 started
goroutine 7 marking done
goroutine 8 started
goroutine 3 marking done
continuing after 3 out of 10 goroutines finished successfully.
goroutine 9 will be killed, bcz cancel
goroutine 8 will be killed, bcz cancel
goroutine 6 will be killed, bcz cancel
goroutine 1 will be killed, bcz cancel
goroutine 0 will be killed, bcz cancel
goroutine 4 will be killed, bcz cancel
goroutine 2 will be killed, bcz cancel
Play links
https://play.golang.org/p/l5i6X3GClBq
https://play.golang.org/p/Bcns0l9OdFg
https://play.golang.org/p/rkGSLyclgje

How would you define a pool of goroutines to be executed at once?

TL;DR: Please just go to the last part and tell me how you would solve this problem.
I've begun using Go this morning coming from Python. I want to call a closed-source executable from Go several times, with a bit of concurrency, with different command line arguments. My resulting code is working just well but I'd like to get your input in order to improve it. Since I'm at an early learning stage, I'll also explain my workflow.
For the sake of simplicity, assume here that this "external closed-source program" is zenity, a Linux command line tool that can display graphical message boxes from the command line.
Calling an executable file from Go
So, in Go, I would go like this:
package main
import "os/exec"
func main() {
cmd := exec.Command("zenity", "--info", "--text='Hello World'")
cmd.Run()
}
This should be working just right. Note that .Run() is a functional equivalent to .Start() followed by .Wait(). This is great, but if I wanted to execute this program just once, the whole programming stuff would not be worth it. So let's just do that multiple times.
Calling an executable multiple times
Now that I had this working, I'd like to call my program multiple times, with custom command line arguments (here just i for the sake of simplicity).
package main
import (
"os/exec"
"strconv"
)
func main() {
NumEl := 8 // Number of times the external program is called
for i:=0; i<NumEl; i++ {
cmd := exec.Command("zenity", "--info", "--text='Hello from iteration n." + strconv.Itoa(i) + "'")
cmd.Run()
}
}
Ok, we did it! But I still can't see the advantage of Go over Python … This piece of code is actually executed in a serial fashion. I have a multiple-core CPU and I'd like to take advantage of it. So let's add some concurrency with goroutines.
Goroutines, or a way to make my program parallel
a) First attempt: just add "go"s everywhere
Let's rewrite our code to make things easier to call and reuse and add the famous go keyword:
package main
import (
"os/exec"
"strconv"
)
func main() {
NumEl := 8
for i:=0; i<NumEl; i++ {
go callProg(i) // <--- There!
}
}
func callProg(i int) {
cmd := exec.Command("zenity", "--info", "--text='Hello from iteration n." + strconv.Itoa(i) + "'")
cmd.Run()
}
Nothing! What is the problem? All the goroutines are executed at once. I don't really know why zenity is not executed but AFAIK, the Go program exited before the zenity external program could even be initialized. This was confirmed by the use of time.Sleep: waiting for a couple of seconds was enough to let the 8 instance of zenity launch themselves. I don't know if this can be considered a bug though.
To make it worse, the real program I'd actually like to call takes a while to execute itself. If I execute 8 instances of this program in parallel on my 4-core CPU, it's gonna waste some time doing a lot of context switching … I don't know how plain Go goroutines behave, but exec.Command will launch zenity 8 times in 8 different threads. To make it even worse, I want to execute this program more than 100,000 times. Doing all of that at once in goroutines won't be efficient at all. Still, I'd like to leverage my 4-core CPU!
b) Second attempt: use pools of goroutines
The online resources tend to recommend the use of sync.WaitGroup for this kind of work. The problem with that approach is that you are basically working with batches of goroutines: if I create of WaitGroup of 4 members, the Go program will wait for all the 4 external programs to finish before calling a new batch of 4 programs. This is not efficient: CPU is wasted, once again.
Some other resources recommended the use of a buffered channel to do the work:
package main
import (
"os/exec"
"strconv"
)
func main() {
NumEl := 8 // Number of times the external program is called
NumCore := 4 // Number of available cores
c := make(chan bool, NumCore - 1)
for i:=0; i<NumEl; i++ {
go callProg(i, c)
c <- true // At the NumCoreth iteration, c is blocking
}
}
func callProg(i int, c chan bool) {
defer func () {<- c}()
cmd := exec.Command("zenity", "--info", "--text='Hello from iteration n." + strconv.Itoa(i) + "'")
cmd.Run()
}
This seems ugly. Channels were not intended for this purpose: I'm exploiting a side-effect. I love the concept of defer but I hate having to declare a function (even a lambda) to pop a value out of the dummy channel that I created. Oh, and of course, using a dummy channel is, by itself, ugly.
c) Third attempt: die when all the children are dead
Now we are nearly finished. I have just to take into account yet another side effect: the Go program closes before all the zenity pop-ups are closed. This is because when the loop is finised (at the 8th iteration), nothing prevents the program from finishing. This time, sync.WaitGroup will be useful.
package main
import (
"os/exec"
"strconv"
"sync"
)
func main() {
NumEl := 8 // Number of times the external program is called
NumCore := 4 // Number of available cores
c := make(chan bool, NumCore - 1)
wg := new(sync.WaitGroup)
wg.Add(NumEl) // Set the number of goroutines to (0 + NumEl)
for i:=0; i<NumEl; i++ {
go callProg(i, c, wg)
c <- true // At the NumCoreth iteration, c is blocking
}
wg.Wait() // Wait for all the children to die
close(c)
}
func callProg(i int, c chan bool, wg *sync.WaitGroup) {
defer func () {
<- c
wg.Done() // Decrease the number of alive goroutines
}()
cmd := exec.Command("zenity", "--info", "--text='Hello from iteration n." + strconv.Itoa(i) + "'")
cmd.Run()
}
Done.
My questions
Do you know any other proper way to limit the number of goroutines executed at once?
I don't mean threads; how Go manages goroutines internally is not relevant. I really mean limiting the number of goroutines launched at once: exec.Command creates a new thread each time it is called, so I should control the number of time it is called.
Does that code look fine to you?
Do you know how to avoid the use of a dummy channel in that case?
I can't convince myself that such dummy channels are the way to go.
I would spawn 4 worker goroutines that read the tasks from a common channel. Goroutines that are faster than others (because they are scheduled differently or happen to get simple tasks) will receive more task from this channel than others. In addition to that, I would use a sync.WaitGroup to wait for all workers to finish. The remaining part is just the creation of the tasks. You can see an example implementation of that approach here:
package main
import (
"os/exec"
"strconv"
"sync"
)
func main() {
tasks := make(chan *exec.Cmd, 64)
// spawn four worker goroutines
var wg sync.WaitGroup
for i := 0; i < 4; i++ {
wg.Add(1)
go func() {
for cmd := range tasks {
cmd.Run()
}
wg.Done()
}()
}
// generate some tasks
for i := 0; i < 10; i++ {
tasks <- exec.Command("zenity", "--info", "--text='Hello from iteration n."+strconv.Itoa(i)+"'")
}
close(tasks)
// wait for the workers to finish
wg.Wait()
}
There are probably other possible approaches, but I think this is a very clean solution that is easy to understand.
A simple approach to throttling (execute f() N times but maximum maxConcurrency concurrently), just a scheme:
package main
import (
"sync"
)
const maxConcurrency = 4 // for example
var throttle = make(chan int, maxConcurrency)
func main() {
const N = 100 // for example
var wg sync.WaitGroup
for i := 0; i < N; i++ {
throttle <- 1 // whatever number
wg.Add(1)
go f(i, &wg, throttle)
}
wg.Wait()
}
func f(i int, wg *sync.WaitGroup, throttle chan int) {
defer wg.Done()
// whatever processing
println(i)
<-throttle
}
Playground
I wouldn't probably call the throttle channel "dummy". IMHO it's an elegant way (it's not my invention of course), how to limit concurrency.
BTW: Please note that you're ignoring the returned error from cmd.Run().
🧩 Modules
Golang Concurrency Manager
📃 Template
package main
import (
"fmt"
"github.com/zenthangplus/goccm"
"math/rand"
"runtime"
)
func main() {
semaphore := goccm.New(runtime.NumCPU())
for {
semaphore.Wait()
go func() {
fmt.Println(rand.Int())
semaphore.Done()
}()
}
semaphore.WaitAllDone()
}
🎰 Optimal routine quantity
If the operation is CPU bounded: runtime.NumCPU()
Otherwise test with: time go run *.go
🔨 Configure
export GOPATH="$(pwd)/gopath"
go mod init *.go
go mod tidy
🧹 CleanUp
find "${GOPATH}" -exec chmod +w {} \;
rm --recursive --force "${GOPATH}"
try this:
https://github.com/korovkin/limiter
limiter := NewConcurrencyLimiter(10)
limiter.Execute(func() {
zenity(...)
})
limiter.Wait()
You could use Worker Pool pattern described here in this post.
This is how an implementation would look like ...
package main
import (
"os/exec"
"strconv"
)
func main() {
NumEl := 8
pool := 4
intChan := make(chan int)
for i:=0; i<pool; i++ {
go callProg(intChan) // <--- launch the worker routines
}
for i:=0;i<NumEl;i++{
intChan <- i // <--- push data which will be received by workers
}
close(intChan) // <--- will safely close the channel & terminate worker routines
}
func callProg(intChan chan int) {
for i := range intChan{
cmd := exec.Command("zenity", "--info", "--text='Hello from iteration n." + strconv.Itoa(i) + "'")
cmd.Run()
}
}

Resources