I'm attempting to write a simple worker pool with goroutines.
Is the code I wrote idiomatic? If not, then what should change?
I want to be able to set the maximum number of worker threads to 5 and block until a worker becomes available if all 5 are busy. How would I extend this to only have a pool of 5 workers max? Do I spawn the static 5 goroutines, and give each the work_channel?
code:
package main
import (
"fmt"
"math/rand"
"sync"
"time"
)
func worker(id string, work string, o chan string, wg *sync.WaitGroup) {
defer wg.Done()
sleepMs := rand.Intn(1000)
fmt.Printf("worker '%s' received: '%s', sleep %dms\n", id, work, sleepMs)
time.Sleep(time.Duration(sleepMs) * time.Millisecond)
o <- work + fmt.Sprintf("-%dms", sleepMs)
}
func main() {
var work_channel = make(chan string)
var results_channel = make(chan string)
// create goroutine per item in work_channel
go func() {
var c = 0
var wg sync.WaitGroup
for work := range work_channel {
wg.Add(1)
go worker(fmt.Sprintf("%d", c), work, results_channel, &wg)
c++
}
wg.Wait()
fmt.Println("closing results channel")
close(results_channel)
}()
// add work to the work_channel
go func() {
for c := 'a'; c < 'z'; c++ {
work_channel <- fmt.Sprintf("%c", c)
}
close(work_channel)
fmt.Println("sent work to work_channel")
}()
for x := range results_channel {
fmt.Printf("result: %s\n", x)
}
}
Your solution is not a worker goroutine pool in any sense: your code does not limit concurrent goroutines, and it does not "reuse" goroutines (it always starts a new one when a new job is received).
Producer-consumer pattern
As posted at Bruteforce MD5 Password cracker, you can make use of the producer-consumer pattern. You could have a designated producer goroutine that would generate the jobs (things to do / calculate), and send them on a jobs channel. You could have a fixed pool of consumer goroutines (e.g. 5 of them) which would loop over the channel on which jobs are delivered, and each would execute / complete the received jobs.
The producer goroutine could simply close the jobs channel when all jobs were generated and sent, properly signalling consumers that no more jobs will be coming. The for ... range construct on a channel handles the "close" event and terminates properly. Note that all jobs sent before closing the channel will still be delivered.
This would result in a clean design, would result in fixed (but arbitrary) number of goroutines, and it would always utilize 100% CPU (if # of goroutines is greater than # of CPU cores). It also has the advantage that it can be "throttled" with the proper selection of the channel capacity (buffered channel) and the number of consumer goroutines.
Note that this model to have a designated producer goroutine is not mandatory. You could have multiple goroutines to produce jobs too, but then you must synchronize them too to only close the jobs channel when all producer goroutines are done producing jobs - else attempting to send another job on the jobs channel when it has already been closed results in a runtime panic. Usually producing jobs are cheap and can be produced at a much quicker rate than they can be executed, so this model to produce them in 1 goroutine while many are consuming / executing them is good in practice.
Handling results:
If jobs have results, you may choose to have a designated result channel on which results could be delivered ("sent back"), or you may choose to handle the results in the consumer when the job is completed / finished. This latter may even be implemented by having a "callback" function that handles the results. The important thing is whether results can be processed independently or they need to be merged (e.g. map-reduce framework) or aggregated.
If you go with a results channel, you also need a goroutine that receives values from it, preventing consumers to get blocked (would occur if buffer of results would get filled).
With results channel
Instead of sending simple string values as jobs and results, I would create a wrapper type which can hold any additional info and so it is much more flexible:
type Job struct {
Id int
Work string
Result string
}
Note that the Job struct also wraps the result, so when we send back the result, it also contains the original Job as the context - often very useful. Also note that it is profitable to just send pointers (*Job) on the channels instead of Job values so no need to make "countless" copies of Jobs, and also the size of the Job struct value becomes irrelevant.
Here is how this producer-consumer could look like:
I would use 2 sync.WaitGroup values, their role will follow:
var wg, wg2 sync.WaitGroup
The producer is responsible to generate jobs to be executed:
func produce(jobs chan<- *Job) {
// Generate jobs:
id := 0
for c := 'a'; c <= 'z'; c++ {
id++
jobs <- &Job{Id: id, Work: fmt.Sprintf("%c", c)}
}
close(jobs)
}
When done (no more jobs), the jobs channel is closed which signals consumers that no more jobs will arrive.
Note that produce() sees the jobs channel as send only, because that's what the producer needs to do only with that: send jobs on it (besides closing it, but that is also permitted on a send only channel). An accidental receive in the producer would be a compile time error (detected early, at compile time).
The consumer's responsibility is to receive jobs as long as jobs can be received, and execute them:
func consume(id int, jobs <-chan *Job, results chan<- *Job) {
defer wg.Done()
for job := range jobs {
sleepMs := rand.Intn(1000)
fmt.Printf("worker #%d received: '%s', sleep %dms\n", id, job.Work, sleepMs)
time.Sleep(time.Duration(sleepMs) * time.Millisecond)
job.Result = job.Work + fmt.Sprintf("-%dms", sleepMs)
results <- job
}
}
Note that consume() sees the jobs channel as receive only; consumer only needs to receive from it. Similarly the results channel is send only for the consumer.
Also note that the results channel cannot be closed here as there are multiple consumer goroutines, and only the first attempting to close it would succeed and further ones would result in runtime panic! results channel can (must) be closed after all consumer goroutines ended, because then we can be sure no further values (results) will be sent on the results channel.
We have results which need to be analyzed:
func analyze(results <-chan *Job) {
defer wg2.Done()
for job := range results {
fmt.Printf("result: %s\n", job.Result)
}
}
As you can see, this also receives results as long as they may come (until results channel is closed). The results channel for the analyzer is receive only.
Please note the use of channel types: whenever it is sufficient, use only a unidirectional channel type to detect and prevent errors early, at compile time. Only use bidirectional channel type if you do need both directions.
And this is how all these are glued together:
func main() {
jobs := make(chan *Job, 100) // Buffered channel
results := make(chan *Job, 100) // Buffered channel
// Start consumers:
for i := 0; i < 5; i++ { // 5 consumers
wg.Add(1)
go consume(i, jobs, results)
}
// Start producing
go produce(jobs)
// Start analyzing:
wg2.Add(1)
go analyze(results)
wg.Wait() // Wait all consumers to finish processing jobs
// All jobs are processed, no more values will be sent on results:
close(results)
wg2.Wait() // Wait analyzer to analyze all results
}
Example output:
Here is an example output:
As you can see, results are coming and getting analyzed before all the jobs would be enqueued:
worker #4 received: 'e', sleep 81ms
worker #0 received: 'a', sleep 887ms
worker #1 received: 'b', sleep 847ms
worker #2 received: 'c', sleep 59ms
worker #3 received: 'd', sleep 81ms
worker #2 received: 'f', sleep 318ms
result: c-59ms
worker #4 received: 'g', sleep 425ms
result: e-81ms
worker #3 received: 'h', sleep 540ms
result: d-81ms
worker #2 received: 'i', sleep 456ms
result: f-318ms
worker #4 received: 'j', sleep 300ms
result: g-425ms
worker #3 received: 'k', sleep 694ms
result: h-540ms
worker #4 received: 'l', sleep 511ms
result: j-300ms
worker #2 received: 'm', sleep 162ms
result: i-456ms
worker #1 received: 'n', sleep 89ms
result: b-847ms
worker #0 received: 'o', sleep 728ms
result: a-887ms
worker #1 received: 'p', sleep 274ms
result: n-89ms
worker #2 received: 'q', sleep 211ms
result: m-162ms
worker #2 received: 'r', sleep 445ms
result: q-211ms
worker #1 received: 's', sleep 237ms
result: p-274ms
worker #3 received: 't', sleep 106ms
result: k-694ms
worker #4 received: 'u', sleep 495ms
result: l-511ms
worker #3 received: 'v', sleep 466ms
result: t-106ms
worker #1 received: 'w', sleep 528ms
result: s-237ms
worker #0 received: 'x', sleep 258ms
result: o-728ms
worker #2 received: 'y', sleep 47ms
result: r-445ms
worker #2 received: 'z', sleep 947ms
result: y-47ms
result: u-495ms
result: x-258ms
result: v-466ms
result: w-528ms
result: z-947ms
Try the complete application on the Go Playground.
Without a results channel
Code simplifies significantly if we don't use a results channel but the consumer goroutines handle the result right away (print it in our case). In this case we don't need 2 sync.WaitGroup values (the 2nd was only needed to wait for the analyzer to complete).
Without a results channel the complete solution is like this:
var wg sync.WaitGroup
type Job struct {
Id int
Work string
}
func produce(jobs chan<- *Job) {
// Generate jobs:
id := 0
for c := 'a'; c <= 'z'; c++ {
id++
jobs <- &Job{Id: id, Work: fmt.Sprintf("%c", c)}
}
close(jobs)
}
func consume(id int, jobs <-chan *Job) {
defer wg.Done()
for job := range jobs {
sleepMs := rand.Intn(1000)
fmt.Printf("worker #%d received: '%s', sleep %dms\n", id, job.Work, sleepMs)
time.Sleep(time.Duration(sleepMs) * time.Millisecond)
fmt.Printf("result: %s\n", job.Work+fmt.Sprintf("-%dms", sleepMs))
}
}
func main() {
jobs := make(chan *Job, 100) // Buffered channel
// Start consumers:
for i := 0; i < 5; i++ { // 5 consumers
wg.Add(1)
go consume(i, jobs)
}
// Start producing
go produce(jobs)
wg.Wait() // Wait all consumers to finish processing jobs
}
Output is "like" that of with results channel (but of course execution/completion order is random).
Try this variant on the Go Playground.
You can implement a counting semaphore to limit goroutine concurrency.
var tokens = make(chan struct{}, 20)
func worker(id string, work string, o chan string, wg *sync.WaitGroup) {
defer wg.Done()
tokens <- struct{}{} // acquire a token before performing work
sleepMs := rand.Intn(1000)
fmt.Printf("worker '%s' received: '%s', sleep %dms\n", id, work, sleepMs)
time.Sleep(time.Duration(sleepMs) * time.Millisecond)
<-tokens // release the token
o <- work + fmt.Sprintf("-%dms", sleepMs)
}
This is the general design used to limit the number of workers. You can of course change location of releasing/acquiring of tokens to fit your code.
Related
I have a golang grpc server which has streaming endpoint. Earlier I was doing all the work sequentially and sending on the stream but then I realize I can make the work concurrent and then send on stream. From grpc-go docs: I understood that I can make the work concurrent, but you can't make sending on the stream concurrent so I got below code which does the job.
Below is the code I have in my streaming endpoint which sends data back to client in a streaming way. This does all the work concurrently.
// get "allCids" from lot of files and load in memory.
allCids := .....
var data = allCids.([]int64)
out := make(chan *custPbV1.CustomerResponse, len(data))
wg := &sync.WaitGroup{}
wg.Add(len(data))
go func() {
wg.Wait()
close(out)
}()
for _, cid := range data {
go func (id int64) {
defer wg.Done()
pd := repo.GetCustomerData(strconv.FormatInt(cid, 10))
if !pd.IsCorrect {
return
}
resources := us.helperCom.GenerateResourceString(pd)
val, err := us.GenerateInfo(clientId, resources, cfg)
if err != nil {
return
}
out <- val
}(cid)
}
for val := range out {
if err := stream.Send(val); err != nil {
log.Printf("send error %v", err)
}
}
Now problem I have is size of data slice can be approx a million so I don't want to spawn million go routine doing the job. How do I handle that scenario here? If instead of len(data) I use 100 then will that work for me or I need to slice data as well in 100 sub arrays? I am just confuse on what is the best way to deal with this problem?
I recently started with golang so pardon me if there are any mistakes in my above code while making it concurrent.
Please check this pseudo code
func main() {
works := make(chan int, 100)
errChan := make(chan error, 100)
out := make(chan *custPbV1.CustomerResponse, 100)
// spawn fixed workers
var workerWg sync.WaitGroup
for i := 0; i < 100; i++ {
workerWg.Add(1)
go worker(&workerWg, works, errChan, out)
}
// give input
go func() {
for _, cid := range data {
// this will be blocked if all the workers are busy and no space is left in the channel.
works <- cid
}
close(works)
}()
var analyzeResults sync.WaitGroup
analyzeResults.Add(2)
// process errors
go func() {
for err := range errChan {
log.Printf("error %v", err)
}
analyzeResults.Done()
}()
// process outout
go func() {
for val := range out {
if err := stream.Send(val); err != nil {
log.Printf("send error %v", err)
}
}
analyzeResults.Done()
}()
workerWg.Wait()
close(out)
close(errChan)
analyzeResults.Wait()
}
func worker(job *sync.WaitGroup, works chan int, errChan chan error, out chan *custPbV1.CustomerResponse) {
defer job.Done()
// Idle worker takes the work from this channel.
for cid := range works {
pd := repo.GetCustomerData(strconv.FormatInt(cid, 10))
if !pd.IsCorrect {
errChan <- errors.New(fmt.Sprintf("pd %d is incorrect", pd))
// we can not return here as the total number of workers will be reduced. If all the workers does this then there is a chance that no workers are there to do the job
continue
}
resources := us.helperCom.GenerateResourceString(pd)
val, err := us.GenerateInfo(clientId, resources, cfg)
if err != nil {
errChan <- errors.New(fmt.Sprintf("got error", err))
continue
}
out <- val
}
}
Explanation:
This is a worker pool implementation where we spawn a fixed number of goroutines(100 workers here) to do the same job(GetCustomerData() & GenerateInfo() here) but with different input data(cid here). 100 workers here does not mean that it is parallel but concurrent(depends on the GOMAXPROCS). If one worker is waiting for io result(basically some blocking operation)then that particular goroutine will be context switched and other worker goroutine gets a chance to execute. But increasing goroutuines (workers) may not give much performance but can leads to contention on the channel as more workers are waiting for the input job on that channel.
The benefit over splitting the 1 million data to subslice is that. Lets say we have 1000 jobs and 100 workers. each worker will get assigned to the jobs 1-10, 11-20 etc... What if the first 10 jobs is taking more time than others. In that case the first worker is overloaded and the other workers will finish the tasks and will be idle even though there are pending tasks. So to avoid this situation, this is the best solution as the idle worker will take the next job. So that no worker is more overloaded compared to the other workers
There are plenty of examples of how to use WaitGroup to wait for all of a group of goroutines to finish, but what if you want to wait for any one of them to finish without using a semaphore system where some process must be waiting? For example, a producer/consumer scenario where multiple producer threads add multiple entries to a data structure while a consumer is removing them one at a time. In this scenario:
We can't just use the standard producer/consumer semaphore system, because production:consumption is not 1:1, and also because the data structure acts as a cache, so the producers can be "free-running" instead of blocking until a consumer is "ready" to consume their product.
The data structure may be emptied by the consumer, in which case, the consumer wants to wait until any one of the producers finishes (meaning that there might be new things in the data structure)
Question: Is there a standard way to do that?
I've only been able to devise two methods of doing this. Both by using channels as semaphores:
var unitary_channel chan int = make(chan int, 1)
func my_goroutine() {
// Produce, produce, produce!!!
unitary_channel<-0 // Try to push a value to the channel
<-unitary_channel // Remove it, in case nobody was waiting
}
func main() {
go my_goroutine()
go my_goroutine()
go my_goroutine()
for len(stuff_to_consume) { /* Consume, consume, consume */ }
// Ran out of stuff to consume
<-unitary_channel
unitary_channel<-0 // To unblock the goroutine which was exiting
// Consume more
}
Now, this simplistic example has some glaring (but solvable issues), like the fact that main() can't exit if there wasn't at least one go_routine() still running.
The second method, instead of requiring producers to remove the value they just pushed to the channel, uses select to allow the producers to exit when the channel would block them.
var empty_channel chan int = make(chan int)
func my_goroutine() {
// Produce, produce, produce!!!
select {
case empty_channel <- 0: // Push if you can
default: // Or don't if you can't
}
}
func main() {
go my_goroutine()
go my_goroutine()
go my_goroutine()
for len(stuff_to_consume) { /* Consume, consume, consume */ }
// Ran out of stuff to consume
<-unitary_channel
// Consume more
}
Of course, this one will also block main() forever if all of the goroutines have already terminated. So, if the answer to the first question is "No, there's no standard solution to this other than the ones you've come up with", is there a compelling reason why one of these should be used instead of the other?
you could use a channel with a buffer like this
// create a channel with a buffer of 1
var Items = make(chan int, 1)
var MyArray []int
func main() {
go addItems()
go addItems()
go addItems()
go sendToChannel()
for true {
fmt.Println(<- Items)
}
}
// push a number to the array
func addItems() {
for x := 0; x < 10; x++ {
MyArray = append(MyArray, x)
}
}
// push to Items and pop the array
func sendToChannel() {
for true {
for len(MyArray) > 0 {
Items <- MyArray[0]
MyArray = MyArray[1:]
}
time.Sleep(10 * time.Second)
}
}
the for loop in main will loop for ever and print anything that gets added to the channel and the sendToChannel function will block when the array is empty,
this way a producer will never be blocked and a consumer can consume when there are one or more items available
I'm newbie in programming and I need help. Trying to write gitlab scraper on golang.
Something goes wrong when i'm trying to get information about projects in multithreading mode.
Here is the code:
func (g *Gitlab) getAPIResponce(url string, structure interface{}) error {
responce, responce_error := http.Get(url)
if responce_error != nil {
return responce_error
}
ret, _ := ioutil.ReadAll(responce.Body)
if string(ret) != "[]" {
err := json.Unmarshal(ret, structure)
return err
}
return errors.New(error_emptypage)
}
...
func (g *Gitlab) GetProjects() {
projects_chan := make(chan Project, g.LatestProjectID)
var waitGroup sync.WaitGroup
queue := make(chan struct{}, 50)
for i := g.LatestProjectID; i > 0; i-- {
url := g.BaseURL + projects_url + "/" + strconv.Itoa(i) + g.Token
waitGroup.Add(1)
go func(url string, channel chan Project) {
queue <- struct{}{}
defer waitGroup.Done()
var oneProject Project
err := g.getAPIResponce(url, &oneProject)
if err != nil {
fmt.Println(err.Error())
}
fmt.Printf(".")
channel <- oneProject
<-queue
}(url, projects_chan)
}
go func() {
waitGroup.Wait()
close(projects_chan)
}()
for project := range projects_chan {
if project.ID != 0 {
g.Projects = append(g.Projects, project)
}
}
}
And here is the output:
$ ./gitlab-auditor
latest project = 1532
Gathering projects...
.......................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................Get https://gitlab.example.com/api/v4/projects/563&private_token=SeCrEt_ToKeN: unexpected EOF
Get https://gitlab.example.com/api/v4/projects/558&private_token=SeCrEt_ToKeN: unexpected EOF
..Get https://gitlab.example.com/api/v4/projects/531&private_token=SeCrEt_ToKeN: unexpected EOF
Get https://gitlab.example.com/api/v4/projects/571&private_token=SeCrEt_ToKeN: unexpected EOF
.Get https://gitlab.example.com/api/v4/projects/570&private_token=SeCrEt_ToKeN: unexpected EOF
..Get https://gitlab.example.com/api/v4/projects/467&private_token=SeCrEt_ToKeN: unexpected EOF
Get https://gitlab.example.com/api/v4/projects/573&private_token=SeCrEt_ToKeN: unexpected EOF
................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
Every time it's different projects, but it's id is around 550.
When I'm trying to curl links from output, i'm getting normal JSON. When I'm trying to run this code with queue := make(chan struct{}, 1) (in single thread) - everything is fine.
What can it be?
i would say this not a very clear way to achieve concurrency.
what seems to be happening here is
you create a buffered channel that has a size of 50.
then you fire up 1532 goroutines
the first 50 of them enqueue themselves and start processing. by the time they <-queue and free up somespace a random one from the next manages to get on the queue.
as people say in the comments most certainly you hit some limits around the time it the blast has made it around id 550. Then gitlab's API is angry at you and rate limits.
then another goroutine is fired that will close the channel to notify the main goroutine
the main goroutine reads messages.
the talk go concurrency patterns
as well as this blog post concurrency in go might help.
personally i rarely use buffered channels. for your problem i would go like:
define a number of workers
have the main goroutine fire up the workers with a func listening on a channel of ints , doing the api call, writing to a channel of projects
have the main goroutine send to a channel of ints the project number to be fetched and read from the channel of projects.
maybe ratelimit by firing a ticker and have main read from it before it sends the next request?
main closes the number channel to notify the others to die.
I'm in the process of learning how to do concurrency, and I've written this as its own app so that I can port it into a different project once it's working.
The project I'm adding it to will basically send in a RowInfo to a global QueueChannel, and then my workers should pick up this work and process it. If I queue two rows with the same ID, and one of them is currently processing by a worker, I'll remove the duplicate row from the queue (as you can see where I do my "continue" in the dispatcher).
This queueing/worker code will be running on a web server blocking on ListenAndServe, so I want it to always remain running and the workers to always remain actively looking for jobs. I don't want to have to close the channels (unless perhaps I ctrl+C'd the app or something). I suspect the error I'm getting has something to do with not closing channels because that's what a lot of the other threads mentioning this error seem to indicate, but I'm not sure how it relates to the code I have exactly.
Terminal error output:
[~/go/src/github.com/zzz/asynch]> go run main.go
fatal error: all goroutines are asleep - deadlock!
goroutine 1 [chan send]:
main.main()
/home/zzz/go/src/github.com/zzz/asynch/main.go:29 +0x14b
goroutine 5 [select]:
main.diszzzcher(0xc82001a120, 0xc82001a180, 0xc82001a1e0)
/home/zzz/go/src/github.com/zzz/asynch/main.go:42 +0x21a
created by main.main
/home/zzz/go/src/github.com/zzz/asynch/main.go:19 +0xb1
goroutine 6 [chan receive]:
main.worker(0xc82001a180, 0xc82001a1e0)
/home/zzz/go/src/github.com/zzz/asynch/main.go:55 +0x54
created by main.main
/home/zzz/go/src/github.com/zzz/asynch/main.go:24 +0xf7
goroutine 7 [chan receive]:
main.worker(0xc82001a180, 0xc82001a1e0)
/home/zzz/go/src/github.com/zzz/asynch/main.go:55 +0x54
created by main.main
/home/zzz/go/src/github.com/zzz/asynch/main.go:24 +0xf7
goroutine 8 [chan receive]:
main.worker(0xc82001a180, 0xc82001a1e0)
/home/zzz/go/src/github.com/zzz/asynch/main.go:55 +0x54
created by main.main
/home/zzz/go/src/github.com/zzz/asynch/main.go:24 +0xf7
goroutine 9 [chan receive]:
main.worker(0xc82001a180, 0xc82001a1e0)
/home/zzz/go/src/github.com/zzz/asynch/main.go:55 +0x54
created by main.main
/home/zzz/go/src/github.com/zzz/asynch/main.go:24 +0xf7
exit status 2
Code:
package main
import (
"log"
"time"
)
type RowInfo struct {
id int64
}
var QueueChan chan RowInfo
func main() {
QueueChan := make(chan RowInfo)
workerChan := make(chan RowInfo)
exitChan := make(chan int64)
go dispatcher(QueueChan, workerChan, exitChan)
// Start WorkerCount number of workers
workerCount := 4
for i := 0; i < workerCount; i++ {
go worker(workerChan, exitChan)
}
// Send test data
for i := 0; i < 12; i++ {
QueueChan <- RowInfo{id: int64(i)}
}
// Prevent app close
for {
time.Sleep(1 * time.Second)
}
}
func dispatcher(queueChan, workerChan chan RowInfo, exitChan chan int64) {
state := make(map[int64]bool)
for {
select {
case job := <-QueueChan:
if state[job.id] == true {
continue
}
workerChan <- job
case result := <-exitChan:
state[result] = false
}
}
}
func worker(workerChan chan RowInfo, exitChan chan int64) {
for job := range workerChan {
log.Printf("Doing work on job rowInfo ID: %d", job.id)
// Finish job
exitChan <- job.id
}
}
Thank you.
The error tells you: all goroutines are asleep, the program has deadlocked.
Now why are all your goroutines asleep? Let's check them one by one:
the worker goroutines: Wait indefinitely for new work on workerChan, will not exit until workerChan is closed, is asleep whenever it waits for new work
the dispatcher goroutine: Loops forever, selecting over two channels. Will never exit, is asleep while waiting in the select
the main goroutine: Loops forever on a time.Sleep, will never exit and be asleep most of the time
Typically, in a situation like this, you'd introduce a chan struct{} (call it closing or something like that) and inclue it in your selects. If you want to close the program, just close(closing). The select will choose the <-closing option, you return the goroutines.
You should also add a sync.WaitGroup to be notified when all your goroutines have exited.
I have a Go program that runs continuously and relies entirely on goroutines + 1 manager thread. The main thread simply calls goroutines and otherwise sleeps.
There is a memory leak. The program uses more and more memory until it drains all 16GB RAM + 32GB SWAP and then each goroutine panics. It is actually OS memory that causes the panic, usually the panic is fork/exec ./anotherapp: cannot allocate memory when I try to execute anotherapp.
When this happens all of the worker threads will panic and be recovered and restarted. So each goroutine will panic, be recovered and restarted... at which point the memory usage will not decrease, it remains at 48GB even though there is now virtually nothing allocated. This means all goroutines will always panic as there is never enough memory, until the entire executable is killed and restarted completely.
The entire thing is about 50,000 lines, but the actual problematic area is as follows:
type queue struct {
identifier string
type bool
}
func main() {
// Set number of gorountines that can be run
var xthreads int32 = 10
var usedthreads int32
runtime.GOMAXPROCS(14)
ready := make(chan *queue, 5)
// Start the manager goroutine, which prepared identifiers in the background ready for processing, always with 5 waiting to go
go manager(ready)
// Start creating goroutines to process as they are ready
for obj := range ready { // loops through "ready" channel and waits when there is nothing
// This section uses atomic instead of a blocking channel in an earlier attempt to stop the memory leak, but it didn't work
for atomic.LoadInt32(&usedthreads) >= xthreads {
time.Sleep(time.Second)
}
debug.FreeOSMemory() // Try to clean up the memory, also did not stop the leak
atomic.AddInt32(&usedthreads, 1) // Mark goroutine as started
// Unleak obj, probably unnecessary, but just to be safe
copy := new(queue)
copy.identifier = unleak.String(obj.identifier) // unleak is a 3rd party package that makes a copy of the string
copy.type = obj.type
go runit(copy, &usedthreads) // Start the processing thread
}
fmt.Println(`END`) // This should never happen as the channels are never closed
}
func manager(ready chan *queue) {
// This thread communicates with another server and fills the "ready" channel
}
// This is the goroutine
func runit(obj *queue, threadcount *int32) {
defer func() {
if r := recover(); r != nil {
// Panicked
erstring := fmt.Sprint(r)
reportFatal(obj.identifier, erstring)
} else {
// Completed successfully
reportDone(obj.identifier)
}
atomic.AddInt32(threadcount, -1) // Mark goroutine as finished
}()
do(obj) // This function does the actual processing
}
As far as I can see, when the do function (last line) ends, either by having finished or having panicked, the runit function then ends, which ends the goroutine entirely, which means all of the memory from that goroutine should now be free. This is now what happens. What happens is that this app just uses more and more and more memory until it becomes unable to function, all the runit goroutines panic, and yet the memory does not decrease.
Profiling does not reveal anything suspicious. The leak appears to be outside of the profiler's scope.
Please consider inverting the pattern, see here or below....
package main
import (
"log"
"math/rand"
"sync"
"time"
)
// I do work
func worker(id int, work chan int) {
for i := range work {
// Work simulation
log.Printf("Worker %d, sleeping for %d seconds\n", id, i)
time.Sleep(time.Duration(rand.Intn(i)) * time.Second)
}
}
// Return some fake work
func getWork() int {
return rand.Intn(2) + 1
}
func main() {
wg := new(sync.WaitGroup)
work := make(chan int)
// run 10 workers
for i := 0; i < 10; i++ {
wg.Add(1)
go func(i int) {
worker(i, work)
wg.Done()
}(i)
}
// main "thread"
for i := 0; i < 100; i++ {
work <- getWork()
}
// signal there is no more work to be done
close(work)
// Wait for the workers to exit
wg.Wait()
}