Golang: can WaitGroup leak with go-routines - multithreading

I am planning to implement a go-routine and have a sync.WaitGroup to synchronize end of a created go-routine. I create a thread essentially using go <function>. So it is something like:
main() {
var wg sync.WaitGroup
for <some condition> {
go myThread(wg)
wg.Add(1)
}
wg.wait()
}
myThread(wg sync.WaitGroup) {
defer wg.Done()
}
I have earlier worked with pthread_create which does fail to create a thread under some circumstances. With that context, is it possibly for the above go myThread(wg) to fail to start, and/or run wg.Done() if the rest of the routine behaves correctly? If so, what would be reported and how would the error be caught? My concern is a possible leak in wg due to wg.Add(1) following the thread creation. (Of course it may be possible to use wg.Add(1) within the function, but that leads to other races between the increment and the main program waiting).
I have read through numerous documentation of go-routines and there is no mention of a failure in scheduling or thread creation anywhere. What would be the case if I create billions of threads and exhaust bookkeeping space? Would the go-routine still work and threads still be created?

I don't know of any possible way for this to fail, and if it is possible, it would result in a panic (and therefor application crash). I have never seen it happen, and I'm aware of examples of applications running millions of goroutines. The only limiting factor is available memory to allocate the goroutine stack.
go foo() is not like pthread_create. Goroutines are lightweight green threads handled by the Go runtime, and scheduled to run on OS threads. Starting a goroutine does not start a new OS thread.

The problem with your code is not in starting a goroutine (which cannot "fail" per se) or that like but in the use of sync.WaitGroup. Your code has two major bugs:
You must do wg.Add(1) before launching the goroutine as otherwise the Done() could be executed before the Add(1).
You must not copy a sync.WaitGroup. Your code makes a copy while calling myThread().
Both issues are explained in the official documentation to sync.WaitGroup and the given example in https://golang.org/pkg/sync/#WaitGroup

Related

Best way to wake 0-N sleeping goroutines at once

I'm writing a program where I start N (N is a command-line argument) worker threads, and at any time 0 to N-1 of them can be waiting on another to update a variable. What's the best way for the threads to wait for this event, and the best way for one of the threads to notify all the others at once of the event occurring? This event will be sent multiple times by each thread.
sync.Cond isn't appropriate because the threads don't need to lock a resource upon waking from sleep. sync.WaitGroup won't work because I don't know how many times to call wg.Done().
Solution #1: I could use a sync.Mutex and have the thread that will eventually notify the others acquire the lock and then unlock it to notify the others, but it seems really inefficient for the others to all fight over a lock when they all just need to pop out of sleep, read a variable to see if that particular worker is now the master, and then either go back to sleep or start working.
Solution #2: Create a wrapper for sync.WaitGroup that allows keeping track of the number of waiting threads so that I can call wg.Add(-numWaitingThreads) to wake them. This sounds like a headache to figure out how to code it without all sorts of race conditions.
Solution #3: Until someone comes up with a better idea, I'll be using a list of N channels and have the notifier non-blocking-send to all of the channels except its own. Is this really the best way?
More details: I give each worker some unique credits and have a central variable for "which credit is the next to be written to the output file". When a worker finishes its work for whichever credit ID it was working on, it needs to do the following:
for centralNextCreditID != creditID {
wait_for_centralNextCreditID_to_change()
}
saveWorkToFile()
centralNextCreditID++
wake_other_threads_waiting_for_centralNextCreditID_to_change()
To me it does seem like this is an appropriate use case for sync.Cond. You can use a *RWMutex.RLocker() for Cond.L so all goroutines can acquire the read lock simultaneously once the Cond.Broadcast() is sent.
Additionally, it may be worth making sure you hold a write lock when changing this "who's master" variable to avoid race conditions, which would make sync.Cond an even better fit.
sync.WaitGroup won't work because I don't know how many times to call wg.Done().
wg can be used in this case. Make a wg with count 1 and pass this to the N goroutines. Make them wg.Wait(), except the one that updates the variable.
The goroutine updating the variable calls wg.Done() after successful update thus resulting in N goroutines to come out of wait and start executing further.
The title says that you want to wake 0-N sleeping goroutines, but the body of the question indicates that you only need to wake the goroutine for the next id (if there is a goroutine waiting).
Here's how to implement the problem described in the body of the question:
// waiter sequences work according to an incrementing id.
type waiter struct {
mu sync.Mutex
id int
waiting map[int]chan struct{}
}
func NewWaiter(firstID int) *waiter {
return &waiter{id: firstID, waiting: make(map[int]chan struct{})}
}
// wait waits for id's turn in the sequence.
func (w *waiter) wait(id int) {
w.mu.Lock()
if w.id == id {
// This id is next. Nothing to do.
w.mu.Unlock()
return
}
// Wait for our turn.
c := make(chan struct{})
w.waiting[id] = c
w.mu.Unlock()
<-c
}
// done signals that the work for the previous id is done.
func (w *waiter) done() {
w.mu.Lock()
w.id++
c, ok := w.waiting[w.id]
if ok {
delete(w.waiting, w.id)
}
w.mu.Unlock()
if ok {
// close cause c to receive a zero value
close(c)
}
}
Here's how to use it:
for _, creditID := range creditIDs {
doWorkFor(creditID)
waiter.wait(creditID)
saveWorkToFile()
waiter.done()
}
WaitGroup is the best option. The reason is that is keeps its signalled state and you are safe from deadlock if the main thread signals too early.
If you use Cond there is a risk that the main thread calls cond.Broadcast BEFORE the worker thread calls cond.Wait(). Since Cond doesn't remember that it was signalled, the worker thread will wait for the event to happen.
Here is an example: https://go.dev/play/p/YLfvEGO2A18
The main thread broadcasts too early, the worker threads run into a deadlock.
Same case with con.WaitGroup: https://go.dev/play/p/R6_-ULo2eJ2
The main thread releases the wait group too early, but there is no deadlock.

Will Go's scheduler yield control from one goroutine to another for CPU-intensive work?

The accepted answer at golang methods that will yield goroutines explains that Go's scheduler will yield control from one goroutine to another when a syscall is encountered. I understand that this means if you have multiple goroutines running, and one begins to wait for something like an HTTP response, the scheduler can use this as a hint to yield control from that goroutine to another.
But what about situations where there are no syscalls involved? What if, for example, you had as many goroutines running as logical CPU cores/threads available, and each were in the middle of a CPU-intensive calculation that involved no syscalls. In theory, this would saturate the CPU's ability to do work. Would the Go scheduler still be able to detect an opportunity to yield control from one of these goroutines to another, that perhaps wouldn't take as long to run, and then return control back to one of these goroutines performing the long CPU-intensive calculation?
There are few if any promises here.
The Go 1.14 release notes says this in the Runtime section:
Goroutines are now asynchronously preemptible. As a result, loops without function calls no longer potentially deadlock the scheduler or significantly delay garbage collection. This is supported on all platforms except windows/arm, darwin/arm, js/wasm, and plan9/*.
A consequence of the implementation of preemption is that on Unix systems, including Linux and macOS systems, programs built with Go 1.14 will receive more signals than programs built with earlier releases. This means that programs that use packages like syscall or golang.org/x/sys/unix will see more slow system calls fail with EINTR errors. ...
I quoted part of the third paragraph here because this gives us a big clue as to how this asynchronous preemption works: the runtime system has the OS deliver some OS signal (SIGALRM, SIGVTALRM, etc.) on some sort of schedule (real or virtual time). This allows the Go runtime to implement the same kind of schedulers that real OSes implement with real (hardware) or virtual (virtualized hardware) timers. As with OS schedulers, it's up to the runtime to decide what to do with the clock ticks: perhaps just run the GC code, for instance.
We also see a list of platforms that don't do it. So we probably should not assume it will happen at all.
Fortunately, the runtime source is actually available: we can go look to see what does happen, should any given platform implement it. This shows that in runtime/signal_unix.go:
// We use SIGURG because it meets all of these criteria, is extremely
// unlikely to be used by an application for its "real" meaning (both
// because out-of-band data is basically unused and because SIGURG
// doesn't report which socket has the condition, making it pretty
// useless), and even if it is, the application has to be ready for
// spurious SIGURG. SIGIO wouldn't be a bad choice either, but is more
// likely to be used for real.
const sigPreempt = _SIGURG
and:
// doSigPreempt handles a preemption signal on gp.
func doSigPreempt(gp *g, ctxt *sigctxt) {
// Check if this G wants to be preempted and is safe to
// preempt.
if wantAsyncPreempt(gp) && isAsyncSafePoint(gp, ctxt.sigpc(), ctxt.sigsp(), ctxt.siglr()) {
// Inject a call to asyncPreempt.
ctxt.pushCall(funcPC(asyncPreempt))
}
// Acknowledge the preemption.
atomic.Xadd(&gp.m.preemptGen, 1)
atomic.Store(&gp.m.signalPending, 0)
}
The actual asyncPreempt function is in assembly, but it just does some assembly-only trickery to save user registers, and then calls asyncPreempt2 which is in runtime/preempt.go:
//go:nosplit
func asyncPreempt2() {
gp := getg()
gp.asyncSafePoint = true
if gp.preemptStop {
mcall(preemptPark)
} else {
mcall(gopreempt_m)
}
gp.asyncSafePoint = false
}
Compare this to runtime/proc.go's Gosched function (documented as the way to voluntarily yield):
//go:nosplit
// Gosched yields the processor, allowing other goroutines to run. It does not
// suspend the current goroutine, so execution resumes automatically.
func Gosched() {
checkTimeouts()
mcall(gosched_m)
}
We see the main differences include some "async safe point" stuff and that we arrange for an M-stack-call to gopreempt_m instead of gosched_m. So, apart from the safety check stuff and a different trace call (not shown here) the involuntary preemption is almost exactly the same as voluntary preemption.
To find this, we had to dig rather deep into the (Go 1.14, in this case) implementation. One might not want to depend too much on this.
A little bit more on this to complete #torek's answer.
Goroutines are interruptible when there is a syscall, but also when a routine is waiting on a lock, a chan or sleeping.
As #torek's said, since 1.14 routines can also be preempted even when they do none of the above. The scheduler can mark any routine as preemptible after it ran for more than 10ms.
More reading there: https://medium.com/a-journey-with-go/go-goroutine-and-preemption-d6bc2aa2f4b7

Goroutines are cooperatively scheduled. Does that mean that goroutines that don't yield execution will cause goroutines to run one by one?

From: http://blog.nindalf.com/how-goroutines-work/
As the goroutines are scheduled cooperatively, a goroutine that loops continuously can starve other goroutines on the same thread.
Goroutines are cheap and do not cause the thread on which they are multiplexed to block if they are blocked on
network input
sleeping
channel operations or
blocking on primitives in the sync package.
So given the above, say that you have some code like this that does nothing but loop a random number of times and print the sum:
func sum(x int) {
sum := 0
for i := 0; i < x; i++ {
sum += i
}
fmt.Println(sum)
}
if you use goroutines like
go sum(100)
go sum(200)
go sum(300)
go sum(400)
will the goroutines run one by one if you only have one thread?
A compilation and tidying of all of creker's comments.
Preemptive means that kernel (runtime) allows threads to run for a specific amount of time and then yields execution to other threads without them doing or knowing anything. In OS kernels that's usually implemented using hardware interrupts. Process can't block entire OS. In cooperative multitasking thread have to explicitly yield execution to others. If it doesn't it could block whole process or even whole machine. That's how Go does it. It has some very specific points where goroutine can yield execution. But if goroutine just executes for {} then it will lock entire process.
However, the quote doesn't mention recent changes in the runtime. fmt.Println(sum) could cause other goroutines to be scheduled as newer runtimes will call scheduler on function calls.
If you don't have any function calls, just some math, then yes, goroutine will lock the thread until it exits or hits something that could yield execution to others. That's why for {} doesn't work in Go. Even worse, it will still lead to process hanging even if GOMAXPROCS > 1 because of how GC works, but in any case you shouldn't depend on that. It's good to understand that stuff but don't count on it. There is even a proposal to insert scheduler calls in loops like yours
The main thing that Go's runtime does is it gives its best to allow everyone to execute and don't starve anyone. How it does that is not specified in the language specification and might change in the future. If the proposal about loops will be implemented then even without function calls switching could occur. At the moment the only thing you should remember is that in some circumstances function calls could cause goroutine to yield execution.
To explain the switching in Akavall's answer, when fmt.Printf is called, the first thing it does is checks whether it needs to grow the stack and calls the scheduler. It MIGHT switch to another goroutine. Whether it will switch depends on the state of other goroutines and exact implementation of the scheduler. Like any scheduler, it probably checks whether there're starving goroutines that should be executed instead. With many iterations function call has greater chance to make a switch because others are starving longer. With few iterations goroutine finishes before starvation happens.
For what its worth it. I can produce a simple example where it is clear that the goroutines are not ran one by one:
package main
import (
"fmt"
"runtime"
)
func sum_up(name string, count_to int, print_every int, done chan bool) {
my_sum := 0
for i := 0; i < count_to; i++ {
if i % print_every == 0 {
fmt.Printf("%s working on: %d\n", name, i)
}
my_sum += 1
}
fmt.Printf("%s: %d\n", name, my_sum)
done <- true
}
func main() {
runtime.GOMAXPROCS(1)
done := make(chan bool)
const COUNT_TO = 10000000
const PRINT_EVERY = 1000000
go sum_up("Amy", COUNT_TO, PRINT_EVERY, done)
go sum_up("Brian", COUNT_TO, PRINT_EVERY, done)
<- done
<- done
}
Result:
....
Amy working on: 7000000
Brian working on: 8000000
Amy working on: 8000000
Amy working on: 9000000
Brian working on: 9000000
Brian: 10000000
Amy: 10000000
Also if I add a function that just does a forever loop, that will block the entire process.
func dumb() {
for {
}
}
This blocks at some random point:
go dumb()
go sum_up("Amy", COUNT_TO, PRINT_EVERY, done)
go sum_up("Brian", COUNT_TO, PRINT_EVERY, done)
Well, let's say runtime.GOMAXPROCS is 1. The goroutines run concurrently one at a time. Go's scheduler just gives the upper hand to one of the spawned goroutines for a certain time, then to another, etc until all are finished.
So, you never know which goroutine is running at a given time, that's why you need to synchronize your variables. From your example, it's unlikely that sum(100) will run fully, then sum(200) will run fully, etc
The most probable is that one goroutine will do some iterations, then another will do some, then another again etc.
So, the overall is that they are not sequential, even if there is only one goroutine active at a time (GOMAXPROCS=1).
So, what's the advantage of using goroutines ? Plenty. It means that you can just do an operation in a goroutine because it is not crucial and continue the main program. Imagine an HTTP webserver. Treating each request in a goroutine is convenient because you do not have to care about queueing them and run them sequentially: you let Go's scheduler do the job.
Plus, sometimes goroutines are inactive, because you called time.Sleep, or they are waiting for an event, like receiving something for a channel. Go can see this and just executes other goroutines while some are in those idle states.
I know there are a handful of advantages I didn't present, but I don't know concurrency that much to tell you about them.
EDIT:
Related to your example code, if you add each iteration at the end of a channel, run that on one processor and print the content of the channel, you'll see that there is no context switching between goroutines: Each one runs sequentially after another one is done.
However, it is not a general rule and is not specified in the language. So, you should not rely on these results for drawing general conclusions.
#Akavall Try adding sleep after creating dumb goroutine, goruntime never executes sum_up goroutines.
From that it looks like go runtime spawns next go routines immediately, it might execute sum_up goroutine until go runtime schedules dumb() goroutine to run. Once dumb() is scheduled to run then go runtime won't schedule sum_up goroutines to run, as dumb runs for{}

Why doesn't Go's LockOSThread lock this OS thread?

The documentation for runtime.LockOsThread states:
LockOSThread wires the calling goroutine to its current operating system thread. Until the calling goroutine exits or calls UnlockOSThread, it will always execute in that thread, and no other goroutine can.
But consider this program:
package main
import (
"fmt"
"runtime"
"time"
)
func main() {
runtime.GOMAXPROCS(1)
runtime.LockOSThread()
go fmt.Println("This shouldn't run")
time.Sleep(1 * time.Second)
}
The main goroutine is wired to the one available OS thread set by GOMAXPROCS, so I would expect that the goroutine created on line 3 of main will not run. But instead the program prints This shouldn't run, pauses for 1 second, and quits. Why does this happen?
From the runtime package documentation:
The GOMAXPROCS variable limits the number of operating system threads that can execute user-level Go code simultaneously. There is no limit to the number of threads that can be blocked in system calls on behalf of Go code; those do not count against the GOMAXPROCS limit.
The sleeping thread doesn't count towards the GOMAXPROCS value of 1, so Go is free to have another thread run the fmt.Println goroutine.
Here's a Windows sample that will probably help you understand what's going on. It prints thread IDs on which goroutines are running. Had to use syscall so it works only on Windows. But you can easily port it to other systems.
package main
import (
"fmt"
"runtime"
"golang.org/x/sys/windows"
)
func main() {
runtime.GOMAXPROCS(1)
runtime.LockOSThread()
ch := make(chan bool, 0)
go func(){
fmt.Println("2", windows.GetCurrentThreadId())
<- ch
}()
fmt.Println("1", windows.GetCurrentThreadId())
<- ch
}
I don't use sleep to prevent runtime from spawning another thread for sleeping goroutine. Channel will block and just remove goroutine from running queue. If you execute the code you will see that thread IDs are different. Main goroutine locked one of the threads so the runtime has to spawn another one.
As you already know GOMAXPROCS does not prevent the runtime from spawning more threads. GOMAXPROCS is more about the number of threads that can execute goroutines in parallel. But more threads can be created for goroutines that are waiting for syscall to complete, for example.
If you remove runtime.LockOSThread() you will see that thread IDs are equal. That's because channel read blocks the goroutine and allows the runtime to yield execution to another goroutine without spawning new thread. That's how multiple goroutines can execute concurrently even when GOMAXPROCS is 1.
GOMAXPROCS(1) which causes you to have one ACTIVE M (OS thread) to be present to server the go routines (G).
In your program there are two Go routines, one is main, and the other is fmt.Println. Since main routine is in sleep, M is free to run any go routine, which in this case fmt.Println can run.
That looks like correct behavior to me. From what I understand the LockOSThread() function only ties all future go calls to the single OS thread, it does not sleep or halt the thread ever.
Edit for clarity: the only thing that LockOSThread() does is turn off multithreadding so that all future GO calls happen on a single thread. This is primarily for use with things like graphics API's that require a single thread to work correctly.
Edited again.
If you compare this:
func main() {
runtime.GOMAXPROCS(6)
//insert long running routine here.
go fmt.Println("This may run almost straight away if tho long routine uses a different thread")
}
To this:
func main() {
runtime.GOMAXPROCS(6)
runtime.LockOSThread()
//insert long running routine here.
go fmt.Println("This will only run after the task above has completed")
}
If we insert a long running routine in the location indicated above then the first block may run almost straight away if the long routine runs on a new thread, but in the second example it will always have to wait for the routine to complete.

How to close thread winapi

what is the rigth way to close Thread in Winapi, threads don't use common resources.
I am creating threads with CreateThread , but I don't know how to close it correctly in ,because someone suggest to use TerminateThread , others ExitThread , but what is the correct way to close it .
Also where should I call closing function in WM_CLOSE or WM_DESTROY ?
Thx in advance .
The "nicest" way to close a thread in Windows is by "telling" the thread to shutdown via some thread-safe signaling mechanism, then simply letting it reach its demise its own, potentially waiting for it to do so via one of the WaitForXXXX functions if completion detection is needed (which is frequently the case). Something like:
Main thread:
// some global event all threads can reach
ghStopEvent = CreateEvent(NULL, TRUE, FALSE, NULL);
// create the child thread
hThread = CreateThread(NULL, 0, ThreadProc, NULL, 0, NULL);
//
// ... continue other work.
//
// tell thread to stop
SetEvent(ghStopEvent);
// now wait for thread to signal termination
WaitForSingleObject(hThread, INFINITE);
// important. close handles when no longer needed
CloseHandle(hThread);
CloseHandle(ghStopEvent);
Child thread:
DWORD WINAPI ThreadProc(LPVOID pv)
{
// do threaded work
while (WaitForSingleObject(ghStopEvent, 1) == WAIT_TIMEOUT)
{
// do thread busy work
}
return 0;
}
Obviously things can get a lot more complicated once you start putting it in practice. If by "common" resources you mean something like the ghStopEvent in the prior example, it becomes considerably more difficult. Terminating a child thread via TerminateThread is strongly discouraged because there is no logical cleanup performed at all. The warnings specified in the `TerminateThread documentation are self-explanatory, and should be heeded. With great power comes....
Finally, even the called thread invoking ExitThread is not required explicitly by you, and though you can do so, I strongly advise against it in C++ programs. It is called for you once the thread procedure logically returns from the ThreadProc. I prefer the model above simply because it is dead-easy to implement and supports full RAII of C++ object cleanup, which neither ExitThread nor TerminateThread provide. For example, the ExitThread documentation :
...in C++ code, the thread is exited before any destructors can be called
or any other automatic cleanup can be performed. Therefore, in C++
code, you should return from your thread function.
Anyway, start simple. Get a handle on things with super-simple examples, then work your way up from there. There are a ton of multi-threaded examples on the web, Learn from the good ones and challenge yourself to identify the bad ones.
Best of luck.
So you need to figure out what sort of behaviour you need to have.
Following is a simple description of the methods taken from documentation:
"TerminateThread is a dangerous function that should only be used in the most extreme cases. You should call TerminateThread only if you know exactly what the target thread is doing, and you control all of the code that the target thread could possibly be running at the time of the termination. For example, TerminateThread can result in the following problems:
If the target thread owns a critical section, the critical section will not be released.
If the target thread is allocating memory from the heap, the heap lock will not be released.
If the target thread is executing certain kernel32 calls when it is terminated, the kernel32 state for the thread's process could be inconsistent.
If the target thread is manipulating the global state of a shared DLL, the state of the DLL could be destroyed, affecting other users of the DLL."
So if you need your thread to terminate at any cost, call this method.
About ExitThread, this is more graceful. By calling ExitThread, you're telling to windows you're done with that calling thread, so the rest of the code isn't going to get called. It's a bit like calling exit(0).
"ExitThread is the preferred method of exiting a thread. When this function is called (either explicitly or by returning from a thread procedure), the current thread's stack is deallocated, all pending I/O initiated by the thread is canceled, and the thread terminates. If the thread is the last thread in the process when this function is called, the thread's process is also terminated."

Resources