Ticker's behavior with time.sleep() - multithreading

Code:
func main() {
fmt.Println(time.Now())
ticker := time.NewTicker(100 * time.Millisecond)
done := make(chan bool)
go func() {
time.Sleep(900 * time.Millisecond)
for {
select {
case <-done:
return
case t := <-ticker.C:
fmt.Println("Tick at", t)
}
}
}()
time.Sleep(1600 * time.Millisecond)
ticker.Stop()
done <- true
fmt.Println("Ticker stopped")
}
Output:
2021-12-15 17:00:44.2506052 +0800 +08 m=+0.002777301
Tick at 2021-12-15 17:00:44.3916764 +0800 +08 m=+0.143848501
Tick at 2021-12-15 17:00:45.2913066 +0800 +08 m=+1.043478701
Tick at 2021-12-15 17:00:45.4007827 +0800 +08 m=+1.152954801
Tick at 2021-12-15 17:00:45.4930864 +0800 +08 m=+1.245258501
Tick at 2021-12-15 17:00:45.6021253 +0800 +08 m=+1.354297401
Tick at 2021-12-15 17:00:45.6980372 +0800 +08 m=+1.450209301
Tick at 2021-12-15 17:00:45.7929148 +0800 +08 m=+1.545086901
Tick at 2021-12-15 17:00:45.901921 +0800 +08 m=+1.654093101
Ticker stopped
Questions:
How do I interpret the result? More specifically:
Why the sleep in the goroutine will pause the ticker while the sleep in the main routine will not?
Is ticker.C non buffering so there aren't 16 ticks?
Why the first tick has m=+0.143848501?

The sleep in the goruotine doesn't pause the ticker, it delays the moment when the value is printed for the first time.
ticker.C has a buffer of 1. According to comments in code:
// Give the channel a 1-element time buffer.
// If the client falls behind while reading, we drop ticks
// on the floor until the client catches up.
So there is only one buffered value there.
The first tick is written into the channel roughly around the moment when the ticker duration passes for the first time ~100ms. Other ticks are then skipped because buffer in ticker.C is full and are dropped until the channel is unblocked after time.Sleep passes so we have a jump of ~900 ms.

Related

How to reduce the time deviation when using threads?

Here is the attempt to make a simple piece of code, that would get the current time and hypothetically trigger a function when time is right.
{-# LANGUAGE BlockArguments, NumericUnderscores #-}
module Main where
import Control.Concurrent
import Control.Monad (forever, forM, void)
import Data.Time.Clock
main :: IO ()
main = forever do
forkIO writer
threadDelay 1_000_000
writer :: IO ()
writer = print =<< getCurrentTime
And is get this:
2021-12-13 09:22:08.7632491 UTC
2021-12-13 09:22:09.7687358 UTC
2021-12-13 09:22:10.7756821 UTC
2021-12-13 09:22:11.7772306 UTC
2021-12-13 09:22:12.7954329 UTC
2021-12-13 09:22:13.8096189 UTC
2021-12-13 09:22:14.8218579 UTC
2021-12-13 09:22:15.826626 UTC
2021-12-13 09:22:16.8291541 UTC
2021-12-13 09:22:17.8358406 UTC
2021-12-13 09:22:18.8468617 UTC
2021-12-13 09:22:19.8490381 UTC
2021-12-13 09:22:20.859682 UTC
2021-12-13 09:22:21.868705 UTC
2021-12-13 09:22:22.88392 UTC
2021-12-13 09:22:23.8893969 UTC
2021-12-13 09:22:24.8940725 UTC
2021-12-13 09:22:25.9026013 UTC
2021-12-13 09:22:26.9181843 UTC
2021-12-13 09:22:27.920115 UTC
2021-12-13 09:22:28.9214061 UTC
2021-12-13 09:22:29.9236218 UTC
2021-12-13 09:22:30.9320501 UTC
2021-12-13 09:22:31.9359116 UTC
2021-12-13 09:22:32.9381218 UTC
2021-12-13 09:22:33.9541171 UTC
2021-12-13 09:22:34.9639691 UTC
2021-12-13 09:22:35.9767943 UTC
2021-12-13 09:22:36.9909998 UTC
2021-12-13 09:22:38.0016628 UTC
2021-12-13 09:22:39.0029746 UTC
2021-12-13 09:22:40.01921 UTC
2021-12-13 09:22:41.0337936 UTC
2021-12-13 09:22:42.0369494 UTC
2021-12-13 09:22:43.0403321 UTC
2021-12-13 09:22:44.0426835 UTC
2021-12-13 09:22:45.0468416 UTC
2021-12-13 09:22:46.0503551 UTC
2021-12-13 09:22:47.0557148 UTC
2021-12-13 09:22:48.066979 UTC
2021-12-13 09:22:49.0723431 UTC
As you might have noticed, the differences are not exact and faults in the timedif can be crucial in my case. Any ways to improve this?
Tried the option when a different thread takes the print function, but makes little difference in the long run.
Thank you!
Now, here's an answer to your original question. The secret is that instead of always waiting for a second between events, you should keep track of a trigger time, always increment it by a second, and wait whatever amount of time is needed to get to the next trigger time. It's actually similar to my other answer in many respects:
{-# LANGUAGE NumericUnderscores #-}
module Main where
import Control.Concurrent
import Control.Monad
import Data.Time
main :: IO ()
main = everySecond =<< getCurrentTime
everySecond :: UTCTime -> IO ()
everySecond tick = do
forkIO writer
-- next tick in one second
let nexttick = addUTCTime (secondsToNominalDiffTime 1) tick
now <- getCurrentTime
let wait = nominalDiffTimeToSeconds (diffUTCTime nexttick now)
threadDelay $ ceiling (wait * 1_000_000)
everySecond nexttick
writer :: IO ()
writer = print =<< getCurrentTime
Sample output:
2021-12-13 21:16:53.316687476 UTC
2021-12-13 21:16:54.318070692 UTC
2021-12-13 21:16:55.31821399 UTC
2021-12-13 21:16:56.318432887 UTC
2021-12-13 21:16:57.318432582 UTC
2021-12-13 21:16:58.318648861 UTC
2021-12-13 21:16:59.317988137 UTC
2021-12-13 21:17:00.318367675 UTC
2021-12-13 21:17:01.318565036 UTC
2021-12-13 21:17:02.317856019 UTC
2021-12-13 21:17:03.318285608 UTC
2021-12-13 21:17:04.318508451 UTC
2021-12-13 21:17:05.318487069 UTC
2021-12-13 21:17:06.318435325 UTC
2021-12-13 21:17:07.318504691 UTC
2021-12-13 21:17:08.318591666 UTC
2021-12-13 21:17:09.317797443 UTC
2021-12-13 21:17:10.317732578 UTC
2021-12-13 21:17:11.318100396 UTC
2021-12-13 21:17:12.318535002 UTC
2021-12-13 21:17:13.318008916 UTC
2021-12-13 21:17:14.317803441 UTC
2021-12-13 21:17:15.318220664 UTC
2021-12-13 21:17:16.318558786 UTC
2021-12-13 21:17:17.31793816 UTC
2021-12-13 21:17:18.322564881 UTC
2021-12-13 21:17:19.318923334 UTC
2021-12-13 21:17:20.318293808 UTC
Not quite an answer to your question, but if you want to write a program to trigger events at specific times, a more robust design is:
Sort the list of (time,event) pairs by time
Sleep for the difference between the first event time in the list and the current time
When you wake up, get/update the current time, and execute and remove from the front of the list all events whose time has "expired" (i.e., event time on or before the current time).
If the list is still non-empty, jump to step 2.
This avoids the need to poll every second (which maybe isn't a big deal, but still...) and avoids the possibility that events will be missed because you woke up later than expected.
An example program follows. (This program relies on threadDelay treating negative numbers the same as zero, in case the events take a long time to run, and the actual time overruns the first unexpired event.)
{-# LANGUAGE NumericUnderscores #-}
import Data.List
import Data.Time
import Control.Concurrent
data Event = Event
{ eventTime :: UTCTime
, eventAction :: IO ()
}
runEvents :: [Event] -> IO ()
runEvents = go . sortOn eventTime
where go [] = return () -- no more events
go events#(Event firstTime _ : _) = do
now <- getCurrentTime
let wait = nominalDiffTimeToSeconds (diffUTCTime firstTime now)
threadDelay $ ceiling (wait * 1_000_000)
now' <- getCurrentTime
let (a, b) = span (expiredAsOf now') events
mapM eventAction a -- run the expired events
go b -- wait for the rest
expiredAsOf t e = eventTime e <= t
main = do
-- some example events
now <- getCurrentTime
let afterSeconds = flip addUTCTime now . secondsToNominalDiffTime
evts = [ Event (afterSeconds 3) (putStrLn "3 seconds")
, Event (afterSeconds 6) (putStrLn "6 seconds action # 1")
, Event (afterSeconds 6) (putStrLn "6 seconds action # 2")
, Event (afterSeconds 7) (putStrLn "Done after 7 seconds")
]
runEvents evts

Unexpected QT5 QTimer duration on ARM

I am working on QT console application to execute on a ARM CPU and I met a very strange behavior of QTimer: instead of planned 100 ms, the timer expired after 1946 ms. I changed the duration but the observed behavior does not change (about some milliseconds, ex. 1958 ms instead of 40 ms).
When the same code is executed on x86_AMD64 (I stubbed the call to a specific HW API function; the execution of this function without QTimer slot requires less than 3 ms), the timer duration is as expected +/- 100 ms.
Note: the version of embedded QT is 5.4.1; the version of PC QT is 5.9.5
I tried different durations, including 0. The expiring is about the same duration.
I monitored the CPU usage (less than 30%) and load average (less than 0.15).
I wrote also a small QT console application which starts some timers of different durations and logs the elapsed times. The results are corrects (the elapsed times drift, as "expected" ;), so I think buildchain and embedded QT installation are good.
I added to my initial code a QElapsedTimer and I logged the elapsed time in the slot method of 40 ms QTimer.
I obtained the trace on PC:
mDebugMessage = ("elapsed time = 42 ms - INPUT_DOOR_LOCKED_SENSOR=false - INPUT_DOOR_UNLOCKED_SENSOR=true - time = 46", "elapsed time = 81 ms - INPUT_DOOR_LOCKED_SENSOR=false - INPUT_DOOR_UNLOCKED_SENSOR=true - time = 81", "elapsed time = 122 ms - INPUT_DOOR_LOCKED_SENSOR=false - INPUT_DOOR_UNLOCKED_SENSOR=true - time = 122", "elapsed time = 162 ms - INPUT_DOOR_LOCKED_SENSOR=false - INPUT_DOOR_UNLOCKED_SENSOR=false - time = 163", "elapsed time = 201 ms - INPUT_DOOR_LOCKED_SENSOR=false - INPUT_DOOR_UNLOCKED_SENSOR=false - time = 201", "elapsed time = 242 ms - INPUT_DOOR_LOCKED_SENSOR=false - INPUT_DOOR_UNLOCKED_SENSOR=true - time = 242", "elapsed time = 281 ms - INPUT_DOOR_LOCKED_SENSOR=false - INPUT_DOOR_UNLOCKED_SENSOR=false - time = 281", ...
On ARM, the trace is different, instead of expected +/- 40 ms, the duration is about 2 seconds:
mDebugMessage = ("elapsed time = 1958 ms - INPUT_DOOR_LOCKED_SENSOR=false - INPUT_DOOR_UNLOCKED_SENSOR=false - time = 1961", "elapsed time = 3916 ms - INPUT_DOOR_LOCKED_SENSOR=false - INPUT_DOOR_UNLOCKED_SENSOR=false - time = 3919", "elapsed time = 5873 ms - INPUT_DOOR_LOCKED_SENSOR=false - INPUT_DOOR_UNLOCKED_SENSOR=false - time = 5876", "elapsed time = 7830 ms - INPUT_DOOR_LOCKED_SENSOR=false - INPUT_DOOR_UNLOCKED_SENSOR=false - time = 7833", "elapsed time = 9787 ms - INPUT_DOOR_LOCKED_SENSOR=false - INPUT_DOOR_UNLOCKED_SENSOR=false - time = 9790", "elapsed time = 11744 ms - INPUT_DOOR_LOCKED_SENSOR=false - INPUT_DOOR_UNLOCKED_SENSOR=false - time = 11747", "elapsed time = 13700 ms - INPUT_DOOR_LOCKED_SENSOR=false - INPUT_DOOR_UNLOCKED_SENSOR=false - time = 13705", ...
I need your help to understand why my QTimer does not expire as expected or any clue to investigate on target what may prevent my timer to expire.
Thank you for your idea.
Best regards,
EDIT: as demanded, the code
const int CDoorManagement::I_DOOR_LOCKING_DURATION_MS = 40;
const int CDoorManagement::I_DOOR_LOCKING_ALARM_DURATION_MS = 12000;
CDoorManagement::CDoorManagement(CInputOutputManagerPtr ioPtr)
: QObject(nullptr)
, mIOManagerPtr(ioPtr)
, mOperationElapsedTimer()
, mDoorLockingTimer()
, mDebugMessages()
{
connect(&mDoorLockingTimer, SIGNAL(timeout()), this, SLOT(slotDoorLocking()), Qt::UniqueConnection);
}
void CDoorManagement::slotDoorLocking()
{
const auto elapsedTime = mOperationElapsedTimer.elapsed();
if (elapsedTime > I_DOOR_LOCKING_ALARM_DURATION_MS)
{
mDoorLockingTimer.stop();
mIOManagerPtr->setActuator(OUTPUT_DOOR_LOCKING_ACTUATOR, false);
mDebugMessages << QString("elapsed time = %1 ms - INPUT_DOOR_LOCKED_SENSOR=%2 - INPUT_DOOR_UNLOCKED_SENSOR=%3 - time = %4")
.arg(elapsedTime)
.arg(mIOManagerPtr->getTorInputState(INPUT_DOOR_LOCKED_SENSOR)?"true":"false")
.arg(mIOManagerPtr->getTorInputState(INPUT_DOOR_UNLOCKED_SENSOR)?"true":"false")
.arg(mOperationElapsedTimer.elapsed());
qDebug() << "door locking - mDebugMessage =" << mDebugMessages;
abort(QSTR_LOCKING_ABORTED);
}
if(mIOManagerPtr->getTorInputState(INPUT_DOOR_LOCKED_SENSOR))
{
mDoorLockingTimer.stop();
mIOManagerPtr->setActuator(OUTPUT_DOOR_LOCKING_ACTUATOR, false);
syslog(LOG_INFO, "%s::%s() - locked: elapsedTime = %lld, max time=%d",
LOG_PREFIX, __FUNCTION__, elapsedTime, I_DOOR_LOCKING_ALARM_DURATION_MS);
mDebugMessages << QString("elapsed time = %1 ms - INPUT_DOOR_LOCKED_SENSOR=%2 - INPUT_DOOR_UNLOCKED_SENSOR=%3 - time = %4")
.arg(elapsedTime)
.arg(mIOManagerPtr->getTorInputState(INPUT_DOOR_LOCKED_SENSOR)?"true":"false")
.arg(mIOManagerPtr->getTorInputState(INPUT_DOOR_UNLOCKED_SENSOR)?"true":"false")
.arg(mOperationElapsedTimer.elapsed());
qDebug() << "door locking - mDebugMessage =" << mDebugMessages;
emit signalDoorLocked();
}
else
{
mDebugMessages << QString("elapsed time = %1 ms - INPUT_DOOR_LOCKED_SENSOR=%2 - INPUT_DOOR_UNLOCKED_SENSOR=%3 - time = %4")
.arg(elapsedTime)
.arg(mIOManagerPtr->getTorInputState(INPUT_DOOR_LOCKED_SENSOR)?"true":"false")
.arg(mIOManagerPtr->getTorInputState(INPUT_DOOR_UNLOCKED_SENSOR)?"true":"false")
.arg(mOperationElapsedTimer.elapsed());
}
}
void CDoorManagement::startLocking()
{
mDebugMessages.clear();
qDebug() << "start of mDoorLockingTimer using " << I_DOOR_LOCKING_DURATION_MS << " ms delay";
mOperationElapsedTimer.start();
mDoorLockingTimer.start(I_DOOR_LOCKING_DURATION_MS);
if(!mIOManagerPtr->setActuator(OUTPUT_DOOR_LOCKING_ACTUATOR, true))
{
mIOManagerPtr->setActuator(OUTPUT_DOOR_LOCKING_ACTUATOR, false);
syslog(LOG_WARNING, "%s::%s() - failed to activate OUTPUT_DOOR_LOCKING_ACTUATOR", LOG_PREFIX, __FUNCTION__);
abort(QSTR_LOCKING_ACTIVATION_FAILURE);
}
}
I found the root cause of observed behavior: in example slot, I read a digital input and this reading requires 3 ms. in another slot, I read two RTD inputs and these readings require up to 2000 ms. the reading of digital and RTD inputs use the same library where there is a mutex to access the HW, either the access to digital or to RTD :(

waitgroup on subset of go routines

I have situation where in, the main go routines will create "x" go routines. but it is interested only in "y" ( y < x ) go routines to finish.
I was hoping to use Waitgroup. But Waitgroup only allows me to wait on all go routines. I cannot, for example do this,
1. wg.Add (y)
2 create "x" go routines. These routines will call wg.Done() when finished.
3. wg. Wait()
This panics when the y+1 go routine calls wg.Done() because the wg counter goes negative.
I sure can use channels to solve this but I am interested if Waitgroup solves this.
As noted in Adrian's answer, sync.WaitGroup is a simple counter whose Wait method will block until the counter value reaches zero. It is intended to allow you to block (or join) on a number of goroutines before allowing a main flow of execution to proceed.
The interface of WaitGroup is not sufficiently expressive for your usecase, nor is it designed to be. In particular, you cannot use it naïvely by simply calling wg.Add(y) (where y < x). The call to wg.Done by the (y+1)th goroutine will cause a panic, as it is an error for a wait group to have a negative internal value. Furthermore, we cannot be "smart" by observing the internal counter value of the WaitGroup; this would break an abstraction and, in any event, its internal state is not exported.
Implement your own!
You can implement the relevant logic yourself using some channels per the code below (playground link). Observe from the console that 10 goroutines are started, but after two have completed, we fallthrough to continue execution in the main method.
package main
import (
"fmt"
"time"
)
// Set goroutine counts here
const (
// The number of goroutines to spawn
x = 10
// The number of goroutines to wait for completion
// (y <= x) must hold.
y = 2
)
func doSomeWork() {
// do something meaningful
time.Sleep(time.Second)
}
func main() {
// Accumulator channel, used by each goroutine to signal completion.
// It is buffered to ensure the [y+1, ..., x) goroutines do not block
// when sending to the channel, which would cause a leak. It will be
// garbage collected when all goroutines end and the channel falls
// out of scope. We receive y values, so only need capacity to receive
// (x-y) remaining values.
accChan := make(chan struct{}, x-y)
// Spawn "x" goroutines
for i := 0; i < x; i += 1 {
// Wrap our work function with the local signalling logic
go func(id int, doneChan chan<- struct{}) {
fmt.Printf("starting goroutine #%d\n", id)
doSomeWork()
fmt.Printf("goroutine #%d completed\n", id)
// Communicate completion of goroutine
doneChan <- struct{}{}
}(i, accChan)
}
for doneCount := 0; doneCount < y; doneCount += 1 {
<-accChan
}
// Continue working
fmt.Println("Carrying on without waiting for more goroutines")
}
Avoid leaking resources
As this does not wait for the [y+1, ..., x) goroutines to complete, you should take special care in the doSomeWork function to remove or minimize the risk that the work can block indefinitely, which would also cause a leak. Remove, where possible, the feasibility of indefinite blocking on I/O (including channel operations) or falling into infinite loops.
You could use a context to signal to the additional goroutines when their results are no longer required to have them break out of execution.
WaitGroup doesn't actually wait on goroutines, it waits until its internal counter reaches zero. If you only Add() the number of goroutines you care about, and you only call Done() in those goroutines you care about, then Wait() will only block until those goroutines you care about have finished. You are in complete control of the logic and flow, there are no restrictions on what WaitGroup "allows".
Are these y specific go-routines that you are trying to track, or any y out of the x? What are the criteria?
Update:
1. If you hve control over any criteria to pick matching y go-routines:
You can do wp.wg.Add(1) and wp.wg.Done() from inside the goroutine based on your condition by passing it as a pointer argument into the goroutine, if your condition can't be checked outside the goroutine.
Something like below sample code. Will be able to be more specific if you provide more details of what you are trying to do.
func sampleGoroutine(z int, b string, wg *sync.WaitGroup){
defer func(){
if contition1{
wg.Done()
}
}
if contition1 {
wg.Add(1)
//do stuff
}
}
func main() {
wg := sync.WaitGroup{}
for i := 0; i < x; i++ {
go sampleGoroutine(1, "one", &wg)
}
wg.Wait()
}
2. If you have no control over which ones, and just want the first y:
Based on your comment, that you have no control/desire to pick any specific goroutines, but the ones that finish first. If you would want to do it in a generic way, you can use the below custom waitGroup implementation that fits your use case. (It's not copy-safe, though. Also doesn't have/need wg.Add(int) method)
type CountedWait struct {
wait chan struct{}
limit int
}
func NewCountedWait(limit int) *CountedWait {
return &CountedWait{
wait: make(chan struct{}, limit),
limit: limit,
}
}
func (cwg *CountedWait) Done() {
cwg.wait <- struct{}{}
}
func (cwg *CountedWait) Wait() {
count := 0
for count < cwg.limit {
<-cwg.wait
count += 1
}
}
Which can be used as follows:
func sampleGoroutine(z int, b string, wg *CountedWait) {
success := false
defer func() {
if success == true {
fmt.Printf("goroutine %d finished successfully\n", z)
wg.Done()
}
}()
fmt.Printf("goroutine %d started\n", z)
time.Sleep(time.Second)
if rand.Intn(10)%2 == 0 {
success = true
}
}
func main() {
x := 10
y := 3
wg := NewCountedWait(y)
for i := 0; i < x; i += 1 {
// Wrap our work function with the local signalling logic
go sampleGoroutine(i, "something", wg)
}
wg.Wait()
fmt.Printf("%d out of %d goroutines finished successfully.\n", y, x)
}
3. You can also club in context with 2 to ensure that the remaining goroutines don't leak
You may not be able to run this on play.golang, as it has some long sleeps.
Below is a sample output:
(note that, there may be more than y=3 goroutines marking Done, but you are only waiting till 3 finish)
goroutine 9 started
goroutine 0 started
goroutine 1 started
goroutine 2 started
goroutine 3 started
goroutine 4 started
goroutine 5 started
goroutine 5 marking done
goroutine 6 started
goroutine 7 started
goroutine 7 marking done
goroutine 8 started
goroutine 3 marking done
continuing after 3 out of 10 goroutines finished successfully.
goroutine 9 will be killed, bcz cancel
goroutine 8 will be killed, bcz cancel
goroutine 6 will be killed, bcz cancel
goroutine 1 will be killed, bcz cancel
goroutine 0 will be killed, bcz cancel
goroutine 4 will be killed, bcz cancel
goroutine 2 will be killed, bcz cancel
Play links
https://play.golang.org/p/l5i6X3GClBq
https://play.golang.org/p/Bcns0l9OdFg
https://play.golang.org/p/rkGSLyclgje

Golang scheduler mystery: Linux vs Mac OS X

I've run into some mysterious behavior with the Go scheduler, and I'm very curious about what's going on. The gist is that runtime.Gosched() doesn't work as expected in Linux unless it is preceded by a log.Printf() call, but it works as expected in both cases on OS X. Here's a minimal setup that reproduces the behavior:
The main goroutine sleeps for 1000 periods of 1ms, and after each sleep pushes a dummy message onto another goroutine via a channel. The second goroutine listens for new messages, and every time it gets one it does 10ms of work. So without any runtime.Gosched() calls, the program will take 10 seconds to run.
When I add periodic runtime.Gosched() calls in the second goroutine, as expected the program runtime shrinks down to 1 second on my Mac. However, when I try running the same program on Ubuntu, it still takes 10 seconds. I made sure to set runtime.GOMAXPROCS(1) in both cases.
Here's where it gets really strange: if I just add a logging statement before the runtime.Gosched() calls, then suddenly the program runs in the expected 1 second on Ubuntu as well.
package main
import (
"time"
"log"
"runtime"
)
func doWork(c chan int) {
for {
<-c
// This outer loop will take ~10ms.
for j := 0; j < 100 ; j++ {
// The following block of CPU work takes ~100 microseconds
for i := 0; i < 300000; i++ {
_ = i * 17
}
// Somehow this print statement saves the day in Ubuntu
log.Printf("donkey")
runtime.Gosched()
}
}
}
func main() {
runtime.GOMAXPROCS(1)
c := make(chan int, 1000)
go doWork(c)
start := time.Now().UnixNano()
for i := 0; i < 1000; i++ {
time.Sleep(1 * time.Millisecond)
// Queue up 10ms of work in the other goroutine, which will backlog
// this goroutine without runtime.Gosched() calls.
c <- 0
}
// Whole program should take about 1 second to run if the Gosched() calls
// work, otherwise 10 seconds.
log.Printf("Finished in %f seconds.", float64(time.Now().UnixNano() - start) / 1e9)
}
Additional details: I'm running go1.10 darwin/amd64, and compiling the linux binary with
env GOOS=linux GOARCH=amd64 go build ...
I've tried a few simple variants:
Just making a log.Printf() call, without the Gosched()
Making two calls to Gosched()
Keeping the Gosched() call but replacing the log.Printf() call to a dummy function call
All of these are ~10x slower than calling log.Printf() and then Gosched().
Any insights would be appreciated! This example is of course very artificial, but the issue came up while writing a websocket broadcast server which led to significantly degraded performance.
EDIT: I got rid of the extraneous bits in my example to make things more transparent. I've discovered that without the print statement, the runtime.Gosched() calls are still getting run, it's just that they seem to be delayed by a fixed 5ms, leading to a total runtime of almost exactly 5seconds in the example below, when the program should finish instantaneously (and does on my Mac, or on Ubuntu with the print statement).
package main
import (
"log"
"runtime"
"time"
)
func doWork() {
for {
// This print call makes the code run 20x faster
log.Printf("donkey")
// Without this line, the program never terminates (as expected). With this line
// and the print call above it, the program takes <300ms as expected, dominated by
// the sleep calls in the main goroutine. But without the print statement, it
// takes almost exactly 5 seconds.
runtime.Gosched()
}
}
func main() {
runtime.GOMAXPROCS(1)
go doWork()
start := time.Now().UnixNano()
for i := 0; i < 1000; i++ {
time.Sleep(10 * time.Microsecond)
runtime.Gosched()
}
log.Printf("Finished in %f seconds.", float64(time.Now().UnixNano() - start) / 1e9)
}
When I add periodic runtime.Gosched() calls in the second goroutine,
as expected the program runtime shrinks down to 1 second on my Mac.
However, when I try running the same program on Ubuntu, it still takes
10 seconds.
On Ubuntu, I'm unable to reproduce your issue, one second, not ten seconds,
Output:
$ uname -srvm
Linux 4.13.0-37-generic #42-Ubuntu SMP Wed Mar 7 14:13:23 UTC 2018 x86_64
$ go version
go version devel +f1deee0e8c Mon Apr 2 20:18:14 2018 +0000 linux/amd64
$ go build rampatowl.go && time ./rampatowl
2018/04/02 16:52:04 Finished in 1.122870 seconds.
real 0m1.128s
user 0m1.116s
sys 0m0.012s
$
rampatowl.go:
package main
import (
"log"
"runtime"
"time"
)
func doWork(c chan int) {
for {
<-c
// This outer loop will take ~10ms.
for j := 0; j < 100; j++ {
// The following block of CPU work takes ~100 microseconds
for i := 0; i < 300000; i++ {
_ = i * 17
}
// Somehow this print statement saves the day in Ubuntu
//log.Printf("donkey")
runtime.Gosched()
}
}
}
func main() {
runtime.GOMAXPROCS(1)
c := make(chan int, 1000)
go doWork(c)
start := time.Now().UnixNano()
for i := 0; i < 1000; i++ {
time.Sleep(1 * time.Millisecond)
// Queue up 10ms of work in the other goroutine, which will backlog
// this goroutine without runtime.Gosched() calls.
c <- 0
}
// Whole program should take about 1 second to run if the Gosched() calls
// work, otherwise 10 seconds.
log.Printf("Finished in %f seconds.", float64(time.Now().UnixNano()-start)/1e9)
}

Why following code generates deadlock

Golang newbie here. Can somebody explain why the following code generates a deadlock?
I am aware of sending true to boolean <- done channel but I don't want to use it.
package main
import (
"fmt"
"sync"
"time"
)
var wg2 sync.WaitGroup
func producer2(c chan<- int) {
for i := 0; i < 5; i++ {
time.Sleep(time.Second * 10)
fmt.Println("Producer Writing to chan %d", i)
c <- i
}
}
func consumer2(c <-chan int) {
defer wg2.Done()
fmt.Println("Consumer Got value %d", <-c)
}
func main() {
c := make(chan int)
wg2.Add(5)
fmt.Println("Starting .... 1")
go producer2(c)
go consumer2(c)
fmt.Println("Starting .... 2")
wg2.Wait()
}
Following is my understanding and I know that it is wrong:
The channel will be blocked the moment 0 is written to it within the
loop of producer function
So I expect channel to be emptied by the
consumer afterwards.
As the channel is emptied in the step 2,
producer function can again put in another value and then get
blocked and steps 2 repeats again.
Your original deadlock is caused by wg2.Add(5), you were waiting for 5 goroutines to finish, but only one did; you called wg2.Done() once. Change this to wg2.Add(1), and your program will run without error.
However, I suspect that you intended to consume all the values in the channel not just one as you do. If you change consumer function to:
func consumer2(c <-chan int) {
defer wg2.Done()
for i := range c {
fmt.Printf("Consumer Got value %d\n", i)
}
}
You will get another deadlock because the channel is not closed in producer function, and consumer is waiting for more values that never arrive. Adding close(c) to the producer function will fix it.
Why it error?
Running your code gets the following error:
➜ gochannel go run dl.go
Starting .... 1
Starting .... 2
Producer Writing to chan 0
Consumer Got value 0
Producer Writing to chan 1
fatal error: all goroutines are asleep - deadlock!
Here is why:
There are three goroutines in your code: main,producer2 and consumer2. When it runs,
producer2 sends a number 0 to the channel
consumer2 recives 0 from the channel, and exits
producer2 sends 1 to the channel, but no one is consuming, since consumer2 already exits
producer2 is waiting
main executes wg2.Wait(), but not all waitgroup are closed. So main is waiting
Two goroutines are waiting here, does nothing, and nothing will be done no matter how long you wait. It is a deadlock! Golang detects it and panic.
There are two concepts you are confused here:
how waitgourp works
how to receive all values from a channel
I'll explain them here briefly, there are alreay many articles out there on the internet.
how waitgroup works
WaitGroup if a way to wait for all groutine to finish. When running goroutines in the background, it's important to know when all of them quits, then certain action can be conducted.
In your case, we run two goroutines, so at the beginning we should set wg2.Add(2), and each goroutine should add wg2.Done() to notify it is done.
Receive data from a channel
When receiving data from a channel. If you know exactly how many data it will send, use for loop this way:
for i:=0; i<N; i++ {
data = <-c
process(data)
}
Otherwise use it this way:
for data := range c {
process(data)
}
Also, Don't forget to close channel when there is no more data to send.
How to fix it?
With the above explanation, the code can be fixed as:
package main
import (
"fmt"
"sync"
"time"
)
var wg2 sync.WaitGroup
func producer2(c chan<- int) {
defer wg2.Done()
for i := 0; i < 5; i++ {
time.Sleep(time.Second * 1)
fmt.Printf("Producer Writing to chan %d\n", i)
c <- i
}
close(c)
}
func consumer2(c <-chan int) {
defer wg2.Done()
for i := range c {
fmt.Printf("Consumer Got value %d\n", i)
}
}
func main() {
c := make(chan int)
wg2.Add(2)
fmt.Println("Starting .... 1")
go producer2(c)
go consumer2(c)
fmt.Println("Starting .... 2")
wg2.Wait()
}
Here is another possible way to fix it.
The expected output
Fixed code gives the following output:
➜ gochannel go run dl.go
Starting .... 1
Starting .... 2
Producer Writing to chan 0
Consumer Got value 0
Producer Writing to chan 1
Consumer Got value 1
Producer Writing to chan 2
Consumer Got value 2
Producer Writing to chan 3
Consumer Got value 3
Producer Writing to chan 4
Consumer Got value 4

Resources