Cancelling golang os.Stdin read using context - linux

I want to be able to cancel an os.Stdin read using context which is not possible. Usually, you close the file handle to accomplish cancel, but I do not want to close os.Stdin
Possible solutions could be:
Can it be determined if Stdin.Read will block?
Can the thread be terminated like in pthreads?
Should os.Stdin be forwarded to another file handle that can be closed?
Here’s what I got, the ugliness is that scannerThread is left running on context cancelation:
// Keystrokes emits keystroke events
// on g0.Context() shutdown, scannerThread is left running until the next newline
// the lines channel is never closed
func Keystrokes(lines chan<- string, g0 parl.Go) {
var err error
defer g0.Done(err)
defer parl.Recover(parl.Annotation(), &err, parl.NoOnError)
// stdio.Scan cannot be terminated, so let that thread terminate whenever
var scanLines parl.NBChan[string]
go scannerThread(scanLines.Send, g0.Context())
// consume scannerThread output
scannerCh := scanLines.Ch()
done := g0.Context().Done()
for {
select {
case line := <-scannerCh:
lines <- line
case <-done:
return // canceled by context exit
}
}
}
// scannerThread reads from os.Stdin and therefore cannot be cancelled.
// send is a non-blocking send function.
// ctx indicates shutdown effective on next os.Stdin newline.
func scannerThread(send func(string), ctx context.Context) {
var err error
defer parl.Recover(parl.Annotation(), &err, parl.Infallible)
scanner := bufio.NewScanner(os.Stdin)
for scanner.Scan() { // scanner is typically stuck here
if ctx.Err() != nil {
return // terminated via context
}
send(scanner.Text())
}
// scanner had error
err = scanner.Err()
}

The answer is to read os.Stdin non-blocking, works on Linux macOS
os.Stdin has a block against closing it, I remember
Non-blocking os.Stdin.Read() returns bytes once a complete line is available, otherwise returns n==0
Here is how to set non-blocking:
err = unix.SetNonblock(int(os.Stdin.Fd()), true)
The state of non-blocking can also be read:
flags, err := unix.FcntlInt(os.Stdin.Fd(), unix.F_GETFL, 0)
wasSet = flags&unix.O_NONBLOCK != 0

Related

Go exec.CommandContext is not being terminated after context timeout

In golang, I can usually use context.WithTimeout() in combination with exec.CommandContext() to get a command to automatically be killed (with SIGKILL) after the timeout.
But I'm running into a strange issue that if I wrap the command with sh -c AND buffer the command's outputs by setting cmd.Stdout = &bytes.Buffer{}, the timeout no longer works, and the command runs forever.
Why does this happen?
Here is a minimal reproducible example:
package main
import (
"bytes"
"context"
"os/exec"
"time"
)
func main() {
ctx, cancel := context.WithTimeout(context.Background(), 100*time.Millisecond)
defer cancel()
cmdArgs := []string{"sh", "-c", "sleep infinity"}
bufferOutputs := true
// Uncommenting *either* of the next two lines will make the issue go away:
// cmdArgs = []string{"sleep", "infinity"}
// bufferOutputs = false
cmd := exec.CommandContext(ctx, cmdArgs[0], cmdArgs[1:]...)
if bufferOutputs {
cmd.Stdout = &bytes.Buffer{}
}
_ = cmd.Run()
}
I've tagged this question with Linux because I've only verified that this happens on Ubuntu 20.04 and I'm not sure whether it would reproduce on other platforms.
My issue was that the child sleep process was not being killed when the context timed out. The sh parent process was being killed, but the child sleep was being left around.
This would normally still allow the cmd.Wait() call to succeed, but the problem is that cmd.Wait() waits for both the process to exit and for outputs to be copied. Because we've assigned cmd.Stdout, we have to wait for the read-end of the sleep process' stdout pipe to close, but it never closes because the process is still running.
In order to kill child processes, we can instead start the process as its own process group leader by setting the Setpgid bit, which will then allow us to kill the process using its negative PID to kill the process as well as any subprocesses.
Here is a drop-in replacement for exec.CommandContext I came up with that does exactly this:
type Cmd struct {
ctx context.Context
*exec.Cmd
}
// NewCommand is like exec.CommandContext but ensures that subprocesses
// are killed when the context times out, not just the top level process.
func NewCommand(ctx context.Context, command string, args ...string) *Cmd {
return &Cmd{ctx, exec.Command(command, args...)}
}
func (c *Cmd) Start() error {
// Force-enable setpgid bit so that we can kill child processes when the
// context times out or is canceled.
if c.Cmd.SysProcAttr == nil {
c.Cmd.SysProcAttr = &syscall.SysProcAttr{}
}
c.Cmd.SysProcAttr.Setpgid = true
err := c.Cmd.Start()
if err != nil {
return err
}
go func() {
<-c.ctx.Done()
p := c.Cmd.Process
if p == nil {
return
}
// Kill by negative PID to kill the process group, which includes
// the top-level process we spawned as well as any subprocesses
// it spawned.
_ = syscall.Kill(-p.Pid, syscall.SIGKILL)
}()
return nil
}
func (c *Cmd) Run() error {
if err := c.Start(); err != nil {
return err
}
return c.Wait()
}

How to make work concurrent while sending data on a stream in golang?

I have a golang grpc server which has streaming endpoint. Earlier I was doing all the work sequentially and sending on the stream but then I realize I can make the work concurrent and then send on stream. From grpc-go docs: I understood that I can make the work concurrent, but you can't make sending on the stream concurrent so I got below code which does the job.
Below is the code I have in my streaming endpoint which sends data back to client in a streaming way. This does all the work concurrently.
// get "allCids" from lot of files and load in memory.
allCids := .....
var data = allCids.([]int64)
out := make(chan *custPbV1.CustomerResponse, len(data))
wg := &sync.WaitGroup{}
wg.Add(len(data))
go func() {
wg.Wait()
close(out)
}()
for _, cid := range data {
go func (id int64) {
defer wg.Done()
pd := repo.GetCustomerData(strconv.FormatInt(cid, 10))
if !pd.IsCorrect {
return
}
resources := us.helperCom.GenerateResourceString(pd)
val, err := us.GenerateInfo(clientId, resources, cfg)
if err != nil {
return
}
out <- val
}(cid)
}
for val := range out {
if err := stream.Send(val); err != nil {
log.Printf("send error %v", err)
}
}
Now problem I have is size of data slice can be approx a million so I don't want to spawn million go routine doing the job. How do I handle that scenario here? If instead of len(data) I use 100 then will that work for me or I need to slice data as well in 100 sub arrays? I am just confuse on what is the best way to deal with this problem?
I recently started with golang so pardon me if there are any mistakes in my above code while making it concurrent.
Please check this pseudo code
func main() {
works := make(chan int, 100)
errChan := make(chan error, 100)
out := make(chan *custPbV1.CustomerResponse, 100)
// spawn fixed workers
var workerWg sync.WaitGroup
for i := 0; i < 100; i++ {
workerWg.Add(1)
go worker(&workerWg, works, errChan, out)
}
// give input
go func() {
for _, cid := range data {
// this will be blocked if all the workers are busy and no space is left in the channel.
works <- cid
}
close(works)
}()
var analyzeResults sync.WaitGroup
analyzeResults.Add(2)
// process errors
go func() {
for err := range errChan {
log.Printf("error %v", err)
}
analyzeResults.Done()
}()
// process outout
go func() {
for val := range out {
if err := stream.Send(val); err != nil {
log.Printf("send error %v", err)
}
}
analyzeResults.Done()
}()
workerWg.Wait()
close(out)
close(errChan)
analyzeResults.Wait()
}
func worker(job *sync.WaitGroup, works chan int, errChan chan error, out chan *custPbV1.CustomerResponse) {
defer job.Done()
// Idle worker takes the work from this channel.
for cid := range works {
pd := repo.GetCustomerData(strconv.FormatInt(cid, 10))
if !pd.IsCorrect {
errChan <- errors.New(fmt.Sprintf("pd %d is incorrect", pd))
// we can not return here as the total number of workers will be reduced. If all the workers does this then there is a chance that no workers are there to do the job
continue
}
resources := us.helperCom.GenerateResourceString(pd)
val, err := us.GenerateInfo(clientId, resources, cfg)
if err != nil {
errChan <- errors.New(fmt.Sprintf("got error", err))
continue
}
out <- val
}
}
Explanation:
This is a worker pool implementation where we spawn a fixed number of goroutines(100 workers here) to do the same job(GetCustomerData() & GenerateInfo() here) but with different input data(cid here). 100 workers here does not mean that it is parallel but concurrent(depends on the GOMAXPROCS). If one worker is waiting for io result(basically some blocking operation)then that particular goroutine will be context switched and other worker goroutine gets a chance to execute. But increasing goroutuines (workers) may not give much performance but can leads to contention on the channel as more workers are waiting for the input job on that channel.
The benefit over splitting the 1 million data to subslice is that. Lets say we have 1000 jobs and 100 workers. each worker will get assigned to the jobs 1-10, 11-20 etc... What if the first 10 jobs is taking more time than others. In that case the first worker is overloaded and the other workers will finish the tasks and will be idle even though there are pending tasks. So to avoid this situation, this is the best solution as the idle worker will take the next job. So that no worker is more overloaded compared to the other workers

Stop channels when ws not able to connect

I have the following code which works ok, the issue is that when the socket.Connect() fails to connect I want to stop the process, I’ve tried with the following code
but it’s not working, I.e. if the socket connect fails to connect the program still runs.
What I want to happen is that if the connect fails, the process stops and the channe…what am I missing here?
func run (appName string) (err error) {
done = make(chan bool)
defer close(done)
serviceURL, e := GetContext().getServiceURL(appName)
if e != nil {
err = errors.New("process failed" + err.Error())
LogDebug("Exiting %v func[err =%v]", methodName, err)
return err
}
url := "wss://" + serviceURL + route
socket := gowebsocket.New(url)
addPass(&socket, user, pass)
socket.OnConnectError = OnConnectErrorHandler
socket.OnConnected = OnConnectedHandler
socket.OnTextMessage = socketTextMessageHandler
socket.OnDisconnected = OnDisconnectedHandler
LogDebug("In %v func connecting to URL %v", methodName, url)
socket.Connect()
jsonBytes, e := json.Marshal(payload)
if e != nil {
err = errors.New("build process failed" + e.Error())
LogDebug("Exiting %v func[err =%v]", methodName, err)
return err
}
jsonStr := string(jsonBytes)
LogDebug("In %v Connecting to payload JSON is %v", methodName, jsonStr)
socket.SendText(jsonStr)
<-done
LogDebug("Exiting %v func[err =%v]", methodName, err)
return err
}
func OnConnectErrorHandler(err error, socket gowebsocket.Socket) {
methodName := "OnConnectErrorHandler"
LogDebug("Starting %v parameters [err = %v , socket = %v]", methodName, err, socket)
LogInfo("Disconnected from server ")
done <- true
}
The process should open one ws connection for process that runs about 60-90 sec (like execute npm install) and get the logs of the process via web socket and when it finish , and of course handle the issue that could happen like network issue or some error running the process
So, #Slabgorb is correct - if you look here (https://github.com/sacOO7/GoWebsocket/blob/master/gowebsocket.go#L87) you will see that the OnConnectErrorHandler is called synchronously during the execution of your call to Connect(). The Connect() function doesn't kick off a separate goroutine to handle the websocket until after the connection is fully established and the OnConnected callback has completed. So when you try to write to the unbuffered channel done, you are blocking the same goroutine that called into the run() function to begin with, and you deadlock yourself, because no goroutine will ever be able to read from the channel to unblock you.
So you could go with his solution and turn it into a buffered channel, and that will work, but my suggestion would be not to write to a channel for this sort of one-time flag behavior, but use close signaling instead. Define a channel for each condition you want to terminate run(), and in the appropriate websocket handler function, close the channel when that condition happens. At the bottom of run(), you can select on all the channels, and exit when the first one closes. It would look something like this:
package main
import "errors"
func run(appName string) (err error) {
// first, define one channel per socket-closing-reason (DO NOT defer close these channels.)
connectErrorChan := make(chan struct{})
successDoneChan := make(chan struct{})
surpriseDisconnectChan := make(chan struct{})
// next, wrap calls to your handlers in a closure `https://gobyexample.com/closures`
// that captures a reference to the channel you care about
OnConnectErrorHandler := func(err error, socket gowebsocket.Socket) {
MyOnConnectErrorHandler(connectErrorChan, err, socket)
}
OnDisconnectedHandler := func(err error, socket gowebsocket.Socket) {
MyOnDisconectedHandler(surpriseDisconnectChan, err, socket)
}
// ... declare any other handlers that might close the connection here
// Do your setup logic here
// serviceURL, e := GetContext().getServiceURL(appName)
// . . .
// socket := gowebsocket.New(url)
socket.OnConnectError = OnConnectErrorHandler
socket.OnConnected = OnConnectedHandler
socket.OnTextMessage = socketTextMessageHandler
socket.OnDisconnected = OnDisconnectedHandler
// Prepare and send your message here...
// LogDebug("In %v func connecting to URL %v", methodName, url)
// . . .
// socket.SendText(jsonStr)
// now wait for one of your signalling channels to close.
select { // this will block until one of the handlers signals an exit
case <-connectError:
err = errors.New("never connected :( ")
case <-successDone:
socket.Close()
LogDebug("mission accomplished! :) ")
case <-surpriseDisconnect:
err = errors.New("somebody cut the wires! :O ")
}
if err != nil {
LogDebug(err)
}
return err
}
// *Your* connect error handler will take an extra channel as a parameter
func MyOnConnectErrorHandler(done chan struct{}, err error, socket gowebsocket.Socket) {
methodName := "OnConnectErrorHandler"
LogDebug("Starting %v parameters [err = %v , socket = %v]", methodName, err, socket)
LogInfo("Disconnected from server ")
close(done) // signal we are done.
}
This has a few advantages:
1) You don't need to guess which callbacks happen in-process and which happen in background goroutines (and you don't have to make all your channels buffered 'just in case')
2) Selecting on the multiple channels lets you find out why you are exiting and maybe handle cleanup or logging differently.
Note 1: If you choose to use close signaling, you have to use different channels for each source in order to avoid race conditions that might cause a channel to get closed twice from different goroutines (e.g. a timeout happens just as you get back a response, and both handlers fire; the second handler to close the same channel causes a panic.) This is also why you don't want to defer close all the channel at the top of the function.
Note 2: Not directly relevant to your question, but -- you don't need to close every channel - once all the handles to it go out of scope, the channel will get garbage collected whether or not it has been closed.
Ok, what is happening is the channel is blocking when you try to add something to it. Try initializing the done channel with a buffer (I used 1) like this:
done = make(chan bool, 1)

Troubles with gitlab scraping via golang

I'm newbie in programming and I need help. Trying to write gitlab scraper on golang.
Something goes wrong when i'm trying to get information about projects in multithreading mode.
Here is the code:
func (g *Gitlab) getAPIResponce(url string, structure interface{}) error {
responce, responce_error := http.Get(url)
if responce_error != nil {
return responce_error
}
ret, _ := ioutil.ReadAll(responce.Body)
if string(ret) != "[]" {
err := json.Unmarshal(ret, structure)
return err
}
return errors.New(error_emptypage)
}
...
func (g *Gitlab) GetProjects() {
projects_chan := make(chan Project, g.LatestProjectID)
var waitGroup sync.WaitGroup
queue := make(chan struct{}, 50)
for i := g.LatestProjectID; i > 0; i-- {
url := g.BaseURL + projects_url + "/" + strconv.Itoa(i) + g.Token
waitGroup.Add(1)
go func(url string, channel chan Project) {
queue <- struct{}{}
defer waitGroup.Done()
var oneProject Project
err := g.getAPIResponce(url, &oneProject)
if err != nil {
fmt.Println(err.Error())
}
fmt.Printf(".")
channel <- oneProject
<-queue
}(url, projects_chan)
}
go func() {
waitGroup.Wait()
close(projects_chan)
}()
for project := range projects_chan {
if project.ID != 0 {
g.Projects = append(g.Projects, project)
}
}
}
And here is the output:
$ ./gitlab-auditor
latest project = 1532
Gathering projects...
.......................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................Get https://gitlab.example.com/api/v4/projects/563&private_token=SeCrEt_ToKeN: unexpected EOF
Get https://gitlab.example.com/api/v4/projects/558&private_token=SeCrEt_ToKeN: unexpected EOF
..Get https://gitlab.example.com/api/v4/projects/531&private_token=SeCrEt_ToKeN: unexpected EOF
Get https://gitlab.example.com/api/v4/projects/571&private_token=SeCrEt_ToKeN: unexpected EOF
.Get https://gitlab.example.com/api/v4/projects/570&private_token=SeCrEt_ToKeN: unexpected EOF
..Get https://gitlab.example.com/api/v4/projects/467&private_token=SeCrEt_ToKeN: unexpected EOF
Get https://gitlab.example.com/api/v4/projects/573&private_token=SeCrEt_ToKeN: unexpected EOF
................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
Every time it's different projects, but it's id is around 550.
When I'm trying to curl links from output, i'm getting normal JSON. When I'm trying to run this code with queue := make(chan struct{}, 1) (in single thread) - everything is fine.
What can it be?
i would say this not a very clear way to achieve concurrency.
what seems to be happening here is
you create a buffered channel that has a size of 50.
then you fire up 1532 goroutines
the first 50 of them enqueue themselves and start processing. by the time they <-queue and free up somespace a random one from the next manages to get on the queue.
as people say in the comments most certainly you hit some limits around the time it the blast has made it around id 550. Then gitlab's API is angry at you and rate limits.
then another goroutine is fired that will close the channel to notify the main goroutine
the main goroutine reads messages.
the talk go concurrency patterns
as well as this blog post concurrency in go might help.
personally i rarely use buffered channels. for your problem i would go like:
define a number of workers
have the main goroutine fire up the workers with a func listening on a channel of ints , doing the api call, writing to a channel of projects
have the main goroutine send to a channel of ints the project number to be fetched and read from the channel of projects.
maybe ratelimit by firing a ticker and have main read from it before it sends the next request?
main closes the number channel to notify the others to die.

Concurrent Read/Close in Go, in a crossplatform way

I recently realized that I don't know how to properly Read and Close in Go concurrently. In my particular case, I need to do that with a serial port, but the problem is more generic.
If we do that without any extra effort to synchronize things, it leads to a race condition. Simple example:
package main
import (
"fmt"
"os"
"time"
)
func main() {
f, err := os.Open("/dev/ttyUSB0")
if err != nil {
panic(err)
}
// Start a goroutine which keeps reading from a serial port
go reader(f)
time.Sleep(1000 * time.Millisecond)
fmt.Println("closing")
f.Close()
time.Sleep(1000 * time.Millisecond)
}
func reader(f *os.File) {
b := make([]byte, 100)
for {
f.Read(b)
}
}
If we save the above as main.go, and run go run --race main.go, the output will look as follows:
closing
==================
WARNING: DATA RACE
Write at 0x00c4200143c0 by main goroutine:
os.(*file).close()
/usr/local/go/src/os/file_unix.go:143 +0x124
os.(*File).Close()
/usr/local/go/src/os/file_unix.go:132 +0x55
main.main()
/home/dimon/mydata/projects/go/src/dmitryfrank.com/testfiles/main.go:20 +0x13f
Previous read at 0x00c4200143c0 by goroutine 6:
os.(*File).read()
/usr/local/go/src/os/file_unix.go:228 +0x50
os.(*File).Read()
/usr/local/go/src/os/file.go:101 +0x6f
main.reader()
/home/dimon/mydata/projects/go/src/dmitryfrank.com/testfiles/main.go:27 +0x8b
Goroutine 6 (running) created at:
main.main()
/home/dimon/mydata/projects/go/src/dmitryfrank.com/testfiles/main.go:16 +0x81
==================
Found 1 data race(s)
exit status 66
Ok, but how to handle that properly? Of course, we can't just lock some mutex before calling f.Read(), because the mutex will end up locked basically all the time. To make it work properly, we'd need some sort of cooperation between reading and locking, like conditional variables do: the mutex gets unlocked before putting the goroutine to wait, and it's locked back when the goroutine wakes up.
I would implement something like this manually, but then I need some way to select things while reading. Like this: (pseudocode)
select {
case b := <-f.NextByte():
// process the byte somehow
default:
}
I examined docs of the packages os and sync, and so far I don't see any way to do that.
I belive you need 2 signals:
main -> reader, to tell it to stop reading
reader -> main, to tell that reader has been terminated
of course you can select go signaling primitive (channel, waitgroup, context etc) that you prefer.
Example below, I use waitgroup and context. The reason is
that you can spin multiple reader and only need to close the context to tell all the reader go-routine to stop.
I created multiple go routine just as
an example that you can even coordinate multiple go routine with it.
package main
import (
"context"
"fmt"
"os"
"sync"
"time"
)
func main() {
ctx, cancelFn := context.WithCancel(context.Background())
f, err := os.Open("/dev/ttyUSB0")
if err != nil {
panic(err)
}
var wg sync.WaitGroup
for i := 0; i < 3; i++ {
wg.Add(1)
// Start a goroutine which keeps reading from a serial port
go func(i int) {
defer wg.Done()
reader(ctx, f)
fmt.Printf("reader %d closed\n", i)
}(i)
}
time.Sleep(1000 * time.Millisecond)
fmt.Println("closing")
cancelFn() // signal all reader to stop
wg.Wait() // wait until all reader finished
f.Close()
fmt.Println("file closed")
time.Sleep(1000 * time.Millisecond)
}
func reader(ctx context.Context, f *os.File) {
b := make([]byte, 100)
for {
select {
case <-ctx.Done():
return
default:
f.Read(b)
}
}
}

Resources