Go exec.CommandContext is not being terminated after context timeout - linux

In golang, I can usually use context.WithTimeout() in combination with exec.CommandContext() to get a command to automatically be killed (with SIGKILL) after the timeout.
But I'm running into a strange issue that if I wrap the command with sh -c AND buffer the command's outputs by setting cmd.Stdout = &bytes.Buffer{}, the timeout no longer works, and the command runs forever.
Why does this happen?
Here is a minimal reproducible example:
package main
import (
"bytes"
"context"
"os/exec"
"time"
)
func main() {
ctx, cancel := context.WithTimeout(context.Background(), 100*time.Millisecond)
defer cancel()
cmdArgs := []string{"sh", "-c", "sleep infinity"}
bufferOutputs := true
// Uncommenting *either* of the next two lines will make the issue go away:
// cmdArgs = []string{"sleep", "infinity"}
// bufferOutputs = false
cmd := exec.CommandContext(ctx, cmdArgs[0], cmdArgs[1:]...)
if bufferOutputs {
cmd.Stdout = &bytes.Buffer{}
}
_ = cmd.Run()
}
I've tagged this question with Linux because I've only verified that this happens on Ubuntu 20.04 and I'm not sure whether it would reproduce on other platforms.

My issue was that the child sleep process was not being killed when the context timed out. The sh parent process was being killed, but the child sleep was being left around.
This would normally still allow the cmd.Wait() call to succeed, but the problem is that cmd.Wait() waits for both the process to exit and for outputs to be copied. Because we've assigned cmd.Stdout, we have to wait for the read-end of the sleep process' stdout pipe to close, but it never closes because the process is still running.
In order to kill child processes, we can instead start the process as its own process group leader by setting the Setpgid bit, which will then allow us to kill the process using its negative PID to kill the process as well as any subprocesses.
Here is a drop-in replacement for exec.CommandContext I came up with that does exactly this:
type Cmd struct {
ctx context.Context
*exec.Cmd
}
// NewCommand is like exec.CommandContext but ensures that subprocesses
// are killed when the context times out, not just the top level process.
func NewCommand(ctx context.Context, command string, args ...string) *Cmd {
return &Cmd{ctx, exec.Command(command, args...)}
}
func (c *Cmd) Start() error {
// Force-enable setpgid bit so that we can kill child processes when the
// context times out or is canceled.
if c.Cmd.SysProcAttr == nil {
c.Cmd.SysProcAttr = &syscall.SysProcAttr{}
}
c.Cmd.SysProcAttr.Setpgid = true
err := c.Cmd.Start()
if err != nil {
return err
}
go func() {
<-c.ctx.Done()
p := c.Cmd.Process
if p == nil {
return
}
// Kill by negative PID to kill the process group, which includes
// the top-level process we spawned as well as any subprocesses
// it spawned.
_ = syscall.Kill(-p.Pid, syscall.SIGKILL)
}()
return nil
}
func (c *Cmd) Run() error {
if err := c.Start(); err != nil {
return err
}
return c.Wait()
}

Related

In Golang, how to terminate an os.exec.Cmd process with a SIGTERM instead of a SIGKILL?

Currently, I am terminating a process using the Golang os.exec.Cmd.Process.Kill() method (on an Ubuntu box).
This seems to terminate the process immediately instead of gracefully. Some of the processes that I am launching also write to files, and it causes the files to become truncated.
I want to terminate the process gracefully with a SIGTERM instead of a SIGKILL using Golang.
Here is a simple example of a process that is started and then terminated using cmd.Process.Kill(), I would like an alternative in Golang to the Kill() method which uses SIGTERM instead of SIGKILL, thanks!
import "os/exec"
cmd := exec.Command("nc", "example.com", "80")
if err := cmd.Start(); err != nil {
log.Print(err)
}
go func() {
cmd.Wait()
}()
// Kill the process - this seems to kill the process ungracefully
cmd.Process.Kill()
You can use Signal() API. The supported Syscalls are here.
So basically you might want to use
cmd.Process.Signal(syscall.SIGTERM)
Also please note as per documentation.
The only signal values guaranteed to be present in the os package on
all systems are os.Interrupt (send the process an interrupt) and
os.Kill (force the process to exit). On Windows, sending os.Interrupt
to a process with os.Process.Signal is not implemented; it will return
an error instead of sending a signal.
cmd.Process.Signal(syscall.SIGTERM)
You may use:
cmd.Process.Signal(os.Interrupt)
Tested example:
package main
import (
"fmt"
"log"
"net"
"os"
"os/exec"
"sync"
"time"
)
func main() {
cmd := exec.Command("nc", "-l", "8080")
cmd.Stderr = os.Stderr
cmd.Stdout = os.Stdout
cmd.Stdin = os.Stdin
err := cmd.Start()
if err != nil {
log.Fatal(err)
}
var wg sync.WaitGroup
wg.Add(1)
go func() {
err := cmd.Wait()
if err != nil {
fmt.Println("cmd.Wait:", err)
}
fmt.Println("done")
wg.Done()
}()
fmt.Println("TCP Dial")
fmt.Println("Pid =", cmd.Process.Pid)
time.Sleep(200 * time.Millisecond)
// or comment this and use: nc 127.0.0.1 8080
w1, err := net.DialTimeout("tcp", "127.0.0.1:8080", 1*time.Second)
if err != nil {
log.Fatal("tcp DialTimeout:", err)
}
defer w1.Close()
fmt.Fprintln(w1, "Hi")
time.Sleep(1 * time.Second)
// cmd.Process.Kill()
cmd.Process.Signal(os.Interrupt)
wg.Wait()
}
Output:
TCP Dial
Pid = 21257
Hi
cmd.Wait: signal: interrupt
done

How to ptrace multi-process or mutli-thread application from Go

I am attempting to write a program that would print all of the syscalls a program makes. I am having trouble extending this code to work with multi-processing scripts. I started off with code from https://github.com/lizrice/strace-from-scratch, and now I would like to trace offspring processes as well.
I tried adding the options PTRACE_O_TRACEVFORK | PTRACE_O_TRACEFORK | PTRACE_O_TRACECLONE, but this causes the process to hang for some reason. I do not know why. If I do not specify these options, then the process runs to completion, but of course children are not traced.
package main
import (
"fmt"
seccomp "github.com/seccomp/libseccomp-golang"
"golang.org/x/sys/unix"
"log"
"os"
"os/exec"
"runtime"
)
func init() {
runtime.LockOSThread()
}
func main() {
var err error
if len(os.Args) < 2 {
log.Fatalf("usage: ./trace-files program [arg]...")
}
cmd := exec.Command(os.Args[1], os.Args[2:]...)
cmd.Stderr = os.Stderr
cmd.Stdin = os.Stdin
cmd.Stdout = os.Stdout
cmd.SysProcAttr = &unix.SysProcAttr{
Ptrace: true,
}
if err = cmd.Start(); err != nil {
log.Fatalf("error starting command: %s\n", err)
}
if err = cmd.Wait(); err != nil {
// We expect "trace/breakpoint trap" here.
fmt.Printf("Wait returned: %s\n", err)
}
pid := cmd.Process.Pid
exit := true
var regs unix.PtraceRegs
var status unix.WaitStatus
// TODO: Setting these options causes the multiprocessing Python script to hang.
ptraceOptions := unix.PTRACE_O_TRACEVFORK | unix.PTRACE_O_TRACEFORK | unix.PTRACE_O_TRACECLONE
if err = unix.PtraceSetOptions(pid, ptraceOptions); err != nil {
log.Fatalf("error setting ptrace options: %s", err)
}
fmt.Println("pid\tsyscall")
for {
if exit {
err = unix.PtraceGetRegs(pid, &regs)
if err != nil {
break
}
name, err := seccomp.ScmpSyscall(regs.Orig_rax).GetName()
if err != nil {
fmt.Printf("error getting syscall name for orig_rax %d\n", regs.Orig_rax)
}
fmt.Printf("%d\t%s\n", pid, name)
}
if err = unix.PtraceSyscall(pid, 0); err != nil {
log.Fatalf("error calling ptrace syscall: %s\n", err)
}
// TODO: is it OK to overwrite pid here?
pid, err = unix.Wait4(pid, &status, 0, nil)
if err != nil {
log.Fatalf("error calling wait")
}
exit = !exit
}
}
For testing purposes, I have written a Python script that uses multiprocessing and prints the process IDs that it spawned.
import multiprocessing
import os
def fun(x):
return os.getpid()
if __name__ == "__main__":
print("PYTHON: starting multiprocessing pool")
with multiprocessing.Pool() as pool:
processes = pool.map(fun, range(1000000))
print("PYTHON: ended multiprocessing pool")
processes = map(str, set(processes))
print("PYTHON: process IDs: ", ", ".join(processes))
When I run the Go code above on a single process program, like ls, then things seems to work fine.
go run . ls
But when I run the Go code on the Python script, then the output hangs (but only if I supply the ptrace options I mentioned above).
go run . python script.py
My end goal for this program is to get a list of all of the files a program uses. I will inspect /proc/PID/maps for each syscall for that part, but first I would like to know how to trace multi-process programs. I tried looking through the documentation and code for strace, but that confused me further...

os.Process.Wait() after os.FindProcess(pid) works on windows not on linux

I have an issue when trying to recover a process in go. My go app launch a bunch of processes and when it crashes the processes are out there in the open and when I rerun my app I want to recover my processes. On windows everything works as expected I can wait() on the process kill() it etc.. but in linux it just goes trough my wait() without any error.
Here is the code
func (proc *process) Recover() {
pr, err := os.FindProcess(proc.Cmd.Process.Pid)
if err != nil {
return
}
log.Info("Recovering " + proc.Name + proc.Service.Version)
Processes.Lock()
Processes.Map[proc.Name] = proc
Processes.Unlock()
proc.Cmd.Process = pr
if proc.Service.Reload > 0 {
proc.End = make(chan bool)
go proc.KillRoutine()
}
proc.Cmd.Wait()
if proc.Status != "killed" {
proc.Status = "finished"
}
proc.Time = time.Now()
channelProcess <- proc
//confirmation that process was killed
if proc.End != nil {
proc.End <- true
}
}
process is my own struct to handle processes the important part is cmd which is from the package "os/exec" I have also tried to directly call pr.wait() with the same issue
You're not handing the error message from Wait. Try:
ps, err := proc.Cmd.Wait()
if err != nil {
/* handle it */
}
Also the documentation says:
Wait waits for the Process to exit, and then returns a ProcessState
describing its status and an error, if any. Wait releases any
resources associated with the Process. On most operating systems, the
Process must be a child of the current process or an error will be
returned.
In your case since you're "recovering", your process is not the parent of the processes you found using os.FindProcess.
So why does it work on windows? I suspect it is because on windows it boils down to WaitForSingleObject which doesn't have that requirement.

Prevent Ctrl+C from interrupting exec.Command in Golang

I've noticed that processes started with exec.Command get interrupted even when the interrupt call has been intercepted via signal.Notify. I've done the following example to show the issue:
package main
import (
"log"
"os"
"os/exec"
"os/signal"
"syscall"
)
func sleep() {
log.Println("Sleep start")
cmd := exec.Command("sleep", "60")
cmd.Run()
log.Println("Sleep stop")
}
func main() {
var doneChannel = make(chan bool)
go sleep()
c := make(chan os.Signal, 1)
signal.Notify(c, os.Interrupt)
signal.Notify(c, syscall.SIGTERM)
go func() {
<-c
log.Println("Receved Ctrl + C")
}()
<-doneChannel
}
If Ctrl+C is pressed while this program is running, it's going to print:
2015/10/16 10:05:50 Sleep start
^C2015/10/16 10:05:52 Receved Ctrl + C
2015/10/16 10:05:52 Sleep stop
showing that the sleep commands gets interrupted. Ctrl+C is successfully caught though and the main program doesn't quit, it's just the sleep commands that gets affected.
Any idea how to prevent this from happening?
The shell will signal the entire process group when you press ctrl+c. If you signal the parent process directly, the child process won't receive the signal.
To prevent the shell from signaling the children, you need to start the command in its own process group with with the Setpgid and Pgid fields in syscall.SysProcAttr before starting the processes
cmd := exec.Command("sleep", "60")
cmd.SysProcAttr = &syscall.SysProcAttr{
Setpgid: true,
}
You can ignore the syscall.SIGINT signal, then it won't be passed to the exec.Command.
func main() {
var doneChannel = make(chan bool)
signal.Ignore(syscall.SIGINT)
go func() {
log.Println("Sleep start")
cmd := exec.Command("sleep", "10")
cmd.Run()
log.Println("Sleep stop")
doneChannel <- true
}()
<-doneChannel
}

Start a process in Go and detach from it

I need to start a new process in Go with the following requirements:
The starting process should run even after the Go process is terminated
I need to be able to set the Unix user/group that's running it
I need to be able to set the environment variables inherited
I need control over std in/out/err
Here is an attempt:
var attr = os.ProcAttr {
Dir: "/bin",
Env: os.Environ(),
Files: []*os.File{
os.Stdin,
"stdout.log",
"stderr.log",
},
}
process, err := os.StartProcess("sleep", []string{"1"}, &attr)
This works fine but has the following shortcomings from the requirements:
No way to set Unix user/group
The started process ends when the Go process (parent) stops
This needs to run on Linux only if that simplifies things.
You can use process.Release to detach the child process from the parent one and make it survive after parent death
Look at the definition of *os.ProcAttr.Sys.Credentials attribute : it looks like using the attribute you can set process user and group ID.
Here is a working version of your example (I did not check if process ID's where actually the one set )
package main
import "fmt"
import "os"
import "syscall"
const (
UID = 501
GUID = 100
)
func main() {
// The Credential fields are used to set UID, GID and attitional GIDS of the process
// You need to run the program as root to do this
var cred = &syscall.Credential{ UID, GUID, []uint32{} }
// the Noctty flag is used to detach the process from parent tty
var sysproc = &syscall.SysProcAttr{ Credential:cred, Noctty:true }
var attr = os.ProcAttr{
Dir: ".",
Env: os.Environ(),
Files: []*os.File{
os.Stdin,
nil,
nil,
},
Sys:sysproc,
}
process, err := os.StartProcess("/bin/sleep", []string{"/bin/sleep", "100"}, &attr)
if err == nil {
// It is not clear from docs, but Realease actually detaches the process
err = process.Release();
if err != nil {
fmt.Println(err.Error())
}
} else {
fmt.Println(err.Error())
}
}
What I have found that seems to work cross-platform is to re-run the program with a special flag. In your main program, check for this flag. If present on startup, you're in the "fork". If not present, re-run the command with the flag.
func rerunDetached() error {
cwd, err := os.Getwd()
if err != nil {
return err
}
args := append(os.Args, "--detached")
cmd := exec.Command(args[0], args[1:]...)
cmd.Dir = cwd
err = cmd.Start()
if err != nil {
return err
}
cmd.Process.Release()
return nil
}
This will simply re-run your process with the exact parameters and append --detached to the arguments. When your program starts, check for the --detached flag to know if you need to call rerunDetached or not. This is sort of like a poor mans fork() which will work across different OS.

Resources