On linux I need to start a number of socat instances from within a Python 3 program.
It appears that the os.exec functions all assume that the program specified in the exec is to replace the currently executing python.
It appears that there are ways to start things as subprocesses but presumably the subprocesses would die when the invoking python program ends.
How do I start several tasks that will persist after my Python program finishes it's work without having my Python process replaced?
I think this one should work
bashcmd='echo "1"'
os.popen('/bin/bash -c "'+bashcmd+'"')
Related
I have multiple processes (web-scrapers) running in the background (one scraper for each website). The processes are python scripts that were spawned/forked a few weeks ago. I would like to control (they listen on sockets to enable IPC) them from one central place (kinda like a dispatcher/manager python script), while the processes (scrapers) remain individual unrelated processes.
I thought about using the PID to reference each process, but that would require storing the PID whenever I (re)launch one of the scrapers because there is no semantic relation between a number and my use case. I just want to supply some text-tag along with the process when I launch it, so that I can reference it later on.
pgrep -f searches all processes by their name and calling pattern (including arguments).
E.g. if you spawned a process as python myscraper --scrapernametag=uniqueid01 then you can run:
TAG=uniqueid01; pgrep -f "scrapernametag=$TAG"
to discover the PID of a process later down the line.
My Jenkins server (version 2.167) is running a shell build job that executes a script written with Python 3.7.0.
Sometimes users need to cancel the build manually (by clicking on the red button with white cross from Jenkins GUI), and the Python scripts needs to handle the interruption in order to perform cleanup tasks before exiting. Some times, the interruption is handled correctly, but others, it seems that the parent process gets terminated before the Python script can run the cleanup procedure.
At the beginning of the Python script, I defined the following:
def cleanup_after_int(signum, frame):
# some cleanup code here
sys.exit(0)
signal.signal(signal.SIGINT, cleanup_after_int)
signal.signal(signal.SIGTERM, cleanup_after_int)
# the rest of the script here
Is the code I'm using sufficient, or should I consider something more?
The Jenkins doc for aborting a build is https://wiki.jenkins.io/display/JENKINS/Aborting+a+build
Found a pretty good document showing how this work: https://gist.github.com/datagrok/dfe9604cb907523f4a2f
You describe a race:
it seems that the parent process [sometimes] gets terminated before the Python script can run the cleanup procedure.
It would be helpful to know how you know that, in terms of the symptoms you observe.
In any event, the python code you posted looks fine. It should work as expected if SIGTERM is delivered to your python process. Perhaps jenkins is just terminating the parent bash. Or perhaps both bash & python are in the same process group and jenkins signals the process group. Pay attention to PGRP in ps -j output.
Perhaps your cleanup code is complex and requires resources that are not always available. For example, perhaps stdout is a pipe to the parent, and cleanup code logs to that open file descriptor, though sometimes a dead parent has closed it.
You might consider having the cleanup code first "daemonize", using this chapter 3 call: http://man7.org/linux/man-pages/man3/daemon.3.html. Then your cleanup would at least be less racy, leading to more reproducible results when you test it and when you use it in production.
You could choose to have the parent bash script orchestrate the cleanup:
trap "python cleanup.py" SIGINT SIGTERM
python doit.py
You could choose to not worry about cleaning upon exit at all. Instead, log whatever you've dirtied, and (synchronously) clean that just before starting, then begin your regularly scheduled script that does the real work. Suppose you create three temp files, and wish to tidy them up. Append each of their names to /tmp/temp_files.txt just before creating each one. Be sure to flush buffers and persist the write with fsync() or close().
Rather than logging, you could choose to clean at startup without a log. For example:
$ rm -f /tmp/{1,2,3}.txt
might suffice. If only the first two were created last time, and the third does not exist, no big deal. Use wildcards where appropriate.
I'm still pretty confused with the role of linux shell running programs despite of using linux a lot.
I understand there are two type of shells, interactive shells and non-interactive shells. Terminal session interacts with interactive shell, and scripts run in non-interactive shell. But is there really other difference than ability to read input and print output? If I invoke script from shell, does it run in this interactive shell or new non-interactive shell inside shell?
Also, when I execute binary either by invoking it through interactive shell or graphical interface, does it always run in the shell, or could a process run without shell at all? It's said that all processes communicates with kernel through the shell, but I'm confused because in docker, you can define the entrypoint to be either a binary or "sh -c binary".
The shell is just one possible interface. Every Linux system has a notion of a "first" process (usually called init) that is started directly by the kernel. Every other program on your computer is started by another process that first forks itself, then calls exec (actually, one of about 6 functions in the same family) to replace itself with a different program.
The shell is just one possible interface, one that parses text into requests to run other programs. The shell command line mv foo bar is parsed as a request to pass fork the shell and call exec in the new copy with the three words mv, foo, and bar as arguments.
Consider the following snippet of Python:
subprocess.call(["mv", "foo", "bar"])
which basically does the same thing: the Python program forks itself and calls exec with the three given strings as arguments. There is no shell involvement.
The shell is just a convenient UI that lets you run other processes the way you want to. It can also run scripts to do the same. That's all it does. It's not responsible for doing anything for the processes once it runs them.
You could entirely replace it with pythonwhich lets you do the same things, but that's annoying because you have to type chepner's subprocess.call(["mv", "foo", "bar"])just to to run the mv program. If you wanted to pipe one program to another, you'd need 5-10 such lines. Not much fun to write interactively.
You could entirely replace it with KDE/Gnome/whatever and double click programs to run them, but that's not very flexible since you can't include arguments and such, and you can't automate it.
I understand there are two type of shells, interactive shells and non-interactive shells. Terminal session interacts with interactive shell, and scripts run in non-interactive shell. But is there really other difference than ability to read input and print output?
It's just two different modes that you can run sh with. You want comfy keyboard shortcuts, aliases and options to help type things manually (interactively), but they're pointless or annoying when running pre-written script.
If I invoke script from shell, does it run in this interactive shell or new non-interactive shell inside shell?
It runs in a new, independent process. You can run it in the same interactive shell instance with source yourscript, which is basically the same as typing the script contents on the keyboard.
Also, when I execute binary either by invoking it through interactive shell or graphical interface, does it always run in the shell, or could a process run without shell at all?
The process always runs entirely independently of the shell, but may share the same terminal.
It's said that all processes communicates with kernel through the shell,
Processes never talk to the kernel through the shell. They talk through syscalls.
but I'm confused because in docker, you can define the entrypoint to be either a binary or "sh -c binary".
For a simple binary, the two are identical.
If you want to e.g. set up pipes or redirections because the process doesn't do it on its own, you can use sh -c to have a shell do it instead.
I am running a perl script that does a logical check and if certain conditions been met. Example: If it's been over a certain length of time I want to run a system() command on a linux server that runs another script that updates that data. script that updates the file takes 10-15 seconds, with the current amount of files it has to go through, but can be up to 30 seconds during peak times of the month.
I want the perl script to run and if it has to run the system() command, I don't want it to wait for the system() to finish before finishing the rest of the script. What is the best way to go about this?
Thank you
System runs a command in the shell, so you can use all of your shell features, including job control. So just stick & at the end of your command thus:
system "sleep 30 &";
Use fork to create a child process, and then in the child, call your other script using exec instead of system. exec will execute your other script in a separate process and return immediately, which will allow the child to finish. Meanwhile, your parent script can finish what it needs to do and exit as well.
Check this out. It may help you.
There's another good example of how to use fork on this page.
Not intended to be a pun due to publication date, but beware of zombies!
It is a bit tricky, see perlipc for details.
However, as far as I understood your problem, you don't need to maintain any relation between the updater and the caller processes. In this case, it is easier to just "fire and forget":
use strict;
use warnings qw(all);
use POSIX;
# fork child
unless (fork) {
# create a new session
POSIX::setsid();
# fork grandchild
unless (fork) {
# close standard descriptors
open STDIN, '<', '/dev/null';
open STDOUT, '>', '/dev/null';
open STDERR, '>', '/dev/null';
# run another process
exec qw(sleep 10);
}
# terminate child
exit;
}
In this example, sleep 10 don't belong to the process' group anymore, so even killing the parent process won't affect the child.
There's a good tutorial about running external programs from Perl (including running background processes) at http://aaroncrane.co.uk/talks/pipes_and_processes/paper.html
Ok, I need to write a code that calls a script, and if the operation in script hangs, terminates the process.
The preferred language is Python, but I'm also looking through C and bash script documentation too.
Seems like an easy problem, but I can't decide on the best solution.
From research so far:
Python: Has some weird threading model where the virtual machine uses
one thread at a time, won't work?
C: The preferred solution so far seems to use SIGALARM + fork +
execl. But SIGALARM is not heap safe, so it can trash everything?
Bash: timeout program? Not standard on all distros?
Since I'm a newbie to Linux, I'm probably unaware of 500 different gotchas with those functions, so can anyone tell me what's the safest and cleanest way?
Avoid SIGALRM because there is not much safe stuff to do inside the signal handler.
Considering the system calls that you should use, in C, after doing the fork-exec to start the subprocess, you can periodically call waitpid(2) with the WNOHANG option to inspect whether the subprocess is still running. If waitpid returns 0 (process is still running) and the desired timeout has passed, you can kill(2) the subprocess.
In bash you can do something similar to this:
start the script/program in background with &
get the process id of the background process
sleep for some time
and then kill the process (if it is finished you cannot kill it) or you can check if the process is still live and then to kill it.
Example:
sh long_time_script.sh &
pid=$!
sleep 30s
kill $pid
you can even try to use trap 'script_stopped $pid' SIGCHLD - see the bash man for more info.
UPDATE: I found other command timeout. It does exactly what you need - runs a command with a time limit. Example:
timeout 10s sleep 15s
will kill the sleep after 10 seconds.
There is a collection of Python code that has features to do exactly this, and without too much difficulty if you know the APIs.
The Pycopia collection has the scheduler module for timing out functions, and the proctools module for spawning subprocesses and sending signals to it. The kill method can be used in this case.