examine output of shell command using subprocess in Python - python-3.x

I'm running a shell command in a Jupyter Notebook using subprocess or os.system() . The actual output is a dump of thousands of lines of code which takes at least a minute to stdout in terminal. In my notebook, I just want to know if the output is more than a couple of lines because if it was an error, the output would only be 1 or 2 lines. What's the best way to check if I'm receiving 20+ lines and then stop the process and move on to the next?

you could read line by line using subprocess.Popen and count the lines (redirecting & merging output and error streams, maybe merging is not needed, depends on the process)
If the number of lines exceeds 20, kill the process and break the loop.
If the loop ends before the number of lines reaches 20, print/handle an error
code:
p = subprocess.Popen(cmd,
stdout=subprocess.PIPE,
stderr=subprocess.STDOUT)
for lineno,line in enumerate(iter(p.stdout.readline, b'')):
if lineno == 20:
print("process okay")
p.kill()
break
else:
# too short, break wasn't reached
print("process failed return code: {}".format(p.wait()))
note that p.poll() is not None can help to figure out if the process has ended prematurely too

Related

does this create a pipe and, if so, is it closed automatically: subprocess.run(stdout=subprocess.PIPE

Given the following code
import subprocess
result = ( subprocess.run(
'scontrol --details show node',
shell=True, stdout=subprocess.PIPE, timeout=30) )
if result.returncode != 0:
print("Error: scontrol returned a non-zero exit code.")
exit(result.returncode)
Does this create a pipe, and, if so, does that pipe get closed automatically? I assume the subprocess will terminate automatically with that code, even if it hangs? One more question: To write good code, should I use with anywhere? Thanks.

Python script does not print output as supposed

I have a very simple (test) code which I'm running either from a Linux shell, or in interactive mode, and I have two different behaviours I cannot figure out the reason of.
I have a file generated by a Popen call, previously, where each line is a file path. This is the code used to generate the file:
with open('find.txt','w') as f:
find = subprocess.Popen(["find",".","-name","myfile.out"],stdout=f)
(Incidentally, I was trying to build a PIPE originally, namely inputting the output of this command to a grep command, and since I wasn't successful in any way, I decided to break the problem down and just read the file paths from a file, and process them one by one. So maybe there is a common issue that is blocking me somewhere in this procedure).
Since in this second step I wasn't even able to open and process the files by opening the addresses contained in each line of the find.txt file, I just tried to print the file lines out, because for sure they're available in there:
with open('find.txt','r') as g:
for l in g.readlines():
print(l)
Now, the interesting part:
if I paste the lines above into a python shell, everything works fine and I get my outputs as expected
if, on the other hand, I try to run python test.py, where test.py is the name of the file containing the lines above, no output appears in the shell's stdout.
I've tried sys.stdout.flush() to no avail. I've also inserted some dummy print() statements along the way: everything gets printed but what's after the g.readlines() statement.
Here's the full script I'm trying to make work (a pre-precursor of what I'm actually after, tbh).
#!/usr/bin/env python3
import subprocess
import sys
with open('find.txt','w') as f:
find = subprocess.Popen(["find",".","-name","myfile.out"],stdout=f)
print('hello')
with open('find.txt','r') as g:
print('hello?')
for l in g.readlines():
print('help me!')
print(l)
sys.stdout.flush()
output being:
{ancis:>106> python test.py
hello
hello?
{ancis:>106>
EDIT
I've quickly tried the very same lines (but without the call to find, which isn't available) on my python installation in Windows: it works as expected)
Based on that, I've tried to run the simpler code below:
print('hello')
with open('find.txt','r') as g:
print('hello?')
for l in g.readlines():
print('help me!')
print(l)
sys.stdout.flush()
as a script, in Linux - This also works w/o problems.
This should mean that somehow I'm messing things up with the call to Popen... But what?
This is a race condition.
Your call to
find = subprocess.Popen(["find",".","-name","myfile.out"],stdout=f)
is opening another process and running your find command which takes a bit of time to fully execute.
Python then continues on and reaches the reading of the file portion before the command is fully executed and the file is generated.
Want to test it out?
Add a time.sleep(1) just before the opening of the file.
Full test script:
#!/usr/bin/env python3
import subprocess
import time
with open('find.txt','w') as f:
find = subprocess.Popen(["find",".","-name","myfile.out"],stdout=f)
time.sleep(1)
with open('find.txt','r') as g:
for l in g:
print(l)
To block until the process is complete you can use find.communicate().
With this you can also optionally set a timeout if that's something that you want.
#!/usr/bin/env python3
import subprocess
with open('find.txt','w') as f:
find = subprocess.Popen(["find",".","-name","myfile.out"],stdout=f)
find.communicate()
with open('find.txt','r') as g:
for l in g:
print(l)
Source:
https://docs.python.org/3/library/subprocess.html#subprocess.Popen.communicate

rotate (reopen) file in python3, when signal is received

I have simple script, that logs to a logfile. This is the core of my script:
with open('/var/log/mail/mail.log', 'a', buffering=1) as f:
for line in sys.stdin:
f.write(line)
The file being written /var/log/mail/mail.log has to be rotated regularly by logrotate. At the moment, when logrotate rotates the file, my script does not realize it, and continues writing to the old (now renamed) file.
logrotate has the possibility to execute command after the file has been rotated. Normally, when rsyslog is logging, the command would be:
postrotate
invoke-rc.d rsyslog rotate > /dev/null
endscript
But in my case, I need to send some signal to my script, and handle the signal in my script.
Also, I don't know in advance the PID my script is running as.
How can I best implement this ?
As a solution for this you can watch if the inode of the open log file is the same as the path. If not reopen the file. This only works on Unix.
import os, sys, stat
logfile = '/var/log/mail/mail.log'
while True:
with open(logfile, 'a', buffering=1) as f:
f_ino = os.stat(logfile)[stat.ST_INO]
f_dev = os.stat(logfile)[stat.ST_DEV]
for line in sys.stdin:
f.write(line)
try:
if os.stat(logfile)[stat.ST_INO] != f_ino or
os.stat(logfile)[stat.ST_DEV] != f_dev:
break
except OSError: # Was IOError with python < 3.4
pass
Closing the file is not required as it’s handled by the with context manager.
The try..except OSError block is used to catch any error by the system function os.stat. It can be that during the change of the file the function returns an OSError (For example a FileNotFoundError). In this case it will pass the exception and retry the check if the inode is changed. If we omit this try..except block you might end up with a program that terminates.

Subprocess.Popen vs .call: What is the correct way to call a C-executable from shell script using python where all 6 jobs can run in parallel

Using subprocess.Popen is producing incomplete results where as subprocess.call is giving correct output
This is related to a regression script which has 6 jobs and each job performs same task but on different input files. And I'm running everything in parallel using SubProcess.Popen
Task is performed using a shell script which has calls to a bunch of C-compiled executables whose job is to generate some text reports followed by converting text report info into jpg images
sample of shell script (runit is the file name) with calling C-compile executables
#!/bin/csh -f
#file name : runit
#C - Executable 1
clean_spgs
#C - Executable 2
scrub_spgs_all file1
scrub_spgs_all file2
#C - Executable 3
scrub_pick file1 1000
scrub_pick file2 1000
while using subprocess.Popen, both scrub_spgs_all and scrub_pick are trying to run in parallel causing the script to generate incomplete results i.e. output text files doesn't contain complete information and also missing some of output text reports.
subprocess.Popen call is
resrun_proc = subprocess.Popen("./"+runrescompare, shell=True, cwd=rescompare_dir, stdout=subprocess.PIPE, stderr=subprocess.POPT, universal_newlines=True)
where runrescompare is a shell script and has
#!/bin/csh
#some other text
./runit
Where as using subprocess.call is generating all the output text files and jpg images correctly but I can't run all 6 jobs in parallel.
resrun_proc = subprocess.call("./"+runrescompare, shell=True, cwd=rescompare_dir, stdout=subprocess.PIPE, stderr=subprocess.POPT, universal_newlines=True)
What is the correct way to call a C-exctuable from shell script using python subprocess calls where all 6 jobs can run in parallel(using python 3.5.1?
Thanks.
You tried to simulate multiprocessing with subprocess.Popen() which does not work like you want: the output is blocked after a while unless you consume it, for instance with communicate() (but this is blocking) or by reading the output, but with 6 concurrent handles in a loop, you are bound to get deadlocks.
The best way is run the subprocess.call lines in separate threads.
There are several ways to do it. Small simple example with locking:
import threading,time
lock=threading.Lock()
def func1(a,b,c):
lock.acquire()
print(a,b,c)
lock.release()
time.sleep(10)
tl=[]
t = threading.Thread(target=func1,args=[1,2,3])
t.start()
tl.append(t)
t=threading.Thread(target=func1,args=[4,5,6])
t.start()
tl.append(t)
# wait for all threads to complete (if you want to wait, else
# you can skip this loop)
for t in tl:
t.join()
I took the time to create an example more suitable to your needs:
2 threads executing a command and getting the output, then printing it within a lock to avoid mixup. I have used check_output method for this. I'm using windows, and I list C and D drives in parallel.
import threading,time,subprocess
lock=threading.Lock()
def func1(runrescompare,rescompare_dir):
resrun_proc = subprocess.check_output(runrescompare, shell=True, cwd=rescompare_dir, stderr=subprocess.PIPE, universal_newlines=True)
lock.acquire()
print(resrun_proc)
lock.release()
tl=[]
t=threading.Thread(target=func1,args=["ls","C:/"])
t.start()
tl.append(t)
t=threading.Thread(target=func1,args=["ls","D:/"])
t.start()
tl.append(t)
# wait for all threads to complete (if you want to wait, else
# you can skip this loop)
for t in tl:
t.join()

python subprocess.readline() blocking when calling another python script

I've been playing with using the subprocess module to run python scripts as sub-processes and have come accross a problem with reading output line by line.
The documentation I have read indicates that you should be able to use subprocess and call readline() on stdout, and this does indeed work if the script I am calling is a bash script. However when I run a python script readline() blocks until the whole script has completed.
I have written a couple of test scripts that repeat the problem. In the test scripts I attmept to run a python script (tst1.py) as a sub-process from within a python script (tst.py) and then read the output of tst1.py line by line.
tst.py starts tst1.py and tries to read the output line by line:
#!/usr/bin/env python
import sys, subprocess, multiprocessing, time
cmdStr = 'python ./tst1.py'
print(cmdStr)
cmdList = cmdStr.split()
subProc = subprocess.Popen(cmdList, stdout=subprocess.PIPE, stderr=subprocess.PIPE, stdin=subprocess.PIPE)
while(1):
# this call blocks until tst1.py has completed, then reads all the output
# it then reads empty lines (seemingly for ever)
ln = subProc.stdout.readline()
if ln:
print(ln)
tst1.py simply loops printing out a message:
#!/usr/bin/env python
import time
if __name__ == "__main__":
x = 0
while(x<20):
print("%d: sleeping ..." % x)
# flushing stdout here fixes the problem
#sys.stdout.flush()
time.sleep(1)
x += 1
If tst1.py is written as a shell script tst1.sh:
#!/bin/bash
x=0
while [ $x -lt 20 ]
do
echo $x: sleeping ...
sleep 1
let x++
done
readline() works as expected.
After some playing about I discovered the situation can be resolved by flushing stdout in tst1.py, but I do not understand why this is required. I was wondering if anyone had an explanation for this behaviour ?
I am running redhat 4 linux:
Linux lb-cbga-05 2.6.9-89.ELsmp #1 SMP Mon Apr 20 10:33:05 EDT 2009 x86_64 x86_64 x86_64 GNU/Linux
Because if the output is buffered somewhere the parent process won't see it until the child process exists at that point the output is flushed and all fd's are closed. As for why it works with bash without explicitly flushing the output, because when you type echo in a most shells it actually forks a process that executes echo (which prints something) and exists so the output is flushed too.

Resources