It's my first time asking a question on here so bear with me.
I'm trying to make a python3 program that runs executable files for x amount of time and creates a log of all output in a text file. For some reason the code I have so far works only with some executables. I'm new to python and especially subprocess so any help is appreciated.
import time
import subprocess
def CreateLog(executable, timeout=5):
time_start = time.time()
process = subprocess.Popen(executable, stdout=subprocess.PIPE, stderr=subprocess.DEVNULL, text=True)
f = open("log.txt", "w")
while process.poll() is None:
output = process.stdout.readline()
if output:
f.write(output)
if time.time() > time_start + timeout:
process.kill()
break
I was recently experimenting with crypto mining and came across nanominer, I tried using this python code on nanominer and the log file was empty. I am aware that nanominer already logs its output, but the point is why does the python code fail.
You are interacting through .poll() (R U dead yet?) and .readline().
It's not clear you want to do that.
There seems to be two cases for your long-lived child:
it runs "too long" silently
it runs forever, regularly producing output text at e.g. one-second intervals
The 2nd case is the easy one.
Just use for line in process.stdout:, consume the line,
peek at the clock, and maybe send a .kill() just as you're already doing.
No need for .poll(), as child exiting will produce EOF on that pipe.
For the 1st case, you will want to set an alarm.
See https://docs.python.org/3/library/signal.html#example
signal.signal(signal.SIGALRM, handler)
signal.alarm(5)
After "too long", five seconds, your handler will run.
It can do anything you desire.
You'll want it to have access to the process handle,
which will let you send a .kill().
Related
I am grading python assignments in an entry level class and I run their programs with the same input as my sample program and compare the outputs:
proc = subprocess.Popen([sys.executable, arg_path],
stdin=subprocess.PIPE,
stdout=subprocess.PIPE,
stderr=subprocess.PIPE,
universal_newlines=True,
bufsize=0)
for the_arg in arg_list:
proc.stdin.write(f"{the_arg}\n")
sample_output = ""
for line in proc.stdout:
sample_output += line
The problem is if they really mess up and their program is waiting for more input than I am sending, the program will hange forever when I try to read from proc.stdout. I cannot find how, using subprocess.Popen, how to set time limit (like 10 seconds) after which I can kill the process. Since I have a wrapper that runs each student it's a real problem if one of them hangs on me.
TIA
I have tried proc.communitcate(timeout=10) and that just kills the process no matter what, even is everything is correct.
I have a script that receives temperature data via using requests. Since I had to make multiple requests (around 13000) I decided to explore the use of multi-threading which I am new at.
The programs work by grabbing longitude/latitude data from a csv file and then makes a request to retrieve the temperature data.
The problem that I am facing is that the script does not finish fully when the last temperature value is retrieved.
Here is the code. I have shortened so it is easy to see what I am doing:
num_threads = 16
q = Queue(maxsize=0)
def get_temp(q):
while not q.empty():
work = q.get()
if work is None:
break
## rest of my code here
q.task_done()
At main:
def main():
for o in range(num_threads):
logging.debug('Starting Thread %s', o)
worker = threading.Thread(target=get_temp, args=(q,))
worker.setDaemon(True)
worker.start()
logging.info("Main Thread Waiting")
q.join()
logging.info("Job complete!")
I do not see any errors on the console and temperature is being successfully being written to another file. I have a tried running a test csv file with only a few longitude/latitude references and the script seems to finish executing fine.
So is there a way of shedding light as to what might be happening in the background? I am using Python 3.7.3 on PyCharm 2019.1 on Linux Mint 19.1.
the .join() function waits for all threads to join before continuing to the next line
This question already has answers here:
Executing periodic actions [duplicate]
(9 answers)
Closed 3 years ago.
I want to read a file line by line and output each line at a fixed interval .
The purpose of the script is to replay some GPS log files whilst updating the time/date fields as the software I'm testing rejects messages if they are too far out from the system time.
I'm attempting to use apscheduler for this as I wanted the output rate to be as close to 10Hz as reasonably possible and this didn't seem achievable with simple sleep commands.
I'm new to Python so I can get a little stuck on the scope of variables with tasks like this. The closest I've come to making this work is by just reading lines from the file object in my scheduled function.
import sys, os
from apscheduler.schedulers.blocking import BlockingScheduler
def emitRMC():
line = route.readline()
if line == b'':
route.seek(0)
line = route.readline()
print(line)
if __name__ == '__main__':
route = open("testfile.txt", "rb")
scheduler = BlockingScheduler()
scheduler.add_executor('processpool')
scheduler.add_job(emitRMC, 'interval', seconds=0.1)
scheduler.start()
However this doesn't really feel like the correct way of proceeding and I'm also seeing each input line being repeated 10 times at the output which I can't explain.
The repetition seemed very consistent and I thought it might be due max_workers although I've since set that to 1 without any impact.
I also changed the interval as 10Hz and the 10x repetition felt like it could be something more than a coincidence.
Usually when I get stuck like this it means I'm heading off in the wrong direction and I need pointing to a smarter approach so all advice will be welcome.
I found a simple solution here Executing periodic actions in Python [duplicate]
with this code from Micheal Anderson which works in a single thread.
import datetime, threading, time
def foo():
next_call = time.time()
while True:
print datetime.datetime.now()
next_call = next_call+1;
time.sleep(next_call - time.time())
timerThread = threading.Thread(target=foo)
timerThread.start()
I've been through dozens of the "Python subprocess hangs" articles here and think I've addressed all of the issues presented in the various articles in the code below.
My code intermittently hangs at the Popen command. I am running 4 threads using multiprocessing.dummy.apply_async, each of those threads starts a subprocess and then reads the output line by line and prints a modified version of it to stdout.
def my_subproc():
exec_command = ['stdbuf', '-i0', '-o0', '-e0',
sys.executable, '-u',
os.path.dirname(os.path.realpath(__file__)) + '/myscript.py']
proc = subprocess.Popen(exec_command, env=env, stdout=subprocess.PIPE, stderr=subprocess.STDOUT, bufsize=1)
print "DEBUG1", device
for line in iter(proc.stdout.readline, b''):
with print_lock:
for l in textwrap.wrap(line.rstrip(), LINE_WRAP_DEFAULT):
The code above is run from apply_async:
pool = multiprocessing.dummy.Pool(4)
for i in range(0,4):
pool.apply_async(my_subproc)
Intermittently the subprocess will hang at subprocess.Popen, the statement "DEBUG1" is not printed. Sometimes all threads will work, sometimes as few as 1 of the 4 will work.
I'm not aware that this exhibits any of the known deadlock situations for Popen. Am I wrong?
There is an insidious bug in subprocess.Popen() caused by io buffering of stdout (possibly stderr). There is a limit of around 65536 characters in the child process io buffer. If the child process writes enough output the child process will "hang" waiting for the buffer to be flushed - a deadlock situation. The authors of subprocess.py seem to believe this is a problem caused by the child, even though a subprocess.flush would be welcome. Pearson Anders pearson,
https://thraxil.org/users/anders/posts/2008/03/13/Subprocess-Hanging-PIPE-is-your-enemy/ Has a simple solution but you have to pay attention. As he says, "tempfile.TemporaryFile() is your friend." In my case I am running an application in a loop to batch process a bunch of files, the code for the solution is:
with tempfile.TemporaryFile() as fout:
sp.run(['gmat', '-m', '-ns', '-x', '-r', str(gmat_args)], \
timeout=cpto, check=True, stdout=fout, stderr=fout)
The fix above still deadlocks after processing about 20 files. An improvement but not good enough, since I need to process hundreds of files in a batch. I came up with the below "crowbar" approach.
proc = sp.Popen(['gmat', '-m', '-ns', '-x', '-r', str(gmat_args)], stdout=sp.PIPE, stderr=sp.STDOUT)
""" Run GMAT for each file in batch.
Arguments:
-m: Start GMAT with a minimized interface.
-ns: Start GMAT without the splash screen showing.
-x: Exit GMAT after running the specified script.
-r: Automatically run the specified script after loading.
Note: The buffer passed to Popen() defaults to io.DEFAULT_BUFFER_SIZE, usually 62526 bytes.
If this is exceeded, the child process hangs with write pending for the buffer to be read.
https://thraxil.org/users/anders/posts/2008/03/13/Subprocess-Hanging-PIPE-is-your-enemy/
"""
try:
(outs, errors) = proc.communicate(cpto)
"""Timeout in cpto seconds if process does not complete."""
except sp.TimeoutExpired as e:
logging.error('GMAT timed out in child process. Time allowed was %s secs, continuing', str(cpto))
logging.info("Process %s being terminated.", str(proc.pid))
proc.kill()
""" The child process is not killed by the system. """
(outs, errors) = proc.communicate()
""" And the stdout buffer must be flushed. """
The basic idea is to kill the process and flush the buffer on each timeout. I moved the TimeoutExpired exception into the batch processing loop so that after killing the process, it continues with the next. This is harmless if the timeout value is sufficient to allow gmat to complete (albeit slower). I find that the code will process from 3 to 20 files before it times out.
This really seems like a bug in subprocess.
This appears to be a bad interaction with multiprocessing.dummy. When I use multiprocessing (not the .dummy threading interface) I'm unable to reproduce the error.
I'm working on code where I have a long running shell command whose output is sent to disk. This command will generate hundreds of GBs per file. I have successfully written code that calls this command asynchronously and successfully yields control (awaits) for it to complete.
I also have code that can asynchronously read that file as it is being written to so that I can process the data contained therein. The problem I'm running into is that I can't find a way to stop the file reader once the shell command completes.
I guess I'm looking for some sort of interrupt I can pass into my writer function once the shell command ends that I can use to tell it to close the file and wrap up the event loop.
Here is my writer function. Right now, it runs forever waiting for new data to be written to the file.
import asyncio
PERIOD = 0.5
async def readline(f):
while True:
data = f.readline()
if data:
return data
await asyncio.sleep(PERIOD)
async def read_zmap_file():
with open('/data/largefile.json'
, mode = 'r+t'
, encoding = 'utf-8'
) as f:
i = 0
while True:
line = await readline(f)
print('{:>10}: {!s}'.format(str(i), line.strip()))
i += 1
loop = asyncio.get_event_loop()
loop.run_until_complete(read_zmap_file())
loop.close()
If my approach is off, please let me know. I'm relatively new to asynchronous programming. Any help would be appreciated.
So, I'd do something like
reader = loop.create_task(read_zmap_file)
Then in your code that manages the shell process once the shell process exits, you can do
reader.cancel()
You can do
loop.run_until_complete(reader)
Alternatively, you could simply set a flag somewhere and use that flag in your while statement. You don't need to use asyncio primitives when something simpler works.
That said, I'd look into ways that your reader can avoid the periodic sleep. if your reader will be able to keep up with the shell command, I'd recommend a pipe, because pipes can be used with select (and thus added to an event loop). Then in your reader you can write to to a file if you need a permanent log. I realize the discussion of avoiding the periodic sleep is beyond the scope of this question, and I don't want to go into more detail than I have, but you di ask for hints on how best to approach async programming.