I'm using asyncio to run a subprocess that prints some output which I parse line by line and do various things based on the output I see. I want to put a timeout on this process, but it should not be a global timeout for the life of the entire process. Instead, whenever I see certain specific output from the process, I want to actually reset the timeout so that it starts over. How can I implement this?
For a global timeout I have this working and it's easy, I simply call asyncio.wait_for(_foo(), timeout). But I can't get this to work with resetting the timeout. Here's what I have so far:
# here invocation is my own data structure with some bookkeeping information in it
# (such as the start time from which I want to base my timeout decisions on).
# and process is the value returned by await asyncio.create_subprocess_exec(...)
# _run_one_invocation() is my own function which is just a stdout readline loop
# and some dispatching.
# Make a Task out of the co-routine so that we decide when it gets cancelled, not Python.
run_task = asyncio.Task(_run_one_invocation(invocation, process))
while True:
try:
# Use asyncio.shield so it doesn't get cancelled if the timeout expires
await asyncio.shield(asyncio.wait_for(run_task, 1))
# If the await returns without raising an exception, we got to EOF and we're done.
break
except asyncio.TimeoutError:
# If it's been too long since last reset, this is a "real" timeout.
duration = time.time() - invocation.start_time
if duration > timeout:
run_task.cancel()
raise
When I run this, the if statement which calls run_task.cancel() is not being entered, and nevertheless when I go back to the top of the loop and call asyncio.wait_for() again it immediately raises an asyncio.CancelledError.
What am I doing wrong?
You can fix the issue and simplify the code by completely avoiding wait_for() (and therefore shield()) and just using wait(return_when=FIRST_COMPLETED) to implement the kind of timeout you need:
run_task = asyncio.create_task(_run_one_invocation(invocation, process))
while True:
await asyncio.wait([run_task], timeout=1)
if run_task.done():
break
if time.time() - invocation.start_time > timeout:
run_task.cancel()
raise asyncio.TimeoutErrror()
The downside of this approach is that it introduces 1s wakeups, prohibiting the program (and consequently the computer) from ever going to sleep, even if the task is dormant for hours. Probably not a big deal on a server, but such practices contribute to battery drain on laptops, and it's a good idea to avoid them. Also, the 1s sleep introduces an up to 1s latency to react to a change in timeout.
To resolve that, you can create an event that is fired by the code changing the timeout, and react to that event in addition to the timeout and the task completing:
timeout_changed = asyncio.Event()
# pass timeout_changed where needed, and have the code that changes
# the timeout also call timeout_changed.set()
run_task = asyncio.create_task(_run_one_invocation(invocation, process))
while True:
remaining = timeout - (time.time() - invocation.start_time)
timeout_changed_task = asyncio.ensure_future(timeout_changed.wait())
await asyncio.wait([run_task, timeout_changed_task],
return_when=asyncio.FIRST_COMPLETED, timeout=remaining)
timeout_changed_task.cancel()
# either: 1) the task has completed, 2) the previous timeout has
# expired, or 3) the timeout has changed
if run_task.done():
break # 1
if time.time() - invocation.start_time > timeout:
# 2 or 2+3
run_task.cancel()
raise asyncio.TimeoutErrror()
# 3 - continue waiting with the new timeout
Related
I've a Celery (4.4.7) task which makes a blocking call and may block for long time. I don't want to keep worker busy for long time, so I setup soft_time_limit for the task. My hope was to fail a task (if it's impossible to complete it quickly) and retry it later.
My issue is that SoftTimeLimitExceeded exception is not being raised (I suppose due to the call blocking on OS level). As a result the task is killed by hard time_limit and I don't have a chance to retry it.
#shared_task(
acks_late=True,
ignore_results=True,
soft_time_limit=5,
time_limit=15,
default_retry_delay=1,
retry_kwargs={"max_retries": 10},
retry_backoff=True,
retry_backoff_max=1200, # 20 min
retry_jitter=True,
autoretry_for=(SoftTimeLimitExceeded,),
)
def my_task():
blocking_call_taking_long_time()
What I tried:
Hard time limit is impossible to intercept
I expected ack_late would push my timed-out task back to the queue, but it doesn't happen.
Tried adding reject_on_worker_lost, neither value changes things for me
SoftTimeLimitExceeded exception is 100% not there - neither autoretry_for, nor regular try ... except don't catch it
For now I ended up with setting explicit timeout for the blocking operation. This requires adding a parameter everywhere along the call chain.
Is there some other path I'm missing?
I have some lists of multiprocessing.Queues to communicate between two processes. I want to send a "None" as a last value on each one of the Queues to indicate to the second process the end of the data stream, but this does not seem to always work (I get the None in some of the Queues but not in each one of them) unless I add at least one print() after one of the put() instruction.
Clarification: It works sometimes without the print, but not always. Also, when I put the print instructions, this works so far 100% of the time.
I have also tried setting block=True for the put() method, but this does not seem to make any difference.
I found this solution wile trying to debug the problem, to find out if I'm having problems while putting the values in the Queue or while getting them, but when I put a print() on the put() side, the code always works.
EDIT:
A simplified but complete version that reproduces in part the problem: I have identified two potentially problematic parts, marked in the code as CODEBLOCK1 and CODEBLOCK2: If I uncomment either one of these, the code works as expected.
minimal_example.py:
import multiprocessing, processes
def MainProcess():
multiprocessing.set_start_method("spawn")
metricsQueue = multiprocessing.Queue() # Virtually infinite size
# Define and start the parallel processes
process1 = multiprocessing.Process(target=processes.Process1,
args=(metricsQueue,))
process2 = multiprocessing.Process(target=processes.Process2,
args=(metricsQueue,))
process1.start()
process2.start()
process1.join()
process2.join()
# Script entry point
if __name__ == '__main__':
MainProcess()
processes.py:
import random, queue
def Process1(metricsQueue):
print("Start of process 1")
# Cancel join for the queues, so that upon killing this process, the main process does not block on join if there
# are still elements on the queues -> We don't mind losing data if the process is killed.
# Start of CODEBLOCK1
metricsQueue.cancel_join_thread()
# End of CODEBLOCK1
longData = random.sample(range(10205, 26512), 992)
# Start of CODEBLOCK2
# Put a big number of data in the queue
for data in longData:
try:
metricsQueue.put(data, block=False)
except queue.Full:
print("Error")
# End of CODEBLOCK2
# Once finished, push a None through all queues to mark the end of the process
try:
metricsQueue.put(None, block=False)
print("put None in metricsQueue")
except queue.Full:
print("Error")
print("End of process 1")
def Process2(metricsQueue):
print("Start of process 2")
newMetricsPoint = 0
recoveredMetrics = []
while (newMetricsPoint is not None):
# Metrics point
try:
newMetricsPoint = metricsQueue.get(block=False)
except queue.Empty:
pass
else:
if (newMetricsPoint is not None):
recoveredMetrics.append(newMetricsPoint)
print(f"got {len(recoveredMetrics)} points so far")
else:
print("get None from metricsQueue")
print("End of process 2")
This code give as a result something like this, and the second process will never end, because stuck in the wile loop:
Start of process 1
Start of process 2
put None in metricsQueue 0
End of process 1
If I comment either CODEBLOCK1 OR CODEBLOCK2, the code will work as expected:
Start of process 1
Start of process 2
put None in metricsQueue 0
End of process 1
get None from metricsQueue 0
End of process 2
We don't mind losing data if the process is killed.
This assumption is not correct. The closing signal None is part of the data; losing it prevents the sibling process from shutting down.
If the processes rely on a shutdown signal, do not .cancel_join_thread() for the queues used to send this signal.
Nevermind, I found the problem.
Turns out I misinterpreted what queue.cancel_join_thread() does.
This makes process 1 finish when done sending all data, even if there is some data left in the queue to be consumed by my second process. This causes all the unconsumed data to be flushed and, therefore, lost, never arriving to my second process.
Working with Python 3.6, what I’m looking to accomplish is to create a function that continuously scrapes dynamic/changing data from a webpage, while the rest of the script executes, and is able to reference the data returned from the continuous function.
I know this is likely a threading task, however I’m not super knowledgeable in it yet. Pseudo-code I might think looks something like this
def continuous_scraper():
# Pull data from webpage
scraped_table = pd.read_html(url)
return scraped_table
# start the continuous scraper function here, to run either indefinitely, or preferably stop after a predefined amount of time
scraped_table = thread(continuous_scraper)
# the rest of the script is run here, making use of the updating “scraped_table”
while True:
print(scraped_table[“Col_1”].iloc[0]
Here is a fairly simple example using some stock market page that seems to update every couple of seconds.
import threading, time
import pandas as pd
# A lock is used to ensure only one thread reads or writes the variable at any one time
scraped_table_lock = threading.Lock()
# Initially set to None so we know when its value has changed
scraped_table = None
# This bad-boy will be called only once in a separate thread
def continuous_scraper():
# Tell Python this is a global variable, so it rebinds scraped_table
# instead of creating a local variable that is also named scraped_table
global scraped_table
url = r"https://tradingeconomics.com/australia/stock-market"
while True:
# Pull data from webpage
result = pd.read_html(url, match="Dow Jones")[0]
# Acquire the lock to ensure thread-safety, then assign the new result
# This is done after read_html returns so it doesn't hold the lock for so long
with scraped_table_lock:
scraped_table = result
# You don't wanna flog the server, so wait 2 seconds after each
# response before sending another request
time.sleep(2)
# Make the thread daemonic, so the thread doesn't continue to run once the
# main script and any other non-daemonic threads have ended
scraper_thread = threading.Thread(target=continuous_scraper, daemon=True)
# start the continuous scraper function here, to run either indefinitely, or
# preferably stop after a predefined amount of time
scraper_thread.start()
# the rest of the script is run here, making use of the updating “scraped_table”
for _ in range(100):
print("Time:", time.time())
# Acquire the lock to ensure thread-safety
with scraped_table_lock:
# Check if it has been changed from the default value of None
if scraped_table is not None:
print(" ", scraped_table)
else:
print("scraped_table is None")
# You probably don't wanna flog your stdout, either, dawg!
time.sleep(0.5)
Be sure to read about multithreaded programming and thread safety. It's easy to make mistakes. If there is a bug, it often only manifests in rare and seemingly random occasions, making it difficult to debug.
I recommend looking into multiprocessing library and Pool class.
The docs have multiple examples of how to use it.
Question itself is too general to make a simple answer.
I have a simple watchdog in python 3 that reboots my server if something goes wrong:
import time, os
from multiprocessing import Pool
def watchdog(x):
time.sleep(x)
os.system('reboot')
return
def main():
while True:
p = Pool(processes=1)
p.apply_async(watchdog, (60, )) # start watchdog with 60s interval
# here some code thas has a little chance to block permanently...
# reboot is ok because of many other files running independently
# that will get problems too if this one blocks too long and
# this will reset all together and autostart everything back
# block is happening 1-2 time a month, mostly within a http-request
p.terminate()
p.join()
return
if __name__ == '__main__':
main()
p = Pool(processes=1) is declared every time the while loop starts.
Now here the question: Is there any smarter way?
If I p.terminate() to prevent the process from reboot, the Pool becomes closed for any other work. Or is there even nothing wrong with declaring a new Pool every time because of garbage collection.
Use a process. Processes support all of the features you are using, so you don't need to make a pool with size one. While processes do have a warning about using the terminate() method (since it can corrupt pipes, sockets, and locking primitives), you are not using any of those items and don't need to care. (In any event, Pool.terminate() probably has the same issues with pipes etc. even though it lacks a similar warning.)
I have a piece of code which has to get executed every 100ms and update the GUI. When I am updating the GUI - I am pressing a button, which calls a thread and in turn it calls a target function. The target function gives back the message to the GUI thread using pub sub as follows.
wx.CallAfter(pub.sendMessage, "READ EVENT", arg1=data, arg2=status_read) # This command line is in my target function
pub.subscribe(self.ReadEvent, "READ EVENT") # This is in my GUI file - whihc calls the following function
def ReadEvent(self, arg1, arg2):
if arg2 == 0:
self.MessageBox('The program did not properly read data from MCU \n Contact the Program Developer')
return
else:
self.data = arg1
self.firmware_version_text_control.Clear()
#fwversion = '0x' + ''.join('{:02X}'.format(j) for j in reversed(fwversion))
self.firmware_version_text_control.AppendText(str(SortAndDecode(self.data, 'FwVersion')))
# Pump Model
self.pump_model_text_control.Clear()
self.pump_model_text_control.AppendText(str(SortAndDecode(self.data, 'ModelName')))
# Pump Serial Number
self.pump_serial_number_text_control.Clear()
self.pump_serial_number_text_control.AppendText(str(SortAndDecode(self.data, 'SerialNum'))[:10]) # Personal Hack to not to display the AA , AB and A0
# Pressure GAIN
self.gain_text_control.Clear()
self.gain_text_control.AppendText(str(SortAndDecode(self.data, 'PresGain')))
# Pressure OFFSET Offset
self.offset_text_control.Clear()
self.offset_text_control.AppendText(str(SortAndDecode(self.data, 'PresOffset')))
#Wagner Message:
#self.status_text.SetLabel(str(SortAndDecode(self.data, 'WelcomeMsg')))
# PUMP RUNNING OR STOPPED
if PumpState(SortAndDecode(self.data, 'PumpState')) == 1:
self.led6.SetBackgroundColour('GREEN')
elif PumpState(SortAndDecode(self.data, 'PumpState')) == 0:
self.led6.SetBackgroundColour('RED')
else:
self.status_text.SetLabel(PumpState(SortAndDecode(self.data, 'PumpState')))
# PUMP RPM
self.pump_rpm_text_control.Clear()
if not self.new_model_value.GetValue():
self.pump_rpm_text_control.AppendText("000")
else:
self.pump_rpm_text_control.AppendText(str(self.sheet_num.cell_value(self.sel+1,10)*(SortAndDecode(self.data, 'FrqQ5'))/65536))
# PUMP PRESSURE
self.pressure_text_control.Clear()
self.pressure_text_control.AppendText(str(SortAndDecode(self.data, 'PresPsi')))
# ON TIME -- HOURS AND MINUTES --- EDITING IF YOU WANT
self.on_time_text_control.Clear()
self.on_time_text_control.AppendText(str(SortAndDecode(self.data, 'OnTime')))
# JOB ON TIME - HOURS AND MINUTES - EDITING IF YOU WANT
self.job_on_time_text_control.Clear()
self.job_on_time_text_control.AppendText(str(SortAndDecode(self.data, 'JobOnTime')))
# LAST ERROR ----- RECHECK THIS AGAIN
self.last_error_text_control.Clear()
self.last_error_text_control.AppendText(str(SortAndDecode(self.data, 'LastErr')))
# LAST ERROR COUNT --- RECHECK THIS AGAIN
self.error_count_text_control.Clear()
self.error_count_text_control.AppendText("CHECK THIS")
As you can see my READEVENT is very big and it takes a while for the GUI to take enough time to successfully do all these things. My problem here is, when my GUI is updating the values of TEXTCTRL it is going unresponsive - I cannot do anything else. I cant press a button or enter data or anything else. My question is if there is a better way for me to do this, so my GUI wont go unresponsive. I dont know how I can put this in a different thread as all widgets are in the main GUI. But that also requires keep creating threads every 100ms - which is horrible. Any suggestions would be greatly helpful.
Some suggestions:
How long does SortAndDecode take? What about the str() of the result? Those may be good candidates for keeping that processing in the worker thread instead of the UI thread, and passing the values to the UI thread pre-sorted-and-decoded.
You can save a little time in each iteration by calling ChangeValue instead of Clear and AppendText. Why do two function calls for each text widget instead of just one? Function calls are relatively expensive in Python compared to other Python code.
If it's possible that the same value will be sent that was sent on the last iteration then adding checks for the new value matching the old value and skipping the update of the widget could potentially save lots of time. Updating widget values is very expensive compared to leaving them alone.
Unless there is a hard requirement for 100ms updates you may want to try 150 or 200. Fewer updates per second may be fast enough for most people, especially since it's mostly textual. How much text can you read in 100ms?
If you are still having troubles with having more updates than the UI thread can keep up with, then you may want to use a different approach than pubsub and wx.CallAfter. For example you could have the worker thread receive and process the data and then add an object to a Queue.Queue and then call wx.WakeUpIdle(). In the UI thread you can have an EVT_IDLE event handler that checks the queue and pulls the first item out of the queue, if there are any, and then updates the widgets with that data. This will give the benefit of not flooding the pending events list with events from too many wx.CallAfter calls, and you can also do things like remove items from your data queue if there are too many items in it.