How do I check something on an interval asynchronously in python3? - python-3.x

Let's say I have a file that I know is going to be updated sometime in the next minute. I want a Python function to fire (roughly) as soon as that happens.
I set up something rudimentary to await that input:
import os, time
modification_time = os.path.getmtime(file)
def wait_for_update(file):
modified = False
while not modified:
if modification_time == os.path.getmtime(file):
time.sleep(1)
else:
modified = True
file_contents = file.read()
Theoretically, that should work fine.
However, I'm coming from a Javascript background. In Javascript, a single-threaded language, the equivalent of that function will stop the entire programme until that while loop finishes - and precious seconds of time that could be spent processing other input is wasted.
In Javascript, there's a handy window.setTimeout() function that queues up a function to be fired at some point in the future when the main thread is clear of processes. It's not truly asynchronous, it just offsets that operation a little bit into the future, leaving other stuff to happen in the meantime.
So that same process written in recursive JS might look something like this:
function waitForUpdate(file) {
var modified = false;
modified = compareFileModificationDates();
if(modified) {
modified = true;
// do other things
} else {
setTimeout(waitForUpdate(file), 1000);
}
}
Do I need to do something similar in Python? Or is time.sleep fine?

Related

How can I force asyncio task to run?

I'm noticing that when I spawn an asyncio task using create_task, it's first completing the rest of the logic rather than starting that task. I'm forced to add an await asyncio.sleep(0) to get the task started, which seems a bit hacky and unclean to me.
Here is some example code:
async def make_rpc_calls(...some args...)
val_1, val_2 = await asyncio.gather(rpc_call_1(...), rpc_call_2(...))
return process(val_1, val_2)
def some_very_cpu_intensive_function(...some args...):
// Does a lot of computation, can take 20 seconds to run
task_1 = asyncio.get_running_loop().create_task(make_rpc_calls(...))
intensive_result = some_very_cpu_intensive_function(...)
await task_1
process(intensive_result, task_1.result())
Anytime I run the above, it runs the some_very_cpu_intensive_function function before the kicking off the expensive RPCs. The only way I've gotten this to work is to do:
async def make_rpc_calls(...some args...)
val_1, val_2 = await asyncio.gather(rpc_call_1(...), rpc_call_2(...))
return process(val_1, val_2)
def some_very_cpu_intensive_function(...some args...):
// Does a lot of computation, can take 20 seconds to run
task_1 = asyncio.get_running_loop().create_task(make_rpc_calls(...))
await asyncio.sleep(0)
intensive_result = some_very_cpu_intensive_function(...)
await task_1
process(intensive_result, task_1.result())
This feels like a hack to me - I'm forcing the event loop to context switch, and doesn't feel like I'm using the asyncio framework correctly. Is there another way I should be approaching this?
sleep() always suspends the current task, allowing other tasks to run.
Setting the delay to 0 provides an optimized path to allow other tasks to run. This can be used by long-running functions to avoid blocking the event loop for the full duration of the function call.
Source: https://docs.python.org/3/library/asyncio-task.html

how to run gather and a coroutine task concurrently

I have approximately the following code
import asyncio
.
.
.
async def query_loop()
while connected:
result = await asyncio.gather(get_value1, get_value2, get_value3)
if True in result:
connected = False
async def main():
await query_loop()
asyncio.run(main())
The get_value - functions query a device, receive values, and publish them to a server. If no problems occur they return False, else True.
Now I need to implement, that the get_value2-function checks if it received the value 7. In this case I need the program to wait for 3 min before sending a special command to the device. But in the mean time, and also afterwards the query_loop should continue.
Has anybody an idea how to do that ?
thanks in advance!
If I understand you correctly, you want to modify get_value2 so that it reacts to a value received from device by spawning additional work in the background, i.e. do something without the loop in query_loop having to wait for that new work to finish.
You can use asyncio.create_task() to spawn a background task. In fact, you can always combine create_task() and await to runs things in the background; asyncio.gather is just a utility function that does it for you. In this case query_loop remains unchanged, and get_value2 gets modified like this:
async def get_value2():
...
value = await receive_value_from_device()
if value == 7:
# schedule send_command() to run, but don't wait for it
asyncio.create_task(special_command())
...
return False
async def special_command():
await asyncio.sleep(180)
await send_command_to_device(...)
Note that if get_value1 and others are async functions, the correct invocation of gather must call them, so it should be await asyncio.gather(get_value1(), get_value2(), get_value3()) (note the extra parentheses).

Python 3.5 asyncio execute coroutine on event loop from synchronous code in different thread

I am hoping someone can help me here.
I have an object that has the ability to have attributes that return coroutine objects. This works beautifully, however I have a situation where I need to get the results of the coroutine object from synchronous code in a separate thread, while the event loop is currently running. The code I came up with is:
def get_sync(self, key: str, default: typing.Any=None) -> typing.Any:
"""
Get an attribute synchronously and safely.
Note:
This does nothing special if an attribute is synchronous. It only
really has a use for asynchronous attributes. It processes
asynchronous attributes synchronously, blocking everything until
the attribute is processed. This helps when running SQL code that
cannot run asynchronously in coroutines.
Args:
key (str): The Config object's attribute name, as a string.
default (Any): The value to use if the Config object does not have
the given attribute. Defaults to None.
Returns:
Any: The vale of the Config object's attribute, or the default
value if the Config object does not have the given attribute.
"""
ret = self.get(key, default)
if asyncio.iscoroutine(ret):
if loop.is_running():
loop2 = asyncio.new_event_loop()
try:
ret = loop2.run_until_complete(ret)
finally:
loop2.close()
else:
ret = loop.run_until_complete(ret)
return ret
What I am looking for is a safe way to synchronously get the results of a coroutine object in a multithreaded environment. self.get() can return a coroutine object, for attributes I have set to provide them. The issues I have found are: If the event loop is running or not. After searching for a few hours on stack overflow and a few other sites, my (broken) solution is above. If the loop is running, I make a new event loop and run my coroutine in the new event loop. This works, except that the code hangs forever on the ret = loop2.run_until_complete(ret) line.
Right now, I have the following scenarios with results:
results of self.get() is not a coroutine
Returns results. [Good]
results of self.get() is a coroutine & event loop is not running (basically in same thread as the event loop)
Returns results. [Good]
results of self.get() is a coroutine & event loop is running (basically in a different thread than the event loop)
Hangs forever waiting for results. [Bad]
Does anyone know how I can go about fixing the bad result so I can get the value I need? Thanks.
I hope I made some sense here.
I do have a good, and valid reason to be using threads; specifically I am using SQLAlchemy which is not async and I punt the SQLAlchemy code to a ThreadPoolExecutor to handle it safely. However, I need to be able to query these asynchronous attributes from within these threads for the SQLAlchemy code to get certain configuration values safely. And no, I won't switch away from SQLAlchemy to another system just in order to accomplish what I need, so please do not offer alternatives to it. The project is too far along to switch something so fundamental to it.
I tried using asyncio.run_coroutine_threadsafe() and loop.call_soon_threadsafe() and both failed. So far, this has gotten the farthest on making it work, I feel like I am just missing something obvious.
When I get a chance, I will write some code that provides an example of the problem.
Ok, I implemented an example case, and it worked the way I would expect. So it is likely my problem is elsewhere in the code. Leaving this open and will change the question to fit my real problem if I need.
Does anyone have any possible ideas as to why a concurrent.futures.Future from asyncio.run_coroutine_threadsafe() would hang forever rather than return a result?
My example code that does not duplicate my error, unfortunately, is below:
import asyncio
import typing
loop = asyncio.get_event_loop()
class ConfigSimpleAttr:
__slots__ = ('value', '_is_async')
def __init__(
self,
value: typing.Any,
is_async: bool=False
):
self.value = value
self._is_async = is_async
async def _get_async(self):
return self.value
def __get__(self, inst, cls):
if self._is_async and loop.is_running():
return self._get_async()
else:
return self.value
class BaseConfig:
__slots__ = ()
attr1 = ConfigSimpleAttr(10, True)
attr2 = ConfigSimpleAttr(20, True)
def get(self, key: str, default: typing.Any=None) -> typing.Any:
return getattr(self, key, default)
def get_sync(self, key: str, default: typing.Any=None) -> typing.Any:
ret = self.get(key, default)
if asyncio.iscoroutine(ret):
if loop.is_running():
fut = asyncio.run_coroutine_threadsafe(ret, loop)
print(fut, fut.running())
ret = fut.result()
else:
ret = loop.run_until_complete(ret)
return ret
config = BaseConfig()
def example_func():
return config.get_sync('attr1')
async def main():
a1 = await loop.run_in_executor(None, example_func)
a2 = await config.attr2
val = a1 + a2
print('{a1} + {a2} = {val}'.format(a1=a1, a2=a2, val=val))
return val
loop.run_until_complete(main())
This is the stripped down version of exactly what my code is doing, and the example works, even if my actual application doesn't. I am stuck as far as where to look for answers. Suggestions are welcome as to where to try to track down my "stuck forever" problem, even if my code above doesn't actually duplicate the problem.
It is very unlikely that you need to run several event loops at the same time, so this part looks quite wrong:
if loop.is_running():
loop2 = asyncio.new_event_loop()
try:
ret = loop2.run_until_complete(ret)
finally:
loop2.close()
else:
ret = loop.run_until_complete(ret)
Even testing whether the loop is running or not doesn't seem to be the right approach. It's probably better to give explicitly the (only) running loop to get_sync and schedule the coroutine using run_coroutine_threadsafe:
def get_sync(self, key, loop):
ret = self.get(key, default)
if not asyncio.iscoroutine(ret):
return ret
future = asyncio.run_coroutine_threadsafe(ret, loop)
return future.result()
EDIT: Hanging problems can be related to tasks being scheduled in the wrong loop (e.g. forgetting about the optional loop argument when calling a coroutine). This kind of problem should be easier to debug with the PR 303 (now merged): a RuntimeError is raised instead when the loop and the future don't match. So you might want to run your tests with the latest version of asyncio.
Ok, I got my code working, by taking a different approach to it. The problem was tied with using something that had file IO, which I was converting into a coroutine using loop.run_in_executor() on the file IO components. Then, I was trying to use this in a sync function being called from another thread, processed using another loop.run_in_executor() on that function. This is a very important routine in my code (called probably a million times or more during the execution of my short-running code), and I made a decision that my logic was just getting too complicated. So... I uncomplicated it. Now, if I want to use the file IO components asynchronously, I explicitly use my "get_async()" method, otherwise, I use my attribute through normal attribute access.
By removing the complexity of my logic, it made the code cleaner, easier to understand, and even more importantly, it actually works. While I am not 100% certain that I know the root cause of the issue (I believe it has something to do with a thread processing an attribute, which then in turn starts another thread that tries to read the attribute before it is processed, which caused something like a race condition and halting my code, but I could never duplicate the error outside of my application unfortunately to completely prove it out), I was able to get past it and continue with my development efforts.

Bug in Python? threading.Thread.start() does not always return

I have a tiny Python script which (in my eyes) makes threading.Thread.start() behave unexpectedly since it does not return immediately.
Inside a thread I want to call a method from a boost::python based object which will not return immediately.
To do so I wrap the object/method like this:
import threading
import time
import my_boostpython_lib
my_cpp_object = my_boostpython_lib.my_cpp_class()
def some_fn():
# has to be here - otherwise .start() does not return
# time.sleep(1)
my_cpp_object.non_terminating_fn() # blocks
print("%x: 1" % threading.get_ident())
threading.Thread(target=some_fn).start()
print("%x: 2" % threading.get_ident()) # will not always be called!!
And everything works fine as long as I run some code before my_cpp_object.non_terminating_fn(). If I don't, .start() will block the same way as calling .run() directly would.
Printing just a line before calling the boost::python function is not enough, but e.g. printing two lines or calling time.sleep() makes start() return immediately as expected.
Can you explain this behavior? How would I avoid this (apart from calling sleep() before calling a boost::python function)?
This behavior is (as in most cases when you believe in a bug in an interpreter/compiler) not a bug in Python but a race condition covering the behavior you have to expect because of the Python GIL (also discussed here).
As soon as the non-Python function my_cpp_object.non_terminating_fn() has been started the GIL doesn't get released until it returns and keeps the interpreter from executing any other command.
So time.sleep(1) doesn't help here anyway because the code following my_cpp_object.non_terminating_fn() would not be executed until the GIL gets released.
In case of boost::python and of course in case you can modify the C/C++ part you can release the GIL manually as described here.
A small example (from the link above) could look like this (in the boost::python wrapper code)
class scoped_gil_release {
public:
inline scoped_gil_release() {
m_thread_state = PyEval_SaveThread();
}
inline ~scoped_gil_release() {
PyEval_RestoreThread(m_thread_state);
m_thread_state = NULL;
}
private:
PyThreadState * m_thread_state;
};
int non_terminating_fn_wrapper() {
scoped_gil_release scoped;
return non_terminating_fn();
}

RxScala: How to keep the thread doing Observable.interval alive?

I am trying to write a simple RxScala program:
import rx.lang.scala.Observable
import scala.concurrent.duration.DurationInt
import scala.language.{implicitConversions, postfixOps}
object Main {
def main(args: Array[String]): Unit = {
val o = Observable.interval(1 second)
o.subscribe(println(_))
}
}
When I run this program, I do not see anything printed out. I suspect that this is because that thread producing the numbers in Observable.interval dies. I noticed a call to waitFor(o) in the RxScalaDemo, but I can't figure out where that is imported from.
How do I keep this program running for ever printing the number sequence?
Here is one way to block the main thread from exiting:
val o = Observable.interval(1 second)
val latch = new CountDownLatch(1)
o.subscribe(i => {
print(i)
if (i >= 5) latch.countDown()
})
latch.await()
This is a fairly common pattern, use CountDownLatch.await to block the main thread and then countDown the latch when you are done with what you are doing, thus releasing the main thread
You're not seeing anything because your main method exits immediately after you subscribe to the Observable. At that point, your program is done.
A common trick for test programs like this is to read a byte from stdin once you've subscribed.

Resources