Why does scala hang evaluating a by-name parameter in a Future? - multithreading

The below (contrived) code attempts to print a by-name String parameter within a future, and return when the printing is complete.
import scala.concurrent._
import concurrent.ExecutionContext.Implicits.global
import scala.concurrent.duration._
class PrintValueAndWait {
def printIt(param: => String): Unit = {
val printingComplete = future {
println(param); // why does this hang?
}
Await.result(printingComplete, Duration.Inf)
}
}
object Go {
val str = "Rabbits"
new PrintValueAndWait().printIt(str)
}
object RunMe extends App {
Go
}
However, when running RunMe, it simply hangs while trying to evaluate param. Changing printIt to take in its parameter by-value makes the application return as expected. Alternatively, changing printIt to simply print the value and return synchronously (in the same thread) seems to work fine also.
What's happening exactly here? Is this somehow related to the Go object not having been fully constructed yet, and so the str field not being visible yet to the thread attempting to print it? Is hanging the expected behaviour here?
I've tested with Scala 2.10.3 on both Mac OS Mavericks and Windows 7, on Java 1.7.

Your code is deadlocking on the initialization of the Go object. This is a known issue, see e.g. SI-7646 and this SO question
Objects in scala are lazily initialized and a lock is taken during this time to prevent two threads from racing to initialize the object. However, if two threads simultaneously try and initialize an object and one depends on the other to complete, there will be a circular dependency and a deadlock.
In this particular case, the initialization of the Go object can only complete once new PrintValueAndWait().printIt(str) has completed. However, when param is a by name argument, essentially a code block gets passed in which is evaluated when it is used. In this case the str argument in new PrintValueAndWait().printIt(str) is shorthand for Go.str, so when the thread the future runs on tries to evaluate param it is essentially calling Go.str. But since Go hasn't completed initialization yet, it will try to initialize the Go object too. The other thread initializing Go has a lock on its initialization, so the future thread blocks. So the first thread is waiting on the future to complete before it finishes initializing, and the future thread is waiting for the first thread to finish initializing: deadlock.
In the by value case, the string value of str is passed in directly, so the future thread doesn't try to initialize Go and there is no deadlock.
Similarly, if you leave param as by name, but change Go as follows:
object Go {
val str = "Rabbits"
{
val s = str
new PrintValueAndWait().printIt(s)
}
}
it won't deadlock, since the already evaluated local string value s is passed in, instead of Go.str, so the future thread won't try and initialize Go.

Related

Python 3.5 asyncio execute coroutine on event loop from synchronous code in different thread

I am hoping someone can help me here.
I have an object that has the ability to have attributes that return coroutine objects. This works beautifully, however I have a situation where I need to get the results of the coroutine object from synchronous code in a separate thread, while the event loop is currently running. The code I came up with is:
def get_sync(self, key: str, default: typing.Any=None) -> typing.Any:
"""
Get an attribute synchronously and safely.
Note:
This does nothing special if an attribute is synchronous. It only
really has a use for asynchronous attributes. It processes
asynchronous attributes synchronously, blocking everything until
the attribute is processed. This helps when running SQL code that
cannot run asynchronously in coroutines.
Args:
key (str): The Config object's attribute name, as a string.
default (Any): The value to use if the Config object does not have
the given attribute. Defaults to None.
Returns:
Any: The vale of the Config object's attribute, or the default
value if the Config object does not have the given attribute.
"""
ret = self.get(key, default)
if asyncio.iscoroutine(ret):
if loop.is_running():
loop2 = asyncio.new_event_loop()
try:
ret = loop2.run_until_complete(ret)
finally:
loop2.close()
else:
ret = loop.run_until_complete(ret)
return ret
What I am looking for is a safe way to synchronously get the results of a coroutine object in a multithreaded environment. self.get() can return a coroutine object, for attributes I have set to provide them. The issues I have found are: If the event loop is running or not. After searching for a few hours on stack overflow and a few other sites, my (broken) solution is above. If the loop is running, I make a new event loop and run my coroutine in the new event loop. This works, except that the code hangs forever on the ret = loop2.run_until_complete(ret) line.
Right now, I have the following scenarios with results:
results of self.get() is not a coroutine
Returns results. [Good]
results of self.get() is a coroutine & event loop is not running (basically in same thread as the event loop)
Returns results. [Good]
results of self.get() is a coroutine & event loop is running (basically in a different thread than the event loop)
Hangs forever waiting for results. [Bad]
Does anyone know how I can go about fixing the bad result so I can get the value I need? Thanks.
I hope I made some sense here.
I do have a good, and valid reason to be using threads; specifically I am using SQLAlchemy which is not async and I punt the SQLAlchemy code to a ThreadPoolExecutor to handle it safely. However, I need to be able to query these asynchronous attributes from within these threads for the SQLAlchemy code to get certain configuration values safely. And no, I won't switch away from SQLAlchemy to another system just in order to accomplish what I need, so please do not offer alternatives to it. The project is too far along to switch something so fundamental to it.
I tried using asyncio.run_coroutine_threadsafe() and loop.call_soon_threadsafe() and both failed. So far, this has gotten the farthest on making it work, I feel like I am just missing something obvious.
When I get a chance, I will write some code that provides an example of the problem.
Ok, I implemented an example case, and it worked the way I would expect. So it is likely my problem is elsewhere in the code. Leaving this open and will change the question to fit my real problem if I need.
Does anyone have any possible ideas as to why a concurrent.futures.Future from asyncio.run_coroutine_threadsafe() would hang forever rather than return a result?
My example code that does not duplicate my error, unfortunately, is below:
import asyncio
import typing
loop = asyncio.get_event_loop()
class ConfigSimpleAttr:
__slots__ = ('value', '_is_async')
def __init__(
self,
value: typing.Any,
is_async: bool=False
):
self.value = value
self._is_async = is_async
async def _get_async(self):
return self.value
def __get__(self, inst, cls):
if self._is_async and loop.is_running():
return self._get_async()
else:
return self.value
class BaseConfig:
__slots__ = ()
attr1 = ConfigSimpleAttr(10, True)
attr2 = ConfigSimpleAttr(20, True)
def get(self, key: str, default: typing.Any=None) -> typing.Any:
return getattr(self, key, default)
def get_sync(self, key: str, default: typing.Any=None) -> typing.Any:
ret = self.get(key, default)
if asyncio.iscoroutine(ret):
if loop.is_running():
fut = asyncio.run_coroutine_threadsafe(ret, loop)
print(fut, fut.running())
ret = fut.result()
else:
ret = loop.run_until_complete(ret)
return ret
config = BaseConfig()
def example_func():
return config.get_sync('attr1')
async def main():
a1 = await loop.run_in_executor(None, example_func)
a2 = await config.attr2
val = a1 + a2
print('{a1} + {a2} = {val}'.format(a1=a1, a2=a2, val=val))
return val
loop.run_until_complete(main())
This is the stripped down version of exactly what my code is doing, and the example works, even if my actual application doesn't. I am stuck as far as where to look for answers. Suggestions are welcome as to where to try to track down my "stuck forever" problem, even if my code above doesn't actually duplicate the problem.
It is very unlikely that you need to run several event loops at the same time, so this part looks quite wrong:
if loop.is_running():
loop2 = asyncio.new_event_loop()
try:
ret = loop2.run_until_complete(ret)
finally:
loop2.close()
else:
ret = loop.run_until_complete(ret)
Even testing whether the loop is running or not doesn't seem to be the right approach. It's probably better to give explicitly the (only) running loop to get_sync and schedule the coroutine using run_coroutine_threadsafe:
def get_sync(self, key, loop):
ret = self.get(key, default)
if not asyncio.iscoroutine(ret):
return ret
future = asyncio.run_coroutine_threadsafe(ret, loop)
return future.result()
EDIT: Hanging problems can be related to tasks being scheduled in the wrong loop (e.g. forgetting about the optional loop argument when calling a coroutine). This kind of problem should be easier to debug with the PR 303 (now merged): a RuntimeError is raised instead when the loop and the future don't match. So you might want to run your tests with the latest version of asyncio.
Ok, I got my code working, by taking a different approach to it. The problem was tied with using something that had file IO, which I was converting into a coroutine using loop.run_in_executor() on the file IO components. Then, I was trying to use this in a sync function being called from another thread, processed using another loop.run_in_executor() on that function. This is a very important routine in my code (called probably a million times or more during the execution of my short-running code), and I made a decision that my logic was just getting too complicated. So... I uncomplicated it. Now, if I want to use the file IO components asynchronously, I explicitly use my "get_async()" method, otherwise, I use my attribute through normal attribute access.
By removing the complexity of my logic, it made the code cleaner, easier to understand, and even more importantly, it actually works. While I am not 100% certain that I know the root cause of the issue (I believe it has something to do with a thread processing an attribute, which then in turn starts another thread that tries to read the attribute before it is processed, which caused something like a race condition and halting my code, but I could never duplicate the error outside of my application unfortunately to completely prove it out), I was able to get past it and continue with my development efforts.

Bug in Python? threading.Thread.start() does not always return

I have a tiny Python script which (in my eyes) makes threading.Thread.start() behave unexpectedly since it does not return immediately.
Inside a thread I want to call a method from a boost::python based object which will not return immediately.
To do so I wrap the object/method like this:
import threading
import time
import my_boostpython_lib
my_cpp_object = my_boostpython_lib.my_cpp_class()
def some_fn():
# has to be here - otherwise .start() does not return
# time.sleep(1)
my_cpp_object.non_terminating_fn() # blocks
print("%x: 1" % threading.get_ident())
threading.Thread(target=some_fn).start()
print("%x: 2" % threading.get_ident()) # will not always be called!!
And everything works fine as long as I run some code before my_cpp_object.non_terminating_fn(). If I don't, .start() will block the same way as calling .run() directly would.
Printing just a line before calling the boost::python function is not enough, but e.g. printing two lines or calling time.sleep() makes start() return immediately as expected.
Can you explain this behavior? How would I avoid this (apart from calling sleep() before calling a boost::python function)?
This behavior is (as in most cases when you believe in a bug in an interpreter/compiler) not a bug in Python but a race condition covering the behavior you have to expect because of the Python GIL (also discussed here).
As soon as the non-Python function my_cpp_object.non_terminating_fn() has been started the GIL doesn't get released until it returns and keeps the interpreter from executing any other command.
So time.sleep(1) doesn't help here anyway because the code following my_cpp_object.non_terminating_fn() would not be executed until the GIL gets released.
In case of boost::python and of course in case you can modify the C/C++ part you can release the GIL manually as described here.
A small example (from the link above) could look like this (in the boost::python wrapper code)
class scoped_gil_release {
public:
inline scoped_gil_release() {
m_thread_state = PyEval_SaveThread();
}
inline ~scoped_gil_release() {
PyEval_RestoreThread(m_thread_state);
m_thread_state = NULL;
}
private:
PyThreadState * m_thread_state;
};
int non_terminating_fn_wrapper() {
scoped_gil_release scoped;
return non_terminating_fn();
}

RxScala: How to keep the thread doing Observable.interval alive?

I am trying to write a simple RxScala program:
import rx.lang.scala.Observable
import scala.concurrent.duration.DurationInt
import scala.language.{implicitConversions, postfixOps}
object Main {
def main(args: Array[String]): Unit = {
val o = Observable.interval(1 second)
o.subscribe(println(_))
}
}
When I run this program, I do not see anything printed out. I suspect that this is because that thread producing the numbers in Observable.interval dies. I noticed a call to waitFor(o) in the RxScalaDemo, but I can't figure out where that is imported from.
How do I keep this program running for ever printing the number sequence?
Here is one way to block the main thread from exiting:
val o = Observable.interval(1 second)
val latch = new CountDownLatch(1)
o.subscribe(i => {
print(i)
if (i >= 5) latch.countDown()
})
latch.await()
This is a fairly common pattern, use CountDownLatch.await to block the main thread and then countDown the latch when you are done with what you are doing, thus releasing the main thread
You're not seeing anything because your main method exits immediately after you subscribe to the Observable. At that point, your program is done.
A common trick for test programs like this is to read a byte from stdin once you've subscribed.

Val at object level and thread safety in Scala

Stumbled upon following code in an existing codebase I am looking at. there are other similar calls which "set" values to "myService' etc. Confirming that following piece isn't threadsafe given myService is not "local" and two threads entering createUser at the same time and calling "myService.newUser" at the same time will corrupt the subsequent persona.firstName and persona.lastName etc. Is this understanding correct?
object WFService {
lazy private val myService = engine.getMyService
def createUser(persona: Persona): String = {
val user = myService.newUser(persona.id.toString)
persona.firstName.map(n => user.setFirstName(n))
persona.lastName.map(n => user.setLastName(n))
Lazy vals in Scala are thread safe [1]. You don't need to worry about multiple calls from different threads resulting in the RHS being executed twice.
Since you have an object, you only have one instance of WFService too.
[1] http://code-o-matic.blogspot.co.uk/2009/05/double-checked-locking-idiom-sweet-in.html
It is val member so I would assume that there could be only one alignement to it. The second one should return error.
As mentioned before lazy val's in Scala are thread-safe. Please refert to:
Lazy Vals initialization

Object deletes reference to self

Does Python interpreter gracefully handles cases where an object instance deletes the last reference to itself?
Consider the following (admittedly useless) module:
all_instances = []
class A(object):
def __init__(self):
global all_instances
all_instances.append(self)
def delete_me(self):
global all_instances
self.context = "I'm still here"
all_instances.remove(self)
print self.context
and now the usage:
import the_module
a = the_module.A()
the_deletion_func = a.delete_me
del a
the_deletion_func()
This would still print I'm still here, but is there a race condition with Python's garbage collector which is about to collect the object instance?Does the reference to the object's function save the day?Does the interpreter keep references to the object whose code it is currently executing until it finishes?
No, there isn't any such race condition. You are clearing the reference, so the ref count drops to 1 and the object will be cleaned up once you delete the method reference.
The the_deletion_func reference points to a method, which points to the instance (as well as the class), so there is still a reference there.
Currently executing methods have a local variable self, which is a reference to the instance as well, but mostly it's the method wrapper that provides that reference.

Resources