idiomatic timeouts for long processes in multi-threaded scala - multithreading

So, I see a few questions on stackoverflow asking in one way or another how to "kill" a future, a la the deprecated Thread.stop(). I see answers explaining why it's impossible, but not an alternative mechanism to solve similar problems.
For example: Practical use of futures? Ie, how to kill them?
I realize a future can't be "killed."
I know how I could do this the Java way: break up the task into smaller sleeps, and have some "volatile boolean isStillRunning" in a thread class which is periodically checked. If I've cancelled the thread by updating this value, the thread exits. This involves "shared state" (the isStillRunning var), and if I were to do the same thing in Scala it wouldn't seem very "functional."
What's the correct way to do solve this sort of problem in idiomatic functional scala? Is there a reasonably concise way to do it? Should I revert to "normal" threading and volatile flags? I should use #volatile in the same way as the Java keyword?

Yes, it looks the same as in Java.
For a test rig, where a test can hang or run too long, I use a promise for failing the test (for any reason). For instance, a timeout monitor can "cancel" the test runner (interrupt the thread and compareAndSet a flag) and then complete the promise with failure. Or test prep can fail a badly configured test early. Or, the test runs and produces a result. At a higher level, the test rig just sees the future and its value.
What is different from Java are your options for composing futures.
val all = Future.traverse(tests)(test => {
val toKeep = promise[Result] // promise to keep, or fail by monitor
val f = for (w <- feed(prepare(test, toKeep))) yield {
monitored(w, toKeep) {
listener.start(w)
w.runTest()
}
}
f completing consume _
// strip context
val g = f map (r => r.copy(context = null))
(toKeep completeWith g).future
})

I think I've found a better solution to my own problem. Instead of using a volatile variable to let an operation know when to die, I can send a higher-priority exit message to the actor. It looks something like this:
val a = new Actor() {
def act():Unit = {
loop{ react {
case "Exit" => exit(); return;
case MyMessage => {
//this check makes "Exit" a high priority message, if "Exit"
//is on the queue, it will be handled before proceeding to
//handle the real message.
receiveWithin(0) {
case "Exit" => exit(); return
case TIMEOUT => //don't do anything.
}
sender ! "hi!" //reply to sender
}
}}
}
}
a.start()
val f = a !! MyMessage
while( ! f.isSet && a.getState != Actor.State.Terminated ) {
//will probably return almost immediately unless the Actor was terminated
//after I checked.
Futures.awaitAll(100,f)
}
if( a.getState != Actor.State.Terminated ) {
f() // the future should evaluate to "hi!"
}
a ! "Exit" //stops the actor from processing anymore messages.
//regardless of if any are still on the queue.
a.getState // terminated
There's probably a cleaner way to write this.. but that's approximately what I did in my application.
The reactWithin(0) is an immediate no-op unless there is an "Exit" message on the queue. The queue'd "Exit" message replaces the volatile boolean I would have put in a threaded Java application.

Related

Scala Iterator for multithreading

I am using scala Iterator for waiting loop in synchronized block:
anObject.synchronized {
if (Try(anObject.foo()).isFailure) {
Iterator.continually {
anObject.wait()
Try(anObject.foo())
}.dropWhile(_.isFailure).next()
}
anObject.notifyAll()
}
Is it acceptable to use Iterator with concurrency and multithreading? If not, why? And then what to use and how?
There are some details, if it matters. anObject is a mutable queue. And there are multiple producers and consumers to the queue. So the block above is a code of such producer or consumer. anObject.foo is a common simplified declaration of function that either enqueue (for producer) or dequeue (for consumer) data to/from the queue.
Iterator is mutable internally, so you have to take that into consideration if you use it in multi-threaded environment. If you guaranteed that you won't end up in situation when e.g.
2 threads check hasNext()
one of them calls next() - it happens to be the last element
the other calls next() - NPE
(or similar) then you should be ok. In your example Iterator doesn't even leave the scope, so the errors shouldn't come from Iterator.
However, in your code I see the issue with having aObject.wait() and aObject.notifyAll() next to each other - if you call .wait then you won't reach .notifyAll which would unblock it. You can check in REPL that this hangs:
# val anObject = new Object { def foo() = throw new Exception }
anObject: {def foo(): Nothing} = ammonite.$sess.cmd21$$anon$1#126ae0ca
# anObject.synchronized {
if (Try(anObject.foo()).isFailure) {
Iterator.continually {
anObject.wait()
Try(anObject.foo())
}.dropWhile(_.isFailure).next()
}
anObject.notifyAll()
}
// wait indefinitelly
I would suggest changing the design to NOT rely on wait and notifyAll. However, from your code it is hard to say what you want to achieve so I cannot tell if this is more like Promise-Future case, monix.Observable, monix.Task or something else.
If your use case is a queue, produces and consumers, then it sound like a use case for reactive streams - e.g. FS2 + Monix, but it could be FS2+IO or something from Akka Streams
val queue: Queue[Task, Item] // depending on use case queue might need to be bounded
// in one part of the application
queue.enqueu1(item) // Task[Unit]
// in other part of the application
queue
.dequeue
.evalMap { item =>
// ...
result: Task[Result]
}
.compile
.drain
This approach would require some change in thinking about designing an application, because you would no longer work on thread directly, but rather designed a flow data and declaring what is sequential and what can be done in parallel, where threads become just an implementation detail.

Repeat asynchronous requests to the server

What I want to do is repeat GET requests to a server asynchronously over and over again so that I can synchronize the local data with the remote one. I'd like to do this by using Futures without involving akka because I just want to understand the basic idea of how to do that at the lower level. No async and await preferably either because they are kind of the high level functions for Futures and Promises, thus I'd like to use Futures and Promises themselves.
So this is my functions:
def sendHttpRequestToServer(): String = { ... }
def send: Unit = {
val f = future { sendHttpRequestToServer() }
f onComplete {
case Success(x) =>
processResult(x) // do something with result "x"
send // delay if needed and send the request again
case onFailure(e) =>
logException(e)
send // send the request again
}
}
That's what I think it might be. How could I change it, is there any mistake in algorithm? Your thoughts.
UPDATE:
As I already know, futures are not designed for recurring tasks, only for one time ones. Therefore, they can't be used here. What do I use then?
Your code has some issues with exception handling. If an exception is thrown in processResult or logException the send will no longer occur, breaking your loop. This exception will also not be logged. A better way is:
f.map(processResult).onFailure(logException)
f.onComplete(x => send())
This way the send still happens despite exceptions in processResult or logException, exceptions from processResult are logged, and the next send can begin intermediately while the results are still being processed. If you want to wait until processing is complete, you could do:
val f2 = f.map(processResult).recover { case e => logException(e) }
f2.onComplete(x => send())

Scala: wake up sleeping thread

In scala, how can I tell a thread: sleep t seconds, or until you receive a message? i.e. sleep at most t seconds, but wake up in case t is not over and you receive a certain message.
The answer depends greatly on what the message is. If you're using Actors (either the old variety or the Akka variety) then you can simply state a timeout value on receive. (React isn't really running until it gets a message, so you can't place a timeout on it.)
// Old style
receiveWithin(1000) {
case msg: Message => // whatever
case TIMEOUT => // Handle timeout
}
// Akka style
context.setTimeoutReceive(1 second)
def receive = {
case msg: Message => // whatever
case ReceiveTimeout => // handle timeout
}
Otherwise, what exactly do you mean by "message"?
One easy way to send a message is to use the Java concurrent classes made for exactly this kind of thing. For example, you can use a java.util.concurrent.SynchronousQueue to hold the message, and the receiver can call the poll method which takes a timeout:
// Common variable
val q = new java.util.concurrent.SynchronousQueue[String]
// Waiting thread
val msg = q.poll(1000)
// Sending thread will also block until receiver is ready to take it
q.offer("salmon", 1000)
An ArrayBlockingQueue is also useful in these situations (if you want the senders to be able to pack messages in a buffer).
Alternatively, you can use condition variables.
val monitor = new AnyRef
var messageReceived: Boolean = false
// The waiting thread...
def waitUntilMessageReceived(timeout: Int): Boolean = {
monitor synchronized {
// The time-out handling here is simplified for the purpose
// of exhibition. The "wait" may wake up spuriously for no
// apparent reason. So in practice, this would be more complicated,
// actually.
while (!messageReceived) monitor.wait(timeout * 1000L)
messageReceived
}
}
// The thread, which sends the message...
def sendMessage: Unit = monitor synchronized {
messageReceived = true
monitor.notifyAll
}
Check out Await. If you have some Awaitable objects then that's what you need.
Instead of making it sleep for a given time, make it only wake up on a Timeout() msg and then you can send this message prematurely if you want it to "wake up".

thread synchronization: making sure function gets called in order

I'm writing a program in which I need to make sure a particular function is called is not being executed in more than one thread at a time.
Here I've written some simplified pseudocode that does exactly what is done in my real program.
mutex _enqueue_mutex;
mutex _action_mutex;
queue _queue;
bool _executing_queue;
// called in multiple threads, possibly simultaneously
do_action() {
_enqueue_mutex.lock()
object o;
_queue.enqueue(o);
_enqueue_mutex.unlock();
execute_queue();
}
execute_queue() {
if (!executing_queue) {
_executing_queue = true;
enqueue_mutex.lock();
bool is_empty = _queue.isEmpty();
_enqueue_mutex.lock();
while (!is_empty) {
_action_mutex.lock();
_enqueue_mutex.lock();
object o = _queue.dequeue();
is_empty = _queue.isEmpty();
_enqueue_mutex.unlock();
// callback is called when "o" is done being used by "do_stuff_to_object_with_callback" also, this function doesn't block, it is executed on its own thread (hence the need for the callback to know when it's done)
do_stuff_to_object_with_callback(o, &some_callback);
}
_executing_queue = false;
}
}
some_callback() {
_action_mutex.unlock();
}
Essentially, the idea is that _action_mutex is locked in the while loop (I should say that lock is assumed to be blocking until it can be locked again), and expected to be unlocked when the completion callback is called (some_callback in the above code).
This, does not seem to be working though. What happens is if the do_action is called more than once at the same time, the program locks up. I think it might be related to the while loop executing more than once simultaneously, but I just cant see how that could be the case. Is there something wrong with my approach? Is there a better approach?
Thanks
A queue that is not specifically designed to be multithreaded (multi-producer multi-consumer) will need to serialize both eneueue and dequeue operations using the same mutex.
(If your queue implementation has a different assumption, please state it in your question.)
The check for _queue.isEmpty() will also need to be protected, if the dequeue operation is prone to the Time of check to time of use problem.
That is, the line
object o = _queue.dequeue();
needs to be surrounded by _enqueue_mutex.lock(); and _enqueue_mutex.unlock(); as well.
You probably only need a single mutex for the queue. Also once you've dequeued the object, you can probably process it outside of the lock. This will prevent calls to do_action() from hanging too long.
mutex moo;
queue qoo;
bool keepRunning = true;
do_action():
{
moo.lock();
qoo.enqueue(something);
moo.unlock(); // really need try-finally to make sure,
// but don't know which language we are using
}
process_queue():
{
while(keepRunning)
{
moo.lock()
if(!qoo.isEmpty)
object o = qoo.dequeue();
moo.unlock(); // again, try finally needed
haveFunWith(o);
sleep(50);
}
}
Then Call process_queue() on it's own thread.

F#: purpose of SwitchToThreadPool just before async return

In the MS docs for Async.SwitchToNewThread one of the examples given is:
let asyncMethod f =
async {
do! Async.SwitchToNewThread()
let result = f()
do! Async.SwitchToThreadPool()
return result
}
What is the purpose of switching to the thread pool immediately before a return statement? I understand why you might want to switch from a dedicated thread to the thread pool when the async block has more work to do but that is not the case here.
This is not part of the main question, but I'm also curious to know why SwitchToNewThread and SwitchToThreadPool return an Async. Is there ever a use case where you would not want to immediately "do!" these tasks? Thank you
The example could be clearer, because it doesn't demonstrate any real scenario.
However, there is a good reason for switching to another thread before return. The reason is that the workflow that calls your function (e.g. asyncMethod) will continue running in the context/thread that you switch to before returning. For example, if you write:
Async.Start (async {
// Starts running on some thread (depends on how it is started - 'Async.Start' uses
// thread pool and 'Async.StartImmediate' uses the current thread
do! asyncMethod (fun () ->
Thread.Sleep(1000) ) // Blocks a newly created thread for 1 sec
// Continues running on the thread pool thread
Thread.Sleep(1000) }) // Blocks thread pool thread
I think the pattern used in the example isn't quite right - asynchronous workflows should always return back to the SynchronizationContext on which they were started (e.g. if a workflow is started on GUI thread, it can switch to a new thread, but should then return back to the GUI thread). If I was writing asyncMethod function, I'd use:
let asyncMethod f = async {
let original = System.Threading.SynchronizationContext.Current
do! Async.SwitchToNewThread()
let result = f()
do! Async.SwitchToContext(original)
return result }
To answer your second question - the reason why SwitchTo operations return Async<unit> and need to be called using do! is that there is no way to switch to a different thread directly. The only points where you get the rest of the workflow as a function (that you can execute on a new thread) is when you use do! or let! The Async<T> type is essentially just some object that gets a function (the rest of the workflow) and can execute it anywhere it wants, but there is no other way to "break" the workflow.

Resources