Akka: How to ensure that message has been received? - multithreading

I have an actor Dispenser. What it does is it
dispenses some objects by request
listens to arriving new ones
Code follows
class Dispenser extends Actor {
override def receive: Receive = {
case Get =>
context.sender ! getObj()
case x: SomeType =>
addObj(x)
}
}
In real processing it doesn't matter whether 1 ms or even few seconds passed since new object was sent until the dispenser starts to dispense it, so there's no code tracking it.
But now I'm writing test for the dispenser and I want to be sure that firstly it receives new object and only then it receives a Get request.
Here's the test code I came up with:
val dispenser = system.actorOf(Props.create(classOf[Dispenser]))
dispenser ! obj
Thread.sleep(100)
val task = dispenser ? Get()
val result = Await.result(task, timeout)
check(result)
It satisfies one important requirement - it doesn't change original code. But it is
At least 100ms seconds slow even on very high performance boxes
Unstable and fails sometimes because 100 ms or any other constant doesn't provide any guaranties.
And the question is how to make a test that satisfies requirement and doesn't have cons above (neither any other obvious cons)

You can take out the Thread.sleep(..) and your test will be fine. Akka guarantees the ordering you need.
With the code
dispenser ! obj
val task = dispenser ? Get()
dispenser will process obj before Get deterministically because
The same thread puts obj then Get in the actor's mailbox, so they're in the correct order in the actor's mailbox
Actors process messages sequentially and one-at-a-time, so the two messages will be received by the actor and processed in the order they're queued in the mailbox.
(..if there's nothing else going on that's not in your sample code - routers, async processing in getObj or addObj, stashing, ..)

Akka FSM module is really handy for testing underlying state and behavior of the actor and does not require to change its implementation specifically for tests.
By using TestFSMRef one can get actors current state and and data by:
val testActor = TestFSMRef(<actors constructor or Props>)
testActor.stateName shouldBe <state name>
testActor.stateData shouldBe <state data>
http://doc.akka.io/docs/akka/2.4.1/scala/fsm.html

Related

Understanding cats effect `Cancelable `

I am trying to understand how cats effect Cancelable works. I have the following minimal app, based on the documentation
import java.util.concurrent.{Executors, ScheduledExecutorService}
import cats.effect._
import cats.implicits._
import scala.concurrent.duration._
object Main extends IOApp {
def delayedTick(d: FiniteDuration)
(implicit sc: ScheduledExecutorService): IO[Unit] = {
IO.cancelable { cb =>
val r = new Runnable {
def run() =
cb(Right(()))
}
val f = sc.schedule(r, d.length, d.unit)
// Returning the cancellation token needed to cancel
// the scheduling and release resources early
val mayInterruptIfRunning = false
IO(f.cancel(mayInterruptIfRunning)).void
}
}
override def run(args: List[String]): IO[ExitCode] = {
val scheduledExecutorService =
Executors.newSingleThreadScheduledExecutor()
for {
x <- delayedTick(1.second)(scheduledExecutorService)
_ <- IO(println(s"$x"))
} yield ExitCode.Success
}
}
When I run this:
❯ sbt run
[info] Loading global plugins from /Users/ethan/.sbt/1.0/plugins
[info] Loading settings for project stackoverflow-build from plugins.sbt ...
[info] Loading project definition from /Users/ethan/IdeaProjects/stackoverflow/project
[info] Loading settings for project stackoverflow from build.sbt ...
[info] Set current project to cats-effect-tutorial (in build file:/Users/ethan/IdeaProjects/stackoverflow/)
[info] Compiling 1 Scala source to /Users/ethan/IdeaProjects/stackoverflow/target/scala-2.12/classes ...
[info] running (fork) Main
[info] ()
The program just hangs at this point. I have many questions:
Why does the program hang instead of terminating after 1 second?
Why do we set mayInterruptIfRunning = false? Isn't the whole point of cancellation to interrupt a running task?
Is this the recommended way to define the ScheduledExecutorService? I did not see examples in the docs.
This program waits 1 second, and then returns () (then unexpectedly hangs). What if I wanted to return something else? For example, let's say I wanted to return a string, the result of some long-running computation. How would I extract that value from IO.cancelable? The difficulty, it seems, is that IO.cancelable returns the cancelation operation, not the return value of the process to be cancelled.
Pardon the long post but this is my build.sbt:
name := "cats-effect-tutorial"
version := "1.0"
fork := true
scalaVersion := "2.12.8"
libraryDependencies += "org.typelevel" %% "cats-effect" % "1.3.0" withSources() withJavadoc()
scalacOptions ++= Seq(
"-feature",
"-deprecation",
"-unchecked",
"-language:postfixOps",
"-language:higherKinds",
"-Ypartial-unification")
you need shutdown the ScheduledExecutorService, Try this
Resource.make(IO(Executors.newSingleThreadScheduledExecutor))(se => IO(se.shutdown())).use {
se =>
for {
x <- delayedTick(5.second)(se)
_ <- IO(println(s"$x"))
} yield ExitCode.Success
}
I was able to find an answer to these questions although there are still some things that I don't understand.
Why does the program hang instead of terminating after 1 second?
For some reason, Executors.newSingleThreadScheduledExecutor() causes things to hang. To fix the problem, I had to use Executors.newSingleThreadScheduledExecutor(new Thread(_)). It appears that the only difference is that the first version is equivalent to Executors.newSingleThreadScheduledExecutor(Executors.defaultThreadFactory()), although nothing in the docs makes it clear why this is the case.
Why do we set mayInterruptIfRunning = false? Isn't the whole point of cancellation to interrupt a running task?
I have to admit that I do not understand this entirely. Again, the docs were not especially clarifying on this point. Switching the flag to true does not seem to change the behavior at all, at least in the case of Ctrl-c interrupts.
Is this the recommended way to define the ScheduledExecutorService? I did not see examples in the docs.
Clearly not. The way that I came up with was loosely inspired by this snippet from the cats effect source code.
This program waits 1 second, and then returns () (then unexpectedly hangs). What if I wanted to return something else? For example, let's say I wanted to return a string, the result of some long-running computation. How would I extract that value from IO.cancelable? The difficulty, it seems, is that IO.cancelable returns the cancelation operation, not the return value of the process to be cancelled.
The IO.cancellable { ... } block returns IO[A] and the callback cb function has type Either[Throwable, A] => Unit. Logically this suggests that whatever is fed into the cb function is what the IO.cancellable expression will returned (wrapped in IO). So to return the string "hello" instead of (), we rewrite delayedTick:
def delayedTick(d: FiniteDuration)
(implicit sc: ScheduledExecutorService): IO[String] = { // Note IO[String] instead of IO[Unit]
implicit val processRunner: JVMProcessRunner[IO] = new JVMProcessRunner
IO.cancelable[String] { cb => // Note IO.cancelable[String] instead of IO[Unit]
val r = new Runnable {
def run() =
cb(Right("hello")) // Note "hello" instead of ()
}
val f: ScheduledFuture[_] = sc.schedule(r, d.length, d.unit)
IO(f.cancel(true))
}
}
You need explicitly terminate the executor at the end, as it is not managed by Scala or Cats runtime, it wouldn't exit by itself, that's why your App hands up instead of exit immediately.
mayInterruptIfRunning = false gracefully terminates a thread if it is running. You can set it as true to forcely kill it, but it is not recommanded.
You have many way to create a ScheduledExecutorService, it depends on need. For this case it doesn't matter, but the question 1.
You can return anything from the Cancelable IO by call cb(Right("put your stuff here")), the only thing blocks you to retrieve the return A is when your cancellation works. You wouldn't get anything if you stop it before it gets to the point. Try to return IO(f.cancel(mayInterruptIfRunning)).delayBy(FiniteDuration(2, TimeUnit.SECONDS)).void, you will get what you expected. Because 2 seconds > 1 second, your code gets enough time to run before it has been cancelled.

stop all async Task when they fails over threshold?

I'm using Monix Task for async control.
scenario
tasks are executed in parallel
if failure occurs over X times
stop all tasks that are not yet in complete status (as quick as better)
my solution
I come up the ideas that race between 1. result and 2. error counter, and cancel the loser.
Via Task.race if the error-counter get to threshold first, then the tasks would be canceled by Task.race.
experiment
on Ammonite REPL
{
import $ivy.`io.monix::monix:3.1.0`
import monix.eval.Task
import monix.execution.atomic.Atomic
import scala.concurrent.duration._
import monix.execution.Scheduler
//import monix.execution.Scheduler.Implicits.global
implicit val s = Scheduler.fixedPool("race", 2) // pool size
val taskSize = 100
val errCounter = Atomic(0)
val threshold = 3
val tasks = (1 to taskSize).map(_ => Task.sleep(100.millis).map(_ => errCounter.increment()))
val guard = Task(f"stop because too many error: ${errCounter.get()}")
.restartUntil(_ => errCounter.get() >= threshold)
val race = Task
.race(guard, Task.gather(tasks))
.runToFuture
.onComplete { case x => println(x); println(f"completed task: ${errCounter.get()}") }
}
issue
The outcome is depends on thread pool size !?
For pool size 1
the outcome is almost always a task success i.e. no stop.
Success(Right(.........))
completed task: 100 // all task success !
For pool size 2
it is very un-deterministic between success and failure and the cancelling is not accurate.
for example:
Success(Left(stop because too many error: 1))
completed task: 98
the canceling is as late as 98 tasks has completed.
the error count is weird small to threshold.
The default global scheduler get this same outcome behavior.
For pool size 200
it is more deterministic and the stopping is earlier thus more accurate in sense that less task was completed.
Success(Left(stop because too many error: 2))
completed task: 8
the larger of the pool size the better.
If I change Task.gather to Task.sequence execution, all issues disappeared!
What is the cause for this dependency on pool size ?
How to improve it or is there better alternative for stopping tasks once too many error occurs ?
What you're seeing is likely an effect of the monix scheduler and how it aims for fairness. It's a fairly complex topic but the documentation and scaladocs are excellent (see: https://monix.io/docs/3x/execution/scheduler.html#execution-model)
When you have only one thread (or few) it takes a while until the "guard" Task gets another turn to check. With Task.gather you start 100 tasks at once, so the scheduler is very busy and the "guard" cannot check again until the other tasks are already done.
If you have one thread per task the scheduler cannot guarantee fairness and therefore the "guard" unfairly checks much more frequently and can finish sooner.
If you use Task.sequence those 100 tasks are executed sequentially, which is why the "guard" task gets much more opportunities to finish as soon as needed. If you want to keep your code the way it is, you could use Task.gatherN(parallelism = 4) which will limit the parallelism and therefore allow your "guard" to check more often (a middleground between Task.sequence and Task.gather).
It seems a bit like Go code to me (using Task.race like Go's select) and you're also using side-effects unconstrained which further complicates understanding what's going on. I've tried to rewrite your program in a way that's more idiomatic and for complicated concurrency I usually reach for streams like Observable:
import cats.effect.concurrent.Ref
import monix.eval.Task
import monix.execution.Scheduler
import monix.reactive.Observable
import scala.concurrent.duration._
object ErrorThresholdDemo extends App {
//import monix.execution.Scheduler.Implicits.global
implicit val s: Scheduler = Scheduler.fixedPool("race", 2) // pool size
val taskSize = 100
val threshold = 30
val program = for {
errCounter <- Ref[Task].of(0)
tasks = (1 to taskSize).map(n => Task.sleep(100.millis).flatMap(_ => errCounter.update(_ + (n % 2))))
tasksFinishedCount <- Observable
.fromIterable(tasks)
.mapParallelUnordered(parallelism = 4) { task =>
task
}
.takeUntilEval(errCounter.get.restartUntil(_ >= threshold))
.map(_ => 1)
.sumL
errorCount <- errCounter.get
_ <- Task(println(f"completed tasks: $tasksFinishedCount, errors: $errorCount"))
} yield ()
program.runSyncUnsafe()
}
As you can see I no longer use global mutable side-effects but instead Ref which interally also uses Atomic but provides a functional api which we can use with Task.
For demonstration purposes I also changed the threshold to 30 and only every other task will "error". So the expected output is always around completed tasks: 60, errors: 30 no matter the thread-pool size.
I'm still using polling with errCounter.get.restartUntil(_ >= threshold) which might burn a bit too much CPU for my taste but it's close to your original idea and works well.
Usually I don't create a list of tasks up front but instead throw the inputs into the Observable and create the tasks inside of .mapParallelUnordered. This code keeps your list which is why there is no real mapping involved (it already contains tasks).
You can choose your desired parallelism much like with Task.gatherN which is pretty nice imo.
Let me know if anything is still unclear :)

Spark stream data from IBM MQ

I want to stream data from IBM MQ. I have tried out this code I found on Github.
I am able to stream data from the Queue but each time it streams, it takes all the data from it. I just want to take the current data that is pushed into the queue. I looked up on many sites but didn't find the correct solution.
In Kafka we had something like KafkaStreamUtils for streaming the near-real-time data. Is there anything similar to that in IBM MQ so that it streams only the latest data?
The sample in the link you provided shows that it calls the following method to recieve from the the IBM MQ:
CustomMQReciever(String host , int port, String qm, String channel, String qn)
If you review CustomMQReciever here you can see that it is only Browsing the messages from the queue. This means the message will still be on the queue and the next time you connect you will receive the same messages:
MQQueueBrowser browser = (MQQueueBrowser) qSession.createBrowser(queue);
If you wanted to remove the messages from the queue you would need to call a method that does consume them from the queue instead of browsing them from the queue. Below is an example of changes to CustomMQReciever.java that should accomplish what you want:
Under the initConnection() change the above code to the following to cause it to remove the messages from the queue:
MQMessageConsumer consumer = (MQMessageConsumer) qSession.createConsumer(queue);
Get rid of:
enumeration= browser.getEnumeration();
Under receive() change the following:
while (!isStopped() && enumeration.hasMoreElements() )
{
receivedMessage= (JMSMessage) enumeration.nextElement();
String userInput = convertStreamToString(receivedMessage);
//System.out.println("Received data :'" + userInput + "'");
store(userInput);
}
To something like this:
while (!isStopped() && (receivedMessage = consumer.receiveNoWait()) != null))
{
String userInput = convertStreamToString(receivedMessage);
//System.out.println("Received data :'" + userInput + "'");
store(userInput);
}

Torch - Multithreading to load tensors into a queue for training purposes

I would like to use the library threads (or perhaps parallel) for loading/preprocessing data into a queue but I am not entirely sure how it works. In summary;
Load data (tensors), pre-process tensors (this takes time, hence why I am here) and put them in a queue. I would like to have as many threads as possible doing this so that the model is not waiting or not waiting for long.
For the tensor at the top of the queue, extract it and forward it through the model and remove it from the queue.
I don't really understand the example in https://github.com/torch/threads enough. A hint or example as to where I would load data into the queue and train would be great.
EDIT 14/03/2016
In this example "https://github.com/torch/threads/blob/master/test/test-low-level.lua" using a low level thread, does anyone know how I can extract data from these threads into the main thread?
Look at this multi-threaded data provider:
https://github.com/soumith/dcgan.torch/blob/master/data/data.lua
It runs this file in the thread:
https://github.com/soumith/dcgan.torch/blob/master/data/data.lua#L18
by calling it here:
https://github.com/soumith/dcgan.torch/blob/master/data/data.lua#L30-L43
And afterwards, if you want to queue a job into the thread, you provide two functions:
https://github.com/soumith/dcgan.torch/blob/master/data/data.lua#L84
The first one runs inside the thread, and the second one runs in the main thread after the first one completes.
Hopefully that makes it a bit more clear.
If Soumith's examples in the previous answer are not very easy to use, I suggest you build your own pipeline from scratch. I provide here an example of two synchronized threads : one for writing data and one for reading data:
local t = require 'threads'
t.Threads.serialization('threads.sharedserialize')
local tds = require 'tds'
local dict = tds.Hash() -- only local variables work here, and only tables or tds.Hash()
dict[1] = torch.zeros(4)
local m1 = t.Mutex()
local m2 = t.Mutex()
local m1id = m1:id()
local m2id = m2:id()
m1:lock()
local pool = t.Threads(
1,
function(threadIdx)
end
)
pool:addjob(
function()
local t = require 'threads'
local m1 = t.Mutex(m1id)
local m2 = t.Mutex(m2id)
while true do
m2:lock()
dict[1] = torch.randn(4)
m1:unlock()
print ('W ===> ')
print(dict[1])
collectgarbage()
collectgarbage()
end
return __threadid
end,
function(id)
end
)
-- Code executing on master:
local a = 1
while true do
m1:lock()
a = dict[1]
m2:unlock()
print('R --> ')
print(a)
end

In Scala/Playframework, how does "Future" and "Future.map" work under the hood?

The demo codes are from here
object ProxyController extends Controller {
def proxy = Action {
val responseFuture: Future[Response] = WS.url("http://example.com").get()
val resultFuture: Future[Result] = responseFuture.map { resp =>
// Create a Result that uses the http status, body, and content-type
// from the example.com Response
Status(resp.status)(resp.body).as(resp.ahcResponse.getContentType)
}
Async(resultFuture)
}
}
As I understand, the workflow looks like this:
One of the listener threads (threads to process the HTTP request), T1, executes the proxy action, running through the code from top to bottom. When it runs at the WS.url("http://example.com").get(), T1 just delegate the Web Request to another thread (worker thread) W1 and go the the next line. Then T1 skip the contents of the function passed to the map method, since that depends on a non-blocking I/O call that has not yet completed. Once T1 returns the AsyncResult, it moves on to process other requests.
Later on, the worker thread W1 finished the Web Request to "http://example.com", then it sends signal to a listener thread T2, which may or may not be the same as T1. Then T2 starts to execute the responseFuture.map line, and then delegate the task to another worker thread W2. Once the task is delegated to W2, T2 moves on to process other requests.
Later on, the worker thread W2 finished creating result (the responseFuture.map line), then it sends signal to a listener thread T3. And T3 get the result from W2 and send this result to the client in a magic way (It looks magic because I've no idea how T3 knows the original client.. Is it done by closures?)
Is this the real workflow under the hood? If so, will it be too complex and ineffective for the thread communication? If not, what happened under the hood for the codes above? Does anyone have ideas about this?

Resources