Akka BalancingPool using a PinnedDispatcher - multithreading

I have a few database IO operations I would like to run concurrently. In my case it would be best to use a BalancingPool Router.
The docs say if blocking operations are to occur in the workers then one should use a thread-pool-executor rather than the default fork-join-dispatcher.
I did not want to configure it in the Akka conf file so I think this would be the way to do it in code:
val router = context.actorOf(BalancingPool(5).withDispatcher("my-pinned-dispatcher").props(Props[History]), "HistoryBalancingRouter")
But the docs say for a PinnedDispatcher:
This dispatcher dedicates a unique thread for each actor using it;
i.e. each actor will have its own thread pool with only one thread in
the pool.
Mailboxes: Any, creates one per Actor
Whereas for the BalancingPool it says workers share a single mailbox:
The BalancingPool automatically uses a special BalancingDispatcher for
its routees - disregarding any dispatcher that is set on the the
routee Props object. This is needed in order to implement the
balancing semantics via sharing the same mailbox by all the routees.
While it is not possible to change the dispatcher used by the routees,
it is possible to fine tune the used executor. By default the
fork-join-dispatcher is used and can be configured as explained in
Dispatchers. In situations where the routees are expected to perform
blocking operations it may be useful to replace it with a
thread-pool-executor hinting the number of allocated threads
explicitly
So what is actually happening with mailboxes and threads here?
Is my example a sane way to implement a BalancingPool using a thread-pool-executor?

The BalancingPool always works with a BalancingDispatcher, it will simply refuse your PinnedDispatcher setting. The reason is exactly that the BalancingDispatcher allows actors to share a common mailbox (for performance) which other Dispatchers do not support.
Please note that several of the Dispatchers allow you to configure the Executor that they will use inside. By default, for BalancingPool it is the fork-join-executor (the docs incorrectly say "fork-join-dispatcher") but you can change it to use a thread-pool-executor. An example dispatcher configuration is here: http://doc.akka.io/docs/akka/2.3.13/scala/dispatchers.html#Setting_the_dispatcher_for_an_Actor you just need to change the executor type

Related

What is the intended usage of Qt threads in conjunction with dependency injection?

Let's have a worker thread which is accessed from a wide variety of objects. This worker object has some public slots, so anyone who connects its signals to the worker's slots can use emit to trigger the worker thread's useful tasks.
This worker thread needs to be almost global, in the sense that several different classes use it, some of them are deep in the hierarchy (child of a child of a child of the main application).
I guess there are two major ways of doing this:
All the methods of the child classes pass their messages upwards the hierarchy via their return values, and let the main (e.g. the GUI) object handle all the emitting.
All those classes which require the services of the worker thread have a pointer to the Worker object (which is a member of the main class), and they all connect() to it in their constructors. Every such class then does the emitting by itself. Basically, dependency injection.
Option 2. seems much more clean and flexible to me, I'm only worried that it will create a huge number of connections. For example, if I have an array of an object which needs the thread, I will have a separate connection for each element of the array.
Is there an "official" way of doing this, as the creators of Qt intended it?
There is no magic silver bullet for this. You'll need to consider many factors, such as:
Why do those objects emit the data in the first place? Is it because they need to do something, that is, emission is a “command”? Then maybe they could call some sort of service to do the job without even worrying about whether it's going to happen in another thread or not. Or is it because they inform about an event? In such case they probably should just emit signals but not connect them. Its up to the using code to decide what to do with events.
How many objects are we talking about? Some performance tests are needed. Maybe it's not even an issue.
If there is an array of objects, what purpose does it serve? Perhaps instead of using a plain array some sort of “container” class is needed? Then the container could handle the emission and connection and objects could just do something like container()->handle(data). Then you'd only have one connection per container.

Modifying Cassandra's threadpool queues

I've been meddling with Cassandra's (v 2.2.4) threadpool executors (namely SEPExecutor.java module) and trying to change the queues used for storing pending reads (that have no immediately available workers to serve). By default, Cassandra uses a ConcurrentLinkedQueue (which is a non-blocking queue variant). I'm currently trying to override this with a MultiQueue setup in order to schedule requests in non-FIFO order.
Lets assume for simplicity that my MultiQueue implementation is an extension of AbstractQueue that simply overrides the offer and poll functions and randomly (de)queues requests to any of the enclosed ConcurrentLinkedQueues. For polling, if one queue returns null, we basically keep going through all the queues until we find a non-null element (otherwise we return null). There's no locking mechanism in place since my intention is to utilize the properties of the enclosed ConcurrentLinkedQueues (which are non-blocking).
The main problem is that it seems I'm running into some sort of race condition, where some of the assigned workers can't poll an item that supposedly exists in the queue. In other words, the MultiQueue structure appears to be non-linearizable. More specifically, I'm encountering a NullPointerException on this line: SEPWorker.java [line 105]
Any clue as to what could be causing this, or how should I go about maintaining the properties of a single ConcurrentLinkedQueue in a MultiQueue setup?

Replacing bad performing workers in pool

I have a set of actors that are somewhat stateless and perform similar tasks.
Each of these workers is unreliable and potentially low performing. In my design- I can easily spawn more actors to replace lazy ones.
The performance of an actor is assessed by itself. Is there a way to make the supervisor/actor pool do this assessment, to help decide which workers are slow enough for me to replace? Or is my current strategy "the" right strategy?
I'm new to akka myself, so only trying to help, but my attack would be something along the following lines:
Write your own routing logic, something along the following lines https://github.com/akka/akka/blob/v2.3.5/akka-actor/src/main/scala/akka/routing/SmallestMailbox.scala Keep in mind that a new instance is created for every pool, so each instance can store information about how many messages have been processed by each actor so far. In this instance, once you find an actor underperforming, mark it as 'removable' (once it is no longer processing any new messages) in a separate data structure and stop sending further messages.
Write your own router pool: override createRouterActor https://github.com/akka/akka/blob/v2.3.5/akka-actor/src/main/scala/akka/routing/RouterConfig.scala:236 to provide your own CustomRouterPoolActor
Write your CustomRouterPoolActor along the following lines: https://github.com/akka/akka/blob/8485cd2ebb46d2fba851c41c03e34436e498c005/akka-actor/src/main/scala/akka/routing/Resizer.scala (See ResizablePoolActor). This actor will have access to your strategy instance. From this strategy instance- remove the routees already marked for removal. Look at ResizablePoolCell to see how to remove actors.
Question is - why some of your workers perform badly? Is there anything difference between them (I assume not). If not, that maybe some payloads simply require more work the the others - what's the point of terminating them then?
Once we had similar problem - and used SmallestMailboxRoutingLogic. It basically try to distribute the workload based on mailbox sizes.
Anyway, I would rather try to answer the question - why some of the workers are unstable and perform poorly - because this looks like a biggest problem you are just trying to cover elsewhere.

Lucene NIOFSDirectory and SimpleFSDirectory with multiple threads

My basic question is: what's the proper way to create/use instances of NIOFSDirectory and SimpleFSDirectory when there's multiple threads that need to make queries (reads) on the same index. More to the point: should an instance of the XXXFSDirectory be created for each thread that needs to do a query and retrieve some results (and then in the same thread have it closed immediatelly after), or should I make a "global" (singleton?) instance which is passed to all threads and then they all use it at the same time (and it's no longer up to each thread to close it when it's done with a query)?
Here's more details:
I've read the docs on both NIOFSDirectory and SimpleFSDirectory and what I got is:
they both support multithreading:
NIOFSDirectory : "An FSDirectory implementation that uses java.nio's FileChannel's positional read, which allows multiple threads to read from the same file without synchronizing."
SimpleFSDirectory : "A straightforward implementation of FSDirectory using java.io.RandomAccessFile. However, this class has poor concurrent performance (multiple threads will bottleneck) as it synchronizes when multiple threads read from the same file. It's usually better to use NIOFSDirectory or MMapDirectory instead."
NIOFSDirectory is better suited (basically, faster) than SimpleFSDirectory in a multi threaded context (see above)
NIOFSDIrectory does not work well on Windows. On Windows SimpleFSDirectory is recomended. However on *nix OS NIOFSDIrectory works fine, and due to better performance when multi threading, it's recommended over SimpleFSDirectory.
"NOTE: NIOFSDirectory is not recommended on Windows because of a bug in how FileChannel.read is implemented in Sun's JRE. Inside of the implementation the position is apparently synchronized."
The reason I'm asking this is that I've seen some actual projects, where the target OS is Linux, NIOFSDirectory is used to read from the index, but an instance of it is created for each request (from each thread), and once the query is done and the results returned, the thread closes that instance (only to create a new one at the next request, etc). So I was wondering if this is really a better approach than to simply have a single NIOFSDirectory instance shared by all threads, and simply have it opened when the application starts, and closed much later on when a certain (multi threaded) job is finished...
More to the point, for a web application, isn't it better to have something like a context listener which creates an instance of NIOFSDirectory , places it in to the Application Context, all Servlets share and use it, and then the same context listener closes it when the app shuts down?
Official Lucene FAQ suggests the following:
Share a single IndexSearcher across queries and across threads in your
application.
IndexSearcher requires single IndexReader and the latter can be produced with a DirectoryReader.open(Directory) which would only require a single instance of Directory.

Scala : Akka - multiple event buses for actorsystem or having prioritized events?

I have single ActorSystem, which has several subscribers to it's eventStream. Application may produce thousands of messages per second, and some of the messages are more important than the rest of. So they should be handled before all.
I found that every ActorSystem has single eventStream attached, thus it seems that I need to register same actor class with two (or more) ActorSystems, in order to receive important messages in dedicated eventStream.
Is this preferred approach, or there are some tricks for this task? May be classifiers can also tweak message priorities somehow?
EventStream is not a datastructure that holds events, it just routes events to subscribers, hence you should use PriorityMailbox for the listener actors, see the documentation for how to use priority mailboxes: http://doc.akka.io/docs/akka/2.0.3/scala/dispatchers.html#Mailboxes

Resources