Modifying Cassandra's threadpool queues

Modifying Cassandra's threadpool queues - multithreading

I've been meddling with Cassandra's (v 2.2.4) threadpool executors (namely SEPExecutor.java module) and trying to change the queues used for storing pending reads (that have no immediately available workers to serve). By default, Cassandra uses a ConcurrentLinkedQueue (which is a non-blocking queue variant). I'm currently trying to override this with a MultiQueue setup in order to schedule requests in non-FIFO order.
Lets assume for simplicity that my MultiQueue implementation is an extension of AbstractQueue that simply overrides the offer and poll functions and randomly (de)queues requests to any of the enclosed ConcurrentLinkedQueues. For polling, if one queue returns null, we basically keep going through all the queues until we find a non-null element (otherwise we return null). There's no locking mechanism in place since my intention is to utilize the properties of the enclosed ConcurrentLinkedQueues (which are non-blocking).
The main problem is that it seems I'm running into some sort of race condition, where some of the assigned workers can't poll an item that supposedly exists in the queue. In other words, the MultiQueue structure appears to be non-linearizable. More specifically, I'm encountering a NullPointerException on this line: SEPWorker.java [line 105]
Any clue as to what could be causing this, or how should I go about maintaining the properties of a single ConcurrentLinkedQueue in a MultiQueue setup?

Related

What is the intended usage of Qt threads in conjunction with dependency injection?

Let's have a worker thread which is accessed from a wide variety of objects. This worker object has some public slots, so anyone who connects its signals to the worker's slots can use emit to trigger the worker thread's useful tasks.
This worker thread needs to be almost global, in the sense that several different classes use it, some of them are deep in the hierarchy (child of a child of a child of the main application).
I guess there are two major ways of doing this:
All the methods of the child classes pass their messages upwards the hierarchy via their return values, and let the main (e.g. the GUI) object handle all the emitting.
All those classes which require the services of the worker thread have a pointer to the Worker object (which is a member of the main class), and they all connect() to it in their constructors. Every such class then does the emitting by itself. Basically, dependency injection.
Option 2. seems much more clean and flexible to me, I'm only worried that it will create a huge number of connections. For example, if I have an array of an object which needs the thread, I will have a separate connection for each element of the array.
Is there an "official" way of doing this, as the creators of Qt intended it?

There is no magic silver bullet for this. You'll need to consider many factors, such as:
Why do those objects emit the data in the first place? Is it because they need to do something, that is, emission is a “command”? Then maybe they could call some sort of service to do the job without even worrying about whether it's going to happen in another thread or not. Or is it because they inform about an event? In such case they probably should just emit signals but not connect them. Its up to the using code to decide what to do with events.
How many objects are we talking about? Some performance tests are needed. Maybe it's not even an issue.
If there is an array of objects, what purpose does it serve? Perhaps instead of using a plain array some sort of “container” class is needed? Then the container could handle the emission and connection and objects could just do something like container()->handle(data). Then you'd only have one connection per container.

How Cassandra handle blocking execute statement in datastax java driver

Blocking execute fethod from com.datastax.driver.core.Session
public ResultSet execute(Statement statement);
Comment on this method:
This method blocks until at least some result has been received from
the database. However, for SELECT queries, it does not guarantee that
the result has been received in full. But it does guarantee that some
response has been received from the database, and in particular
guarantee that if the request is invalid, an exception will be thrown
by this method.
Non-blocking execute fethod from com.datastax.driver.core.Session
public ResultSetFuture executeAsync(Statement statement);
This method does not block. It returns as soon as the query has been
passed to the underlying network stack. In particular, returning from
this method does not guarantee that the query is valid or has even
been submitted to a live node. Any exception pertaining to the failure
of the query will be thrown when accessing the {#link
ResultSetFuture}.
I have 02 questions about them, thus it would be great if you can help me to understand them.
Let's say I have 1 million of records and I want all of them to be arrived in the database (without any lost).
Question 1: If I have n number of threads, all threads will have the same amount of records they need to send to the database. All of them continue sending multiple insert queries to cassandra using blocking execute call. If I increase the value of n, will it also helps to speed up the time that I need to insert all records to cassandra?
Will this cause performance problem for cassandra? Does Cassandra have to make sure that for every single insert record, all the nodes in the clusters should know about the new record immediately? In order to maintain the consistency in data. (I assume cassandra node won't even think about using the local machine time for controlling the record insertion time).
Question 2: With non-blocking execute, how can I assure that all of the insertions is successful? The only way I know is waiting for the ResultSetFuture to check the execution of the insert query. Is there any better way I can do ? Is there a higher chance that non-blocking execute is easier to fail then blocking execute?
Thank you very much for your helps.

If I have n number of threads, all threads will have the same amount of records they need to send to the database. All of them continue sending multiple insert queries to cassandra using blocking execute call. If I increase the value of n, will it also helps to speed up the time that I need to insert all records to cassandra?
To some extent. Lets divorce the client implementation details a bit and look at things from the perspective of "Number of concurrent requests", as you don't need to have a thread for each ongoing request if you use executeAsync. In my testing I have found that while there is a lot of value in having a high number of concurrent requests, there is a threshold for which there are diminishing returns or performance starts to degrade. My general rule of thumb is (number of Nodes *native_transport_max_threads (default: 128)* 2), but you may find more optimal results with more or less.
The idea here is that there is not much value in enqueuing more requests than cassandra will handle at a time. While reducing the number of inflight requests, you limit unnecessary congestion on the connections between your driver client and cassandra.
Question 2: With non-blocking execute, how can I assure that all of the insertions is successful? The only way I know is waiting for the ResultSetFuture to check the execution of the insert query. Is there any better way I can do ? Is there a higher chance that non-blocking execute is easier to fail then blocking execute?
Waiting on the ResultSetFuture via get is one route, but if you are developing a fully async application, you want to avoid blocking as much as possible. Using guava, your two best weapons are Futures.addCallback and Futures.transform.
Futures.addCallback allows you to register a FutureCallback that gets executed when the driver has received the response. onSuccess gets executed in the success case, onFailure otherwise.
Futures.transform allows you to effectively map the returned ResultSetFuture into something else. For example if you only want the value of 1 column you could use it to transform ListenableFuture<ResultSet> to a ListenableFuture<String> without having to block in your code on the ResultSetFuture and then getting the String value.
In the context of writing a dataloader program, you could do something like the following:
To keep things simple use a Semaphore or some other construct with a fixed number of permits (that will be your maximum number of inflight requests). Whenever you go to submit a query using executeAsync, acquire a permit. You should really only need 1 thread (but may want to introduce a pool of # cpu cores size that does this) that acquires the permits from the Semaphore and executes queries. It will just block on acquire until there is an available permit.
Use Futures.addCallback for the future returned from executeAsync. The callback should call Sempahore.release() in both onSuccess and onFailure cases. By releasing a permit, this should allow your thread in step 1 to continue and submit the next request.
To further improve throughput, you might want to consider using BatchStatement and submitting requests in batches. This is a good option if you keep your batches small (50-250 is a good number) and if your inserts in a batch all share the same partition key.

Besides the above answer,
Looks like execute() calls executeAsync(statement).getUninterruptibly(), so whether you manage your own "n thread pool" using execute() and block yourself until execution completes up to a max of n running threads OR using executeAsync() on all records, cassandra side performance should be roughly same, depending on execution time/count + timeouts.
They executions will all run connections borrowed from a pool, each execution has a streamId on client side and gets notified you via future when the response comes back for this streamId, limited by total requests per connection on client side and total requests limited by read threads on each node that was picked to execute your request, any higher number will be buffered in a queue (not blocked) limited by the connection maxQueueSize and maxRequestsPerConnection, any higher than this should fail. The beauty of this is that executeAsync() does not run on a new thread per request/execution.
So, there has to be a limit on how many requests can run via execute() or executeAsync(), in execute() you are avoiding beyond these limits.
Performance wise, you will start seeing a penalty beyond what each node can handle so execute() with a good size pool makes sense to me. Even better, use a reactive architecture to avoid creating so many threads that are doing nothing but waiting, so large number of threads will cause wasted context switching on client side. For smaller number of requests, executeAsync() will be better by avoiding thread pools.
DefaultResultSetFuture future = new DefaultResultSetFuture(..., makeRequestMessage(statement, null));
new RequestHandler(this, future, statement).sendRequest();

Akka BalancingPool using a PinnedDispatcher

I have a few database IO operations I would like to run concurrently. In my case it would be best to use a BalancingPool Router.
The docs say if blocking operations are to occur in the workers then one should use a thread-pool-executor rather than the default fork-join-dispatcher.
I did not want to configure it in the Akka conf file so I think this would be the way to do it in code:
val router = context.actorOf(BalancingPool(5).withDispatcher("my-pinned-dispatcher").props(Props[History]), "HistoryBalancingRouter")
But the docs say for a PinnedDispatcher:
This dispatcher dedicates a unique thread for each actor using it;
i.e. each actor will have its own thread pool with only one thread in
the pool.
Mailboxes: Any, creates one per Actor
Whereas for the BalancingPool it says workers share a single mailbox:
The BalancingPool automatically uses a special BalancingDispatcher for
its routees - disregarding any dispatcher that is set on the the
routee Props object. This is needed in order to implement the
balancing semantics via sharing the same mailbox by all the routees.
While it is not possible to change the dispatcher used by the routees,
it is possible to fine tune the used executor. By default the
fork-join-dispatcher is used and can be configured as explained in
Dispatchers. In situations where the routees are expected to perform
blocking operations it may be useful to replace it with a
thread-pool-executor hinting the number of allocated threads
explicitly
So what is actually happening with mailboxes and threads here?
Is my example a sane way to implement a BalancingPool using a thread-pool-executor?

The BalancingPool always works with a BalancingDispatcher, it will simply refuse your PinnedDispatcher setting. The reason is exactly that the BalancingDispatcher allows actors to share a common mailbox (for performance) which other Dispatchers do not support.
Please note that several of the Dispatchers allow you to configure the Executor that they will use inside. By default, for BalancingPool it is the fork-join-executor (the docs incorrectly say "fork-join-dispatcher") but you can change it to use a thread-pool-executor. An example dispatcher configuration is here: http://doc.akka.io/docs/akka/2.3.13/scala/dispatchers.html#Setting_the_dispatcher_for_an_Actor you just need to change the executor type

LMAX Disruptor: Must EventHandler clone object received from EventHandler#onEvent

I have an application with Many Producers and consumers.
From my understanding, RingBuffer creates objects at start of RingBuffer init and you then copy object when you publish in Ring and get them from it in EventHandler.
My application LogHandler buffers received events in a List to send it in Batch mode further once the list has reached a certain size. So EventHandler#onEvent puts the received object in the list , once it has reached the size , it sends it in RMI to a server and clears it.
My question, is do I need to clone the object before I put in list, as I understand, once consumed they can be reused ?
Do I need to synchronize access to the list in my EventHandler#onEvent ?

Yes - your understanding is correct. You copy your values in and out of the ringbuffer slots.
I would suggest that yes you clone the values as you extract it from the ring buffer and into your event handler list; otherwise the slot can be reused.
You should not need to synchronise access to the list as long as it is a private member variable of your Event Handler and you only have one event handler instance per thread. If you have multiple event handlers adding to the same (eg static) List instance then you would need synchronisation.
Clarification:
Be sure to read the background in OzgurH's comments below. If you stick to using the endOfBatch flag on disruptor and use that to decide the size of your batch, you do not have to copy objects out of the list. If you are using your own accumulation strategy (such as size - as per the question), then you should clone objects out as the slot could be reused before you have had the chance to send.
Also worth noting that if you are needing to synchronize on the list instance, then you have missed a big opportunity with disruptor and will destroy your performance anyway.

It is possible to use slots in the Disruptor's RingBuffer (including ones containing a List) without cloning/copying values. This may be a preferable solution for you depending on whether you are worried about garbage creation, and whether you actually need to be concerned about concurrent updates to the objects being placed in the RingBuffer. If all the objects being placed in the slot's list are immutable, or if they are only being updated/read by a single thread at a time (a precondition which the Disruptor is often used to enforce), there will be nothing gained from cloning them as they are already immune to data races.
On the subject of batching, note that the Disruptor framework itself provides a mechanism for taking items from the RingBuffer in batches in your EventHandler threads. This is approach is fully thread-safe and lock-free, and could yield better performance by making your memory access patterns more predictable to the CPU.

Application design for parallel collection processing

I'm experimenting with the System.Collections.Concurrent namespace but I have a problem implementing my design.
My input queue (ConcurrentQueue) is getting populated fine from a Thread which is doing some I/O at startup to read and parse.
Next I kick off a Parallel.ForEach() on the input queue. I'm doing some I/O bound work on each item.
A log item is created for each item processed in the ForEach() and is dropped into a result queue.
What I would like to do is kick off the logging I start reading the input because I may not be able to fit all of the log items in memory. What is the best way to wait for items to land in the result queue? Are there design patterns or examples that I should be looking at?

I think the pattern you're looking for is the producer/consumer pattern. More specifically, you can have a producer/consumer implementation built around TPL and BlockingCollection.
The main concepts you want to read about are:
Task,
BlockingCollection,
TaskFactory.ContinueWhenAll(will allow you to perform some action when a set of tasks/threads is finished running).
Bounding and Blocking in BlockingCollection. This allows you to set a maximum size for your output collection (for memory reasons) and producer thread(s) will wait for consumers to pick up elements in case the maximum size you specify is reached.
BlockingCollection.CompleteAdding and BlockingCollection.IsCompleted which can be used to synchronize producers and consumers (producer can say when it's finished, consumer can check for that and keep running until the producer(s) are finised).
A more complete sample is in the second article I linked.
In your case I think you want the consumer to just pick up things from the result queue and dispose of them as soon as possible (write them to a logging store, or similar).
So your final collection, where you dump log items should be a BlockingCollection, not a ConcurrentQueue.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string