Concurrent processing via scala singleton object - multithreading

I'm trying to build a simple orchestration engine in a functional test like the following:
object Engine {
def orchestrate(apiSequence : Seq[Any]) {
val execUnitList = getExecutionUnits(apiSequence) // build a specific list
schedule(execUnitList) // call multiple APIs
}
In the methods called underneath (getExecutionUnits, and schedule), the pattern I've applied is one where I incrementally build a list (hence, not a val but a var), iterate over the list and call sepcific APIs and run some custom validation on each one.
I'm aware that an object in scala is sort of equivalent to a singleton (so there's only one instance of Engine, in my case). I'm wondering if this is an appropriate pattern if I'm expecting 100's of invocations of the orchestrate method concurrently. I'm not managing any other internal variables within the Engine object and I'm simply acting on the provided arguments in the method. Assuming that the schedule method can take up to 10 seconds, I'm worried about the behavior when it comes to concurrent access. If client1, client2 and client3 call this method at the same time, will 2 of the clients get queued up and be blocked my the current client being processed?
Is there a safer idiomatic way to handle the use-case? Do you recommend using actors to wrap up the "orchestrate" method to handle concurrent requests?
Edit: To clarify, it is absolutely essential the the 2 methods (getExecutionUnits and schedule) and called in sequence. Moreover, the schedule method in turn calls multiple APIs (anywhere between 1 to 10) and it is important that they too get executed in sequence. As of right now I have a simply for loop that tackles 1 Api at a time, waits for the response, then moves onto the next one if appropriate.

I'm not managing any other internal variables within the Engine object and I'm simply acting on the provided arguments in the method.
If you are using any vars in Engine at all, this won't work. However, from your description it seems like you don't: you have a local var in getExecutionUnits method and (possibly) a local var in schedule which is initialized with the return value of getExecutionUnits. This case should be fine.
If client1, client2 and client3 call this method at the same time, will 2 of the clients get queued up and be blocked my the current client being processed?
No, if you don't add any synchronization (and if Engine itself has no state, you shouldn't).
Do you recommend using actors to wrap up the "orchestrate" method to handle concurrent requests?
If you wrap it in one actor, then the clients will be blocked waiting while the engine is handling one request.

Related

How to handle simultaneous updates on a variable in Ver.tx?

I am new to Vert.x. I have one scenario in which I need to make a count for all incoming request into a verticle ‒ which is serving as a REST API.
If I just increment the counter for all request, then for simultaneous requests, the value won't be correct ‒ as it will be updating by all requests at same time. It will be same as multiple threads updating a variable simultaneously.
How to handle such scenario in Vert.x?
One solution would be to implement a verticle (and a handler) to do the counting/aggregation. Every time you receive a request, you would publish a message to that address (nothing really) and when the verticle receives it, do the math ‒ just add one. If you need the count value, you would need another handler for that. One thing to keep in mind is that you would need to instantiate only one of these ‒ if you have a cluster the problem complicates a little bit more.
But, why would you do any of that since Vert.x provides something out-of-the-box called Asynchronous counters. This locks though, but that would be one of the easiest ways to accomplish that task in a cluster.

ListQueuesSegmented vs ListQueues

I am using Azure Storage Queue Client to list all the queues that have been created. There are these two methods client.ListQueuesSegmented and client.ListQueues that are in the SDK. Both allow you to query using a prefix. ListQueuesSegmented uses a token which help you to query the next segment. I am trying to understand in what scenarios you would use one over the other.
ListQueuesSegmented returns the results to you in chunks... to iterate over the list of all queues, you make successive calls to ListQueuesSegmented and pass in the QueueContinuationToken from the prior QueueResultSegment return value (or null if this is the first call to ListQueuesSegmented).
ListQueues will return all the queues to you with one call... but that can be very expensive if you have many queues. Prefer the segmented method unless you know you'll only return a small number of queues.
You should also consider using the async version of these methods, to avoid blocking the calling thread while you wait for results to return.
Best of luck!

How should I model 'knowing' when an aggregate is finished being updated to trigger an event?

Aggregate B has calculations that need to be eventually consistent with aggregate A. Aggregate A can be mutated using eight methods and each method results in B needing to be updated. It seems an eventually consistent task, but the actual update time frame should be within seconds.
I don't want to rely on the application layer to 'remember' to trigger the update. (Jimmy Bogard says this as well.) What's the best way to model this?
Using a domain service with double dispatch is a pain:
The service will have to be a parameter on every method on A
Multiple mutation methods will usually be called in a row and I don't want to trigger an update in B each time a method is called.
Constructor injection is also a pain:
There are situations where A is not mutated, so being forced to instantiate and inject a domain service to watch for mutation that certainly won't happen feels wrong.
Again, multiple mutation methods will usually be called in a row and I don't want to trigger an update in B each time a method is called.
Domain events sound good but I'm not sure what that looks like. Each mutation method raises a domain event?
Again, multiple mutation methods will usually be called in a row and I don't want to trigger an update in B each time a method is called.
How do I model 'knowing' when A is finished being updated and knowing whether it has been updated so I can trigger B's update without relying on the application layer to call methods in a particular order each time?
Or is this really a repository-level or application-level concern, even though it seems to be a domain requirement?
Your number 3. is commonly used and a very straight-forward technique:
Raise a domain event AChangedType1, ..., AChangedTypeN on model A updates
Let a saga/process manager listen on AChangedTypeX and issue a corresponding UpdateBTypeX command.
It's loosely coupled (neither A nor B now about each other) and scales well (easy parallelization), and the relation between them is explicitly modeled in the long running process.
If you don't want to trigger an update to B on every change on A, then you can delay the update by some time before you send out the UpdateBTypeX command (as it is commonly done in network protocols, see, e.g., TCP's delayed acks.

React Flux dispatcher vs Node.js EventEmitter - scalable?

When you use Node's EventEmitter, you subscribe to a single event. Your callback is only executed when that specific event is fired up:
eventBus.on('some-event', function(data){
// data is specific to 'some-event'
});
In Flux, you register your store with the dispatcher, then your store gets called when every single event is dispatched. It is the job of the store to filter through every event it gets, and determine if the event is important to the store:
eventBus.register(function(data){
switch(data.type){
case 'some-event':
// now data is specific to 'some-event'
break;
}
});
In this video, the presenter says:
"Stores subscribe to actions. Actually, all stores receive all actions, and that's what keeps it scalable."
Question
Why and how is sending every action to every store [presumably] more scalable than only sending actions to specific stores?
The scalability referred to here is more about scaling the codebase than scaling in terms of how fast the software is. Data in flux systems is easy to trace because every store is registered to every action, and the actions define every app-wide event that can happen in the system. Each store can determine how it needs to update itself in response to each action, without the programmer needing to decide which stores to wire up to which actions, and in most cases, you can change or read the code for a store without needing to worrying about how it affects any other store.
At some point the programmer will need to register the store. The store is very specific to the data it'll receive from the event. How exactly is looking up the data inside the store better than registering for a specific event, and having the store always expect the data it needs/cares about?
The actions in the system represent the things that can happen in a system, along with the relevant data for that event. For example:
A user logged in; comes with user profile
A user added a comment; comes with comment data, item ID it was added to
A user updated a post; comes with the post data
So, you can think about actions as the database of things the stores can know about. Any time an action is dispatched, it's sent to each store. So, at any given time, you only need to think about your data mutations a single store + action at a time.
For instance, when a post is updated, you might have a PostStore that watches for the POST_UPDATED action, and when it sees it, it will update its internal state to store off the new post. This is completely separate from any other store which may also care about the POST_UPDATED event—any other programmer from any other team working on the app can make that decision separately, with the knowledge that they are able to hook into any action in the database of actions that may take place.
Another reason this is useful and scalable in terms of the codebase is inversion of control; each store decides what actions it cares about and how to respond to each action; all the data logic is centralized in that store. This is in contrast to a pattern like MVC, where a controller is explicitly set up to call mutation methods on models, and one or more other controllers may also be calling mutation methods on the same models at the same time (or different times); the data update logic is spread through the system, and understanding the data flow requires understanding each place the model might update.
Finally, another thing to keep in mind is that registering vs. not registering is sort of a matter of semantics; it's trivial to abstract away the fact that the store receives all actions. For example, in Fluxxor, the stores have a method called bindActions that binds specific actions to specific callbacks:
this.bindActions(
"FIRST_ACTION_TYPE", this.handleFirstActionType,
"OTHER_ACTION_TYPE", this.handleOtherActionType
);
Even though the store receives all actions, under the hood it looks up the action type in an internal map and calls the appropriate callback on the store.
Ive been asking myself the same question, and cant see technically how registering adds much, beyond simplification. I will pose my understanding of the system so that hopefully if i am wrong, i can be corrected.
TLDR; EventEmitter and Dispatcher serve similar purposes (pub/sub) but focus their efforts on different features. Specifically, the 'waitFor' functionality (which allows one event handler to ensure that a different one has already been called) is not available with EventEmitter. Dispatcher has focussed its efforts on the 'waitFor' feature.
The final result of the system is to communicate to the stores that an action has happened. Whether the store 'subscribes to all events, then filters' or 'subscribes a specific event' (filtering at the dispatcher). Should not affect the final result. Data is transferred in your application. (handler always only switches on event type and processes, eg. it doesn't want to operate on ALL events)
As you said "At some point the programmer will need to register the store.". It is just a question of fidelity of subscription. I don't think that a change in fidelity has any affect on 'inversion of control' for instance.
The added (killer) feature in facebook's Dispatcher is it's ability to 'waitFor' a different store, to handle the event first. The question is, does this feature require that each store has only one event handler?
Let's look at the process. When you dispatch an action on the Dispatcher, it (omitting some details):
iterates all registered subscribers (to the dispatcher)
calls the registered callback (one per stores)
the callback can call 'waitfor()', and pass a 'dispatchId'. This internally references the callback of registered by a different store. This is executed synchronously, causing the other store to receive the action and be updated first. This requires that the 'waitFor()' is called before your code which handles the action.
The callback called by 'waitFor' switches on action type to execute the correct code.
the callback can now run its code, knowing that its dependancies (other stores) have already been updated.
the callback switches on the action 'type' to execute the correct code.
This seems a very simple way to allow event dependancies.
Basically all callbacks are eventually called, but in a specific order. And then switch to only execute specific code. So, it is as if we only triggered a handler for the 'add-item' event on the each store, in the correct order.
If subscriptions where at a callback level (not 'store' level), would this still be possible? It would mean:
Each store would register multiple callbacks to specific events, keeping reference to their 'dispatchTokens' (same as currently)
Each callback would have its own 'dispatchToken'
The user would still 'waitFor' a specific callback, but be a specific handler for a specific store
The dispatcher would then only need to dispatch to callbacks of a specific action, in the same order
Possibly, the smart people at facebook have figured out that this would actually be less performant to add the complexity of individual callbacks, or possibly it is not a priority.

Creating Dependencies Within An NSOperation

I have a fairly involved download process I want to perform in a background thread. There are some natural dependencies between steps in this process. For example, I need to complete the downloads of both Table A and Table B before setting the relationships between them (I'm using Core Data).
I thought first of putting each dependent step in its own NSOperation, then creating a dependency between the two operations (i.e. download the two tables in one operation, then set the relationship between them in the next, dependent operation). However, each NSOperation requires it's own NSManagedContext, so this is no good. I don't want to save the background context until both tables have been downloaded and their relationships set.
I've therefore concluded this should all occur inside one NSOperation, and that I should use notifications or some other mechanism to call the dependent method when all the conditions for running it have been met.
I'm an iOS beginner, however, so before I venture down this path, I wouldn't mind advice on whether I've reached the right conclusion.
Given your validation requirements, I think it will be easiest inside of one operation, although this could turn into a bit of a hairball as far as code structure goes.
You'll essentially want to make two wire fetches to get the entire dataset you require, then combine the data and parse it at one time into Core Data.
If you're going to use the asynchronous API's this essentially means structuring a class that waits for both operations to complete and then launches another NSOperation or block which does the parse and relationship construction.
Imagine this order of events:
User performs some action (button tap, etc.)
Selector for that action fires two network requests
When both requests have finished (they both notify a common delegate) launch the parse operation
Might look something like this in code:
- (IBAction)someAction:(id)sender {
//fire both network requests
request1.delegate = aDelegate;
request2.delegate = aDelegate;
}
//later, inside the implementation of aDelegate
- (void)requestDidComplete... {
if (request1Finished && request2Finished) {
NSOperation *parse = //init with fetched data
//launch on queue etc.
}
}
There's two major pitfalls that this solution is prone to:
It keeps the entire data set around in memory until both requests are finished
You will have to constantly switch on the specific request that's calling your delegate (for error handling, success, etc.)
Basically, you're implementing operation dependencies on your own, although there might not be a good way around that because of the structure of NSURLConnection.

Resources