I want to use akka in my application for multi processing.
So, the same block of code gets executed by each actor and the results will be aggregated by the listener.
so, my question is will there be any synchronisation problem in this case. If not how is it being handled by akka actors internally.
By default there should not be any synchronization problems - if you strictly respect the actor approach. This means that actors should only communicate using messages that contain immutable objects - and that you should never expose internal state of an actor to the outer world directly. Make the internal state mutable/readable solely by reacting to received messages.
Each actor is executed in its own ExecutionContext. This means that each actor has its own private state. Akka actors are designed in a way that accessing this internal state from the "outer world" is basically not possible (or made very hard) since after creating a new Actor, you only have an intermediary reference to an Actor (an ActorRef instance), not an reference to the actual Actor instance in the memory. It is intention of the Akka developers to do it that way: it is made hard for developers to get the the actual reference and access its properties directly - which would break the Actor approach.
If - on the other hand - you pass an shared mutable object to an actor, you will have all the hassle with locks and synchronization as you have for instance with using Threads.
Related
I was wondering what is the reasoning behind making messages immutable in Spring Integration.
Is it only because of thread-safety in multi threaded evnironments?
Performance? Don't you get a performance penalization when you have to create a new message each time you want to add something to an existing message?
Avoiding a range of bugs when passing by reference?
Just guessing here.
The simplest way to explain this comes from the original Java Immutable Objects idea:
Immutable objects are particularly useful in concurrent applications. Since they cannot change state, they cannot be corrupted by thread interference or observed in an inconsistent state.
Since we talk here about Messaging we should always keep in mind the Loose coupling principle where the producer (caller) and consumer (executor) know nothing about each other and they communicate only via messages (events, commands, packages etc.). At the same time the same message may have several consumers to perform absolutely not related business logics. So, supporting immutable state for the active object we don't impact one process from another. That's might be also as a part of the security between processes when we execute a message in isolation.
The Spring Integration is really pure Java, so any concurrency and security restrictions just simply applied here as well and you would be surprised distributing a message to different independent processes and see modifications from one process in the other.
There is some information in the Reference Manual:
Therefore, when a Message instance is sent to multiple consumers (e.g. through a Publish Subscribe Channel), if one of those consumers needs to send a reply with a different payload type, it will need to create a new Message. As a result, the other consumers are not affected by those changes.
As you see it is applied for Message object per se and its MessageHeaders. The payload is fully your responsibility and I really had in past some problems adding and removing elements to the ArrayList payload in multi-threaded business logic.
Anyway the Framework suggest a compromise: MutableMessage, MutableMessageHeaders and MutableMessageBuilder. You also can globally override the MessageBuilder used in the Framework internally to the MutableMessageBuilderFactory. For this purpose you just need to register such a bean with the bean name IntegrationUtils.INTEGRATION_MESSAGE_BUILDER_FACTORY_BEAN_NAME:
#Bean(name = IntegrationUtils.INTEGRATION_MESSAGE_BUILDER_FACTORY_BEAN_NAME)
public static MessageBuilderFactory mutableMessageBuilderFactory() {
return new MutableMessageBuilderFactory();
}
And all messages in your integration flows will be mutable and supply the same id and timestamp headers.
In the DDD litterature, the returning domain event pattern is described as a way to manage domain events. Conceptually, the aggregate root keeps a list of domain events, populated when you do some operations on it.
When the operation on the aggregate root is done, the DB transaction is completed, at the application service layer, and then, the application service iterates on the domain events, calling an Event Dispatcher to handle those messages.
My question is concerning the way we should handle transaction at this moment. Should the Event Dispatcher be responsible of managing a new transaction for each event it process? Or should the application service manages the transaction inside the domain event iteration where it calls the domain Event Dispatcher? When the dispatcher uses infrastructure mecanism like RabbitMQ, the question is irrelevent, but when the domain events are handled in-process, it is.
Sub-question related to my question. What is your opinion about using ORM hooks (i.e.: IPostInsertEventListener, IPostDeleteEventListener, IPostUpdateEventListener of NHibernate) to kick in the Domain Events iteration on the aggregate root instead of manually doing it in the application service? Does it add too much coupling? Is it better because it does not require the same code being written at each use case (the domain event looping on the aggregate and potentially the new transaction creation if it is not inside the dispatcher)?
My question is concerning the way we should handle transaction at this moment. Should the Event Dispatcher be responsible of managing a new transaction for each event it process? Or should the application service manages the transaction inside the domain event iteration where it calls the domain Event Dispatcher?
What you are asking here is really a specialized version of this question: should we ever update more than one aggregate in a single transaction?
You can find a lot of assertions that the answer is "no". For instance, Vaughn Vernon (2014)
A properly designed aggregate is one that can be modified in any way required by the business with its invariants completely consistent within a single transaction. And a properly designed bounded context modifies only one aggregate instance per transaction in all cases.
Greg Young tends to go further, pointing out that adhering to this rule allows you to partition your data by aggregate id. In other words, the aggregate boundaries are an explicit expression of how your data can be organized.
So your best bet is to try to arrange your more complicated orchestrations such that each aggregate is updated in its own transaction.
My question is related to the way we handle the transaction of the event sent after the initial aggregate is altered after the initial transaction is completed. The domain event must be handled, and its process could need to alter another aggregate.
Right, so if we're going to alter another aggregate, then there should (per the advice above) be a new transaction for the change to the aggregate. In other words, it's not the routing of the domain event that determines if we need another transaction -- the choice of event handler determines whether or not we need another transaction.
Just because event handling happens in-process doesn't mean the originating application service has to orchestrate all transactions happening as a consequence of the events.
If we take in-process event handling via the Observable pattern for instance, each Observer will be responsible for creating its own transaction if it needs one.
What is your opinion about using ORM hooks (i.e.:
IPostInsertEventListener, IPostDeleteEventListener,
IPostUpdateEventListener of NHibernate) to kick in the Domain Events
iteration on the aggregate root instead of manually doing it in the
application service?
Wouldn't this have to happen during the original DB transaction, effectively turning everything into immediate consistency (if events are handled in process)?
I'm wondering what is an "official" name for the design pattern where you have a single thread that actually handles some resource (database, file, communication interface, network connection, log, ...) and other threads that wish to do something with that resource have to pass a message to this thread and - optionally - wait for a notification about completion?
I've found some articles that refer to this method as "Central Controller", but googling doesn't give much about that particular phrase.
One the other hand this is not exactly a "message pump" or "event queue", because it's not related to GUI or the operating system passing some messages to the application.
It's also not "work queue" or "thread pool", as this single thread is dedicated only to this single activity (managing single resource), not meant to be used to do just about anything that is thrown at it.
For example assume that there's a special communication interface managed by one thread (for example let that be Modbus, but this really doesn't matter). This interface is completely hidden inside an object with it's thread and a message queue. This object has member functions that allow to "read" or "write" data using that communication interface, and these functions can be used by multiple threads without any special synchronization. That's because internally the code of these function converts the arguments to a message/request and passes that via the queue to the handler thread, which just serves these requests one at a time.
This design pattern may be used instead of explicit synchronization with a mutex protecting the shared resource, which would have to be locked/unlocked by each thread that wishes to interact with that resource.
The best pattern that fits here may be the Broker pattern:
The Broker architectural pattern can be used to structure distributed
software systems with decoupled components that interact by remote
service invocations. A broker component Is responsible for
coordinating communication, such as forwarding requests. as well as
for transmitting results and exceptions.
I would simply call it asynchronous IO, or more catchy: non-blocking IO.
As: does it really matter what that "single thread side" is doing in detail? Does it make a difference if you deal "async" with a data base; or some other remote server?
They key attribute is: your code is not waiting for answers; but expecting information to flow in "later".
Spray routing is based on the Akka actor system. In all the sample code I remember, routing is done "fast" and actual work is spawned to other actors, unless it needs to be done synchronically to gain a response.
I would need to validate a POST input that may take some time (100's of milliseconds). Is the HTTP server going to be busy during this time, with regard to other incoming requests (s.a. normal GETs)?
In other words, what's the Spray routing multithreading model, really?
I can spawn the validation to another actor, but in such a case the REST API response will no longer be able to report if there is an error with the incoming contents. What's the optimum way to handle this?
1) The listener parameter of Http.Bind.apply() can be an actor pool. In this case you will have several identical actors to run your route with several HTTP requests simultaneously.
2) Usually you should not do any blocking calls and/or heavy tasks inside an actor, including a Spray routing actor. In general will be better to use another pool of actors for doing such tasks, using ask pattern from your spray route, or create a temporary per-request actor (don't forget to set setReceiveTimeout and handle timeouts in it) whom can send a message to another actor and wait for the answer (and die after answering to the HTTP request) or create a simple Future, include the Spray's request context (ctx) to this Actor or Future and allow them to do such a work in their separate threads, completing the request context with proper HTTP status and entity when all work will be done (but you should avoid passing the Spray request context to any actors as a message or a message part, because it has heavy context including not serializable parts in it).
I have a design question for a multi-threaded windows service that processes messages from multiple clients.
The rules are
Each message is to process something for an entity (with a unique id) and can be different i.e DoA, DoB, DoC etc. Entity id is in the payload of the message.
The processing may take some time (up to few seconds).
Messages must be processed in the order they arrive for each entity (with same id).
Messages can however be processed for another entity concurrently (i.e as long as they are not the same entity id)
The no of concurrent processing is configurable (generally 8)
Messages can not be lost. If there is an error in processing a message then that message and all other messages for the same entity must be stored for future processing manually.
The messages arrive in a transactional MSMQ queue.
How would you design the service. I have a working solution but would like to know how others would tackle this.
First thing you do is step back, and think about how critical is performance for this application. Do you really need to proccess messages concurrently? Is it mission critical? Or do you just think that you need it? Have you run a profiler on your service to find the real bottlenecks of the procces and optimized those?
The reason I ask, is be cause you mention you want 8 concurrent procceses - however, if you make this app single threaded, it will greatly reduce the complexity & developement & testing time... And since you only want 8, it almost seems not worth it...
Secondly, since you can only proccess concurrent messages on the same entity - how often will you really get concurrent requests from your client to procces the same entity? Is it worth adding so many layers of complexity for a use case that might not come up very often?
I would KISS. I'd use MSMQ via WCF, and keep my WCF service as a singleton. Now you have the power, ordered reliability of MSMQ and you are now meeting your actual requirements. Then I'd test it at high load with realistic data, and run a profiler to find bottlenecks if i found it was too slow. Only then would I go through all the extra trouble of building a much more complex app to manage concurrency for only specific use cases...
One design to consider is creating a central 'gate keeper' or 'service bus' service who receives all the messages from the clients, and then passes these messages down to the actual worker service(s). When he gets a request, he then finds if another one of his clients are already proccessing a message for the same entity - if so, he sends it to that same service he sent the other message to. This way you can proccess the same messages for a given entity concurrently and nothing more... And you have ease of seamless scalability... However, I would only do this if I absolutely had to and it was proved out via profiling and testing, and not because 'we think we needed it' (see YAGNI principal :))
My approach would be the following:
Create a threadpool with your configurable number of threads.
Keep map of entity ids and associate each id with a queue of messages.
When you receive a message place it in the queue of the corresponding entity id.
Each thread will only look at the entity id dedicated to it (e.g. make a class that is initialized as such Service(EntityID id)).
Let the thread only process messages from the queue of its dedicated entity id.
Once all the messages are processed for the given entity id remove the id from the map and exit the loop of the thread.
If there is room in the threadpool, then add a new thread to deal with the next available entity id.
You'll have to manage the messages that can't be processed at the time, including the situations where the message processing fails. Create a backlog of messages, etc.
If you have access to a concurrent map (a lock-free/wait-free map), then you can have multiple readers and writers to the map without the need of locking or waiting. If you can't get a concurrent map, then all the contingency will be on the map: whenever you add messages to a queue in the map or you add new entity id's you have to lock it. The best thing to do is wrap the map in a structure that offers methods for reading and writing with appropriate locking.
I don't think you will see any significant performance impact from locking, but if you do start seeing one I would suggest that you create your own lock-free hash map: http://www.azulsystems.com/events/javaone_2007/2007_LockFreeHash.pdf
Implementing this system will not be a rudimentary task, so take my comments as a general guideline... it's up to the engineer to implement the ideas that apply.
While my requirements were different from yours, I did have to deal with the concurrent processing from a message queue. My solution was to have a service which would look at each incoming message and hand it off to an agent process to consume. The service has a setting which controls how many agents it can have running.
I would look at having n thread each that read from a single thread-safe queue. I would then hash the EntityId to decide witch queue on put an incomming message on.
Sometimes, some threads will have nothing to do, but is this a problem if you have a few more threads then CPUs?
(Also you may wish to group entites by type into the queues so as to reduce the number of locking conflits in your database.)