Immutable message in Spring Integration - spring-integration

I was wondering what is the reasoning behind making messages immutable in Spring Integration.
Is it only because of thread-safety in multi threaded evnironments?
Performance? Don't you get a performance penalization when you have to create a new message each time you want to add something to an existing message?
Avoiding a range of bugs when passing by reference?
Just guessing here.

The simplest way to explain this comes from the original Java Immutable Objects idea:
Immutable objects are particularly useful in concurrent applications. Since they cannot change state, they cannot be corrupted by thread interference or observed in an inconsistent state.
Since we talk here about Messaging we should always keep in mind the Loose coupling principle where the producer (caller) and consumer (executor) know nothing about each other and they communicate only via messages (events, commands, packages etc.). At the same time the same message may have several consumers to perform absolutely not related business logics. So, supporting immutable state for the active object we don't impact one process from another. That's might be also as a part of the security between processes when we execute a message in isolation.
The Spring Integration is really pure Java, so any concurrency and security restrictions just simply applied here as well and you would be surprised distributing a message to different independent processes and see modifications from one process in the other.
There is some information in the Reference Manual:
Therefore, when a Message instance is sent to multiple consumers (e.g. through a Publish Subscribe Channel), if one of those consumers needs to send a reply with a different payload type, it will need to create a new Message. As a result, the other consumers are not affected by those changes.
As you see it is applied for Message object per se and its MessageHeaders. The payload is fully your responsibility and I really had in past some problems adding and removing elements to the ArrayList payload in multi-threaded business logic.
Anyway the Framework suggest a compromise: MutableMessage, MutableMessageHeaders and MutableMessageBuilder. You also can globally override the MessageBuilder used in the Framework internally to the MutableMessageBuilderFactory. For this purpose you just need to register such a bean with the bean name IntegrationUtils.INTEGRATION_MESSAGE_BUILDER_FACTORY_BEAN_NAME:
#Bean(name = IntegrationUtils.INTEGRATION_MESSAGE_BUILDER_FACTORY_BEAN_NAME)
public static MessageBuilderFactory mutableMessageBuilderFactory() {
return new MutableMessageBuilderFactory();
}
And all messages in your integration flows will be mutable and supply the same id and timestamp headers.

Related

Is there an inbuilt or canonical way in Rust to ‘publish’ state?

Is there any canonical way in Rust to publish a frequently updating ‘state’ such that any number of consumers can read it without providing access to the object itself?
My use case is that I have a stream of information coming in via web socket and wish to have aggregated metrics available to consume by other threads. One could do this externally with something like Kafka, and I could probably roll my own internal solution but wondering if there is any other method?
An alternative which I’ve used in Go is to have consumers register themselves with the producer and each receive a channel, with the producer simply publishing to each channel separately. There will generally be a low number of consumers so this may well work, but wondering if there’s anything better.
It sounds like you want a "broadcast channel".
If you're using async, the popular tokio crate provides an implementation in their sync::broadcast module:
A multi-producer, multi-consumer broadcast queue. Each sent value is seen by all consumers.
A Sender is used to broadcast values to all connected Receiver values. Sender handles are clone-able, allowing concurrent send and receive actions. [...]
[...]
New Receiver handles are created by calling Sender::subscribe. The returned Receiver will receive values sent after the call to subscribe.
If that doesn't quite suit your fancy, there are other crates that provide similar types that can be found by searching for "broadcast" on crates.io.

What is the name of multithreading design pattern that uses asynchronous requests instead of synchronization with mutexes?

I'm wondering what is an "official" name for the design pattern where you have a single thread that actually handles some resource (database, file, communication interface, network connection, log, ...) and other threads that wish to do something with that resource have to pass a message to this thread and - optionally - wait for a notification about completion?
I've found some articles that refer to this method as "Central Controller", but googling doesn't give much about that particular phrase.
One the other hand this is not exactly a "message pump" or "event queue", because it's not related to GUI or the operating system passing some messages to the application.
It's also not "work queue" or "thread pool", as this single thread is dedicated only to this single activity (managing single resource), not meant to be used to do just about anything that is thrown at it.
For example assume that there's a special communication interface managed by one thread (for example let that be Modbus, but this really doesn't matter). This interface is completely hidden inside an object with it's thread and a message queue. This object has member functions that allow to "read" or "write" data using that communication interface, and these functions can be used by multiple threads without any special synchronization. That's because internally the code of these function converts the arguments to a message/request and passes that via the queue to the handler thread, which just serves these requests one at a time.
This design pattern may be used instead of explicit synchronization with a mutex protecting the shared resource, which would have to be locked/unlocked by each thread that wishes to interact with that resource.
The best pattern that fits here may be the Broker pattern:
The Broker architectural pattern can be used to structure distributed
software systems with decoupled components that interact by remote
service invocations. A broker component Is responsible for
coordinating communication, such as forwarding requests. as well as
for transmitting results and exceptions.
I would simply call it asynchronous IO, or more catchy: non-blocking IO.
As: does it really matter what that "single thread side" is doing in detail? Does it make a difference if you deal "async" with a data base; or some other remote server?
They key attribute is: your code is not waiting for answers; but expecting information to flow in "later".

Service Fabric actors that receive events from other actors

I'm trying to model a news post that contains information about the user that posted it. I believe the best way is to send user summary information along with the message to create a news post, but I'm a little confused how to update that summary information if the underlying user information changes. Right now I have the following NewsPostActor and UserActor
public interface INewsPostActor : IActor
{
Task SetInfoAndCommitAsync(NewsPostSummary summary, UserSummary postedBy);
Task AddCommentAsync(string content, UserSummary, postedBy);
}
public interface IUserActor : IActor, IActorEventPublisher<IUserActorEvents>
{
Task UpdateAsync(UserSummary summary);
}
public interface IUserActorEvents : IActorEvents
{
void UserInfoChanged();
}
Where I'm getting stuck is how to have the INewsPostActor implementation subscribe to events published by IUserActor. I've seen the SubscribeAsync method in the sample code at https://github.com/Azure/servicefabric-samples/blob/master/samples/Actors/VS2015/VoiceMailBoxAdvanced/VoicemailBoxAdvanced.Client/Program.cs#L45 but is it appropriate to use this inside the NewsPostActor implementation? Will that keep an actor alive for any reason?
Additionally, I have the ability to add comments to news posts, so should the NewsPostActor also keep a subscription to each IUserActor for each unique user who comments?
Events may not be what you want to be using for this. From the documentation on events (https://azure.microsoft.com/en-gb/documentation/articles/service-fabric-reliable-actors-events/)
Actor events provide a way to send best effort notifications from the
Actor to the clients. Actor events are designed for Actor-Client
communication and should NOT be used for Actor-to-Actor communication.
Worth considering notifying the relevant actors directly or have an actor/service that will manage this communication.
Service Fabric Actors do not yet support a Publish/Subscribe architecture. (see Azure Feedback topic for current status.)
As already answered by charisk, Actor-Events are also not the way to go because they do not have any delivery guarantees.
This means, the UserActor has to initiate a request when a name changes. I can think of multiple options:
From within IUserAccount.ChangeNameAsync() you can send requests directly to all NewsPostActors (assuming the UserAccount holds a list of his posts). However, this would introduce additional latency since the client has to wait until all posts have been updated.
You can send the requests asynchronously. An easy way to do this would be to set a "NameChanged"-property on your Actor state to true within ChangeNameAsync() and have a Timer that regularly checks this property. If it is true, it sends requests to all NewsPostActors and sets the property to false afterwards. This would be an improvement to the previous version, however it still implies a very strong connection between UserAccounts and NewsPosts.
A more scalable solution would be to introduce the "Message Router"-pattern. You can read more about this pattern in Vaughn Vernon's excellent book "Reactive Messaging Patterns with the Actor Model". This way you can basically setup your own Pub/Sub model by sending a "NameChanged"-Message to your Router. NewsPostActors can - depending on your scalability needs - subscribe to that message either directly or through some indirection (maybe a NewsPostCoordinator). And also depending on your scalability needs, the router can forward the messages either directly or asynchronously (by storing it in a queue first).

synchronisation on a block of code using akka

I want to use akka in my application for multi processing.
So, the same block of code gets executed by each actor and the results will be aggregated by the listener.
so, my question is will there be any synchronisation problem in this case. If not how is it being handled by akka actors internally.
By default there should not be any synchronization problems - if you strictly respect the actor approach. This means that actors should only communicate using messages that contain immutable objects - and that you should never expose internal state of an actor to the outer world directly. Make the internal state mutable/readable solely by reacting to received messages.
Each actor is executed in its own ExecutionContext. This means that each actor has its own private state. Akka actors are designed in a way that accessing this internal state from the "outer world" is basically not possible (or made very hard) since after creating a new Actor, you only have an intermediary reference to an Actor (an ActorRef instance), not an reference to the actual Actor instance in the memory. It is intention of the Akka developers to do it that way: it is made hard for developers to get the the actual reference and access its properties directly - which would break the Actor approach.
If - on the other hand - you pass an shared mutable object to an actor, you will have all the hassle with locks and synchronization as you have for instance with using Threads.

How to design a service that processes messages arriving in a queue

I have a design question for a multi-threaded windows service that processes messages from multiple clients.
The rules are
Each message is to process something for an entity (with a unique id) and can be different i.e DoA, DoB, DoC etc. Entity id is in the payload of the message.
The processing may take some time (up to few seconds).
Messages must be processed in the order they arrive for each entity (with same id).
Messages can however be processed for another entity concurrently (i.e as long as they are not the same entity id)
The no of concurrent processing is configurable (generally 8)
Messages can not be lost. If there is an error in processing a message then that message and all other messages for the same entity must be stored for future processing manually.
The messages arrive in a transactional MSMQ queue.
How would you design the service. I have a working solution but would like to know how others would tackle this.
First thing you do is step back, and think about how critical is performance for this application. Do you really need to proccess messages concurrently? Is it mission critical? Or do you just think that you need it? Have you run a profiler on your service to find the real bottlenecks of the procces and optimized those?
The reason I ask, is be cause you mention you want 8 concurrent procceses - however, if you make this app single threaded, it will greatly reduce the complexity & developement & testing time... And since you only want 8, it almost seems not worth it...
Secondly, since you can only proccess concurrent messages on the same entity - how often will you really get concurrent requests from your client to procces the same entity? Is it worth adding so many layers of complexity for a use case that might not come up very often?
I would KISS. I'd use MSMQ via WCF, and keep my WCF service as a singleton. Now you have the power, ordered reliability of MSMQ and you are now meeting your actual requirements. Then I'd test it at high load with realistic data, and run a profiler to find bottlenecks if i found it was too slow. Only then would I go through all the extra trouble of building a much more complex app to manage concurrency for only specific use cases...
One design to consider is creating a central 'gate keeper' or 'service bus' service who receives all the messages from the clients, and then passes these messages down to the actual worker service(s). When he gets a request, he then finds if another one of his clients are already proccessing a message for the same entity - if so, he sends it to that same service he sent the other message to. This way you can proccess the same messages for a given entity concurrently and nothing more... And you have ease of seamless scalability... However, I would only do this if I absolutely had to and it was proved out via profiling and testing, and not because 'we think we needed it' (see YAGNI principal :))
My approach would be the following:
Create a threadpool with your configurable number of threads.
Keep map of entity ids and associate each id with a queue of messages.
When you receive a message place it in the queue of the corresponding entity id.
Each thread will only look at the entity id dedicated to it (e.g. make a class that is initialized as such Service(EntityID id)).
Let the thread only process messages from the queue of its dedicated entity id.
Once all the messages are processed for the given entity id remove the id from the map and exit the loop of the thread.
If there is room in the threadpool, then add a new thread to deal with the next available entity id.
You'll have to manage the messages that can't be processed at the time, including the situations where the message processing fails. Create a backlog of messages, etc.
If you have access to a concurrent map (a lock-free/wait-free map), then you can have multiple readers and writers to the map without the need of locking or waiting. If you can't get a concurrent map, then all the contingency will be on the map: whenever you add messages to a queue in the map or you add new entity id's you have to lock it. The best thing to do is wrap the map in a structure that offers methods for reading and writing with appropriate locking.
I don't think you will see any significant performance impact from locking, but if you do start seeing one I would suggest that you create your own lock-free hash map: http://www.azulsystems.com/events/javaone_2007/2007_LockFreeHash.pdf
Implementing this system will not be a rudimentary task, so take my comments as a general guideline... it's up to the engineer to implement the ideas that apply.
While my requirements were different from yours, I did have to deal with the concurrent processing from a message queue. My solution was to have a service which would look at each incoming message and hand it off to an agent process to consume. The service has a setting which controls how many agents it can have running.
I would look at having n thread each that read from a single thread-safe queue. I would then hash the EntityId to decide witch queue on put an incomming message on.
Sometimes, some threads will have nothing to do, but is this a problem if you have a few more threads then CPUs?
(Also you may wish to group entites by type into the queues so as to reduce the number of locking conflits in your database.)

Resources