I have a system which may generate certain events in the lifecycle of a transaction. On every even I need to update a row in a DB and also send out a UI event over websocket.
One option I have is to implement the event processing (DB and UI) in actors thus avoiding any locking issues - also I can afford minor delays so handling this sequentially will greatly simplify matters.
What are alternative ways of handling this in Scala as I feel maybe Actors might be overkill in this case?
There are those blogs stating that actors should be used for "concurrency with state" - though I would like to see a more appropriate mechanism in order to eliminate this option.
Ultimately the main unique benefit for using actors is that they are great for encapsulating mutable variables so that you avoid race conditions.
To do what you describe, you could just use classic threads. In your (possibly simplified) description, I don't see the potential for deadlocks. If you want to something a bit more composable, e.g. a sequence of asynchronous tasks, you can use Scala's Futures.
Not at all sure if this is applicable for Scala, but there's a great lib for Groovy and Java with several concurrency models. I've myself used the Dataflow Concurrency with great success and can recommend it as a light-weight yet manageable model.
Dataflow Concurrency offers an alternative concurrency model, which is
inherently safe and robust. It puts an emphasis on the data and their
flow though your processes instead of the actual processes that
manipulate the data. Dataflow algorithms relieve developers from
dealing with live-locks, race-conditions and make dead-locks
deterministic and thus 100% reproducible. If you don’t get dead-locks
in tests you won’t get them in production.
There are other models available in the linked GPars library as well.
I wouldn't suggest making the threading yourself, unless you have no other choices.
Addendum
After posting I got interested in the topic and made a few searches. Seems like Akka has direct support for Dataflow model also. Or at least has had in some version.
Actors avoid locking issues because they use queues to interact. You can use threads with (blocking) queues and get the same level of safety. The only advantage of Actors over Threads is that an Actor does not spend memory for call stack, thus we can have many more actors than threads in the same amount of core memory. The downside of actor model is that a complex algorithm which could be implemented in single thread require several actors and so actor implementation may look obscured.
Related
I was just introduced to concurrent programming, and I learned that locks are one of the simplest synchronization primitive and people almost always get it wrong.
My question is: why aren't they rather using something like promises(dataflow variables), csp, actors, etc all the time? Wouldn't that save us from bugs and deadlocks? Cannot all locks be replaced by that?
Other problems are livelocks, race conditions and lack of parallelism. All concurrency models suffer from a few of these to different degrees.
Locks are just one tool in the toolbox. They have their sweet spot, so do have the other models.
The more you think about concurrency models the more you recognize how similar they really are. For example, actors plus message passing are like one lock per actor and the additional rule that you can only take one lock at a time. Clearly, that guarantees freedom of deadlocks and per-actor data races but it does not provide global correctness.
I'm not strong in multi-threading programming. And I've been working with akka pretty much enough, but nonetheless I still don't understand what makes actors and akka so neat, convenient, safe and so and so forth. I know that they receive messages, an actor can receive only message at a time. But what of it, what makes them thread-safe?
First of all, actors are just a library built on system threads that involves using shared mutable state and they need somehow to deal with it.
So the question is, how do actors work at a very deep level? I'd also appreciate any link about it.
Björn's answer hits the important point: The actor model encapsulates state and any logic that operates on that state in an actor. The only way to change state from the outside is to send the actor a message.
Because only the actor can modify the state, and because it processes messages serially, there's no possibility of concurrent modification. No race conditions.
Ryan Tanner (disclosure: Ryan works at my company) has a great blog post about what makes actors special: http://blog.goconspire.com/post/64274254800/akka-at-conspire-part-2-why-we-like-actors.
You seem to be mixing up the Actor Model with one concrete implementation of it in Akka.
The code inside a single actor is only run on one thread at any given time processing one single message at any given time. If your actors don't share mutable objects between each other and only communicate via immutable messages then the code is free of the kind of races where you inadvertently change the same object/variable from multiple threads concurrently.
How the implementation runs your actors on top of multiple threads should be irrelevant. But you are of course free to look at the Akka source code.
I have a general question about the CQRS paradigm in general.
I understand that a CommandBus and EventBus will decouple the domain model from our Query-side datastore, the merits of eventual consistency, and being able to denormalize the storage on the Query side to optimize reads, etc. That all sounds great.
But I wonder as I begin to expand the number of the components on the Query side responsible for updating the Query datastore, if they wouldn't start to contend with one another to perform their updates?
In other words, if we tried to use a pub/sub model for the EventBus, and there were a lot of different subscribers for a particular event type, couldn't they start to contend with one another over updating various bits of denormalized data? Wouldn't this put us in the same boat as we were before CQRS?
As I've heard it explained, it sounds like CQRS is supposed to do away with this contention all together, but is this just an ideal, and in reality we're only really minimizing it? I feel like I could be missing something here, but can't put my finger on it.
it all depends on how you have designed the infrastructure. Strictly speaking, CQRS in itself doesn't say anything about how the Query models are updated. Using Events is just a one of the options you have. CQRS doesn't say anything about dealing with contention either. It's just an architectural pattern that leaves you with more options and choices to deal with things like concurrency. In "regular" architectures, such as the layered architecture, you often don't have these options at all.
If you have scaled your command processing component out on multiple machines, you can assume that they can produce more events than a single event handling component can handle. That doesn't have to be a bad thing. It may just mean that the Query models will be updated with a slightly bigger delay during peak times. If it is a problem for you, then you should consider scaling out the query models too.
The Event Handler component themselves will not be contending with each other. They can safely process events in parallel. However, if you design the system to make them all update the same data store, your data store could be the bottleneck. Setting up a cluster or dividing the query model over different data sources altogether could be a solution to your problem.
Be careful not to prematurely optimize, though. Don't scale out until you have the figures to prove that it will help in your specific case. CQRS based architectures allow you to make a lot of choices. All you need to do is make the right choice at the right time.
So far, in the application's I am involved with, I haven't come across situations where the Query model was a bottleneck. Some of these applications produce more than 100mln events per day.
When should the Actor Model be used?
It certainly doesn't guarantee deadlock-free environment.
Actor A can wait for a message from B while B waits for A.
Also, if an actor has to make sure its message was processed before moving on to its next task, it will have to send a message and wait for a "your message was processed" message instead of the straightforward blocking.
What's the power of the model?
Given some concurrency problem, what would you look for to decide whether to use actors or not?
First I would look to define the problem... is the primary motivation a speedup of a nested for loop or recursion? If so a simple task based approach or parallel loop approach will likely work well for you (rather than actors).
However if you have a more complex system that involves dependencies and coordinating shared state, then an actor approach can help. Specifically through use of actors and message passing semantics you can often avoid using explicit locks to protect shared state by actually making copies of that state (messages) and reacting to them.
You can do this quite easily with the classic synchronization problems like dining philosophers and the sleeping barbers problem. But you can also use the 'actor' to help with more modern patterns, i.e. your facade could be an actor, your model view and controller could also be actors that communicate with each other.
Another thing that I've observed is that actor semantics are learnable by most developers and 'safer' than their locked counterparts. This is because they raise the abstraction level and allow you to focus on coordinating access to that data rather than protecting all accesses to the data with locks. As an example, imagine that you have a simple class with a data member. If you choose to place a lock in that class to protect access to that data member then any methods on that class will need to ensure that they are accessing that data member under the lock. This becomes particularly problematic when others (or you) modify the class at a later date, they have to remember to use that lock.
On the other hand if that class becomes an actor and the data member becomes a buffer or port you communicate with via messages, you don't have to remember to take the lock because the semantics are built into the buffer and you will very explicitly know whether you are going to block on that based on the type of the buffer.
-Rick
The usage of Actor is "natural" in at least two cases:
When you can decompose your problem in a set of independent tasks.
When you can decompose your problem in a set of tasks linked by a clear workflow (ie. dataflow programming).
For instance, if you process complex data using a series of filters, it is easy to use a pipeline of actors where each actor receives data from an upstream actor and sets data to a downstream actor.
Of course, this data-flow must not be linear and if a step is slow in your pipeline, instead you can use a pool of actors doing the same job. Another way of solving the load balancing problems would be to use instead a demand-driven approach organized with a kind of virtual Kanban system.
Of course, you will need synchronization between actors in almost all interesting cases, but contrary to the classic multi-thread approach, this synchronization is really "concrete". You can imagine guys in a factory, imagine possible problems (workers run out of the job to do, upstream operations is too fast and intermediate products need a huge storage place, etc.) By analogy, you can then find a solution more easily.
I am not an actor expert but here is my 2 cents when to use actor model:
Actor model is not suited for every concurrent application, for instance if you are creating an application which is multi threaded and works in high concurrency actor model is not made to solve the concurrency issue.
Where actors really comes into play is when you are creating an event driven application. For instance you have an application and you are tracking what are users clicking in your application realtime. You can use actors to do activities realtime segregated by user, device or anything of your business requirement as actors are stateful. So, for example if some users lies in actors which clicked on shirts you can send them notification of some coupon.
Also some applications where actors comes handy are : Finance (Pricing, fraud detection), multiplayer gaming.
Actors are asynchronous and concurrent but does not guarantee message order or time limit as to when the message may be acted upon. Hence atomic transactions cannot be split into Actors.
If the application/task involves no mutable state then Actors are overkill as Actor frameworks go to great lengths to avoid race conditions.
it's my first message here and I'm glad to join this community.
It looks like that everything is now going towards multi-thread development. Big fishes say that it won't take longer to reach hundreds of cores.
I've recently read about actor based development and how wonderful message passing is to handle concurrent programming. In addition, I also read that they can be implemented as a means of method call. In this case, a given object is also an actor.
In other words we no longer call methods arbitrarily. They are post in queue for late processing. A queue then ensures that a object's state(var) isn't modified at the same time because messages are all serialized.
I understand that this model is quite straightforward to implement (at least an experimental one) and perhaps that's why is too difficult to find any technical detail.
My question concerns queues. This is a typical case of multiple-producers and one consumer and I suspect they require some sort of synchronization. Is that true? There would be another solution? I heard they can be implemented as lock-free structures.
I'm not really sure about that. Any comment will be greatly appreciated.
Have a nice day pals
Multiple producers and a single consumer is a great scenario for using Actors, and doesn't require any synchronization. In Scala, you generally don't use any mutable state when working with Actors. You just send over a copy of whatever data needs processing.
You can read more about Actors in Scala in "Programming Scala", available online for free.
If I understood correctly, agents received the messages in a MailBox which behaves like a concurrent queue. So you do not have to care about it. If you want to play with mailbox directly, you can have a look at this nice article from the great "The busy Java developer's guide to Scala" series.