I am currently working with an inherited piece of code that uses a very interesting design pattern.
The code is split into a number of objects. I am not sure if the term object is applicable since it is a C code, but it is the best analogy. Each object has object-specific data, a thread, and a message queue. All objects primarily communicate by placing pre-defined messages onto each-other's queues. The main idea seems to be is that each object's data is only accessed by one thread. After doing some research I discovered that a few industrial automation applications are written this way (namely the ProfiNET stack and some EIP implementations).
Do you know if this pattern has a name or if it is describes somewhere in the literature? The "Pattern-Oriented Software Architecture" book by Schidt, Stal, et al does not mention it.
Thank you very much.
This sounds somewhat related to the Actor model.
It might be me but is any other pattern except producer consumer combined with mutual exclusion used in what you described ?
Check out Communicating Sequential Processes (CSP)
CSP allows the description of systems in terms of component processes
that operate independently, and interact with each other solely
through message-passing communication
It is actually one of the core design concepts that the Go language is based upon for communicating between goroutines (concurrency).
Related
I'm learning about microservices.
On one hand, the literature recommends using asynchronous event-publishing for microservices that need to collaborate on sagas or take action on events published by other services.
On the other hand, the same literature recommends not using a shared library to define common events because that couples the microservices through that event library.
Am I taking crazy pills? Aren't those microservices coupled by those events anyway if they rely on them? If so, what is the advantage of coding the exact same events with the same definition in two (or even more) different places? Isn't that a total violation of the DRY principle?
I'm starting to smell a code smell that starts with the initials BS. Will someone help me drink the rest of this koolaid? Or did I just see the emperor with his clothes off for a second?
If so, what is the advantage of coding the exact same events with the same definition in two (or even more) different places?
There could be a number of advantages -- the microservices might be implemented using different languages. Or using the same language, but different in memory representations of the data to suit there specific needs. Or even the "same" in memory representations, but different versions, because they are on different deployment schedules.
There's nothing inherently wrong with sharing the labor of preparing a messaging library among the implementations of your services. But that should be an opt-in, rather than being a requirement. In particular, a team always has the option of replacing the library if the shared implementation is getting in the way.
Two services that agree that the messages are going to use UTF-8 encoded JSON documents should not be required to use the same parser -- the choice of parser is an implementation detail. The coupling is to the schema (the agreement about the semantics of the bytes in the message), not to the implementation.
If you treat events as plain data objects, you don't need a library to deal with them - other than generic messagning and serialization/deserialization code.
The whole point of microservices is to have independent development cycles, so as soon as you introduce the common library, you are starting to make a "distributed monolyth". Any change in this library will cause a redeployment of all microservices.
Without event-specific library the only dependency you introduce it a knowledge of particular event structure from another microservice. Well, this is a necessary evil.
A common multi-threaded implementation is to have some class where Method_A() is running in a thread and sits blocked waiting for some signal/event member variable (e.g. WaitForSingleObject).
Interacting classes running in different thread will then call Method_B() which does some work, sets the signal/event variable, perhaps does some more work, then returns.
How do I represent this interaction on a Sequence Diagram?
Should I have two lifelines, one for each thread, even though they are operating on the same instance of the class? My modelling tool (Enterprise Architect 12) doesn't allow the same class to appear twice on a Sequence Diagram, so seems to discourage this.
Edit: Geert has noted that the Sequence Diagram should use instances, not classes, which is a fair comment. However the problem is the same: multiple lifelines would imply multiple instances, but in the question Method_A() and Method_B() are operating on the same instance, just from different threads. How can that be represented?
The approach I have decided to take is to add two lifelines for the same instance, then label one lifeline with the <<thread>> stereotype and add the thread it runs in to the name:
I realise this is probably not standard UML, but it seems to get across all the information I want to express in a clear manner, which is the most important thing, right?
Martin Fowler does mention a few times in his book that sometimes a non-normative diagram is actually clearer. So that's my excuse. :)
(Edit You can solve it by just using asynchronous messages as #sim points out. That will just do. The answer below is showing what is going on under the hood. So if you don't care about the details, go with that answer.)
You are asking more a design than an UML question. Namely, how do concurrent instances talk to each other. You said first
Method_A() is running in a thread and sits blocked waiting
which simply means that it can not accept anything since it is blocked. Now, guessing from the context of your question, I assume that you still want to communicate with that instance since
in different thread will then call Method_B()
So, in order to be able to accept a message the instance must be in an appropriate state. There are a couple of ways to achieve that. One simple is, if the according OS has support for that, to return to the scheduler and tell him that it's waiting for some message.
Now when method_b is being called you know inside Object1 that you are in some kind of idle state inside method_a and do appropriate (return-) action.
Another way would be to poll the scheduler for incoming messages and handle them.
You need to keep in mind that sending a message usually not directly deals with the instance but tells the system scheduler to interact with the appropriate instance (at least in most OSs).
I just remember from the Modula2 compiler I once wrote that it has a concept of coroutines which allows a concurrent thread to run within the compiled code. But basically that is just mapped to two independent threads running under the hood of a semi-single one and you'd depict that with two life-lines when going into detail.
N.B.: Rather than method it should be operation (since that is was is invoked by a message; while the method is what is implemented inside the operation). And as per common convention they should start with a lower case char.
And also: do NOT use classes in a SD. Unfortunately EA still allows that (why? Ask them!). Somewhere hidden in their docs there is a sentence that you must use instances. Else the model will break. A SD is always (!) a sample sequence of instances talking to each other. Classes do not talk, they are just blueprints for the instances.
You should never use classes in sequence diagrams, but instead use instances/lifelines that have your class as classifier.
If you hold the control down when dragging a class to a sequence diagram you can choose to drop is as instance instead of as class.
This way you can add as many as you want for the same class.
The notation you are looking for is an asynchronous message. You could theoretically express this using a single lifeline. But this wouldn't be readable. So a possibility would be having two instances of a threadclass in your class and show the interaction between the instances. But never show classes in a sequence diagram.
But why are you using a sequence diagram at all? For such internal behavour an activity diagram is most likely more appropriate. There you can use send and receive messages elements to express such a behavour per thread. Or if it shall be shown in one diagram, you can use fork.
When should the Actor Model be used?
It certainly doesn't guarantee deadlock-free environment.
Actor A can wait for a message from B while B waits for A.
Also, if an actor has to make sure its message was processed before moving on to its next task, it will have to send a message and wait for a "your message was processed" message instead of the straightforward blocking.
What's the power of the model?
Given some concurrency problem, what would you look for to decide whether to use actors or not?
First I would look to define the problem... is the primary motivation a speedup of a nested for loop or recursion? If so a simple task based approach or parallel loop approach will likely work well for you (rather than actors).
However if you have a more complex system that involves dependencies and coordinating shared state, then an actor approach can help. Specifically through use of actors and message passing semantics you can often avoid using explicit locks to protect shared state by actually making copies of that state (messages) and reacting to them.
You can do this quite easily with the classic synchronization problems like dining philosophers and the sleeping barbers problem. But you can also use the 'actor' to help with more modern patterns, i.e. your facade could be an actor, your model view and controller could also be actors that communicate with each other.
Another thing that I've observed is that actor semantics are learnable by most developers and 'safer' than their locked counterparts. This is because they raise the abstraction level and allow you to focus on coordinating access to that data rather than protecting all accesses to the data with locks. As an example, imagine that you have a simple class with a data member. If you choose to place a lock in that class to protect access to that data member then any methods on that class will need to ensure that they are accessing that data member under the lock. This becomes particularly problematic when others (or you) modify the class at a later date, they have to remember to use that lock.
On the other hand if that class becomes an actor and the data member becomes a buffer or port you communicate with via messages, you don't have to remember to take the lock because the semantics are built into the buffer and you will very explicitly know whether you are going to block on that based on the type of the buffer.
-Rick
The usage of Actor is "natural" in at least two cases:
When you can decompose your problem in a set of independent tasks.
When you can decompose your problem in a set of tasks linked by a clear workflow (ie. dataflow programming).
For instance, if you process complex data using a series of filters, it is easy to use a pipeline of actors where each actor receives data from an upstream actor and sets data to a downstream actor.
Of course, this data-flow must not be linear and if a step is slow in your pipeline, instead you can use a pool of actors doing the same job. Another way of solving the load balancing problems would be to use instead a demand-driven approach organized with a kind of virtual Kanban system.
Of course, you will need synchronization between actors in almost all interesting cases, but contrary to the classic multi-thread approach, this synchronization is really "concrete". You can imagine guys in a factory, imagine possible problems (workers run out of the job to do, upstream operations is too fast and intermediate products need a huge storage place, etc.) By analogy, you can then find a solution more easily.
I am not an actor expert but here is my 2 cents when to use actor model:
Actor model is not suited for every concurrent application, for instance if you are creating an application which is multi threaded and works in high concurrency actor model is not made to solve the concurrency issue.
Where actors really comes into play is when you are creating an event driven application. For instance you have an application and you are tracking what are users clicking in your application realtime. You can use actors to do activities realtime segregated by user, device or anything of your business requirement as actors are stateful. So, for example if some users lies in actors which clicked on shirts you can send them notification of some coupon.
Also some applications where actors comes handy are : Finance (Pricing, fraud detection), multiplayer gaming.
Actors are asynchronous and concurrent but does not guarantee message order or time limit as to when the message may be acted upon. Hence atomic transactions cannot be split into Actors.
If the application/task involves no mutable state then Actors are overkill as Actor frameworks go to great lengths to avoid race conditions.
it's my first message here and I'm glad to join this community.
It looks like that everything is now going towards multi-thread development. Big fishes say that it won't take longer to reach hundreds of cores.
I've recently read about actor based development and how wonderful message passing is to handle concurrent programming. In addition, I also read that they can be implemented as a means of method call. In this case, a given object is also an actor.
In other words we no longer call methods arbitrarily. They are post in queue for late processing. A queue then ensures that a object's state(var) isn't modified at the same time because messages are all serialized.
I understand that this model is quite straightforward to implement (at least an experimental one) and perhaps that's why is too difficult to find any technical detail.
My question concerns queues. This is a typical case of multiple-producers and one consumer and I suspect they require some sort of synchronization. Is that true? There would be another solution? I heard they can be implemented as lock-free structures.
I'm not really sure about that. Any comment will be greatly appreciated.
Have a nice day pals
Multiple producers and a single consumer is a great scenario for using Actors, and doesn't require any synchronization. In Scala, you generally don't use any mutable state when working with Actors. You just send over a copy of whatever data needs processing.
You can read more about Actors in Scala in "Programming Scala", available online for free.
If I understood correctly, agents received the messages in a MailBox which behaves like a concurrent queue. So you do not have to care about it. If you want to play with mailbox directly, you can have a look at this nice article from the great "The busy Java developer's guide to Scala" series.
I'm fond of using UML diagrams to describe my software. In the majority of cases the diagrams are for my own use and I use them for more involved pieces of code, interactions etc. where I'll benefit from being able to look back over them in the future.
One thing I've found myself doing a few different ways is diagramming threads. Threads by their nature tend to pop up in the more involved pieces of code and keeping track of them is often a primary purpose of my design documents.
In the past I've used a symbol in a sequence diagram to show the creation of a new thread but looking back at some diagrams doing that it's sometimes ambiguous between an object's lifetime - which sequence diagrams are for - and a thread's lifetime. Is there a better approach for incorporating threads into UML?
I managed to produce a diagram that makes sense to me at the time of drawing it. The basic premise is that I've overlaid grey boxes representing class instances with blue boxes representing thread lifetimes. The main thing it lets me keep track of is knowing which thread I will be executing on when I call certain methods.
No doubt there's better and more intuitive ways to do thread and class modeling. The measure of success for me is whether my own diagram still gives me the same level of understanding 6 months down the track.
Activity, Sequence, and State Diagrams are all correct ways of showing thread behavior.
1st: (To vs's comments) There are two sets of diagrams or modeling elements in UML, static structure, as you put it, and behavioral. Any book will help you understand the split, typically in the contents/TOC, additionally it can be seen on page 11 of Martin Fowler's UML Distilled a near defacto standard for beginning UML in my opinion.
2nd: (To sipwiz's question and comment) Activity diagrams are not commonly understood to model business process, they can be used for that however, and most examples or simple tutorial would approach it from a business standpoint.
Discussion on your options to model threads:
Activity diagrams - Allows for forking and specifying concurrency by using a BAR and usage lines. Note the example at the bottom is no a business process, example. Most people can read these, business, management, and developers, though sometimes they can lack detail or get messy.
Sequence Interaction diagrams - In the same post, example, you will see sequence diagrams allow you to specify parallel behavior within a sequence by boxing parallelizable behavior with a label "par", this is useful to show the reader what methods can or should be called in parallel, ie, by different threads. This is the method I would use for detailed developer like discussions around building an object.
State diagram - The state chart just like the activity allows for concurrency by using a BAR and usage lines.
NOTE: These will not model a specific thread and it's exact lift cycle, as that is part of the instance/run-time level of modeling, if this what you want clarify your question and I will respond. I would just model it using one of the above as no one other than a MDA/UML expert will call you out, and you are not generating a running system.
Also: Please note that further details can be found in most UML books.
Also leveraged: http://www.jguru.com/faq/view.jsp?EID=56322
Traditionally threading has been depicted diagramatically using Petri Nets. Rob Martin has an article on multithreading in UML which you may find useful.
Update- just remembered you can represent threads with forks in activity diagrams- I've managed to find something that explains this.
It is very hard to find any free tutorials for Petri Nets, however I know Petri Nets are good for modeling concurrency, so I Google'd "producer-consumer Petri Nets" (my favourite threading thing) and found this.
I've also found some slides that show Petri Nets modeling a Semaphore.
UML activity diagrams have fork and join elements to show parallel flow of logic.
I don't know of a way, but using a sequence diagram does not seem entirely inappropriate, considering that a thread is in many languages implemented as a Thread (or similar) class.
The most UML-compatible way would probably be to add an annotation of some sort indicating that the 'object' represents a thread.
The UML is defined by the UML Superstructure, you can find it here http://www.omg.org/spec/UML.
If you read the specification you find that a UML class can be active. An Active Class is a class with the meta-attribute isActive set to true. It is also depicted differently.
An object instances of an active class automatically executes a "classifier behavior". As for any behavior you can define it by means of an activity in which you wait for asynchronous signals (AcceptEventActions) and invokes methods (CallOperationAction) or other behaviors (CallBehaviorActions). That is how active objects are modeled in UML. You just have to read the UML specification.
Activity diagrams will model the internal workings of your software with forks and joins to represent threads. To find out exactly how to model this properly, please see Conrad Bock's excellent series of articles. Here is the article that covers forks and joins, but you should follow the links back to the first article in the series to learn how to properly model using "Colored Petri Nets". It's not how you think (and it's pretty easy)!
There is a new, in-process standard at the OMG for a language called Alf that provides a more convenient surface notation for activity diagrams and is intended for representing code. From the spec:
A primary goal of an action language is to act as the surface notation for specifying executable
behaviors within a wider model that is primarily represented using the usual graphical notations of
UML. For example, this might include methods on the operations of classes or transition effect
behaviors on state machines.
For a programmer, you probably can't get more intuitive than Alf. And it will convert perfectly into UML activity diagrams.
UML strongest point is depicting the static structure. If you use short-lived threads, I also don't see any easy way of diagramming them. Maybe you can find a solution by turning things around a bit: why do you use/need threads? What's the functionality they provide? If they interact with each other and follow some (message passing) API, drawing them as components might make sense.