Differences between DFD (Data Flow Diagram) and activity diagram

Differences between DFD (Data Flow Diagram) and activity diagram - uml

I need to know this differences in order to undestand how to use them right.
Which are the differences of DFD and Activity diagram?

Actually, it's reasonably logical. You only have to look at the names.
In data flow diagrams, the lines between "boxes" represent data that flows between components of a system. Because these only show the flow of data, they do not give an indication of sequencing.
In activity diagrams, those lines are simply transitions between activities and do not represent data flow at all. They more represent the sequencing of activities and decisions. You can tell from these what order things happen in.
That's a simplistic explanation but should be a good starting point. Further information can be garnered from Wikipedia for DFDs and activity diagrams.

Explicit bias: I'm a DFDs proponent.
#John is correct that Activity diagrams can be used to represent object flow. #pax equally correct they seldom are.
Two big DFD advantages for me:
Link to object model. Data stores on a DFD provide a really nice way to link the data produced / consumed to the object model. Very useful for consistency and ensuring your thinking is joined up.
They de-emphasise control flow. Far too often designs over-constrain sequencing. Activity diagrams do support concurrency - but it requires the user to (a) remember and (b) use it. So the default is over-serialisation. DFDs don't. They lay bare the real sequence dependencies without any extra effort on the part of the user. Consequently they also make it easier to see causal relationships. If processes a and b both require data input D then it's obvious on the diagram. And hence parallel activity is obvious.
Don't get me wrong - I'm not against Activity diags. Where control flow is the primary consideration I'll use an AD over a DFD. But empirically I'd say I find DFDs a more useful tool in ~70-80% of cases.
Of course, YMMV.

Just a humble opinion from someone who has had to explain processes, both computer and manual, to upper management and CIOs. I have found simple is better and pound-for-pound, DFDs get the message across when I am actually "asked" about details. That being said, the better approach is to always practice the story line and answer in simple answers.
One final comment regarding the age of tools and products. Remember that in most cases these are running the business and work pretty darn good. The adage "you break it (or replace it) and you own it" can make you a hero or make you into a clown.
We have a CIO who wants to replace all mainframe application for the simple reason that they are old technology. One must weigh the consequences and understand if the replacement can handle workload. Have you ever wondered why JPMC, Credit Swiss, Walmart, and Bank of America to name a few still run mainframes?
My apologies for taking it in that direction. Just make sure, whatever analysis tool is used, that all aspects of the replacement are documented including workload, I/O, concurrent users, adoption curve, and scalability.

Data Flow represents flow within one module or one independent code. However Sequence Diagram represents sequence of activity in between different modules.
Yes at some points they may pass the same messages.
I basically use Sequence diagram in interface documents which will be shared with other modules/elements, however DFD will be used in Low level design documents which will be used to develop the code within one module or network element.

If we look closer to a dataflow diagram, we can notice that when a node collects data from all its edges, it starts to process them. And to process, it needs an activity token, which represents access to a processor. Usually the process of obtaining that token is omitted, but it has to exist. Typically, the whole node as a token is put in a double-ended queue, where on the other end of that queue free activities (processors or threads) are stored. A thread pool is a perfect example of such a queue, and the nodes ready to work are represented as tasks. As soon as a node meets activity, they both are taken from the queue and actual processing starts. When processing finishes, activity is returned to the queue. This way we can think of activity as a special kind of token.
So both dataflow and activity diagrams are just simplified variants of a general active-data-flow diagram, with either activities or data tokens omitted. But generally, both kinds of tokens can be represented in a diagram simultaneously.
Programmers used to think about threads as activities, but if we look at them closer, we notice that when a thread is ready to execute, it gets in a line to processors, and real execution starts only when a free processor switch to that thread. This is absolute analogy as tasks are executed on a thread pool. So from simplified point of view, a thread is an activity, and from more rigorous point of view a thread is a data token and the only real activity is a physical processor. This shows that activity tokens are not different from data tokens. And indeed, we can omit the route of node chasing an activity and consider a dataflow node as an activity itself, which starts to work immediately when all its edges (inputs) contain data.

Related

Can one Activity Diagram Depict only One process

I am Working on a project for Orders data visualization Dashboard.
This is the use case Diagram:
Currently I'm working on the activity diagram and my question is: Can the Manager and the Shipper login into the system and initiate their own activities in the same diagram?

An activity diagram "specifies behavior by sequencing subordinate units". It can perfectly have several initial nodes, e.g. one for Admin and one for Manager or Shipper. And when the activity is invoked, all those initial nodes would be activated at the same time, starting each a concurrent flow, each being performed at its own pace.
But this makes only sense if the Manager's actions and the Shipper's actions are really related, concurrent and somehow synchronized. E.g. every time an Admin login is performed, a Manager login would be expected.
If the Manager flow as well as the Shipper flow are both independent and in reality unsequenced, you should use separate activity diagrams. In this case, trying to squeeze them into a single diagram could even be misleading.
Additional remarks, unrelated to your question:
Typically you'd have a separate activity diagram for each use case. It's not an obligation, but it's a common practice to describe what happens for a specific use-case. In your case, it would mean that only the Ship Order and the Update product stock would be in a situation where actions could be interdependent.
When activity diagrams are used to design systems, it shows in principle what happens inside the system. The actors are outside the system. So the partitions (columns) would in principle not show what an actor does, but what the system does in relation with an actor. Of course, if you use activity diagrams for process modeling, it would be a different story.

Yes, they can. With each InitialNode a flow begins and the tokens run down the actions independently. Probably at a certain time you will have some synchronization between both (this is not mandatory but otherwise it would probably rather pointless to have two independent flows in the same diagram). In that case you either have a MergeNode (bar) to wait for both tokens to arrive or an action has two incoming edges waiting for both tokens in order to commence.

How to represent a complex use case where every step of the main flow can have multiple scenarios (alternative or error path)?

Little background
I'm new to writing use cases and representing their scenarios.
I'm dealing with a complex system. In the first step of analyzing the system, I created a use case diagram where each use case represents a distinct goal or value for the system. I have tried my best to keep the use cases independent. All these use cases require the initialization and activation of the system, so I decided to take out this common part and link it to the main use cases using include relationship.
I understand that include and extend relationships need to be used only when necessary.
Now I'm lookin into defining scenarios for each use case and then developing user stories and requirements based on scenarios.
Main issue
The use cases are very complex and the easiest way to analyze it seems to be mapping it into a sequence of steps/activities where each activity contains several scenarios and each scenario is represented using a sequence diagram.
I understand that an activity cannot be a use case which is related to the main use case using include relationship; but having sequence diagrams for activities seem wrong too.
What is the best way to represent a use case where each step of the main flow is complex and can have several interactions between actors and systems as well as having error scenarios which can result in termination of the sequence at that step or possibility of the user cancelling/aborting the sequence?
I have attached a simplified version of the activity diagram for "Initialize" use case.
As I mentioned, each activity can have many scenarios. For example
"Perform Self check" has many steps and each step might result in a failure that can terminate the sequence and alert the user (via a HMI). The user then can either terminate the initialization or retry.
"Validate system configuration" include steps for obtaining the reference config versions and comparing that to the system config, then download the new config files if necessary and then update the system configs. Each step might have a failure resulting in some sort of message to user and termination of the sequence. In some cases user should be able to skip the failed steps and proceed without doing that activity.
Same goes for every other activity in the diagram; many steps with exception or alternative paths.
Can I map these on one sequence diagram for the "Initialize" Use case?
My attempt to put all these on one sequence diagram failed.
I tried putting all these interactions on an activity diagram with swimlanes but things got so complex that stakeholders have a hard time understanding what is going on.
Maybe I'm trying to put too much details at the system level. Should I leave all these interim steps and interaction for the lower level of design? Should I create a hierarchy of use cases and roll down the complexity? I'm confused. :(
What is the best way to deal with such level of complexity? Could you provide some good examples.

The only way to represent a complex use case, where every step of the main flow can have multiple scenarios, is fortunately very simple:
The complexity of the scenarios does not change anything to the simplicity of the actor's goals. And if the goals are not sufficiently simple, you'd probably looking at too much details. Or the things are not as clear as they should.
The scenarios are often represented with a set of sequence diagrams. But if it gets really complex you'd better show the flow with an activity diagram.
By the way, you do not need to create an artificial extending or included use-case for the sake of modelling common steps. You may just create a separate activity diagram for the common part. Then, in each of your use-case activity diagram, you'd insert a call action of the common activity. This also avoids to misleadingly include the common part in the description of one UC and forget it for the others.
Last but not least, you also want to develop user-stories based on the use-case scenario. This is a mixed approach that requires some more thoughts:
user-stories are generally used without use-cases. Complex erquirements are described as an epic. The epic would then successfully be refine it into user-stories, that fit in an iteration;
it is possible to structure such user-stories according to stakeholder goals and tasks. THis approach is called user-story mapping. This is closer to the use-case, but there is no term to describe the higher-level goals.
use-case driven development is generally used without user-stories: the scenarios and activity directly lead to development without intermeriate user-stories.
Fortunately, the Use-Case 2.0 approach allows to combine both ways. Read the linked whitebook: it's short, it's free, it's written by the inventor of use-cases together with leading authors of use-case methodology; it offers a reegineered appraoch that allows agile developments, using use-case for the big picture and using use-case slices to break it down dynamically into units that can be developped in one iteration.

A complex use case can remain a single use case, but it may need multiple diagrams to specify its flows.
Your activity diagram (although not 100% UML compliant) gives a good overview of the flow of the use case. Keep this as the main diagram. I would decompose the complex steps in separate diagrams. To indicate that a step is decomposed in a separate diagram, you can display a rake symbol, as follows:
See UML 2.5.1 specification, section 16.3.4.1 for more information.

UML: activity diagram control flow

I have found in internet the following activity diagram:
I can't understand why from Recieve order action there are two control flows (to Ship order and Bill customer). Are they parallel? Then why there is NOT fork. How to understand this diagram? Please, explain.

This is simply a bad example.
Judging from the context there should be a fork to indicate that both Ship order and Bill customer should happen in parallel.
Then there should be a Join before Send confirmation to indicate that both flows should have finished before the Send confirmation is executed.

It's perfectly valid UML 2. The first action requires no tokens, so it can start immediately. When the first action finishes, it offers tokens to two other actions and they can start. The last action cannot start until all required tokens are offered. When the last action is done, the containing activity is also done.
Forks just copy tokens. Joins just merge tokens. Thus, forks and joins are often unnecessary.
Please see Conrad Bock's excellent series of articles on activity diagrams in the Journal of Object Technology.

I did not check if this is a recent amendment to the UML specification but the current version 2.5 (chapter 15.2.3.2) states (emphasis set on my own):
As an ActivityNode may be the source for multiple ActivityEdges, the same token can be offered to multiple targets. However, the same token can only be accepted at one target at a time (unless it is copied, whereupon it is not the same token, see ForkNodes in sub clause 15.3 and ExecutableNodes in sub clause 15.5). If a token is offered to multiple ActivityNodes at the same time, it shall be accepted by at most one of them, but exactly which one is not completely determined by the Activity flow semantics. This means that an Activity model in which non-determinacy occurs may be subject to timing issues and race conditions. It is the responsibility of the modeler to avoid such conditions in the construction of the Activity model, if they are not desired.
From that point of view, I would argue that #Jim L.'s answer might be deprecated. I assume that, at least in the current version of UML, the diagram in question does not reflect the modeler's intention. A fork seems to be not only the clean way but the only right way to go, now.

Some questions regarding Context and DataFlow Diagrams

I have to develop a CRUD application, that will be coded in php.
I have 3 main actors (Users, Administrators and Doctors -- this is for an hypothetical
hospital), each one with different Use Cases already defined.
Although I feel the Use Cases are more than enough to successfuly model the Class Diagram, I am being specifically asked to also include DataFlow Diagrams into the project's documentation.
I've been reading about DataFlow diagrams, and it seems you usually have first of all a Level 0 DataFlow Diagram, to which they call Context Diagram.
Being that this is basically a 3-tier application with 3 different Actors, how should I model the Context Diagram?
Being that a Context Diagram is supposed to just tell us what comes in and what comes out of our System, I can't imagine anything more interesting/descriptive than the following diagram:
Is this supposed to be something like this, or am I totally missing the point? This php page will connect to an Oracle database, but I guess that if the idea is to consider the System as a whole in the Context Diagram, I should "hide" that fact in the above diagram.
Where should I go on from here? I know I should "zoom" the System process to something more detailed. Maybe the next step would be to depict each one of the User Cases in a DataFlow diagram? Do I include repositories of data, already? For example, one for Users, other for Doctors and yet another for Administrators?
Thanks

Are you sure there's nothing else the system interacts with? e.g. diagnostics input, etc.?
If not then your context diag is basically ok - although I'd probably show each entity once and use double headed arrows. I'd agree with your reasoning for the db - it's part of the system, not external to it - so don't show it on the CD.
As for next steps, again you're on the right lines. Try modelling the flow for each Use Case as a DFD. DFDs are very useful for illustrating processing-intensive apps. Difficult to know if that's a good match to your problem or not.
You'll find DFDs are also useful for driving out and validating your Class Diagram. In fact, that's one of their strengths: datastores on the DFD should correlate with the contents of your class diagram (not necessarily one datastore to one class though). So do include datastores as you work through the processes. You'll find it drives out more than just the Actors.
hth.

Some remarks:
You DFD does not tell me much, except that Users, Administrators and Doctors use it, but it gives me no clue what they get from the system (except "Output Data"). IOW the context diagram does not give me the slightest idea, what the system does.
Admittedly, if the system is large, then it can be difficult to describe the dataflows in few words, but just about anything is better than "data".
The fact that the system is a 3tier architecture is irrelevant for the DFD. This is an implementation detail. DFDs are an analysis tool. You describe what you want the system to do, not how this is achieved.
I find it particularly useful to focus on the outgoing flows. While Users, Administrators and Doctors provide input to the system, this is most likely nothing they want to do. It is something they have to do in order to get the desired output.

When should one use the Actor model?

When should the Actor Model be used?
It certainly doesn't guarantee deadlock-free environment.
Actor A can wait for a message from B while B waits for A.
Also, if an actor has to make sure its message was processed before moving on to its next task, it will have to send a message and wait for a "your message was processed" message instead of the straightforward blocking.
What's the power of the model?

Given some concurrency problem, what would you look for to decide whether to use actors or not?
First I would look to define the problem... is the primary motivation a speedup of a nested for loop or recursion? If so a simple task based approach or parallel loop approach will likely work well for you (rather than actors).
However if you have a more complex system that involves dependencies and coordinating shared state, then an actor approach can help. Specifically through use of actors and message passing semantics you can often avoid using explicit locks to protect shared state by actually making copies of that state (messages) and reacting to them.
You can do this quite easily with the classic synchronization problems like dining philosophers and the sleeping barbers problem. But you can also use the 'actor' to help with more modern patterns, i.e. your facade could be an actor, your model view and controller could also be actors that communicate with each other.
Another thing that I've observed is that actor semantics are learnable by most developers and 'safer' than their locked counterparts. This is because they raise the abstraction level and allow you to focus on coordinating access to that data rather than protecting all accesses to the data with locks. As an example, imagine that you have a simple class with a data member. If you choose to place a lock in that class to protect access to that data member then any methods on that class will need to ensure that they are accessing that data member under the lock. This becomes particularly problematic when others (or you) modify the class at a later date, they have to remember to use that lock.
On the other hand if that class becomes an actor and the data member becomes a buffer or port you communicate with via messages, you don't have to remember to take the lock because the semantics are built into the buffer and you will very explicitly know whether you are going to block on that based on the type of the buffer.
-Rick

The usage of Actor is "natural" in at least two cases:
When you can decompose your problem in a set of independent tasks.
When you can decompose your problem in a set of tasks linked by a clear workflow (ie. dataflow programming).
For instance, if you process complex data using a series of filters, it is easy to use a pipeline of actors where each actor receives data from an upstream actor and sets data to a downstream actor.
Of course, this data-flow must not be linear and if a step is slow in your pipeline, instead you can use a pool of actors doing the same job. Another way of solving the load balancing problems would be to use instead a demand-driven approach organized with a kind of virtual Kanban system.
Of course, you will need synchronization between actors in almost all interesting cases, but contrary to the classic multi-thread approach, this synchronization is really "concrete". You can imagine guys in a factory, imagine possible problems (workers run out of the job to do, upstream operations is too fast and intermediate products need a huge storage place, etc.) By analogy, you can then find a solution more easily.

I am not an actor expert but here is my 2 cents when to use actor model:
Actor model is not suited for every concurrent application, for instance if you are creating an application which is multi threaded and works in high concurrency actor model is not made to solve the concurrency issue.
Where actors really comes into play is when you are creating an event driven application. For instance you have an application and you are tracking what are users clicking in your application realtime. You can use actors to do activities realtime segregated by user, device or anything of your business requirement as actors are stateful. So, for example if some users lies in actors which clicked on shirts you can send them notification of some coupon.
Also some applications where actors comes handy are : Finance (Pricing, fraud detection), multiplayer gaming.

Actors are asynchronous and concurrent but does not guarantee message order or time limit as to when the message may be acted upon. Hence atomic transactions cannot be split into Actors.
If the application/task involves no mutable state then Actors are overkill as Actor frameworks go to great lengths to avoid race conditions.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string