Difference between business process and events in hybris - sap-commerce-cloud

Please help me understand the difference between business process and events in hybris. What is the advantage of using Business Process over events?

The Hybris Process Engine is used for defining business processes. It is similar to a workflow (like a workflow diagram). It has a sequence/flow to be followed, and uses different kinds of nodes:
Action: carry out process logic and permit alternative actions to be carried out
Wait: wait for a subprocess or an external process result
Notify: inform a user or user group of the state of a process
Split: split the process into parallel paths
End: end the process and store state in a process item
Hybris also has a Workflow System. It is different from the Process Engine, but conceptually the same and uses different classes.
Business Process does not have human intervention, but a Workflow can have.
On the other hand, the Event System is simply for receiving and sending events. It's similar to the Observer design pattern.
OFFICIAL REFERENCES:
The SAP Commerce processengine: https://help.sap.com/viewer/d0224eca81e249cb821f2cdf45a82ace/1905/en-US/8c30e9ae86691014a36ed5fd11e24a1e.html
workflow Extension: https://help.sap.com/viewer/d0224eca81e249cb821f2cdf45a82ace/1905/en-US/8c878e7286691014b3aaf108edc38cca.html
Event System: https://help.sap.com/viewer/d0224eca81e249cb821f2cdf45a82ace/1905/en-US/8bbbc04e866910149e93ca9faad254eb.html

Related

Service Fabric Reliable Actors and I/O Operations

On the Service Fabric Reliable Actors Introduction, documentation provided by Microsoft, it states that Actors should not, "block callers with unpredictable delays by issuing I/O operations."
I'm a bit unsure on how to interpret this.
Does this imply that I/O is ok so long as the latency of the request is predictable?
or
Does this imply that the best practice is that Actors should not make any I/O operations outside of Service Fabric? Like for example: to some REST API or to write to some sort of DB, data lake or event hub.
Technically, is a bit of both.
Because actors are Single Threaded, only a single operation can happen in the actor at same time.
The SF Actor uses the Ask approach, where every call expect an answer, the callers will make calls and keep waiting for answers, if the actor receives too many calls from clients, and this actor depends too much on external components, it will take too long to process each call and all other client calls will be enqueued and will probably fail at some point because they will wait for too long and timeout.
This wouldn't be a big issue for actors using the Tell approach, like Akka, because it does not wait for an answer, it just send the message to the mailbox and receive a message back with the answer(when applicable). but the latency between request and response will still be an issue because too many messages are pending to process by a single actor. On the other hand, could increase the complexity if one command fail, and there are 2 or 3 sequence of events that triggered before you know the answer for the first (not the scope here, but you relate this happening to the example below).
Regarding the second point, the main idea of an Actor is to be self contained, if it depends too much on external dependencies, maybe you should rethink the design and evaluate if the actor is actually the best design for the problem.
Self contained actors are scalable, they don't depend on external state manager to manage his own state, they won't depend on other actor to accomplish their tasks, they can scale independently of each other.
Example:
Actor1(of ActorTypeA) depends on Actor2 (of ActorTypeB) to execute an operation.
To make it more human friendly let's say:
ActorTypeA is an Ecommerce Checkout Cart
ActorTypeB is a Stock Management
Actor1 is the Cart for user 1
Actor2 is the Stock for product A
Whenever a client(user) interact with his checkout cart, adding or removing products, it will send add and remove commands to the Actor1 to manage his own cart. In this scenario the dependency is one to one, when another user navigate to the website, another actor will be created for him to manage his own cart. In both cases, they will have their own actors.
Let's now say, Whenever a product is placed in the cart, it will be reserved in stock to avoid double selling the same product.
In this case, both actor will try to reserve products in Actor2, and because of single threaded nature of Actors, only the first one will succeed and the second will wait the first to complete and fail if the product is not in stock anymore. Also, the second user, won't be able to add or remove any products to his cart, because the first operation is waiting to complete. Now increase these numbers for thousands and see how the problem evolves quickly and the scalability fails.
This is just a small and simple example, so, the second point is not just for External Dependencies, it also applies to internal ones, every operation outside the actor reduces the scalability of it.
Said that, you should avoid external (outside the actor) dependencies as much as possible , but is not a crime if it is needed, but you will reduce the scalability when an external dependency is limiting it to scale independently.
This other SO question I've answered might also be interesting to you.

DDD - Using a Process Manager or a Domain Service

I am new with DDD and I am implementing it on a part of my application because some of the requirements of the application lead me to CQRS with Event Sourcing (need of historic of events that occured in the system plus need to be able to view the state of the system in the past).
One question I have after reading Vaughn Vernon book and its Effective Aggregate Design series is what is the difference between Process Manager (Long Running Porcess) and Domain Service. Especially when you have navigation properties towards an Aggregate into another Aggregate
I'll explain what I have understood :
- Domain services are made to hold logic that does not belong into any Aggregate. According to Vaughn it can be used as well to pass entities reference to the aggregate that contains it. It maybe used also to manage transactions as they cannot be handle into a Domain Object
- Process manager are made to orchestrate modifications that are made on a system and spans on different aggregates. Some people are saying that a good Process Manager is actually an Aggregate Root. From my understanding it does not manage transactions as events are launched after changes are committed. It uses the approach of eventual consistency. Eventually all the changes will have occured
Now, to put everything in context. The core of the application I am building is to handle a tree of Nodes that contains their own logic. We need to be able to add Nodes to the Tree and of course to create those Nodes.
We need to be able to know what happened to those Node. ie we need to be able to retrieve the event linked to a node
Also a modification that is done to one of the leaves (depending of the kind of modification) will have to be replicated to the other Nodes that are a parent of this node.
What are my aggregates :
- Nodes, it is what my tree contains. In my opinion this is an aggregate for several reasons. It is not an invariant, therefore not a value object. They have their own domain logic that allows them to assign it's properties it's value objects and we need to be able to access them using Ids
- A representation of a non binary Tree made of Nodes. Right now, I actually designed this as my aggregate Root and it is actually a Process Manager. The tree contains a logic representation of this tree. It contains the root of the tree. This root is actually an Object (I am not sure it can be named a Value Object because it contains reference towards other aggregates, Child Nodes, but it certainly sounds like it is). The Node Object in the Tree contains basic information like the Node Name, and the reference towards the actual Aggregate (this almost sounds like two bounded context ?)
Using that approach this is what is happening :
- After executing the command to create a Node, a Node is created and committed. The NodeCreated Event is launched, caught by the correct Handler that retrieves the Tree (process manager) associated to this node and add the node at the correct place (using the parent id property of the Node)
- After executing the command to modify a Node, the node is modified and committed. The NodeModified Event is launched, caught by the handler. The Handler then retrieves the Tree (my process manager) and find all the Parent Node of the modified Node and ask for those Node to modify their own properties based on what was modified on the Child Node. This all makes perfect sense, and looks almost beautiful to me, showing the power of events and the seperation of domain logic
But, My principal issue here is with the transaction. What happens if an error happens while updating the Tree and the node that has to be modified or added? The event for the Node is already saved in the Event Store, because it was committed. So i would have to create a new Event to revert the modifications ? I know that commands have to be valid when entering the system so it would not be a validation issue, and chances that something happening are like 1 in a million. Does that mean we should not take that possibility in account ?
The transaction issue is why I feel like I should use a Service. Either a Application Service (here a command handler) or a domain Service to orchestrate the mofications and do them in a Single Transaction. If something fails during this transaction, nothing is created/modified but that breaks the rule of DDD saying that I should not modify several Aggregates in the same Transaction. This somehow looks a less elegant solution
I really feel like I am missing something here but I am not quite sure what it is.
Some people are saying that a good Process Manager is actually an Aggregate Root
From my point of view this is not correct. A Process manager or a Saga coordinates a long running Business process that spans multiple Aggregate instances. It brings the system eventually in a valid final state. It does not emit events but respond to events and creates Commands that arrive to the Aggregates (possibly through a Command handler, depending on your exact architecture). Those architects that say that have failed to correctly identify they Aggregate boundaries.
A Process manager/Saga could be stateful - but just to remember the progress that it has made; it can have a Process ID; it can even be Event-sourced.
Process manager are made to orchestrate modifications that are made on a system and spans on different aggregates.
Yes, this is correct.
After executing the command to modify a Node, the node is modified and committed.
When you design your Aggregates you must take into consideration only the protection of invariants, of the business rules that exists on the write/command side of the architecture; this is the side that produce the state transition, the emitting of the events in case of Event-driven architectures.
The single business rule, if any, that I have identified on your specific case is that when a node is created (seems like a CRUD operation!) the NodeCreated Event is emitted; similar to NodeModified. So, these operations exist on the write/command side.
The NodeModified Event is launched, caught by the handler. The Handler then retrieves the Tree (my process manager) and find all the Parent Node of the modified Node and ask for those Node to modify their own properties based on what was modified on the Child Node
Are there any business rules for the write side regarding the updating of the Parents nodes? I don't see any. Of course, something is updated after a Node is created but it is not an Aggregate but a Read model. Your Handler that is called is in fact a Read model. It projects the NodeXXX events on a Tree of Nodes.
I really feel like I am missing something here but I am not quite sure what it is.
You may have over complicated your domain model.
Domain Services are typically service providers that give the domain model access to (cached) state or capabilities it wouldn't normally have. For instance, we might use a domain service to give the model access to a cached tax table, so that it can compute the tax on an order; or we might use a domain service to give the model access to a notifyCustomer capability that is delegated to the email infrastructure.
Process Managers are usually used for orchestration - they are basically state machines that look at what has happened (events) and suggest additional commands to run. See Rinat Abdullin's description.
What happens if an error happens while updating the Tree and the node that has to be modified or added? The event for the Node is already saved in the Event Store, because it was committed. So i would have to create a new Event to revert the modifications ?
Maybe - compensating events are a common pattern.
The main point is this: there is no magic in orchestrating a change across multiple transactions. Think about how you would arrange a UI that displays to a human operator what's going on, what should happen next, and what the failure modes would be.
chances that something happening are like 1 in a million. Does that mean we should not take that possibility in account ?
Depends on the risk to the business. But as Greg Young points out in his talk Stop Over Engineering, if you can just escalate that 1 in a million problem to the human beings to sort out, you may have done enough.

EventSourcing gateways (synchronize with external systems)

Are there best practices for implementation of eventsourcing gateways? The gateway is meant as infrastructure or service which allows to generate a set of events, proceeding from the status returned by some external service.
Even if application based on eventsourcing, some external uncontrollable entitles can still be present. For example, you want to synchronize users list from Azure AD, and perform prompt to service, which return users list. Then you get users list from projection, make difference with external state, and produce events to fill this difference.
Or your application is online-shop, and you should import actual USD/EUR/bitcoin ranks for showing prices. Gateway can poll some currencies provider and produce event. In simple case it's very easy, but if projection state is more complex structure, trivial import is not obvious.
Maybe is there common approach for this case?
Building integration adapters that use poll-emit is normal and I personally prefer this way of doing integrations in general.
However, this has little to do with event sourcing, since what you actually need to solve your integration problems is to simulate the desired functionality that the external system will emit events on its own and you can build a reactive system that consumes these events.
When these events come to your system from the adapter - you can do whatever you want with them but essentially, event sourcing assumes that you store your own object's state in event streams but in case the event comes from some external system - it is not your state. You can derive your system state from external events but these will be your own events.

Why can't sagas query the read side?

In a CQRS Domain Driven Design system, the FAQ says that a saga should not query the read side (http://cqrs.nu). However, a saga listens to events in order to execute commands, and because it executes commands, it is essentially a "client", so why can't a saga query the read models?
Sagas should not query the read side (projections) for information it needs to fulfill its task. The reason is that you cannot be sure that the read side is up to date. In an eventual consistent system, you do not know when the projection will be updated so you cannot rely on its state.
That does not mean that sagas should not hold state. Sagas do in many cases need to keep track of state, but then the saga should be responsible of creating that state. As I see it, this can be done in two ways.
It can build up its state by reading the events from the event store. When it receives an event that it should trigger on it will read all events it needs from the store and build up its state in a similar manner that an aggregates does. This can be made performant in Event Store by creating new streams.
The other way is that it continuously listens to events from the event store and build up state and stores it on some data storage like projections do. Just be careful with this approach. You cannot reply sagas in the same way as you do with projections. If you need to change the way you store state and want to rebuild it, make sure that you do not execute the commands that you have already executed.
Sagas use the command model to update the state of the system. The command model contains business rules and is able to ensure that changes are valid within a given domain. To do that, the command model has all the information available that it needs.
The read model, on the other hand, has an entirely different purpose: It structures data so that it is suitable to provide information, e.g. to display on a web page.
Since the saga has all the information it needs through the command model, so it doesn't need the read model. Worse, using the read model from a saga would introduce additional coupling and increase the overall complexity of the system considerably.
This does not mean that you absolutely cannot use the read model. But if you do, be sure you understand the consequences. For me, that bar is quite high, and I have always found a different solution yet.
It's primarily about separation of concerns. Process managers (sagas) are state machines responsible for coordinating activities. If the process manager want to affect change, it dispatches commands (asynchronous).
Also: what is the read model? It's a projection of a bunch of events that already happened. So if the processor cared about those events... shouldn't it have been subscribing to them all along? So there's a modeling smell here.
Possible issues:
The process manager should have been listening to earlier messages in the stream, so that it would be in the right state when this message arrived.
The current event should be richer (so that the data the process manager "needs" is already present).
... variation - the command handler should instead be listening for a different event, and THAT one should be richer.
The query that you want should really be a command to an aggregate that already knows the answer
and failing all else
Send a command to a service, which runs the query and dispatches events in response. This sounds weird, but it's already common practice to have a process manager dispatch a message to a scheduling service, to be "woken up" when some fixed amount of time passes.

Use cases of the Workflow Engine

I'd like to know about specific problems you - the SO reader - have solved using Workflow Engines and what libraries/frameworks you used if you didn't roll your own. I'd also like to know when a Workflow Engine wasn't the best choice and if/how you chose something simpler, like a TaskList/WorkList/Task-Management type application using state machines.
Questions:
What problems have you used workflow engines to solve?
What libraries/frameworks did you use?
When did a simpler State Machine/Task Management like system suffice?
Bonus: How did/do you make the distinction between Task Management and Workflow Engine?
I'm looking for first-hand experiences.
Some of the resources I've checked out:
Ruote and State Machine != Workflow Engine
StonePath and Docs
Creating and Managing Worklist Task Plans with Oracle
Design and Implementation of a Workflow Engine - Thesis
What to use Windows Workflow Foundation For
JBoss jBPM Docs
I'm biased as well, as I am the main author of StonePath.
I have developed workflow applications for the U.S. State Department, the Geneva Centre for Humanitarian Demining, several fortune 500 clients, and most recently the Washington DC Public School System. Every time I have seen a 'workflow engine' that tried to be the one master reference for business processes, I have seen an organization fighting itself to work around the tool. This may be due to the fact that these solutions have always been vendor/product driven, and then end up with a tactical team of 'consultants' constantly feeding the app... but because of this, I tend to react negatively when I hear the benefits of process-based tools that promise to 'centralize the workflow definitions in one place and make them repeatable'.
I very much like Ruote - I have been following that project for some time and should I need that kind of solution, it will be the next tool I'll be willing to try. StonePath has a very different purpose than ruote
Ruote is useful to Ruby in general,
StonePath is aimed at Rails, the web framework written in Ruby.
Ruote is about long-lived business processes and their associated definitions (Note - active development on ruote ceased).
StonePath is about managing State-based workflow and tasking.
Frankly, I think the distinction from the outside looking in might be subtle - many times the same kinds of business processes can be represented either way - the state-and-task-based model tends to map to my mental model though.
Let me describe the highlights of a state-based workflow.
States
Imagine a workflow revolving around the processing of something like a mortgage loan or a passport renewal. As the document moves 'around the office', it travels from state to state.
If you are responsible for the document, and your boss asked you for a status update you'd say things like
"It is in data entry"...
"We are checking the applicant's credentials now"...
"we are awaiting quality review"...
"We are done"... and so on.
These are the states in a state-based workflow. We move from state to state via transitions - like "approve", "apply", kickback", "deny", and so on. These tend to be action verbs. Things like this are modeled all the time in software as a state machine.
Tasks
The next part of a state/task-based workflow is the creation of tasks.
A Task is a unit of work, typically with a due date and handling instructions, that connects a work item (the loan application or passport renewal, for instance), to a users "in box".
Tasks can happen in parallel with each other or sequentially
Tasks can be created automatically when we enter states,
Create tasks manually as people realize work needs to get done
Require tasks to be completed before we can move onto a new state.
This kind of behavior is optional, and part of the workflow definition.
The rabbit hole can go a lot deeper than this, and I wrote an article about it for Issue #4 of PragPub, the Pragmatic Programmer's Magazine. Check out the repo link above for an updated PDF of that article.
In working with StonePath the last few months, I have found that the state-based model maps really well to restful web architectures - in particular, the tasks and state transitions map nicely as nested resources. Expect to see future writing from me on this subject.
I'm biased, I'm one of the authors of ruote.
variant 1) state machine attached to a resource (document, order, invoice, book, piece of furniture).
variant 2) state machine attached to a virtual resource named a task
variant 3) workflow engine interpreting workflow definitions
Now your question is tagged "BPM" we can be expanded into "Business Process management". How does that kind of management occur in each of the variant ?
In variant 1, the business process (or workflow) is scattered in the application. The state machine attached to the resource enforces some of the aspects of the workflow, but only those related to the resource. There may be other resources with their own state machine following the same business process.
In variant 2, the workflow can be concentrated around the task resource and represented by the state machine around that resource.
In variant 3, the workflow is enacted by interpreting a resource called a workflow definition (or business process definition).
What happens when the business process changes ? Is it worth having a workflow engine where business processes are manageable resources ?
Most of the state machine libraries have 1 set states + transitions. Workflow engines are, most of them, workflow definition interpreters and they allow multiple different workflows to run together.
What will be the cost of changing the workflow ?
The variants are not mutually exclusive. I have seen many examples where a workflow engine changes the state of multiple resources some of them guarded by state machines.
I also use variant 3 + 2 a lot, for human tasks : the workflow engine, at some points when running a process instance, hands a task (workitem) to a human participant (resource task is created and placed in state 'ready').
You can go a long way with variant 2 alone (the task manager variant).
We could also mention variant 0), where there is no state machine, no workflow engine, and the business process(es) are scattered and/or hardcoded in the application.
You can ask many questions, but if you don't take the time to read the answers and don't take the time to try out and experiment, you won't go very far, and will never acquire any flair for when to use this or that tool.
On a previous project I was working on i added some Workflow type rules to a set of Government Forms in the Healhcare industry.
Forms needed to be filled out by the end user , and depending on some answers other Forms were scheduled to be filled out at a later date. There were also external events that would cancel scheduled Forms or schedule new ones.
Sample Flow :
Patient Admitted -> Schedule Initial Assessment FOrm -> Schedule Quarterly Review Form -> Patient Died -> Cancel Review -> Schedule Discharge Assessment Form
Many other rules were based on things such as Patient age, where they were being admitted etc.
This was an ASP.NET app, the rules were basically a table in the database. I added scripting, so a script would run on Form completion to determine what to do next. This was a horrid design, and would have been perfect for a proper Workflow engine.
I'm one of the authors of the open source Temporal Workflow Engine we initially developed at Uber as Cadence. The difference between Temporal and the majority of the existing workflow engines is that it is developer focused and is extremely flexible and scalable (to tens of thousands updates per second and up to billions of open workflows). The workflows are written as object oriented programs and the engine ensures that the state of the workflow objects including thread stacks and local variables is fully preserved in case of host failures.
What problems have you used workflow engines to solve?
Temporal is used for practically any backend application that lives beyond a single request reply. Examples of usage are:
Distributed CRON jobs
Managing ML/Data pipelines
Reacting to business events. For example trip events at Uber. The workflow can accumulate state based on events received and execute activities when necessary.
Services Deployment to Mesos/ Kubernetes
CI Pipeline implementation
Ensuring that multiple service calls complete when a request is received. Including SAGA pattern implementation
Managing human worker tasks (similar to Amazon MTurk)
Media processing
Customer Support Ticket Routing
Order processing
Testing service similar to ChaosMonkey
and many others
The other set of use cases is based on porting existing workflow engines to run on Temporal. Practically any existing engine workflow specification language can be ported to run on Temporal. This way a single backend service can power multiple domain specific workflow systems.
What libraries/frameworks did you use?
Temporal is a self contained service written in Go with Go, Java, PHP, and Typescript client side SDKs (.NET and Python are coming in 2022). The only external dependency is storage. Cassandra, MySQL and, PostgreSQL are supported. Elasticsearch can be used for advanced indexing.
Temporal also support asynchronous cross region (using AWS terminology) replication.
When did a simpler State Machine/Task Management like system suffice?
Open source Temporal service can be self hosted or temporal.io cloud offering can be used. So the overhead of building any custom state machine/task management is always higher than using Temporal. Outside the company the service and storage for it need to be set up. If you already have an SQL database the service deployment is trivial through a docker image. The docker is also used to run a local Temporal service for development on a personal computer or laptop.
I am one of the authors of Imixs-Workflow. Imixs-Workflow is an open source workflow engine based on BPMN 2.0 and fully integrated into the Java EE technology stack.
I develop workflow engines by myself since more than 10 years. I will try to answer your question in short:
> What problems have you used workflow engines to solve?
My personal goal when I started to think about workflow engines was to avoid hard codding the business logic within my application. Many things in a business application can be reused so it makes sense to keep them configurable. For example:
sending out a notification
view open tasks
assigned a task to a person
describing the current task
From this function list you can see I am talking about human-centric workflows. In short: A human-centric workflow engine answers the questions: Who is responsible for a task and who needs to be informed next? And these are the typical questions in business requirements.
>What libraries/frameworks did you use?
5 years ago we started reimplementing Imixs-Workflow engine focusing on BPMN 2.0. BPMN is the common standard for process modeling. And the surprising thing for me was that we were suddenly able to describe even highly complex business processes that could be visualized and executed. I recommend everyone to use BPMN for modeling business processes.
> When did a simpler State Machine/Task Management like system suffice?
A simple state machine is sufficient if you just want to track the status of a business object. This is the case when you begin to introduce the 'status' attribute into your object model. But in case you need business processes with responsibilities, logging and flow control, then a state machine is no longer sufficient.
> Bonus: How did/do you make the distinction between Task Management and Workflow Engine?
This is exactly the point where many workflow engines mentioned here differ. For a human-centric workflow you typically need a task management to distribute tasks between human actors. For a process automation, this point is not so relevant. It is sufficient if the engine performs certain tasks. Task management and workflow engines can not be compared because task management is always a function of a workflow engine.
Check rails_workflow gem - I think this is close to what you searching.
I have an experience with using Activiti BPMN 2.0 engine for handling high-performance and high-throughput data transfer processes in an infrastructure of network nodes. The basic task was to allow configuration and monitoring of such transfer processes and control each network node (ie. request node1 to send a data file to node2 via specific transport layer).
There could be thousands of processes running at a time and overall tens or low hundreds of thousands processes per day.
There were bunch of different process definitions but it was not necessarily required that an operator of the system could create custom workflows. So the primary use case for the BPM engine itself was to be robust, scalable and allow monitoring of each process flow.
In the end it basically worked but what we learned from that project was that a BPMN platform, or rather the Activiti engine specifically, was not the best bet for such a high-throughput system.
The main challenges were task execution prioritization, DB locking, execution retries to name the few concerning the BPM itself. So we had to develop custom handling of these, for example:
Handling of retries in the BPM for cases when a node had no free worker for given task, or when the node was not running at all.
Execution of parallel transfer tasks in a single process and synchronization of the results (success/failure).
I don't know if other BPMN engines would be more suitable for such scenario since BPMN is mostly intended for long-running business tasks involving user interaction where performance is probably not the same issue as was in our case.
I rolled my own workflow engine to support phased processing of documents - cataloging, sending for image processing (we work with redaction sw), if needed sending to validation, then release and finally shipping back to the client. In our case we have a truckload of documents to process so sometimes we need to run each service separately to control delivery and resources usage. Simple in concept but high performance and distributed processing needed, and we could't find any off the shelf product that fit the bill for us.

Resources