I started to implement a system with a state machine. But I came to a point where I doubt that a state machine is the correct approach.
For example: I have four states:
(idle, powerup, powerdown, work)
and two other states:
(production, test)
powerup and powerdown do behave different in production and test state ...
If I do have more states the combination of states explodes ...
How is this solved with a state machine?
That's a bit difficult to answer since the actual use case is very vague, but here are some possible techniques:
Create separate states for Production+Powerup, Test+Powerup, Production+Powerdown, Test+Powerdown. Depending on the complexity and number of combinations of states this can explode pretty quickly.
Pros: straightforward.
Cons: cluttered, does not scale well, if some of the state logic is shared across instances there will be some copy pasta involved (hence, not very maintainable)
Use hierarchical state machines (HFSM), that is, if you can define some sort of hierarchical relationship between the various state groups, the implementation of a specific state would be a state machine on it's own.
So in your example, you would have a Production/Test state machine, and each one will implement it's own Idle/Powerup/Powerdown/Work state machine internally.
If you think this over, this is actually a neater implementation of option 1.
Pros: more readable than option 1
Cons: assuming substates should share some common login, there will still be copy pasting involved
Have 2 parallel state machines, one handling Production/Test states and one handling the Idle/Powerup/Powerdown/Work states with some sort of blackboard or shared memory to communicate between machines.
In your example, your agent or system will a container of the above mentioned state machines and process them in turn. The Production/Test machine would write some status to a shared memory that the other machine will read from and branch it's state logic accordingly.
Pros: Can share code between different states
Cons: Can share code between different states... Ok, serioulsy, it's super important to empashize that sharing code is NOT always a good idea (but that's a whole other phylosophical discussion. Just make sure you properly evaluate the amount of shared code vs. the amount of unique code or paths so you don't end up with a huge class that essentially contains 2 completely separate code paths
I know it's a granted, but consider if FSM is the proper method of representing state and execution in your application. Again, not enough context to dive deep into this, and this point on it's own is a phylosophical debate - but keep your mind open to other solutions as well.
The feeling that your state machine "explodes" is very typical in traditional "flat" FSMs (in fact it is generally known as the "state-transition explosion" phenomenon). The remedy is to use a hierarchical state machine (HSM) instead, which specifically counteracts the "explosion" of a traditional FSM. Basically, the HSM allows you to group states with similar behavior together (inside a higher-level super-state) and thus reuse the common behavior among the related states. This is a very powerful concept that leads to much more elegant and consistent designs. To learn more about hierarchical state machines, you can read the article "Introduction to Hierarchical State Machines".
A state machine can share signal with another one. So the state machine that indicate prod or dev can send a signal to the other one.
In fact, if you have only 2 state on the state machine, you can use a variable for that purpose. So you will have one state machine that will do different job depending on the value of the variable.
I would classify 'production' and 'test' as modes, not states. It will still be somewhat messy, but the distinction is important, in my opinion.
switch(state)
{
case powerup:
switch(mode)
{
case test:
test_powerup_stuff();
break;
case production:
production_powerup_stuff();
break;
default:
break;
}
break;
case powerdown:
switch(mode)
{
case test:
test_powerdown_stuff();
break;
case production:
production_powerdown_stuff();
break;
default:
break;
}
break;
case idle:
do_idle_stuff();
break;
case work:
do_work_stuff();
break;
default:
state = powerdown;
break;
}
Could it be that you need two instances of the same state machine model? One for production and one for test?
An alternative is that production and test could be orthogonal regions of a single machine.
Related
I have a state machine diagram for an application that has a UI and a thread that runs parallel. I model this using a state with 2 parallel regions. Now I want when that thread gives some message, for example an error, that the state of the UI changes, for example to an error screen state.
I was wondering what the correct method to do this is, as I was not able to find any example of this situation online.
The options that I thought of were creating a transition from every state to this error state when the error state is called, which in a big diagram would mean a lot of transitions. Or by just creating a transition from the thread to the error state (As shown in the image), which could possible be unclear.
Example of second option
You don't need a transition from every state. Just use a transition from the State State to the Error window state. It will then fire no matter what states are currently active.
Such a transition will also leave the Some process state. Since you didn't define initial states in the orthogonal regions, the right region of the state machine will then be in an undefined state. Some people allow this to happen, but for me it is not good practice. So, I suggest to add initial states to both regions. Then it is clear, which sub state will become active, when the State state is entered for any reason. The fork would also not be necessary then.
How you have done it in your example would also be Ok. However, since the Some process state is left by this transition, this region is now in an undefined state. Again, the solution to this is to have an initial state here. I would only use such a transition between regions, when it contains more than one state and the On error event shall not trigger a transition in all of the states.
If the Some process state is only there to allow for concurrent execution, there is an easier way to achieve this: The State state can have a do behavior, that is running while the sub states are active. Orthogonal regions should be used wisely, since they add a lot complexity.
Assume that I have two aggregates: Vehicles and Drivers, And I have a rule that a vehicle cannot be assigned to a driver if the driver is on vacation.
So, my implementation is:
class Vehicle {
public void assignDriver(driver Driver) {
if (driver.isInVacation()){
throw new Exception();
}
// ....
}
}
Is it ok to pass an aggregate to another one as a parameter? Am I doing anything wrong here?
I'd say your design is perfectly valid and reflects the Ubiquitous Language very well. There's several examples in the Implementing Domain-Driven Design book where an AR is passed as an argument to another AR.
e.g.
Forum#moderatePost: Post is not only provided to Forum, but modified by it.
Group#addUser: User provided, but translated to GroupMember.
If you really want to decouple you could also do something like vehicule.assignDriver(driver.id(), driver.isInVacation()) or introduce some kind of intermediary VO that holds only the necessary state from Driver to make an assignation decision.
However, note that any decision made using external data is considered stale. For instance, what happens if the driver goes in vacation right after it's been assigned to a vehicule?
In such cases you may want to use exception reports (e.g. list all vehicules with an unavailable driver), flag vehicules for a driver re-assignation, etc. Eventual consistency could be done either through batch processing or messaging (event processing).
You could also seek to make the rule strongly-consistent by inverting the relationship, where Driver keeps a set of vehiculeId it drives. Then you could use a DB unique constraint to ensure the same vehicule doesn't have more than 1 driver assigned. You could also violate the rule of modifying only 1 AR per transaction and model the 2-way relationship to protect both invariants in the model.
However, I'd advise you to think of the real world scenario here. I doubt you can prevent a driver from going away. The system must reflect the real world which is probably the book of record for that scenario, meaning the best you can do with strong consistency is probably unassign a driver from all it's vehicules while he's away. In that case, is it really important that vehicules gets unassigned immediately in the same TX or a delay could be acceptable?
In general, an aggregate should keep its own boundaries (to avoid data-load issues and transaction-scoping issues, check this page for example), and therefore only reference another aggregate by identity, e.g. assignDriver(id guid).
That means you would have to query the driver prior to invoking assignDriver, in order to perform validation check:
class MyAppService {
public void execute() {
// Get driver...
if (driver.isInVacation()){
throw new Exception();
}
// Get vehicle...
vehicle.assignDriver(driver.id);
}
}
Suppose you're in a micro-services architecture,
you have a 'Driver Management' service, and an 'Assignation Service' and you're not sharing code between both apart from technical libraries.
You'll naturally have 2 classes for 'Driver',
An aggregate in 'Driver Management' which will hold the operations to manage the state of a driver.
And a value object in the 'Assignation Service' which will only contain the relevant information for assignation.
This separation is harder to see/achieve when you're in a monolithic codebase
I also agree with #plalx, there's more to it for the enforcement of the rule, not only a check on creation, for which you could implement on of the solutions he suggested.
I encourage you to think in events, what happens when:
a driver has scheduled vacation
when he's back from vacation
if he changes he vacation dates
Did you explore creating an Aggregate for Assignation?
I would want expose a little scenario which is still at paper state, and which, regarding DDD principle seem a bit tedious to accomplish.
Let's say, I've an application for hosting accounts management. Basically, the application compose several bounded contexts such as Web accounts management, Ftp accounts management, Mail accounts management... each of them represented by their own AR (they can live standalone).
Now, let's imagine I want to provide a UI with an HTML form that compose one fieldset for each bounded context, for instance to update limits and or features. How should I process exactly to update all AR without breaking single transaction per request principle? Can I create a kind of "outer" AR, let's say a ClientHostingProperties AR which would holds references to other AR and update them as part of single transaction, using own repository? Or should I better create an AR that emit messages to let's listeners provided by the bounded contexts react on, in which case, I should probably think about ES?
Thanks.
How should I process exactly to update all AR without breaking single transaction per request principle?
You are probably looking for a process manager.
Basic sketch: persisting the details from the submitted form is a transaction unto itself (you are offered an opportunity to accrue business value; step 1 is to capture that opportunity).
That gives you a way to keep track of whether or not this task is "done": you compare the changes in the task to the state of the system, and fire off commands (to run in isolated transactions) to make changes.
Processes, in my mind, end up looking a lot like state machines. These tasks are commands are done, these commands are not done, these commands have failed: now what? and eventually reach a state where there are no additional changes to be made, and this instance of the process is "done".
Short answer: You don't.
An aggregate is a transactional boundary, which means that if you would update multiple aggregates in one "action", you'd have to use multiple transactions. The reason for an aggregate to be equivalent to one transaction is that this allows you to guarantee consistency.
This means that you have two options:
You can make your aggregate larger. Then you can actually guarantee consistency, but your ability to handle concurrent requests gets worse. So this is usually what you want to avoid.
You can live with the fact that it's two transactions, which means you are eventually consistent. If so, you usually use something such as a process manager or a flow to handle updating multiple aggregates. In its simplest form, a flow is nothing but a simple if this event happens, run that command rule. In its more complex form, it has its own state.
Hope this helps 😊
Greetings SO denizens!
I'm trying to architect an overhaul of an existing NodeJS application that has outgrown its original design. The solutions I'm working towards are well beyond my experience.
The system has ~50 unique async tasks defined as various finite state machines which it knows how to perform. Each task has a required set of parameters to begin execution which may be supplied by interactive prompts, a database or from the results of a previously completed async task.
I have a UI where the user may define a directed graph ("the flow"), specifying which tasks they want to run and the order they want to execute them in with additional properties associated with both the vertices and edges such as extra conditionals to evaluate before calling a child task(s). This information is stored in a third normal form PostgreSQL database as a "parent + child + property value" configuration which seems to work fairly well.
Because of the sheer number of permutations, conditionals and absurd number of possible points of failure I'm leaning towards expressing "the flow" as a state machine. I merely have just enough knowledge of graph theory and state machines to implement them but practically zero background.
I think what I'm trying to accomplish is at the flow run time after user input for the root services have been received, is somehow compile the database representation of the graph + properties into a state machine of some variety.
To further complicate the matter in the near future I would like to be able to "pause" a flow, save its state to memory, load it on another worker some time in the future and resume execution.
I think I'm close to a viable solution but if one of you kind souls would take mercy on a blind fool and point me in the right direction I'd be forever in your debt.
I solved similar problem few years ago as my bachelor and diploma thesis. I designed a Cascade, an executable structure which forms growing acyclic oriented graph. You can read about it in my paper "Self-generating Programs – Cascade of the Blocks".
The basic idea is, that each block has inputs and outputs. Initially some blocks are inserted into the cascade and inputs are connected to outputs of other blocks to form an acyclic graph. When a block is executed, it reads its inputs (cascade will pass values from connected outputs) and then the block sets its outputs. It can also insert additional blocks into the cascade and connect its inputs to outputs of already present blocks. This should be equal to your task starting another task and passing some parameters to it. Alternative to setting output to an value is forwarding a value from another output (in your case waiting for a result of some other task, so it is possible to launch helper sub-tasks).
I am using shared variables on perl with use threads::shared.
That variables can we modified only from single thread, all other threads are only 'reading' that variables.
Is it required in the 'reading' threads to lock
{
lock $shared_var;
if ($shared_var > 0) .... ;
}
?
isn't it safe to simple verification without locking (in the 'reading' thread!), like
if ($shared_var > 0) ....
?
Locking is not required to maintain internal integrity when setting or fetching a scalar.
Whether it's needed or not in your particular case depends on the needs of the reader, the other readers and the writers. It rarely makes sense not to lock, but you haven't provided enough details for us to determine what your needs are.
For example, it might not be acceptable to use an old value after the writer has updated the shared variable. For starters, this can lead to a situation where one thread is still using the old value while the another thread is using the new value, a situation that can be undesirable if those two threads interact.
It depends on whether it's meaningful to test the condition just at some point in time or other. The problem however is that in a vast majority of cases, that Boolean test means other things, which might have already changed by the time you're done reading the condition that says it represents a previous state.
Think about it. If it's an insignificant test, then it means little--and you have to question why you are making it. If it's a significant test, then it is telltale of a coherent state that may or may not exist anymore--you won't know for sure, unless you lock it.
A lot of times, say in real-time reporting, you don't really care which snapshot the database hands you, you just want a relatively current one. But, as part of its transaction logic, it keeps a complete picture of how things are prior to a commit. I don't think you're likely to find this in code, where the current state is the current state--and even a state of being in a provisional state is a definite state.
I guess one of the times this can be different is a cyclical access of a queue. If one consumer doesn't get the head record this time around, then one of them will the next time around. You can probably save some processing time, asynchronously accessing the queue counter. But here's a case where it means little in context of just one iteration.
In the case above, you would just want to put some locked-level instructions afterward that expected that the queue might actually be empty even if your test suggested it had data. So, if it is just a preliminary test, you would have to have logic that treated the test as unreliable as it actually is.