Concurrent access to msg/payload on split flows in node-red?

Concurrent access to msg/payload on split flows in node-red? - node.js

I'm working on a node-red flow and came upon some (hopefully not real) concurrency problem.
I have a node outputting a msg.payload, that on one connection is written to a database. The database insert node is dead end, so another connection from the first outputting node goes to a function node that overwrites the msg.payload again, needed for the HTTP reply.
I'm wondering about the ordering of execution in this case, or rather the protection against the database accessing the modified msg.payload when it runs after the function node.
Obviously this seems to work - but I would like to know if this is just chance, or is the msg object cloned before each function, or on multiple outputs?

There is no concurrency issue as Node-RED is totally single threaded (as are all NodeJS apps) so only one leg of a branching flow can actually execute at any given time.
Flow execution branch order is in the order the nodes were added to the flow, so for assuming nodes were added in order A, B, C, D, E
A ----> B ---> D ---> E
|
--> C
The message will be delivered from A to B to D to E then to C (assuming that none of B,D,E block for io)
Also messages are cloned when there are multiple nodes hooked up to a single output, you can easily test this with the following flow:
[{"id":"9fd37544.36664","type":"inject","z":"8b231c78.b8edc8","name":"","topic":"","payload":"foo","payloadType":"str","repeat":"","crontab":"","once":false,"x":269.5,"y":284.25,"wires":[["48eda9a0.b455e8","e1f3c665.9af04"]]},{"id":"48eda9a0.b455e8","type":"function","z":"8b231c78.b8edc8","name":"","func":"msg.payload = \"bar\";\nreturn msg;","outputs":1,"noerr":0,"x":454.5,"y":283.75,"wires":[["5f27ffc7.a54ce"]]},{"id":"5f27ffc7.a54ce","type":"debug","z":"8b231c78.b8edc8","name":"","active":true,"console":"false","complete":"false","x":635.5,"y":284.5,"wires":[]},{"id":"e1f3c665.9af04","type":"debug","z":"8b231c78.b8edc8","name":"","active":true,"console":"false","complete":"false","x":475.5,"y":362.5,"wires":[]}]
This has a single input that flows to 2 debug outputs, the first branch includes a function node which modifies the payload before it is output.

Related

How do I create a chain of lazy streams where 2 streams can fetch data from 1?

I'm creating a system where users can chain together nodes, and data flows from one node to the other. At the end is a node which constantly pulls from the node before it, and does something with that data (e.g. for audio data, plays it). If you're familiar with FL Studio's Patcher, that's essentially what I'm trying to do:
The trivial implementation, which is to just have an interface like this:
interface Node {
byte[] getData();
}
where each node stores the one before it would work fine, except that I want to be able to have 2 nodes both requesting data from a single node, like this:
The issue here is that the source node is pulled twice every "step" (the intended behavior would be for the same value to be sent to both nodes). If you had for example an audio signal in the source node, the intended behavior would be for 2 different effects to be done to it, then the signals combined, while what would actually happen is that the signal is chopped up and each effect gets a different part of the signal depending on which happens to be called first.
What is a good way to solve this?

Running multiple copies of Nodejs corrupting my object data

I have a service which holds a javascript object, lets say obj. Initially when the server starts, the object is empty.
I receive pub sub messages. Each message has two attributes, the type of the message and the data. Based on the type of message, I modify the object.
For example, if I receive a message msg-0 of type start and receive some data abc-0. I add the data to my main object and now my object obj becomes {abc-0}.
Similarly, if I receive another message msg-1 of type start and receive some data abc-1. I add the data to my main object and now my object obj becomes {abc-0, abc-1}.
So imagine I receive 10 such messages then my object obj should look like {abc-0, abc-1, abc-2, abc-3, abc-4, abc-5, abc-6, abc-7, abc-8, abc-9, abc-10},
If I were to run just a single copy of this program, then, everything works fine.
But when I run 3 copies of this program, what I end up with is different. It creates 3 different objects, and each object holds {abc-0, abc-1, abc-2, abc-3}, {abc-4, abc-5, abc-6} and {abc-7, abc-8, abc-9, abc-10}.
I'm running 3 pods on kubernetes and this problem became evident. And, when I run just one pod, the problem goes away.
What could I be doing wrong? Is it some sort of a common error?

The issue here is that when running separate NodeJS instances, they are separate. They don't share memory. Objects declared in one are completely separate from the others. In order to get this working correctly, the processes need to communicate with each other. I'm not entirely familiar with Kubernetes, but I'm assuming a pod is the equivalent of a process on a Linux system. Please correct me if I'm wrong, but if this is the case, then that will be the source of the issue.
I would approach this issue by requiring timestamps on each chunk of incoming data, then keep a complete copy of the object in each process. When a new chunk comes in, place it in the correct position based on its timestamp and alert all other processes of an update and optionally share the new chunk or the complete object.
If the different processes (pods) are running synchronously (one after each other) then you'll need to inform the next process of the object at that point in time, so it can continue building the object as time goes on.
TL:DR This issue comes from NodeJS processes not sharing memory. In order to fix this, if the processes run concurrently, keep a running log of the object in all instances and alert all others, should new information be received. If not, inform next process of object at the time of process instantiation.
Hope I understood the question correctly, let me know if not.

How to handle in Event Driven Microservices if the messaging queue is down?

Assume there are two services A and B, in a microservice environment.
In between A and B sits a messaging queue M that is a broker.
A<---->'M'<----->B
The problem is what if the broker M is down?
Possible Solution i can think of:
Ping from Service A at regular intervals to check on Messaging queue M as long as it is down. In the meantime, service A
stores the data in a local DB and dumps it into the queue once the broker M is up.
Considering the above problem, if someone can suggest whether threads or reactive programming is best suited for this scenario and ways it could be handled via code, I would be grateful.

The problem is what if the broker M is down?
If the broker is down, then A and B can't use it to communicate.
What A and B should do in that scenario is going to depend very much on the details of your particular application/use-case.
Is there useful work they can do in that scenario?
If not, then they might as well just stop trying to handle any work/transactions for the time being, and instead just sit and wait for M to come back up. Having them do periodic pings/queries of M (to see if it's back yet) while in this state is a good idea.
If they can do something useful in this scenario, then you can have them continue to work in some sort of "offline mode", caching their results locally in anticipation of M's re-appearance at some point in the future. Of course this can become problematic, especially if M doesn't come back up for a long time -- e.g.
what if the set of cached local results becomes unreasonably large, such that A/B runs out of space to store it?
Or what if A and B cache local results that will both apply to the same data structure(s) within M, such that when M comes back online, some of A's results will overwrite B's (or vice-versa, depending on the order in which they reconnect)? (This is analogous to the sort of thing that source-code-control servers have to deal with after several developers have been working offline, both making changes to the same lines in the same file, and then they both come back online and want to commit their changes to that file. It can get a bit complex and there's not always an obvious "correct" way to resolve conflicts)
Finally what if it was something A or B "said" that caused M to crash in the first place? In that case, re-uploading the same requests to M after it comes back online might only cause it to crash again, and so on in an infinite loop, making the service perpetually unusable. (In this case, of course, the proper fix would be to debug M)
Another approach might be to try to avoid the problem by having multiple redundant brokers (e.g. M1, M2, M3, ...) such that as long as at least one of them is still available, productive work can continue. Or perhaps allow A and B to communicate with each other directly rather than through an intermediary.
As for whether this sort of thing would best be handled by threads or reactive programming, that's a matter of personal preference -- personally I prefer reactive programming, because the multiple-threads style usually means blocking-RPC-operations, and a thread that is blocked inside a blocking-operation is a frozen/helpless thread until the remote party responds (e.g. if M takes 2 minutes to respond to an RPC request, then A's RPC call to M cannot return for 2 minutes, which means that the calling thread is unable to do anything at all for 2 minutes). In a reactive approach, A's thread could also be doing other things during this period (such as pinging M to make sure it's okay, or contacting a backup broker, or whatever) during that 2 minute period if it wanted to.

Design choice for a microservice event-driven architecture

Let's suppose we have the following:
DDD aggregates A and B, A can reference B.
A microservice managing A that exposes the following commands:
create A
delete A
link A to B
unlink A from B
A microservice managing B that exposes the following commands:
create B
delete B
A successful creation, deletion, link or unlink always results in the emission of a corresponding event by the microservice that performed the action.
What is the best way to design an event-driven architecture for these two microservices so that:
A and B will always eventually be consistent with each other. By consistency, I mean A should not reference B if B doesn't exist.
The events from both microservices can easily be projected in a separate read model on which queries spanning both A and B can be made
Specifically, the following examples could lead to transient inconsistent states, but consistency must in all cases eventually be restored:
Example 1
Initial consistent state: A exists, B doesn't, A is not linked to B
Command: link A to B
Example 2
Initial consistent state: A exists, B exists, A is linked to B
Command: delete B
Example 3
Initial consistent state: A exists, B exists, A is not linked to B
Two simultaneous commands: link A to B and delete B
I have two solutions in mind.
Solution 1
Microservice A only allows linking A to B if it has previously received a "B created" event and no "B deleted" event.
Microservice B only allows deleting B if it has not previously received a "A linked to B" event, or if that event was followed by a "A unlinked from B" event.
Microservice A listens to "B deleted" events and, upon receiving such an event, unlinks A from B (for the race condition in which B is deleted before it has received the A linked to B event).
Solution 2:
Microservice A always allows linking A to B.
Microservice B listens for "A linked to B" events and, upon receiving such an event, verifies that B exists. If it doesn't, it emits a "link to B refused" event.
Microservice A listens for "B deleted" and "link to B refused" events and, upon receiving such an event, unlinks A from B.
EDIT: Solution 3, proposed by Guillaume:
Microservice A only allows linking A to B if it has not previously received a "B deleted" event.
Microservice B always allows deleting B.
Microservice A listens to "B deleted" events and, upon receiving such an event, unlinks A from B.
The advantage I see for solution 2 is that the microservices don't need to keep track of of past events emitted by the other service. In solution 1, basically each microservice has to maintain a read model of the other one.
A potential disadvantage for solution 2 could maybe be the added complexity of projecting these events in the read model, especially if more microservices and aggregates following the same pattern are added to the system.
Are there other (dis)advantages to one or the other solution, or even an anti-pattern I'm not aware of that should be avoided at all costs?
Is there a better solution than the two I propose?
Any advice would be appreciated.

Microservice A only allows linking A to B if it has previously received a "B created" event and no "B deleted" event.
There's a potential problem here; consider a race between two messages, link A to B and B Created. If the B Created message happens to arrive first, then everything links up as expected. If B Created happens to arrive second, then the link doesn't happen. In short, you have a business behavior that depends on your message plumbing.
Udi Dahan, 2010
A microsecond difference in timing shouldn’t make a difference to core business behaviors.
A potential disadvantage for solution 2 could maybe be the added complexity of projecting these events in the read model, especially if more microservices and aggregates following the same pattern are added to the system.
I don't like that complexity at all; it sounds like a lot of work for not very much business value.
Exception Reports might be a viable alternative. Greg Young talked about this in 2016. In short; having a monitor that detects inconsistent states, and the remediation of those states, may be enough.
Adding automated remediation comes later. Rinat Abdullin described this progression really well.
The automated version ends up looking something like solution 2; but with separation of the responsibilities -- the remediation logic lives outside of microservice A and B.

Your solutions seem OK but there are some things that need to be clarified:
In DDD, aggregates are consistencies boundaries. An Aggregate is always in a consistent state, no matter what command it receives and if that command succeeds or not. But this does not mean that the whole system is in a permitted permanent state from the business point of view. There are moments when the system as whole is in a not-permitted state. This is OK as long as eventually it will transition in a permitted state. Here comes the Saga/Process managers. This is exactly their role: to bring the system in a valid state. They could be deployed as separate microservices.
One other type of component/pattern that I used in my CQRS projects are Eventually-consistent command validators. They validate a command (and reject it if it is not valid) before it reaches the Aggregate using a private read-model. These components minimize the situations when the system enters an invalid state and they complement the Sagas. They should be deployed inside the microservice that contains the Aggregate, as a layer on top of the domain layer (aggregate).
Now, back to Earth. Your solutions are a combination of Aggregates, Sagas and Eventually-consistent command validations.
Solution 1
Microservice A only allows linking A to B if it has previously received a "B created" event and no "B deleted" event.
Microservice A listens to "B deleted" events and, upon receiving such an event, unlinks A from B.
In this architecture, Microservice A contains Aggregate A and a Command validator and Microservice B contains Aggregate B and a Saga. Here is important to understand that the validator would not prevent the system's invalid state but only would reduce the probability.
Solution 2:
Microservice A always allows linking A to B.
Microservice B listens for "A linked to B" events and, upon receiving such an event, verifies that B exists. If it doesn't, it
emits a "link to B refused" event.
Microservice A listens for "B deleted" and "link to B refused" events and, upon receiving such an event, unlinks A from B.
In this architecture, Microservice A contains Aggregate A and a Saga and Microservice B contains Aggregate B and also a Saga. This solution could be simplified if the Saga on B would verify the existence of B and send Unlink B from A command to A instead of yielding an event.
In any case, in order to apply the SRP, you could extract the Sagas to their own microservices. In this case you would have a microservice per Aggregate and per Saga.

I will start with the same premise as #ConstantinGalbenu but follow with a different proposition ;)
Eventual consistency means that the whole system will eventually
converge to a consistent state.
If you add to that "no matter the order in which messages are received", you've got a very strong statement by which your system will naturally tend to an ultimate coherent state without the help of an external process manager/saga.
If you make a maximum number of operations commutative from the receiver's perspective, e.g. it doesn't matter if link A to B arrives before or after create A (they both lead to the same resulting state), you're pretty much there. That's basically the first bullet point of Solution 2 generalized to a maximum of events, but not the second bullet point.
Microservice B listens for "A linked to B" events and, upon receiving
such an event, verifies that B exists. If it doesn't, it emits a "link
to B refused" event.
You don't need to do this in a nominal case. You'd do it in the case where you know that A didn't receive a B deleted message. But then it shouldn't be part of your normal business process, that's delivery failure management at the messaging platform level. I wouldn't put this kind of systematic double-check of everything by the microservice where the original data came from, because things get way too complex. It looks as if you're trying to put some immediate consistency back into an eventually consistent setup.
That solution might not always be feasible, but at least from the point of view of a passive read model that doesn't emit events in response to other events, I can't think of a case where you couldn't manage to handle all events in a commutative way.

What's the difference between data flow graph and dependence graph in TBB

I have read about data flow graph and dependence graph from the
Intel TBB Tutorial, and feel a bit confusing about these two concepts.
Can I say that the key difference between data flow graph and dependence graph is whether there are explicitly shared resources or not?
But it seems that we can implement a dependence graph using function_node with pseudo messages, or implement a data flow graph using continue_node with shared global variables.

The difference between a function_node accepting a continue_msg input and a continue_node is the behavior when receiving a message. This is a consequence of the concept of "dependence graph."
The idea of dependence graphs is that the only information being passed by the graph is the completion of a task. If you have four tasks (A,B,C,D) all operating on the same shared data, and tasks A and B must be complete before either C or D can be started, you define four continue_nodes, and attach the output of node A to C and D, and the same for B. You may also create a broadcast_node<continue_msg> and attach A and B as successors to it. (The data being used in the computation must be accessible by some other means.)
To start the graph you do a try_put of a continue_msg to the broadcast_node. The broadcast_node sends a continue_msg to each successor (A & B).
continue_nodes A and B each have 1 predecessor (the broadcast_node.) On receiving a number of continue_msgs equal to their predecessor count (1), they are queued to execute, using and updating the data representing the state of the computation.
When continue_node A completes, it sends a continue_msg to each successor, C & D. Those nodes each have two predecessors, so they do not execute on receiving this message. They only remember they have received one message.
When continue_node B completes, it also sends a continue_msg to C and D. This will be the second continue_msg each node receives, so tasks will be queued to execute their function_bodies.
continue_nodes use the graph only to express this order. No data is transferred from node to node (beyond the signal that a predecessor is complete.)
If the nodes in the graph were function_nodes accepting continue_msgs rather than continue_nodes, the reaction to the broadcast_node getting a continue_msg would be
The broadcast_node would forward a continue_msg to A and B, and they would each execute their function_bodies.
Node A would complete, and pass continue_msgs to C and D.
On receiving the continue_msg, tasks would be queued to execute the function_bodies of C and D.
Node B would complete execution, and forward a continue_msg to C and D.
C and D, on receiving this second continue_msg, would queue a task to execute their function_bodies a second time.
Notice 3. above. The function_node reacts each time it receives a continue_msg. The continue_node knows how many predecessors it has, and only reacts when it receives a number of continue_msgs equal to its number of predecessors.
The dependence graph is convenient if there is a lot of state being used in the computation, and if the sequence of tasks is well understood. The idea of "shared state" does not necessarily dictate the use of a dependence graph, but a dependence graph cannot pass anything but completion state of the work involved, and so must use shared state to communicate other data.
(Note that the completion order I am describing above is only one possible ordering. Node B could complete before node A, but the sequence of actions would be similar.)

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string