Transactions / request-response-pattern in flow based/reactive programming - multithreading

So I have been reading about flow based programming (FBP) in the last few days and I have also been reading J. Paul Morrison's book about it. However I feel I still can't really wrap my head around it. The general concept is that you see programming as some sort of assembly line where you have components that take some packet as input and produce some packets as output. You can connect these components and packets travel through the network. While I totally see how this can work for ETL type applications or batch processing, I have no good idea how you could handle things like synchronous request/response patterns or database transactions with it.
For example let's say I have a web server implemented as FPB. This webserver has a GET /user/{id} which should return a JSON with some information about a user. It also has a POST /user/{id} where you can update the user by sending some JSON back to the server. So here is how I would imagine this flow to be looking:
I tried to have many re-usable components instead of putting the whole logic of handling a request into a single component. So there is a HTTP server component which sends out requests to a dispatcher component which then dispatches the requests into subsequent flows. In each flow the request is parsed by a generic "Request parser" component which outputs various parts of the request into the rest of the flow.
The upper part is quite straightforward, I read the entity of the user with the given ID from DB, serialize the object to JSON and then send it back. However at this point we don't really have a reference to the HTTP request anymore, so how would I know where to send this request to?
On the lower part we have some additional complexity because I would like to write to the database in a transactional way. So first a transaction is started (in parallel the request body is parsed into some object), then the user object is retrieved from the database and merged with the inputs from the request. At the end it is written back to the database and the transaction is committed. Finally some "OK" status is responded to the caller. Here I have the additional problem that when committing the transaction I really don't know which transaction to commit. And of course when sending the response I don't know which request to send it to.
So both problems seem to have something in common - a kind of "Context" that spans over many components. On one example it is a HTTP request/response context in the other a transactional context. In regular programming, these contexts are usually handled at the thread level. Since a request runs in a single thread, the transaction and request contexts are bound to a thread-local so they can be accessed everywhere as long as everything is running in the same thread.
In flow based programming, every component runs independently and ideally on separate threads. This is actually a key thing because it allows for parallelization and effective use of multiple processors. However when that thread-local context is no longer there, how can you handle these problems in flow based programming? This would get even more complicated with proper error handling (which I left out in my example).
I figure that when you do reactive style programming where most of the processing is asynchronous and multithreaded as well you will have the same issues, so I wonder if there are patterns to handle this. Do you have real life experience with either reactive style programming or flow based programming and have some hints on how I could solve this problem?

I wrote a quick answer on Twitter - thought I would post it here as well... Apologies for double-posting!
I like substreams for this/these problem(s), where the first Information Packet in the substream provides the "context" you were talking about. This may help: https://github.com/jpaulm/javafbp-websockets... HTH!
PS This loop-style network topology is also the basis of Facebook's new "Flux" technology - see Jing Chen's presentation, in which she compares this approach with MVC: https://www.youtube.com/watch?v=nYkdrAPrdcw

Hopefully this may nudge you in the right direction. I had a similar issue where I needed to perform a synchronous operation in an asynchronous microservice architecture.
How I solved it was using the Observer pattern. I have 3 components; a http server, a callback server and a timer wheel. The http server similar to yours receives the incoming request, the callback server receives the overal result after asynchronous processing and the timer wheel that queues the original http context and reconciles the response to the http request.
When an incoming request is received, the http server creates a correlation id ,appends it to the request metadata, appends the callback server url to the request metadata and finally adds the request and the original http context together into the timer wheel. Then the http server would pass the request to the dispatcher like in your case and send messages to the relevant components for asynchronous processing.
Depending on the outcome of execution of the current processing component, it will retrieve the callback url from the metadata and send the response to the callback server.In your case there's the json serialization or the database write that would do this. The callback server will then extract the correlation Id that was appended and get the corresponding http context and write the response.
NB each timer object in the timer wheel has a timeout that's configurable, that way if the asynchronous processing delays it will timeout and return a configurable message to the http client of the corresponding http context.

Related

NodeJS Polling per User Structure best practice

My project is a full stack application where a web client subscribes to an unready object. When the subscription is triggered, the backend will run an observation loop to that unready object until it becomes ready. When that happens it sends a message to the frontend through socketIO (suggestions are welcome, I'm not quite sure if it's the best method). My question is how do I construct the observation loop.
My frontend basically subscribes to the backend, and gets a return 200 and will connect to the server per Websocket (socketIO) if it got subscribed correctly, or an error 4XX code if there was something that went wrong. On the backend, when the user subscribes, it should start for that user, a "thread" (I know Nodejs doesn't support threads, it's just for the mental image) that polls an information from an api every 10 or so seconds.
I do that, because the API that I poll from does not support WebHooks, so I need to observe the API response until it's at the state that I want it (this part I already got cleared).
What I'm asking, is there a third party library that actually is meant for those kinds of tasks? Should I use worker threads or simple setTimeouts abstracted by Classes? The response will be sent over SocketIO, that part I already got working as well, it's just the method I'm using im not quite sure how to build.
I'm also open to use another fitting programming language that makes solving this case easier. I'm not in a hurry.
A polling network request (which it sounds like this is) is non-blocking and asynchronous so it doesn't really take much of your nodejs CPU unless you're doing some heavy-weight computation of the result.
So, a single nodejs thread can make a lot of network requests (for your polling and for sending data over socket.io connection) without adding WorkerThreads or clustering. This is something that nodejs is very, very good at.
I'm not aware of any third party library specifically for this as you have to custom code looking at the results of the network request anyway and that's most of the coding. There are a bunch of libraries for making http requests of other servers from nodejs listed here. My favorite in that list is got(), but you can look at the choices and decide what you like.
As for making the repeated requests, I would probably just use either repeated setTimeout() calls or a setInterval() call.
You don't say whether you have to make separate requests for every single client that is subscribed to something or whether you can somehow combine all clients watching the same resource so that you use the same polling interval for all of them. If you can do the latter, that would certainly be more efficient.
If, as you scale, you run into scaling issues, you can then move the polling code to one or more child processes or WorkerThreads and then just communicate back to the main thread via messaging when you have found a new state that needs to be sent to the client. But, I would not anticipate you would need to code that extra step until you reach larger scale. As with most scaling things, you would need to code up the more basic option (which should scale well by itself) and then measure and benchmark and see where any bottlenecks are and modify the architecture based on data, not speculation. Far too often, the architecture is over-designed and over-implemented based on where people think the bottlenecks might be rather than where they actually turn out to be. Not only does this make the development take longer and end up with more complicated implementation than required, but it can target development at the wrong part of the problem. Profile, measure, then decide.

Node.js REST API wrapper for async messaging

Given an event driven micro service architecture with asynchronous messaging, what solutions are there to implementing a 'synchronous' REST API wrapper such that requests to the REST interface wait for a response event to be published before sending a response to the client?
Example: POST /api/articles
Internally this would send a CreateArticleEvent in the services layer, eventually expecting an ArticleCreatedEvent in response containing the ID of the persisted article.
Only then would the REST interface response to the end client with this ID.
Dealing with multiple simultaneous requests - is keeping an in-memory map of inflight requests in the REST api layer keyed by some correlating identifier conceptually a workable approach?
How can we deal with timing out requests after a certain period?
Generally you don't need to maintain a map of in-flight requests, because this is basically done for you by node.js's http library.
Just use express as it's intended, and this is probably something you never really have to worry about, as long as you avoid any global state.
If you have a weirder pattern in mind to build, and not sure how to solve it. It might help to share a simple example. Chances are that it's not hard to rebuild and avoid global state.
With express, have you tried middleware? You can chain a series of callback functions with a certain timeout after the article is created.
I assume you are in the context of Event Sourcing and microservices? If so I recommend that you don't publish a CreateArticleEvent to the event store, and instead directly create the article in the database and then publish the ArticleCreatedEvent to the Event store.
Why you ask? Generally this pattern is created to orchestrate different microservices. In the example show in the link above, it was used to orchestrate how the Customer service should react when an Order is created. Note the past tense. The Order Service created the order, and Customer Service reacts to it.
In your case it is easier (and probably better) to just insert the order into the database (by calling the ArticleService directly) and responding with the article ID. Then just publish the ArctileCreatedEvent to your event store, to trigger other microservices that may want to listen to it (like, for example, trigger a notification to the editor for review).
Event Sourcing is a good pattern, but we don't need to apply it to everything.

Carrying request context through the stack and across thrift service boundaries with Node.js

I'm trying to figure out an appropriate method to carry a request-id (x-request-id from a restify request header) through my stack; across thrift inter-service calls, and with rabbitmq queue messages. The goal is that anywhere, in any service, I can correlate an error or event back to an initiating http request. Is there a known practice for doing this with Node? I'd like to avoid passing a context around through virtually every function call.
I've looked into the way New Relic handles instrumentation, and there's this blog: https://opbeat.com/blog/posts/how-we-instrument-nodejs/; but these types of instrumentation require hooking into tons of node core library calls, and don't really help with carrying the context across thrift calls.
How can I take a restify header id such as "x-request-id" from a request, and have access to it deeper in my stack (even in async callbacks) without modifying every function to pass the values through?
I'm also looking for a clean way to pass it through all thrift calls (getting it across service boundaries).
This is with TypeScript and Node.js 5.x
Thanks!
Is there a known practice for doing this with Node
Within NodeJs you move request around whereever you need request context stuff.
With every other system you need to carry the stuff around in the request system format for that thing. E.g. for Event store we store it in the event metadata.
For thrift I recommend just adding it as a property in every query that is echoed back in every response.

Camel inout with very long response times

We have the following scenario that we would like to solve using Apache Camel:
An asynchronous request arrives to an AMQP endpoint configured in Camel. This message contains a header property for a reply-to that should be used for the response. Camel must pass this message to another service using JMS and then route the response back to the reply-to queue from the AMQP request. This seems like a textbook example for using the InOut functionality in Camel but we have one problem: The reply from JMS service could take a long time, in some cases several days.
As I understand it, if we are using InOut it would mean that we would lock a thread to the the long running service. If we are unlucky, we could get several long running calls simultaneously and in the worst case scenario it could be that all threads are busy waiting for replies thus clogging the system.
What strategy should I use for solving the problem described above? At the moment, I have created to separate routes: One that listens to the AMQP endpoint and forwards the message to the JMS endpoint. The other route listens to the replyto-queue for the jms system and would be responsible for sending the reply back to the AMQP reply-to. The problem I have right now is how I should store the AMQP reply-to between these two routes and I am not sure this is a good solution overall for this problem.
Any tips or ideas on how to solve this problem would be greatly appreciated.
If you have to wait for more than a minute for reply, it's probably a good thing to treat the reply as async. and create separate request and response routes.
Since you mention several days, you might even want to survive an application restart (or even backup-restore) to correlate the response. In such cases, you need to store correlation information in a persistent store such as a database or a JMS queue using message properties - and selectors to retrieve the correlation information back.
I've used both queues and databases for long time request/reply correlation information with success.
It's always a good practice to be able to fail over/restart the server or the application at any time knowing that any ongoing processing will take up where it left off without errors.
There is a cost in complexity and performance, but robustness is often perferred to performance.

Potential pitfalls in node/express to have callbacks complete AFTER a response is returned to the client?

I want to write a callback that takes a bit of time to complete an external IO operation, but I do not want it to interfere when sending data back to the client. I don't care about waiting for callback completion for purposes of the reply back to the client, but if the callback results in an error, I would like to log it. About 80% of executions will result in this callback executing after the response is sent back to the client and the connection is closed.
My approach works well and I have not seen any problems, but I would like to know whether there are any pitfalls in this approach that I may be unaware of. I would think that node's evented IO would handle this without issue, but I want to make sure before I commit this architecture to production. Any issues that should make me reconsider this approach?
As long as you're not trying to reference that response object after the response is sent, this will not cause any problems. There's nothing special about a request handler that cares one bit about callbacks in its code being invoked after the response is generated.

Resources