Camel inout with very long response times - multithreading

We have the following scenario that we would like to solve using Apache Camel:
An asynchronous request arrives to an AMQP endpoint configured in Camel. This message contains a header property for a reply-to that should be used for the response. Camel must pass this message to another service using JMS and then route the response back to the reply-to queue from the AMQP request. This seems like a textbook example for using the InOut functionality in Camel but we have one problem: The reply from JMS service could take a long time, in some cases several days.
As I understand it, if we are using InOut it would mean that we would lock a thread to the the long running service. If we are unlucky, we could get several long running calls simultaneously and in the worst case scenario it could be that all threads are busy waiting for replies thus clogging the system.
What strategy should I use for solving the problem described above? At the moment, I have created to separate routes: One that listens to the AMQP endpoint and forwards the message to the JMS endpoint. The other route listens to the replyto-queue for the jms system and would be responsible for sending the reply back to the AMQP reply-to. The problem I have right now is how I should store the AMQP reply-to between these two routes and I am not sure this is a good solution overall for this problem.
Any tips or ideas on how to solve this problem would be greatly appreciated.

If you have to wait for more than a minute for reply, it's probably a good thing to treat the reply as async. and create separate request and response routes.
Since you mention several days, you might even want to survive an application restart (or even backup-restore) to correlate the response. In such cases, you need to store correlation information in a persistent store such as a database or a JMS queue using message properties - and selectors to retrieve the correlation information back.
I've used both queues and databases for long time request/reply correlation information with success.
It's always a good practice to be able to fail over/restart the server or the application at any time knowing that any ongoing processing will take up where it left off without errors.
There is a cost in complexity and performance, but robustness is often perferred to performance.

Related

NodeJS Polling per User Structure best practice

My project is a full stack application where a web client subscribes to an unready object. When the subscription is triggered, the backend will run an observation loop to that unready object until it becomes ready. When that happens it sends a message to the frontend through socketIO (suggestions are welcome, I'm not quite sure if it's the best method). My question is how do I construct the observation loop.
My frontend basically subscribes to the backend, and gets a return 200 and will connect to the server per Websocket (socketIO) if it got subscribed correctly, or an error 4XX code if there was something that went wrong. On the backend, when the user subscribes, it should start for that user, a "thread" (I know Nodejs doesn't support threads, it's just for the mental image) that polls an information from an api every 10 or so seconds.
I do that, because the API that I poll from does not support WebHooks, so I need to observe the API response until it's at the state that I want it (this part I already got cleared).
What I'm asking, is there a third party library that actually is meant for those kinds of tasks? Should I use worker threads or simple setTimeouts abstracted by Classes? The response will be sent over SocketIO, that part I already got working as well, it's just the method I'm using im not quite sure how to build.
I'm also open to use another fitting programming language that makes solving this case easier. I'm not in a hurry.
A polling network request (which it sounds like this is) is non-blocking and asynchronous so it doesn't really take much of your nodejs CPU unless you're doing some heavy-weight computation of the result.
So, a single nodejs thread can make a lot of network requests (for your polling and for sending data over socket.io connection) without adding WorkerThreads or clustering. This is something that nodejs is very, very good at.
I'm not aware of any third party library specifically for this as you have to custom code looking at the results of the network request anyway and that's most of the coding. There are a bunch of libraries for making http requests of other servers from nodejs listed here. My favorite in that list is got(), but you can look at the choices and decide what you like.
As for making the repeated requests, I would probably just use either repeated setTimeout() calls or a setInterval() call.
You don't say whether you have to make separate requests for every single client that is subscribed to something or whether you can somehow combine all clients watching the same resource so that you use the same polling interval for all of them. If you can do the latter, that would certainly be more efficient.
If, as you scale, you run into scaling issues, you can then move the polling code to one or more child processes or WorkerThreads and then just communicate back to the main thread via messaging when you have found a new state that needs to be sent to the client. But, I would not anticipate you would need to code that extra step until you reach larger scale. As with most scaling things, you would need to code up the more basic option (which should scale well by itself) and then measure and benchmark and see where any bottlenecks are and modify the architecture based on data, not speculation. Far too often, the architecture is over-designed and over-implemented based on where people think the bottlenecks might be rather than where they actually turn out to be. Not only does this make the development take longer and end up with more complicated implementation than required, but it can target development at the wrong part of the problem. Profile, measure, then decide.

Google Pub-Sub two way communication architecture

I'm trying to understand how to do two-way communication with google pub-sub with the following architecture
EDIT: I meant to say subscribers instead of consumers
I'm trying to support the following workflow:
UI sends a request to an api service to process an async process
API Service publishes request to a topic to begin the process kick-off
The consumer picks up the message and processes the async process service.
once the async process service is done it publishes to a process complete topic.
Here is where I want the UI to pick up the process complete message and I'm trying to figure out the best approach.
So two questions:
Is the multiple topic the preferred approach when wanting to do two-way communication back to the client? Or is there a way to do this with a single topic with multiple subscriptions?
How should the consumer of the Process-Complete get the response back to the UI? Should the UI be the consumer of the subscription? Or should I send it back to the api service and publish a websocket message? Both these approaches seem to have tradeoffs.
Multiple topics are going to be preferred in this situation, one for messages going to the asynchronous processors and then one for the responses that go back. Otherwise, your asynchronous processors are going to needlessly receive the response messages and have to ack them immediately, which is unnecessary extra delivery of messages.
With regard to getting the response back to the UI, the UI should not be the consumer of the subscription. In order to do that, you'd need every running instance of the UI to have its own subscription because otherwise, they would load balance messages across them and you couldn't guarantee that the particular client that sent the request would actually receive the response. The same would be true if you have multiple API servers that need to receive particular responses based on the requests that transmitted through them. Cloud Pub/Sub isn't really designed for topics and subscriptions to be ephemeral in this way; it is best when these are created once and all of the data is transmitted across them.
Additionally, having the UI act as a subscriber means that you'd have to have the credentials in the UI to subscribe, which could be a security issue.
You might also consider not using a topic for the asynchronous response. Instead, you could encode as part of the message the address or socket of the client or API server that expects the response. Then, the asynchronous processor could receive a message, process it, send a response to the address specified in the message, and then ack the message it received. This would ensure responses are routed to where they need to go and minimize the delivery of messages that subscribers just ack that they don't need to process, e.g., messages that were intended for a different API server.

Transactions / request-response-pattern in flow based/reactive programming

So I have been reading about flow based programming (FBP) in the last few days and I have also been reading J. Paul Morrison's book about it. However I feel I still can't really wrap my head around it. The general concept is that you see programming as some sort of assembly line where you have components that take some packet as input and produce some packets as output. You can connect these components and packets travel through the network. While I totally see how this can work for ETL type applications or batch processing, I have no good idea how you could handle things like synchronous request/response patterns or database transactions with it.
For example let's say I have a web server implemented as FPB. This webserver has a GET /user/{id} which should return a JSON with some information about a user. It also has a POST /user/{id} where you can update the user by sending some JSON back to the server. So here is how I would imagine this flow to be looking:
I tried to have many re-usable components instead of putting the whole logic of handling a request into a single component. So there is a HTTP server component which sends out requests to a dispatcher component which then dispatches the requests into subsequent flows. In each flow the request is parsed by a generic "Request parser" component which outputs various parts of the request into the rest of the flow.
The upper part is quite straightforward, I read the entity of the user with the given ID from DB, serialize the object to JSON and then send it back. However at this point we don't really have a reference to the HTTP request anymore, so how would I know where to send this request to?
On the lower part we have some additional complexity because I would like to write to the database in a transactional way. So first a transaction is started (in parallel the request body is parsed into some object), then the user object is retrieved from the database and merged with the inputs from the request. At the end it is written back to the database and the transaction is committed. Finally some "OK" status is responded to the caller. Here I have the additional problem that when committing the transaction I really don't know which transaction to commit. And of course when sending the response I don't know which request to send it to.
So both problems seem to have something in common - a kind of "Context" that spans over many components. On one example it is a HTTP request/response context in the other a transactional context. In regular programming, these contexts are usually handled at the thread level. Since a request runs in a single thread, the transaction and request contexts are bound to a thread-local so they can be accessed everywhere as long as everything is running in the same thread.
In flow based programming, every component runs independently and ideally on separate threads. This is actually a key thing because it allows for parallelization and effective use of multiple processors. However when that thread-local context is no longer there, how can you handle these problems in flow based programming? This would get even more complicated with proper error handling (which I left out in my example).
I figure that when you do reactive style programming where most of the processing is asynchronous and multithreaded as well you will have the same issues, so I wonder if there are patterns to handle this. Do you have real life experience with either reactive style programming or flow based programming and have some hints on how I could solve this problem?
I wrote a quick answer on Twitter - thought I would post it here as well... Apologies for double-posting!
I like substreams for this/these problem(s), where the first Information Packet in the substream provides the "context" you were talking about. This may help: https://github.com/jpaulm/javafbp-websockets... HTH!
PS This loop-style network topology is also the basis of Facebook's new "Flux" technology - see Jing Chen's presentation, in which she compares this approach with MVC: https://www.youtube.com/watch?v=nYkdrAPrdcw
Hopefully this may nudge you in the right direction. I had a similar issue where I needed to perform a synchronous operation in an asynchronous microservice architecture.
How I solved it was using the Observer pattern. I have 3 components; a http server, a callback server and a timer wheel. The http server similar to yours receives the incoming request, the callback server receives the overal result after asynchronous processing and the timer wheel that queues the original http context and reconciles the response to the http request.
When an incoming request is received, the http server creates a correlation id ,appends it to the request metadata, appends the callback server url to the request metadata and finally adds the request and the original http context together into the timer wheel. Then the http server would pass the request to the dispatcher like in your case and send messages to the relevant components for asynchronous processing.
Depending on the outcome of execution of the current processing component, it will retrieve the callback url from the metadata and send the response to the callback server.In your case there's the json serialization or the database write that would do this. The callback server will then extract the correlation Id that was appended and get the corresponding http context and write the response.
NB each timer object in the timer wheel has a timeout that's configurable, that way if the asynchronous processing delays it will timeout and return a configurable message to the http client of the corresponding http context.

Suggestion for message broker

I need some help when choosing for message broker(RaabitMQ, Redis, etc) or other right tools for this situation.
I am upgrading my game server. It is written by Node.js. it consist of several process, i.e. GameRoom, Lobby, Chat, etc. When a user make request, the message will be routed to relevant process to process it. I do this by routing by my code and each process communicate with each other by node-ipc. However, this is not too efficient and is not scalable. Also, some process has very high work load(Lobby as many requests are related to it), we create several process of Lobby and route message randomly to different process of Lobby. I think message broker can help in this case and also I can even scale up by putting different process in different physical servers. I would like to know which message broker is suitable for this? Can a sender send a message to a queue which multiple consumers compete for a message and only one consumer consume it and reply the message to the sender? Thanks.
I'm not going to be able to talk about Kafka from experience, but any message-queue solution, as will RabbitMQ and ActiveMQ will do what you need.
I assume you're planning a flow like so:
REST_API -> queue -> Workers ----> data persistance <--------+
| |
+------> NotificationManager ----> user
The NotificationManager could be a service that lets the user know via Websockets or any other async communication method.
Some solutions will be better put together and take more weight off your shoulders. Solutions that are not just message-queues but are also task-queues will have ways with getting responses from workers.
Machinery, a project that's been getting my attention lately does all of those , whilst using MongoDB and RabbitMQ itself.

Broadcasting Messages at High Frequency. Using HTTP POST or something else?

We're looking at speccing out a system which broadcasts small amounts of frequently changing data (using JSON or XML or something) to multiple recipients at a reasonably high frequency (our updates will be 1000s per second).
We were initially thinking of using HTTP POST to broadcast the data to each endpoint, maybe once every few seconds (the clients will vary as they're other people's webapps), but we're now wondering if there's a better way to hold up to the load/frequency we're hoping. I imagine we'd need to version/timestamp the messages in some way at the very least.
We're using RabbitMQ for preparing all the things ready for sending and to choose what needs to go where (from a Django app, if that matters), but we can't get all of the endpoints to use a MQ.
The HTTP POST thing just doesn't seem quite right. What else should we be looking in to? Is this where things like node or socket.io or some of the new real time frameworks fit in? We're happy to find the right expertise to help with this, just need steering the correct direction.
Thanks!
You don't want to do thousands of POSTs per second to multiple clients. You're going to introduce the HTTP overhead on your end pushing it out, and for all you know, you might end up flooding the server on the other end with POSTs that just swamp it.
Option 1: For clients that can't or won't read a queue, POSTS could work, but to avoid killing the server and all the HTTP overhead, could you bundle updates? Once every minute or two, take all the aggregate data and then post it to the client? This way, you don't have 60+ POST requests going to one client every minute or two for time and eternity. It'll help save on bandwidth as well, since you only send all the header info once with more data instead of sending all the header information and pieces of data.
Option 2: Have you thought about using a good 'ole socket connection? Either you open a socket to the client, or vice versa, and push the data over that? That avoids the overhead of HTTP and lets the client read at the rate data arrives. If the client no longer wants to receive data, they can just close the connection. It's on the arcane side, but it'd avoid completely killing the target server.
If you can get clients to read a MQ, set up a group just for them and make your life easier so you only have to deal with those that can't or won't read the queue instead of trying for a one size fits all solution.

Resources