RabbitMQ one-consumer/one-publisher pattern - node.js

I'm a bit confused/stuck. I've been reading around for RabbitMQ best practices, and a lot of articles come up stating that you should have two connections -- one separated for your subscriber and one for your publisher.
So what I've ended up doing is essentially started my server as so:
amqp.connect(RABBIT_URL, (err, conn) => {
conn.createChannel((ampqErr, subscribingChannel) => {
// .....
})
amqp.connect(RABBIT_URL, (err, conn) => {
conn.createChannel((ampqErr, publishingChannel) => {
// .......
});
...but I'm 99% certain this isn't the correct way to do this. So this is where my first question is. How do I maintain this rule within one service?
Also, it doesn't really work out since after a task that is done (i.e. finished scraping a page), at the end of that event, I need to fire an event that'll trigger a parsing in another service. I was pretty much doing a
ch.publish(...)
right before I'd do the ack. This isn't pure as the channel I had solely for consuming is now publishing events to trigger a parse from the other service.
This type of 'event/action' order carries on through 2 other services
(1. web/app --> 2. scrape --> 3. parse --> 4. analytics )
My plan is just to trigger an event after the completion at each service. Is this the correct way to do this?
I guess there are two questions.
Thank you.. thank you.. thank you so much to whoever can help me here. Lost an entire weekend just dabbling around. :(

I've been reading around for RabbitMQ best practices, and a lot of articles come up stating that you should have two connections -- one separated for your subscriber and one for your publisher.
Multiple connections and channels as a best practice do not necessarily translate well to Node, as the event loop is single threaded, you don't gain any real benefit by opening multiple channels, unless you are using child processes or some other form of threading which allows you to utilize both channels at once.
Also, it doesn't really work out since after a task that is done (i.e. finished scraping a page), at the end of that event, I need to fire an event that'll trigger a parsing in another service. I was pretty much doing a ch.publish(...)
right before I'd do the ack. This isn't pure as the channel I had solely for consuming is now publishing events to trigger a parse from the other service.
This is not really an issue if you're using a single channel, which as I mentioned before, is not really a problem in Node.
My plan is just to trigger an event after the completion at each service. Is this the correct way to do this?
I see no problem with this method, and have used it quite extensively in my own Node microservices. Rabbit is a very robust and flexible system, and everyones architecture will differ depending on their own application/service needs.

Related

Azure Durable Functions as Message Queue

I have a serverless function that receives orders, about ~30 per day. This function is depending on a third-party API to perform some additional lookups and checks. However, this external endpoint isn't 100% reliable and I need to be able to store order requests if the other API isn't available for a couple of hours (or more..).
My initial thought was to split the function into two, the first part would receive orders, do some initial checks such as validating the order, then post the request into a message queue or pub/sub system. On the other side, there's a consumer that reads orders and tries to perform the API requests, if the API isn't available the orders get posted back into the queue.
However, someone suggested to me to simply use an Azure Durable Function for the requests, and store the current backlog in the function state, using the Aggregator Pattern (especially since the API will be working find 99.99..% of the time). This would make the architecture a lot simpler.
What are the advantages/disadvantages of using one over the other, am I missing any important considerations?
I would appreciate any insight or other suggestions you have. Let me know if additional information is needed.
You could solve this problem with Durable Task Framework or Azure Storage or Service Bus Queues, but at your transaction volume, I think that's overcomplicating the solution.
If you're dealing with ~30 orders per day, consider one of the simpler solutions:
Use Polly, a well-supported resilience and fault-tolerance framework.
Write request information to your database. Have an Azure Function Timer Trigger read occasionally and finish processing orders that aren't marked as complete.
Durable Task Framework is great when you get into serious volume. But there's a non-trivial learning curve for the framework.

Redis Streams: How to manage perpetual subscription and BLOCK behaviour?

I am using Redis in an Express application. My app is both a publisher and consumer of streams, using a single redis connection (redis.createClient).
I have a question on the best way to manage a perpetual subscription (with xreadgroup).
Currently I am doing this:
const readStream = () => xreadgroup('GROUP' appId, consumerId, 'BLOCK', 1, 'COUNT', 1, 'STREAMS' key, '>')
.then(handleData)
.then(() => setImmeadiate(readStream));
Where xreadgroup is just a promisified version of node-redis' xreadgroup.
My question - What is the appropriate usage of BLOCK? If I block indefinitely or for a long period, then my client cannot publish any messages (with xadd) until it is unblocked, or the block times out. Since I must use some sort of loop/recursion in order to keep reading events, BLOCK appears to be fairly unnecessary; can I just leave it off and is this the expected usage?
Likewise, is using setImmeadiate appropriate or would process.nextTick or an async loop be preferred?
There is very little documentation in node-redis and the few examples simply read the messages once after blocking and do not produce/consume on the same client.
Not an expert on the subject, but I'd like to share some thoughts that might help.
I'm not sure if node-redis can "stack" multiple commands, meaning - will it be able to fire new commands while waiting for the XREADGROUP to complete?
From your description, seems like that's what's happening. In that case, I suggest you create a dedicated connection to call XREADGROUP - this way you can publish and listen without blocking one another.
You don't need to use the BLOCK; but if your goal is to listen to all events and wait for those not yet published, using it might be wise and will give you better performance while making less calls to redis.
setImmediate is probably good, especially using BLOCK. If you don't use it, then it might be good to add a little bit of timeout between calls - without the BLOCK calls will answer return almost immediately. You can check this for more details.
Friendly reminder: don't forget to ACK your messages or use NOACK (might be ok depending on your use case):
consumer groups require explicit acknowledgment of the messages successfully processed by the consumer, via the XACK command.
The NOACK subcommand can be used to avoid adding the message to the PEL in cases where reliability is not a requirement and the occasional message loss is acceptable.
Source: https://redis.io/commands/xreadgroup

Axon creating aggregate inside saga

I'm not sure how to properly ask this question but here it is:
I'm starting the saga on specific event, then im dispatching the command which is supposed to create some aggregate and then send another event which will be handled by the saga to proceed with the logic.
However each time i'm restarting the application i get an error saying that event for aggregate at sequence x was already inserted, which, i suppose is because the saga has not yet been finished and when im restarting it it starts it again by trying to create new aggregate.
Question is, is there any way in the axoniq to track progress of the saga? Like should i set some flags when i receive event and wrap in ifs the aggregate creation?
Maybe there is another way which i'm not seeing, i just dont want the saga to be replayed from the start.
Thanks
The solution you've posted definitely would work.
Let me explain the scenario you've hit here though, for other peoples reference too.
In an Axon Framework 4.x application, any Event Handling Component, thus also your Saga instances, are backed by a TrackingEventProcessor.
The Tracking Event Processor "keeps track of" which point in the Event Stream it is handling events. It stores this information through a TrackingToken, for which the TokenStore is the delegating piece of work.
If you haven't specified a TokenStore however, you will have in-memory TrackingTokens for every Tracking Event Processor.
This means that on a restart, your Tracking Event Processor thinks "ow, I haven't done any event handling yet, let me start from the beginning of time".
Due to this, your Saga instances will start a new, every time, trying to recreate the given Aggregate instance.
Henceforth, specifying the TokenStore as you did resolved the problem you had.
Note, that in a Spring Boor environment, with for example the Spring Data starter present, Axon will automatically create the JpaTokenStore for you.
I've solved my issue by simply adding token store configuration, it does exactly what i require - track processed events.
Basic spring config:
#Bean
fun tokenStore(client: MongoClient): TokenStore = MongoTokenStore.builder()
.mongoTemplate(DefaultMongoTemplate.builder().mongoDatabase(client).build())
.serializer(JacksonSerializer.builder().build())
.build()

NodeJS Performance - Multiple routes vs Single routes

I have created a single endpoint in Node.js.
Following is the end-point:
app.post('/processMyRequests',function(req,res){
switch(req.body.functionality) {
case "functionalityName1":
jsFileName1.functionA(req,res);
break;
case "functionalityName2":
jsFileName2.functionB(req,res);
break;
default:
res.send("Sorry for that");
break;
}
});
In each of these functions, calls to APIs are done, then the data is processed, and finally response is sent back.
My questions:
Since Node.js as a default handles requests asynchronously, can we have a single route for all the responses?
Will concurrency be an issue i.e. when parallel hits are happening into the single route will Node.js stall or slow down?
If the answer to question (2) is YES, how will it change when I have separate routes i.e if the same amount of requests come into a specific route then it is going to be the same issue right?
Would be happy if someone could share real-time use cases. Thanks
You technically can have a single route for all the responses, but it's considered "better-practice" to create endpoints which are compact, clear in what the intended function/purpose is, and not too complex; in your example, there could be many possible branches of code that the route could take. This requires unique logic for each branch, which adds to the complexity of your endpoints, and takes away from the clarity of the code. Imagine that when an error occurs, you now have to debug potentially multiple different files and different branches of your endpoint, when you could have created a separate endpoint for each unique "branch".
As your application grows in both size, and complexity, you are going to want an easy way to manage your API. Putting lots of stuff into one endpoint is going to be a nightmare for you to maintain, and debug.
It may be useful for you to look at some tutorials/docs about how to design and implement an API, here is a good article from Scotch.io
Example for question one:
// GET multiple records
app.get('/functionality1',function(req,res){
//Unique logic for functionality
});
// GET a record by an 'id' field
app.get('/functionality1/:id',function(req,res){
//Unique logic for functionality
});
// POST a new record
app.post('/functionality1',function(req,res){
//Unique logic for functionality
});
// PUT (update) a record
app.put('/functionality1',function(req,res){
//Unique logic for functionality
});
// DELETE a record
app.delete('/functionality1',function(req,res){
//Unique logic for functionality
});
app.get('/functionality2',function(req,res){
//Unique logic for functionality
});
...
This gives you a much clearer idea of what is happening for each endpoint, versus having to digest a lot of technically unrelated logic in a single API endpoint. Summing it up, it's better to have endpoints which are clear and concise in their purpose, and scope.
It really depends on how the logic is implemented; obviously Node.js is single-threaded. This means it can only process 1 "stream" of code at a time (no true concurrency or parallelism). However, Node gets around this through its event-loop. The problem that you could see depends on if you wrote asynchronous (non-blocking) code, or synchronous (blocking) code. In Node it's almost always better and recommended to write non-blocking code. This helps to prevent blocking the event loop, meaning your node app can do other things while, for example waiting for a file to finish being read, an API call to finish, or a promise to resolve. Writing blocking code will result in your application bottle-necking/"hanging", which is perceived by your end-users as higher-latency
Having multiple routes, or a single route isn't going to resolve this problem. It's more about how you are utilizing (or not utilizing) the event loop. It's extremely important to use asynchronous code as much as possible.
One thing that you can do if you absolutely must use synchronous code (this is actually a good approach to leverage regardless of code synchronicity)is to implement a microservice architecture, where a service can process your blocking (or resource-intensive) code off of your API Node service. This frees up your API service to handle requests as rapidly as possible, and leave the heavy lifting to other services.
Another possibility is to leverage clustering. This gives you the ability to run node as if it were multi-threaded, by spawning "worker" processes, which are identical to your master process, with the difference in that they are able to process work individually. This type of approach is extremely useful if you expect that you will have a very busy API service.
Some extremely helpful resources:
Node.js Express Best Practices
A GREAT video explaining the event-loop
Parallelism vs. Concurrency in Node.js
Node.js Clustering
API Design

Implementing general purpose long polling

I've been trying to implement a simple long polling service for use in my own projects and maybe release it as a SAAS if I succeed. These are the two approaches I've tried so far, both using Node.js (polling PostgreSQL in the back).
1. Periodically check all the clients in the same interval
Every new connection is pushed onto a queue of connections, which is being walked through in an interval.
var queue = [];
function acceptConnection(req, res) {
res.setTimeout(5000);
queue.push({ req: req, res: res });
}
function checkAll() {
queue.forEach(function(client) {
// respond if there is something new for the client
});
}
// this could be replaced with a timeout after all the clients are served
setInterval(checkAll, 500);
2. Check each client at a separate interval
Every client gets his own ticker which checks for new data
function acceptConnection(req, res) {
// something which periodically checks data for the client
// and responds if there is anything new
new Ticker(req, res);
}
While this keeps the minimum latency for each client lower, it also introduces overhead by setting a lot of timeouts.
Conclusion
Both of these approaches solve the problem quite easily, but I don't feel that this will scale up easily to something like 10 million open connections, especially since I'm polling the database on every check for every client.
I thought about doing this without the database and just immediately broadcast new messages to all open connections, but that will fail if a client's connection dies for a few seconds while the broadcast is happening, because it is not persistent. Which means I basically need to be able to look up messages in history when the client polls for the first time.
I guess one step up here would be to have a data source where I can subscribe to new data coming in (CouchDB change notifications?), but maybe I'm missing something in the big picture here?
What is the usual approach for doing highly scalable long polling? I'm not specifically bound to Node.js, I'd actually prefer any other suggestion with a reasoning why.
Not sure if this answers your question, but I like the approach of PushPin (+ explanation of concepts).
I love the idea (using reverse proxy and communicating with return codes + delayed REST return requests), but I do have reservations about the implementation. I might be underestimating the problem, but is seems to me that the technologies used are a bit on an overkill. Not sure if I will use it or not yet, would prefer a more lightweight solution, but I find the concept phenomenal.
Would love to hear what you used eventually.
Since you mentioned scalability, I have to get a little bit theoretical, as the only practical measure is load testing. Therefore, all I can offer is advice.
Generally speaking, once-per anything is bad for scalability. Especially once-per-connection or once-per-request since that makes part of your app proportional to the amount of traffic. Node.js removed the thread-per-connection dependency with its single-threaded asynchronous I/O model. Of course, you can't completely eliminate having something per-connection, like a request and response object and a socket.
I suggest avoiding anything that opens a database connection for every HTTP connection. This is what connections pools are for.
As for choosing between your two options above, I would personally go for the second choice because it keeps each connection isolated. The first option uses a loop over connections, which means actual execution time per connection. It's probably not a big deal given that I/O is asynchronous, but given a choice between an iteration-per-connection and the mere existence of an object-per-connection, I would prefer to just have an object. Then I have less to worry about when suddenly there are 10,000 connections.
The C10K problem seems like a good reference for this, though this is really personal judgement to be honest.
http://www.kegel.com/c10k.html
http://en.wikipedia.org/wiki/C10k_problem

Resources