How to handle multiple mongo ChangeStreams in a multi instance service - node.js

I am trying to solve a problem where I am listening on a Mongo Changestream but in a production environment, we have multiple instances of the service running so in each instance I get the 'on change' event which I need to process.
But I want to process the change ONLY ONCE but here it is being processed multiple times(Number of instances of the service)
I need a way to ensure that each change Event is processed only once.
Thought of using Redis but then there is a chance of collision when two instances try to add the same on change event at the same time.
Please refer the diagram below for a rough idea of my architecture.

Related

how to run multiple instances without duplicate job in nodejs

I have a problem when I scale the project (nestjs) to multiple instance. In my project, I have a crawler service that run each 10 minutes. When 2 instance running, crawler will run on both instances, so data will duplicate. Does anyone know how to handle it?
Looks like it can be processed using a queue, but I don't have a solution yet
Jobs aren't the right construct in this case.
Instead, use a job Queue: https://docs.nestjs.com/techniques/queues
You won't even need to set up a separate worker server to handle the jobs. Instead, add Redis (or similar) to your setup, configure a queue to use it, then set up 1) a producer module to add jobs to the queue whenever they need to be run, 2) a consumer module which will pull jobs off the queue and process them. Add logic into the producer module to ensure that duplicate jobs aren't being created, if that logic is running on both your machines.
Conversely it may be easier just to separate job production/processing into a separate server.

Mongo watch change stream suddenly stopped working

I'm using mongo watch() to subscribe to change stream events. I've noticed that the change stream events automatically stopped without throwing any specific error and become idle. Then have to restart the server to listen to the change stream again.
I'm not able to find out the specific reason for this strange behavior.
We are using Nodejs server. mongoose for db connection and watch.
If any one of you faced the same issue please guide me on that. We have 1 primary node and 2 secondary node cluster and hosted in mongodb atlas.
The collection.watch(...) method has to be called on the collection on every server restart. Common mistake is to call it once upon the creation of the collection. However, the database does not maintain reference to the result of this call as it does for other calls such as the collection.createIndexes(...).
Change streams only notify on data changes that have persisted to a majority of data-bearing members in the replica set. This ensures that notifications are triggered only by majority-committed changes that are durable in failure scenarios.
Change stream events stop working when a node fails in a replica set

mongo DB change strem with load balancing in production env

mongo change stream with load balancing
Can any one help as how we can achieve mango change stream with load balanceing server..?
we are working on micro service architecture facing issue on production when i go with load balanceing the same code is deployed over the 4 server when i perform any operation on single server the change stream trigger is fired form all the 4 server.
So what i want it should be trigger from same server where the operation are performed
Thanks in advance
Change stream is a database-level concept. Data is inserted/updated/deleted from the database, this produces change events. Any number of subscribers can subscribe to the change events and do whatever they want to do with the changes. Each subscriber is notified of every change event.
A change stream is not meant to inform the application that originated the change of the said change. This is redundant - the application already knows what it did.
Consider rephrasing your question to explain what you are trying to accomplish better.

Axon creating aggregate inside saga

I'm not sure how to properly ask this question but here it is:
I'm starting the saga on specific event, then im dispatching the command which is supposed to create some aggregate and then send another event which will be handled by the saga to proceed with the logic.
However each time i'm restarting the application i get an error saying that event for aggregate at sequence x was already inserted, which, i suppose is because the saga has not yet been finished and when im restarting it it starts it again by trying to create new aggregate.
Question is, is there any way in the axoniq to track progress of the saga? Like should i set some flags when i receive event and wrap in ifs the aggregate creation?
Maybe there is another way which i'm not seeing, i just dont want the saga to be replayed from the start.
Thanks
The solution you've posted definitely would work.
Let me explain the scenario you've hit here though, for other peoples reference too.
In an Axon Framework 4.x application, any Event Handling Component, thus also your Saga instances, are backed by a TrackingEventProcessor.
The Tracking Event Processor "keeps track of" which point in the Event Stream it is handling events. It stores this information through a TrackingToken, for which the TokenStore is the delegating piece of work.
If you haven't specified a TokenStore however, you will have in-memory TrackingTokens for every Tracking Event Processor.
This means that on a restart, your Tracking Event Processor thinks "ow, I haven't done any event handling yet, let me start from the beginning of time".
Due to this, your Saga instances will start a new, every time, trying to recreate the given Aggregate instance.
Henceforth, specifying the TokenStore as you did resolved the problem you had.
Note, that in a Spring Boor environment, with for example the Spring Data starter present, Axon will automatically create the JpaTokenStore for you.
I've solved my issue by simply adding token store configuration, it does exactly what i require - track processed events.
Basic spring config:
#Bean
fun tokenStore(client: MongoClient): TokenStore = MongoTokenStore.builder()
.mongoTemplate(DefaultMongoTemplate.builder().mongoDatabase(client).build())
.serializer(JacksonSerializer.builder().build())
.build()

Node/Express: running specific CPU-instensive tasks in the background

I have a site that makes the standard data-bound calls, but then also have a few CPU-intensive tasks which are ran a few times per day, mainly by the admin.
These tasks include grabbing data from the db, running a few time-consuming different algorithms, then reuploading the data. What would be the best method for making these calls and having them run without blocking the event loop?
I definitely want to keep the calculations on the server so web workers wouldn't work here. Would a child process be enough here? Or should I have a separate thread running in the background handling all /api/admin calls?
The basic answer to this scenario in Node.js land is to use the core cluster module - https://nodejs.org/docs/latest/api/cluster.html
It is an acceptable API to :
easily launch worker node.js instances on the same machine (each instance will have its own event loop)
keep a live communication channel for short messages between instances
this way, any work done in the child instance will not block your master event loop.

Resources