What are ways processing long running queries with Node? - node.js

I need to pull a large amount of data from another server and then save to my database ( mongodb). What is the best way to handle processing long running queries. The another consider is that there will be a good number of users of the app. They will intiate the similar requests to pull the data from the remote server and save it to our server.
I have found a various options like queue-worker, cluster, etc . I am new to NodeJs so I am a bit confused. Which one will be the best or is there any other solution I might consider.

Related

Fetch data from external API and populate database every minutes

I would like to fetch data from external API with limited request and populate my database. My concern is more about the architecture, language and tools to use. I would like to have a big picture in term of performance and good practise.
I did make an cron with nodejs and express running every minutes and populate my database and it works. On the same server i did created some routes to be called for client.
What should be better to do rather than using cron on nodejs ? I know that i can also make cron under linux calling a script whatever it's python or nodejs. But what would be the good practise ? Specially if i want more cron instead of a single one ?
Should i separate my cron into another instance to not block any request from client ? If my server is already busy retrieving data from external API while someone is calling a route in the same server does it will increase the latency ?
There is some tools to monitor my tasks instead of using logs ?
As i know node js is better to handle big amount of requests than a few other servers but if you are able to change the framework then you can give chance to https://bun.sh/.
also, you can try multithreading in node.js it can be more affordable and easy.
https://www.digitalocean.com/community/tutorials/how-to-use-multithreading-in-node-js

NodeJS sharding architecture with many MondoDB databases approaches

We have architecture problem on our project. This project requires sharding, as soon as we need almost unlimited scalability for the part of services.
Сurrently we use Node.js + MongoDb (Mongoose) and MySQL (TypeORM). Data is separated by databases through the simple 'DB Locator'. So node process needs connections to a lot of DBs (up to 1000).
Requests example:
HTTP request from client with Shop ID;
Get DB IP address/credentials in 'DB Locator' service by Shop ID;
Create connection to specific database with shop data;
Perform db queries.
We tried to implement it in two ways:
Create connection for each request, close it on response.
Problems:
we can't use connection after response (it's the main problem, because sometimes we need some asynchronous actions);
it works slower;
Keep all connections opened.
Problems:
reach simultaneous connections limit or some another limits;
memory leaks.
Which way is better? How to avoid described problems? Maybe there is a better solution?
Solution #1 perfectly worked for us on php as it runs single process on request and easily drops connections on process end. As we know, Express is pure JS code running in v8 and is not process based.
It would be great to close non-used connections automatically but can't find options to do that.
The short answer: stop using of MongoDB with Mongoose 😏
Longer answer:
MongoDB is document-oriented DBMS. The main usage case is when you have some not pretty structured data that you have to store, but you don't need to use too much. There is lazy indexing, dynamic typing and many more things that not allow you to use it as RDBMS, but it is great as a storage of logs or any serialized data.
The worth part here is Mongoose. This is the library that makes you feel like your trashbox is wonderful world with relations, virtual fields and many things that should not to be in DODBMS. Also, there is a lot of legacy code from previous versions that also make some troubles with connections management.
You already use TypeORM that may works instead Mongoose. With some restrictions, for sure.
It works exactly same way as MySQL connection management.
Here is some more data: https://github.com/typeorm/typeorm/blob/master/docs/mongodb.md#defining-entities-and-columns
In this case you may use you TypeORM Repository as transparent client that will init connections and close it or keep it alive on demand.

request advice converting project to node.js

Our company processes machine data. We get machine reports from a socket connection and write the data to mysql tables. Then, other scripts and threads written in python and ruby pull these records out and process the data, saving them back to the database and notify our clients on their machines health.
I want to convert these processes to node.js But, i have to do it kinda incrementally. Instead of doing so much writing back and forth to the database, I would like to hand the data off from our input process to a node server, that will process everything at once.
I suppose i can modify our input scripts to hand off the raw bytes we get from the machines or I could process them as JSON. I had thought about writing the new processes using a node http server, but that seems like some overhead. I have thought websockets may be a better solution, but I am looking for ideas.. thanks

Pass data between multiple NodeJS servers

I am still pretty new to NodeJS and want to know if I am looking at this in the wrong way.
Background:
I am making an app that runs once a week, generates a report, and then emails that out to a list of recipients. My initial reason for using Node was because I have an existing front end already built using angular and I wanted to be able to reuse code in order to simplify maintenance. My main idea was to have 4+ individual node apps running in parallel on our server.
The first app would use node-cron in order to run every Sunday. This would check the database for all scheduled tasks and retrieve the stored parameters for the reports it is running.
The next app is a simple queue that would store the scheduled tasks and pass them to the worker tasks.
The actual pdf generation would be somewhat CPU intensive, so this would be a cluster of n apps that would retrieve and run individual reports from the queue.
When done making the pdf, they would pass to a final email app that would send the file out.
My main concerns are communication between apps. At the moment I am setting up the 3 lower levels (ie. all but the scheduler) on separate ports with express, and opening http requests to them when needed. Is there a better way to handle this? Would the basic 'net' work better than the 'http' package? Is Express even necessary for something like this, or would I be better off running everything as a basic http/net server? So far the only real use I've made of Express is to specifically listen to a path for put requests and to parse the incoming json. I was led to asking here because in tracking logs so far I see every so often the http request is reset, which doesn't appear to affect the data received on the child process, but I still like to avoid errors in my coding.
I think that his kind of decoupling could leverage some sort of stateful priority queue with features like retry on failure, clustering, ...
I've used Kue.js in the past with great sucess, it's redis backed and has nice documentation and interface http://automattic.github.io/kue/

Node.js, Redis, and MongoDB

Intoduction
My current project has a mix of common RESTful API concepts and modern realtime websocket/long poling. I'm using mongoDB to store persistant data such as users, products, and aggregated social content. The social content is basically links to tumblr posts, twitter tweets, and facebook posts which are compiled into what I call a "shout".
Implementation
What I'm trying to accomplish is rating "shouts" based on how many likes or follows the post has out of the combined total from all social medias used. I want the data to change on the frontend as the backend updates. The back-end calls all the social medias based on checking an expiration date on the data. The server will check for new data on event that a request was made for the data. A request is made for the data every time a client connects, or everytime someone posts a new shout through my app. If there is not activity in a given duration of time, the shout is archived and updated every so often with scheduled jobs. I use socket.io to send realtime updates.
What I'm Using Redis For
The reason I need Redis is to message all my servers when one of them starts requesting data from the social media sources so I don't run into the issue where all of my servers are essentially doing the same thing when the task only needs to be done once. I also need to message my other services once a change is made. For these implementations I'm currently using Redis pub/sub. Since I'm currently using Redis, I also store session tokens in redis, and use it as a cache.
What I'm Using Mongo For
I use MongoDB to persist data, and I've setup indexing to tune performance specifically for my application.
The Problem
My problem is I feel like my stack is too big. I feel like using redis and mongo can be over kill. Should I cut out redis and use an MQ system, and store my sessions and cache in mongo and just index them for fast lookups? If so what MQ system would be suitable for my application?
Should I cut out mongodb and use all redis? Would this be cost effective for relatively large sums of data? As I would be storing hundreds of thousands(maybe more) shouts(essentially just URIs), thousands of users, and hundreds of thousands of products.

Resources