Fetch data from external API and populate database every minutes - node.js

I would like to fetch data from external API with limited request and populate my database. My concern is more about the architecture, language and tools to use. I would like to have a big picture in term of performance and good practise.
I did make an cron with nodejs and express running every minutes and populate my database and it works. On the same server i did created some routes to be called for client.
What should be better to do rather than using cron on nodejs ? I know that i can also make cron under linux calling a script whatever it's python or nodejs. But what would be the good practise ? Specially if i want more cron instead of a single one ?
Should i separate my cron into another instance to not block any request from client ? If my server is already busy retrieving data from external API while someone is calling a route in the same server does it will increase the latency ?
There is some tools to monitor my tasks instead of using logs ?

As i know node js is better to handle big amount of requests than a few other servers but if you are able to change the framework then you can give chance to https://bun.sh/.
also, you can try multithreading in node.js it can be more affordable and easy.
https://www.digitalocean.com/community/tutorials/how-to-use-multithreading-in-node-js

Related

Schedule function call in React (Node as backend) to perform exactly at day and time which is specified in a database

I want to show the user exactly to the second when he can have access to a given page, othervise it will be blocked. Lets say that I receive specific date and time from the server.
I guess I could use setTimeout function but I'm sure its a bad idea.
I can use a scheduler like node cron in backend but I'd need to send a message to frontend somehow after given time has passed.
Are webSockets an option? Or is there easier way?
I want to show the user exactly to the second when he can have access to a given page
For such accuracy, indeed the WebSocket communication is the way to go. This protocol is widely used on the web for push notification in email/social services like Gmail, Facebook etc.
Regarding the backend, I would suggest you to use a more scalable approach. You could use Bull to create a scheduling service. Bull uses Redis as a store and can operate with multiple processors(Node Processes), ensuring that each task is processed only by one processor. With one word it abstracts away the complexities which arise in distributed systems.

Alternative to GraphQL long polling on an Express server for a large request?

Objective
I need to show a big table of data in my React web app frontend.
My backend is an Express server with a GraphQL layer and a few "normal" endpoints.
My server gets data from various sources, including an external API, which is the data source for my current task.
My server has a database that I can use freely. I cannot directly access the external API from my front end.
The data all comes from the external API I mentioned. In fact, it comes from multiple similar calls to the same endpoint with many different IDs. Each of those individual calls takes a while to return but doesn't risk timing out.
Current Solution
My naive implementation: I do one GraphQL query in which the resolver does all the API calls to the external service in parallel. It waits on them all to complete using Promise.all(). It then returns a big array containing all the data I need to my server. My server then returns that data to me.
Problem With Current Solution
Unfortunately, this sometimes leaves my frontend hanging for too long and it times out (takes longer than 2 minutes).
Proposed Solution
Is there a better way than manually implementing long polling in GraphQL?
This is my main plan for a solution at the moment:
Frontend sends a request to my server
Server returns a 200 and starts hitting the external API, and sets a flag in the database
Server stores the result of each API call in the database as it completes
Meanwhile, the frontend shows a loading screen and keeps making the same GraphQL query for an entity like MyBigTableData which will tell me how many of the external API calls have returned
When they've all returned, the next time I ask for MyBigTableData, the server will send back all the data.
Question
Is there a better alternative to GraphQL long polling on an Express server for this large request that I have to do?
An alternative that comes to mind is to not use GraphQL and instead use a standard HTTP endpoint, but I'm not sure that really makes much difference.
I also see that HTTP/2 has multiplexing which could be relevant. My server currently runs HTTP/1.1 and upgrading is something of an unknown to me.
I see here that Keep-Alive, which sounds like it could be relevant, is unusable in Safari which is bad as many of my users use Safari to access the frontend.
I can't use WebSockets because of technical restraints. I don't want to set a ridiculously long timeout on my client either (and I'm not sure if it's possible)
I discovered that GraphQL has polling built in https://www.apollographql.com/docs/react/data/queries/#polling
In the end, I made a REST polling system.

Is there a way to run a node task in a child process?

I have a node server, which needs to:
Serve the web pages
Keep querying an external REST API and save data to database and send data to clients for certain updates from REST API.
Task 1 is just a normal node tasks. But I don't know how to implement the task 2. This task won't expose any interface to outside. It's more like a background task.
Can anybody suggest? Thanks.
To make a second node.js app that runs at the same time as your first one, you can just create another node.js app and then run it from your first one using child_process.spawn(). It can regularly query the external REST API and update the database as needed.
The part about "Send data to clients for certain updates from REST API" is not so clear what you're trying to do.
If you're using socket.io to send data to connected browsers, then the browsers have to be connected to your web server which I presume is your first node.js process. To have the second node.js process cause data to be sent through the socket.io connections in the first node.js process, you need some interprocess way to communicate. You can use stdout and stdin via child_process.spawn(), you can use some feature in your database or any of several other IPC methods.
Because querying a REST API and updating a database are both asynchronous operations, they don't take much of the CPU of a node.js process. As such, you don't really have to do these in another node.js process. You could just have a setInterval() in your main node.js process, query the API every once in a while, update the database when results are received and then you can directly access the socket.io connections to send data to clients without having to use a separate process and some sort of IPC mechanism.
Task 1:
Express is good way to accomplish this task.
You can explore:
http://expressjs.com/
Task 2:
If you are done with Expressjs. Then you can write your logic with in Express Framework.
This task then can be done with node module forever. Its a simple tool that runs your background scripts forever. You can use forever to run scripts continuously (whether it is written in node.js or not)
Have a look:
https://github.com/foreverjs/forever

Pass data between multiple NodeJS servers

I am still pretty new to NodeJS and want to know if I am looking at this in the wrong way.
Background:
I am making an app that runs once a week, generates a report, and then emails that out to a list of recipients. My initial reason for using Node was because I have an existing front end already built using angular and I wanted to be able to reuse code in order to simplify maintenance. My main idea was to have 4+ individual node apps running in parallel on our server.
The first app would use node-cron in order to run every Sunday. This would check the database for all scheduled tasks and retrieve the stored parameters for the reports it is running.
The next app is a simple queue that would store the scheduled tasks and pass them to the worker tasks.
The actual pdf generation would be somewhat CPU intensive, so this would be a cluster of n apps that would retrieve and run individual reports from the queue.
When done making the pdf, they would pass to a final email app that would send the file out.
My main concerns are communication between apps. At the moment I am setting up the 3 lower levels (ie. all but the scheduler) on separate ports with express, and opening http requests to them when needed. Is there a better way to handle this? Would the basic 'net' work better than the 'http' package? Is Express even necessary for something like this, or would I be better off running everything as a basic http/net server? So far the only real use I've made of Express is to specifically listen to a path for put requests and to parse the incoming json. I was led to asking here because in tracking logs so far I see every so often the http request is reset, which doesn't appear to affect the data received on the child process, but I still like to avoid errors in my coding.
I think that his kind of decoupling could leverage some sort of stateful priority queue with features like retry on failure, clustering, ...
I've used Kue.js in the past with great sucess, it's redis backed and has nice documentation and interface http://automattic.github.io/kue/

Best way to persist soap request and response into Oracle database?

I want to track service calls for security reasons into database and I need to generate reports above them. I don't know what is the best way, using soap handlers, using database loggers or something else but I know that performance is very important for me.
Any Idea?
P.S javaee
Depending on the traffic amount you get on your web services, it might not be a good idea to make a database write per request. Instead what you should do is make some kind of a listener that is triggered each time a ws request comes in, to which you'd pass the request itself, and then that listener will cache back several reports based on those requests, and then write them all to the db at certain intervals (like every 10 minutes, each night at 3 a.m., etc)

Resources