PhantomJs and Nodejs very slow - node.js

I have created a rest service with nodejs, where for the response it goes to a certain page and scrape some date using the phantomjs version of nodejs.
The whole process is very slow (I had to move to another server because some connections were automatically timeout after 30 seconds).
Another problem (as is my understanding) is that the server is single thread so it takes even a lot more to respond if it is already processing another request.
My questions are:
Is there a way to speed up the whole process?
Is there a way to make the nodejs run multithreaded?
Most important would a Java implementaion of the same services (with selenium) would be faster or allow multithreading? Thanks

Related

Nodejs Nestjs - How to measure node process performance and prevent memory leaks

I am working on a nestjs app that makes heavy use of task scheduling using the #nestjs/schedule package that integrates with the node-cron npm lib.
At the moment, the app has been in development for over 6 months and has over 30 cron tasks running in the background simultaneously. although most of them have distinct intervals, some crons have the same interval (e.g. runs EVERY 30 SECONDS).
All cron tasks more or less follow the same behavior;
send request to external APIs to get data.
query mongo db to run some checks and update records accordingly.
some crons emit events to the client when certain condition is met by other cron tasks.
my question is:
How can I measure the performance of the node process while running all these background tasks in my local development PC? and what effect it might have on requests that comes from the client?
another point is: Is it possible to detect a memory leak before it happen?
basically I have concerns about the app performance and I want to try to prevent the problem before it happen.
Thanks.

How does Node treat two o more users simultaneously?

I got several applications working with Node on the back-end and React on the front-end, it works great, I do axios get and post requests from React to Express and I get data back and forth, then on production I use pm2 to get everything up and running.
My question is when two users access the same application at the same time, how does Node treat this, as two separated instances or just one?.
I am considering using socket.io to be able to notify the front-end on changes that are happening on Node, and I wonder if those notifications will be emitted from the back-end no matter what another user might be doing or not.
Thanks.
As you have probably heard node.js is addressed as a "single-threaded" runtime. This is only partially true. Even though node runs on a single thread of your processor it runs the majority of its tasks in a thread pool which can process up to 4 tasks at the same time.
If you want to know about this you might want to look into the node event loop which describes the steps node goes through on each "tick".
So as you see node can often not process one but up to 4 actions on each loop cycle. But there is more, to solve the performance issues that might occur on big applications you can run node on a cluster mode. This allows you to extend the thread pool and add multiple node instances and therefore handle high demand efficiently.
One note to your socket.io question. As you see a high demand of tasks gets queued until it is handled in the node event loop, so sometimes you need to wait. Fortunatly we are in a race of big tech to create the fastest JS-runtime so this thing is pretty fast.

API getting Slow due to Iteration using Node JS

I am using Node js for creating a REST API.
In my scenario i have two API's.
API 1 --> Have to get 10,000 records and make a iteration to modify some of the data
API 2: Simple get method.
When i open post man and hit the first API and Second API parallel
Because of Node JS is single threaded Which Causes second API slower for getting response.
My Expectation:
Even though the 1st API getting time it should not make the 2nd API for large time.
From Node JS docs i have found the clustering concept.
https://nodejs.org/dist/latest-v6.x/docs/api/cluster.html
So i implemented Cluster it created 4 server instance.
Now i hit the API 1 in one tab and API 2 in second tab it worked fine.
But when i opened API 1 in 4 tabs and 5th tab again API 2 which causes the slowness again.
What will be the best solution to solve the issue?
Because of the single threaded nature of node.js, the only way to make sure your server is always responsive to quick requests such as you describe for API2 is to make sure that you never have any long running operations in your server.
When you do encounter some operation in your code that takes awhile to run and would affect the responsiveness of your server, your options are as follows:
Move the long running work to a new process. Start up a new process and run the length operation in another process. This allows your server process to stay active and responsive to other requests, even while the long running other process is still crunching on its data.
Start up enough clusters. Using the clustering you've investigated, start up more clusters than you expect to have simultaneous calls to your long run process. This allows there to always be at least one clustered process that is available to be responsive. Sometimes, you cannot predict how many this will be or it will be more than you can practically create.
Redesign your long running process to execute its work in chunks, returning control to the system between chunks so that node.js can interleave other work it is trying to do with the long running work. Here's an example of processing a large array in chunks. That post was written for the browser, but the concept of not blocking the event loop for too long is the same in node.js.
Speed up the long running task. Find a way to speed up the long running job so it doesn't take so long (using caching, not returning so many results at once, faster way to do it, etc...).
Create N worker processes (probably one less worker process than the number of CPUs you have) and create a work queue for the long running tasks. Then, when a long running request comes in, you insert it in the work queue. Then, each worker process is free to work on items in the queue. When more than N long tasks are being requested, the first ones will get worked on immediately, the later ones will wait in the queue until there is a worker process available to work on them. But, most importantly, your main node.js process will stay free and responsive for regular requests.
This last option is the most foolproof because it will be effective to any number of long running requests, though all of the schemes can help you.
Node.js actually is not multi-threaded, so all of these requests are just being handled in the event loop of a single thread.
Each Node.js process runs in a single thread and by default it has a memory limit of 512MB on 32 bit systems and 1GB on 64 bit systems.
However, you can split a single process into multiple processes or workers. This can be achieved through a cluster module. The cluster module allows you to create child processes (workers), which share (or not) all the server ports with the main Node process.
You can invoke the Cluster API directly in your app, or you can use one of many abstractions over the API
https://nodejs.org/api/cluster.html

Linux command line queueing system

I am running a webservice to convert ODT documents to PDF using OpenOffice on an Ubuntu server.
Sadly, OpenOffice chokes occasionally when more then 1 request is made simultaneously (converting a PDF takes around 500-1000ms). This is a real threat since my webservice is multithreaded and jobs are mostly issued in batches.
What I am looking for is a way to hand off the conversion task from my webservice to a intermediate process that queues all requests and streamlines them 1 by 1 to OpenOffice.
However, sometimes I want to be able to issue a high priority conversion that gets processed immediately (after the current one, if busy) and have the webservice wait (block) for that. This seems a tricky addition that makes most simple scheduling techniques obsolete.
What you're after is some or other message/work queue system.
One of the simplest work queueing systems I've used, that also supports prioritisation, is beanstalkd.
You would have a single process running on your server, that will run your conversion process when it receives a work request from beanstalkd, and you will have your web application push a work request onto beanstalkd with relevant information.
The guys at DigitalOcean have written up a very nice intro to it here:
https://www.digitalocean.com/community/tutorials/how-to-install-and-use-beanstalkd-work-queue-on-a-vps

Less resources for steady client connections, why?

I heard of node.js is very suitable for applications where a persistent connection from the browser to the server is needed. That "long-polling" technique is used, that allows to send updates to the user in real time without needing a lot of server resources. A more traditional server model would need a thread for every single user.
My question, what is done instead, how are the requests served differently?
Why doesn't it take so much resources?
Nodejs is event-driven. The node script is started and then loops continuously, waiting for events to be fired, until stopped. Once running, the overhead associated with loading is done.
Compare this to a more traditional language such as c#.net or PHP, where a request causes the server to load and run the script and it's dependencies. The script then does its' task (often serving a web page) and then shuts down. Another page is requested, the whole process starts again.

Resources