I'm relatively new to Node and javascript. I'm running a program that does heavy network api calls and process the results. What I'm experiencing is that my node code is making other programs running on my mac (Outlook, Chrome, etc.) unresponsive to the point I can't even force quit those programs and have to hard reboot my machine.
Any idea why that's happening? I thought node.js is somewhat sandboxed and shouldn't affect other programs. Is it node using up all sockets available?
seems I've found the reason for node.js itself to use a lot of CPU and memory. I have some code processing thousands rows of user location and calculating the distances. That seems to be very expensive and have a big toll on node. I've moved that code into process.nextTick() and it's not hanging other programs any more.
What I still don't understand is that why node hangs, why it hangs up other programs on my mac as well.
Related
Context; I'm writing a monitoring/management app for a VPS, which is running Linux
Reasons; I need to quickly be able to identify overloaded threads, high ram usage, badly behaving tasks.
Problems and current stage; Right now my code works well, I'm using systeminformation npm module to gather some system information like CPU usage, memory usage, disk status and task list, I put it into an object and send to all connected clients on a socket.io server. Problem is, it seems that this approach literally brings the host machine to it's knees (Both server and client are running locally, because I'm still working on them), by that I mean my CPU usage going from 6% to 80% in an instant, which is ridiculous. I want this updating to be atleast once a second, but if possible, 60/s. Point is, I need to either find a different way of retrieving the usage data of CPU (ideally with each thread as well), memory, disks and the list of tasks. I know this question is not very specific, but I believe this is something more people than just me encounter, that being that NodeJS just kills the machine (irony). The question remains, looking forward towards any help!
I tried different approaches before but they seemed to lower the usage by a bit or just up it because of the need to have more modules loaded. This generally leads me to the conclusion I just need a better module to handle this stuff.
I've been struggling to run multiple instances of Puppeteer on DigitalOcean for quite some time with little luck. I'm able to run ~5 concurrently using tools like puppeteer-cluster, but for some reason the whole thing just chokes with little helpful messaging. So, I switched to spawning ~5 child processes without any additional library -- just Puppeteer itself. Same issue. Chokes with no helpful errors.
I'm able to run all of these jobs just fine locally, but after I deploy, I hit these walls. So, my hunch is that it's a resource/performance issue, but I can't say for sure.
I'm running a droplet with 1GB and 3CPUs on Digital Ocean.
Basically, I'm just looking for ways to start troubleshooting something like this. is there a way I can know for sure that I'm hitting resource walls? I've tried pm2 and the DO dashboard graphs, but I feel like those are all leaving a lot of information out, or else I'm missing something else altogether.
Author of puppeteer-cluster here. You are right, 1 GB of memory is likely not enough for running 5 browser windows (or tabs) in addition to your operating system and maybe even other background tasks.
Here is a list of resources you should check:
Memory: Use a tool like htop to check your memory usage while your application is running.
CPU: Again, you can use htop for that, 3 vCPUs should be more than enough for 5 windows.
Disk space: Use a tool like df to check if there is enough space on the disk. I know of multiple cases in which there was not enough space on the disk (like some old kernels filling the disk), and Chrome needs at least some space to run.
Network throughput: Rarely the problem, but sometimes the network just does not have the bandwidth to support many open browser. Use a tool like nload to check the network throughput.
To use htop or nload, you start your script in the background (node script.js &) or use a terminal multiplexer (like tmux). Resource problems should then be easy to spot.
Most probably you're running out of memory, 5 puppeteer processes are a lot for a 1GB VM.
You can run
grep -i 'killed process' /var/log/messages
to confirm that the OOM killer terminated your processes.
We use clustering with our express apps on multi cpu boxes. Works well, we get the maximum use out of AWS linux servers.
We inherited an app we are fixing up. It's unusual in that it has two processes. It has an Express API portion, to take incoming requests. But the process that acts on those requests can run for several minutes, so it was build as a seperate background process, node calling python and maya.
Originally the two were tightly coupled, with the python script called by the request to upload the data. But this of course was suboptimal, as it would leave the client waiting for a response for the time it took to run, so it was rewritten as a background process that runs in a loop, checking for new uploads, and processing them sequentially.
So my question is this: if we have this separate node process running in the background, and we run clusters which starts up a process for each CPU, how is that going to work? Are we not going to get two node processes competing for the same CPU. We were getting a bit of weird behaviour and crashing yesterday, without a lot of error messages, (god I love node), so it's bit concerning. I'm assuming Linux will just swap the processes in and out as they are being used. But I wonder if it will be problematic, and I also wonder about someone getting their web session swapped out for several minutes while the longer running process runs.
The smart thing to do would be to rewrite this to run on two different servers, but the files that maya uses/creates are on the server's file system, and we were not given the budget to rebuild the way we should. So, we're stuck with this architecture for now.
Any thoughts now possible problems and how to avoid them would be appreciated.
From an overall architecture prospective, spawning 1 nodejs per core is a great way to go. You have a lot of interdependencies though, the nodejs processes are calling maya which may use mulitple threads (keep that in mind).
The part that is concerning to me is your random crashes and your "process that runs in a loop". If that process is just checking the file system you probably have a race condition where the nodejs processes are competing to work on the same input/output files.
In theory, 1 nodejs process per core will work great and should help to utilize all your CPU usage. Linux always swaps the processes in and out so that is not an issue. You could start multiple nodejs per core and still not have an issue.
One last note, be sure to keep an eye on your memory usage, several linux distributions on EC2 do not have a swap file enabled by default, running out of memory can be another silent app killer, best to add a swap file in case you run into memory issues.
When I reboot my Mac, starting up Node.js apps is nice and fast. Later, after the Mac has been on for a few hours, it gets REALLY slow.
I've used process.hrtime() to time various things, and it seems that it's all the require calls that take time, ie loading dependencies. Once everything is loaded, my apps run reasonably fast.
The difference is extreme: just after I've rebooted, an app may take 300ms to get through the require calls, then it takes something like 30 seconds once it's been on for a few hours.
What could be causing this?
Since how fast executing require is perfectly related to a disk loading performance.
So I suppose It's highly possible that HDD/SSD or whatever your project belongs causing that.
it's working fine right after you rebooted so maybe some background software(such as virus scan or auto downloader) is making IO blocking
You'd better start by checking your CPU/RAM/HDD status while the issue is occurring and see what happens if you move your project into USB flash drive or external HDD
I have an application which was ported from Windows to Linux. Now the same code compiles on VS C++ and g++, but there is a difference in performance when it's running on Win and when it's running on Linux. The scope of this application is caching. It's a node between a server and a client, and it's caching client requests and server response in a list, so that any other client which makes requests that was already processed by the server, this node will response instead of forwarding it to server.
When this node runs on Windows, the client gets all it needs in about 7 seconds. But when same node is running on Linux (Ubuntu 9.04), the client starts up in 35 seconds. Every test is from scratch. I'm trying to understand why is this timing difference. A weird scenario is when the node is running on Linux but in a Virtual Machine, hosted by Win. In this case, load time is around 7 seconds, just like it was running Win natively. So, my impression is that there is a problem with networking.
This node is using UDP protocol for sending and receiving network data, and it's using boost::asio as implementation. I tried to change all supported socket flags, changed buffer size, but nothing.
Does someone know why is this happening, or any network settings related with UDP that might influence the performance?
Thanks.
If you suspect a network problem take a network capture (Wireshark is great for this kind of problem) and look at the traffic.
Find out where the time is being spent, either based on the network capture or based on the output of a profiler.
Once you know that you're half way to a solution.
These timing differences can depend on many factors, but the first one coming to mind is that you are using a modern Windows version. XP already had features to keep recently used applications in memory, but in Vista this was much better optimized. For each application you load, a special load file is created that is equal to how it looks in memory. Next time you load your application, it should go a lot faster.
I don't know about Linux, but it is very well possible that it needs to load your app completely each time. You can test the difference in performance between the two systems much better if you compare performance when running. Leave your application open (if it is possible with your design) and compare again.
These differences in how the system optimizes memory are backed up by your scenario using the VM approach.
Basically, if you rule out other running applications and if you run your application in high priority mode, the performance should be close to equal, but it depends on whether you use operating system specific code, how you access the file system, how you you use the UDP protocol etc etc.