Why does some nodeJS tasks makes Ubuntu freeze and not Windows? - node.js

I have been facing an intriguing problem lastly.
I am working on a project with a pretty heavy front in Angular JS with a hundred of Jest tests. I have 16 Go of ram but the project is so heavy that sometimes it fills up completely the ram and often the computer cannot handle the project running plus a yarn test at the same time (which takes up to 3 to 4 Go of ram) or a cypress workflow test without big latency problems.
To avoid big freezes (up to several minutes) and crashes, I increased the swap to 16 Go.
That being said, for various reasons I had to work on the project on Windows 10 and faced none of these problems.
Everything runs smoothly, the graphical interface doesn't lag even with screen sharing even-though the ram is also completely filled up and the CPU at 100%.
I am even able to run 20 yarn test at the same time without much lag which seems completely impossible on Linux even with the increased swap.
I've seen that windows use ram compression by default and not linux but I only had up to 549 Mo of compressed ram during my comparisons.
I firstly though that it could be a problem with gnome which is known to be heavy and sometimes buggy but I also tested it with KDE and have the same results.
I also heard that windows allocate special resources to the graphical environment where linux may treat it like any other process but that alone cannot explain all the problems because the whole computer freezes on linux and not in windows.
So I'm starting to wonder if there is something about the memory or process management that windows do significantly better than linux.
My config :
Computer model : Dell XPS-15-7590
Processor : Intel core i7 9750H, 2,6 GHz, 4,5 GHz turbo max (6 cores, 12 threads)
RAM : 16 Go
Graphic card : GTX 1650M
Screen : 4K 16:9
SSD : NVME 512 Go

I was facing the same issue on Ubuntu 22.04 with 16GB RAM and Intel i5-12400 Processor
My solution was to limit the number of max workers on jest config
"maxWorkers": 4

Related

How to utilize the High Performance cores on Apple Silicon

I have developed a macOS app which is heavily relying on multithreading (a call center simulator). It runs fine on my iMac 2019 and fills up all cores nicely. In my test scenario it simulates app. 1.4 mio. telephone calls in total in 100 iterations, each iteration as a dispatch item on a parallel dispatch queue.
Now I have bought a new Mac mini with M1 Apple Silicon and I was eager to see how the performance develops on that test machine. Well, it’s not bad but not as good as I expected:
System
Duration
iMac 2019, Intel 6-core i5, 3.0 GHz, Catalina macOS 10.15.7
19.95 s
Mac mini, M1 8-core, Big Sur macOS 11.2, Rosetta2
26.85 s
Mac mini, M1 8-core, Big Sur macOS 11.2, native ARM
17.07 s
Investigating a little bit further I noticed that at the start of the simulation all 8 cores of the M1 Mac are filled up properly but after a few seconds only the 4 high efficiency cores are used any more.
I have read the Apple docs „Optimize for Apple Silicon with performance and efficiency cores“ and double checked that the dispatch queue for the iterations is set up properly:
let simQueue = DispatchQueue.global(qos: .userInitiated)
But no success. After a few seconds of running the high performance cores are obviously not utilized any more. I even tried to set up the queue with qos set to .userInteracive up that didn’t help either. I also flagged the dispatch items with proper qos but that didn’t change anything. It looks to me that other apps (e.g. XCode) do utilize the high performance cores even for a longer time.
Does anybody know how to force a M1 Mac to utilize the high performance cores?
"M1 8 core" is really "M1 4 performance + 4 power saving cores". I expect it to have be a bit more performance than an Intel 6 core, but not much. Exactly has you see, 15% faster than six Intel cores or about as fast as 7 Intel cores would be. The current M1 chips are low end processors. "A bit better than Intel six cores" is quite good.
Your code must be running on the performance cores, otherwise there would be no chance at all to come close to the Intel performance. In that graph, nothing tells you which cores are used.
What happens most likely is that all cores start running, each trying to do one eighth of the work, and after about 8 seconds the performance cores have their work done. Then the power saving cores move their work to the performance cores. And you are just misinterpreting the image as only low performance cores doing the work.
I would guess that Apple has put a preference on using efficiency cores over performance for many reasons. Battery life being one, and most likely thermal reasons as well. This is the big question mark with a SoC that originally was designed for smartphones and tablets. MacOS is a much heavier OS then IOS or iPad OS. Apple most likely felt that performance cores were only needed in the cases where maximum throughput was needed. No doubt, I think some including myself with a M1 Mac Mini would like a way to adjust this balance between efficiency and performance cores. Personally overall, I would prefer all cores be capable of switching between efficiency and performance such as in Intel's Speed shift technology. This may come along with the M1's advancements in terms of Mac Pro models and other Pro models.

what could be the reason made a program running on two compute with a big different ipc(instrument per second)?

OS: Linux 4.4.198-1.el7.elrepo.x86_64
CPU: Intel(R) Xeon(R) Gold 6148 CPU # 2.40GHz * 4
MEM: 376 GB
I have a program(do some LSTM model inference based on TensorFlow 1.14),it runs on two machines with the same hardware, one got a bad performance while the other got a better one(about 10x diff).
I used intel pqos tool diagnosed that two processes and got a big different IPC number (one is 0.07 while the other one is 2.5), both processes are binding on some specified CPU core, and each machine payload does not heavy. This problem appears two weeks ago, before that this bad machine works as well, history command shows nothing configuration changed.
I checked many environ information, including the kernel, fs, process scheduler, io scheduler, program, and libraries md5, they are the same, the bad computer iLo show none error, the program mainly burns the CPU.
I used sysbench to test two machines(cpu & memory) which show about 25% performance difference, the bad machine does prime calculation is slower. Could it be some hardware problem?
I don't know what is the root cause resulted in the difference in IPC (equaled to performance), How can I dig into the situation?

Saving GPU memory by bypassing GUI

I have a mac book pro with a 2 Gb Nvidia GPU. I am trying to utilize all my GPU memory for computations (python code). How much saving I may get if I bypassed the GUI interface and only accessed my machine through command line. I want to know if such a thing would save me a good amount of GPU memory?
The difference probably won't be huge.
A GPU that is only hosting a console display will only typically have ~5-25 megabytes of memory reserved out of the total memory size. On the other hand, a GPU that is hosting a GUI display (using the NVIDIA GPU driver) might typically have ~50 megabytes or more reserved for display use (this may vary somewhat based on the size of the display attached).
So you can probably get a good "estimate" of the savings by running nvidia-smi and looking at the difference between total and available memory for your GPU with the GUI running. If that is, for example, 62MB, then you can probably "recover" around 40-50MB by shutting off the GUI, for example on linux switching to runlevel 3.
I just ran this experiment on a linux laptop with a Quadro3000M that happens to have 2GB of memory. With the X display up and NVIDIA GPU driver loaded, the "used" memory was 62MB out of 2047MB (reported by nvidia-smi).
When I switch to runlevel 3 (X is not started) the memory usage dropped to about 4MB. That probably means that ~50MB additional is available to CUDA.
A side benefit of shutting off the GUI might be the elimination of the display watchdog.

why performance improvement of CPU-intensive task differs between windows and linux when using multi processes

Here is my situation:
my company need to run tests on tons of test samples. But if we start a single process on a windows PC machine, this test could last for hours, even days. so we try to split the test set and start a process to test each one of the slices on a multi-core linux server.
we expect a linear performance improvement for the server solution, but the truth is we could only observe a 2~3 times improvement when the test task finished by 10~20 processes.
I tried several means to locate the problem:
disable hyper-threading;
use max-performance power policy
use taskset to pin each process on different core
but no luck, the problem remains.
Why does this happen? which is the root cause, our code, OS or hardware?
here is the info of my pc and server:
PC: os: win10; cpu: i5-4570, 2 physical core; mem : 16gb
server: os: redhat 6.5 cpu: E5-2630 v3, 2 physical core; mem : 32gb
Edit:
About CPU: the server has 2 processors, and each of them has 8 physical cores. check this link for more information.
About My Test: it's handwriting recognition related(that's why it's a cpu-sensitive task).
About IO: the performance check points do not involve much IO if logging doesn't count.
we expect a linear performance improvement for the server solution,
but the truth is we could only observe a 2~3 times improvement when
the test task finished by 10~20 processes.
This seems very logical considering there are only 2 cores on the system. Starting 10-20 processes will only add some overhead due to task switching.
Also, I/O could be a bottleneck here too, if multiple processes are reading from disk at the same time.
Ideally, the number of running threads should not exceed 2 x the number of cores.

Performance Problems with Node.js (Mac OSX) - Processes

i hope to find some little help here.
we are using node, mongodb, supertest, mocha and spawn in our test env.
we've tried to improve our mocha test env to run tests in parallel, because our test cases run now for almost 5minutes! (600 cases)
we are spawning for example 4 processes and run tests in parallel. it is very successful, but just on linux.
on my mac the tests still run very slow. it seems like the different processes are not really running in parallel.
test times:
macosx:
- running 9 tests in parallel: 37s
- running 9 tests not in parallel: 41s
linux:
- running 9 tests in parallel: 16s
- running 9 tests not in parallel: 25s
maxosx early 2011:
10.9.2
16gb ram
core i7 2,2ghz
physical processors: 1
cores: 4
threads: 8
linux dell:
ubuntu
8gb ram
core i5-2520M 2.5ghz
physical processors: 1
cores: 2
threads: 4
my questions are:
are there any tips improving the process performance on macosx?
(except of ulimit, launchctl (maxfiles)?)
why do the tests running much faster on linux?
thanks,
kate
I copied my comment here, as it describes most of the answer.
Given the specs of your machines, I doubt the extra 8 gigs of ram, and processing power effect things much, especially given nodes single process model, and that you're only launching 4 processes. I doubht for the linux machine that 8 gigs, 2.5 ghz, and 4 threads is a bottleneck at all. As such, I would actually expect the time the processor spends running your tests to be roughly equivalent for both machines. I'd be more interested in your Disk I/O performance, given you're running mongo. Your Disk I/O has the most potential to slow stuff down. What are the specs there?
Your specs:
macosx: Toshiba 5400RPM 8MB
linux: Seagate 7200 rpm 16mb
Your Linux drive is significantly, 1.33X faster, than your mac drive, as well as having a significantly larger cache. For database based applications hard drive performance is crucial. Most of the time spent in your application will be waiting for I/O, particularly in Nodes single process method of doing work. I would suggest this as the culprit for 90% of the performance difference, and chalk the rest up to the fact that linux probably has less crap going on in the background, further exacerbating your Mac Disk Drive's performance issues.
Furthermore, launching multiple node processes isn't likely to help this. Since processor time isn't your bottleneck, launching too many processes will just slow your disk down. Another proof that this is the problem is that the performance of multiple processes on linux is proportionally better than the performance of multiple processes on mac. 1 process is nearly maxing out the performance of your 5400 drive, and so you don't see significant performance increase from running multiple processes. Whereas the multiple linux Node processes use the disk to it's full potential. You would likely see diminishing returns on the linux OS if you were to launch many more processes, unless of course you were to upgrade to a SSD.

Resources