Can node.js worker threads run on all CPUs and all cores? - node.js

I understand that node.js can run on multiple cores on at least one CPU. What's not clear to me is whether it can run on more than one CPU. For example, if I have a 4-CPU machine with 10 cores in each CPU, can a single node.js process take advantage of all 40 cores, or just 10? Does it depend on the OS?

Other than performance effects (such as NUMA), multi-socket systems work exactly like single-socket-multi-core systems, whatever that implies for node.js. They're all SMP systems with multiple CPU cores and cache-coherent shared memory.
OSes will run threads across all physical cores in the system, so all that matters is that your workload is threaded at all.
The only thing that would be different is a cluster of machines with shared memory that's not cache-coherent; in that case you wouldn't be running a single instance of an OS across all the cores.

Related

Is there code to emulate and increase the number of CPU in a system

This code somehow would make the OS think there are more cores on the system and dispatch threads to those. That code then should be able to take that dispatched thread and run it on a different CPU on another machine.
Of course it would have to know the specs of that machine before hand so it can expose the right CPU specs to the OS or whatever information is needed by the OS.

Is there a maximum number of CPU's that a VirtualBox could bare?

I am using VirtualBox 5.1 running in a host with 48 CPU's and 250GB of RAM
the virtual machine that I am importing (the guest) initially had 2 CPU's and 4GB of RAM.
Inside this machine I am running a process with Java that starts a dynamic number of threads to perform some tasks.
I ran it with configurations below:
The whole process in my laptop (2 CPUs/4GB RAM) ~ 11 seconds
same program in the virtual machine in the server
(15 CPUs and 32GB of RAM) ~ 45 seconds
same program in the virtual machine in the server
(20 CPUs and 32GB of RAM) ~ 100+ seconds
same program in the virtual machine in the server
(10 CPUs and 32GB of RAM) ~ 5+ seconds
First I thought that there was a problem in how I was managing the threads from Java but after many tests I figured out that there was a relation between the number of CPU's that the virtual machine has and its performance, the maximum was 10, after that the overall performance of the machine slows down(CPU starvation?)
The virtual machine runs Oracle Enterprise Linux 6.7 and the host runs Oracle Enterprise Linux 6.9
I couldn't found any hard limit in the Virtual Machine documentation regarding the number of CPU's.
Is there a setting that needs to be set to enable/take advantage of more than 10 CPU's in a VirtualBox instance?
Time has happened since I posted this question, just for the archive I
will share my findings hoping they can help to save time for others.
It turns out that the performance issues were due to the way how VirtualBox works. Especially the relationship between the OS and the hypervisor.
The Virtual Machine (the guest OS) at the end is a single process for the host and when you modify the number of CPU's in the Virtual Machine settings what they will do is change the number of threads that the process will have to emulate the other CPU's. (at least in VirtualBox)
Having said that, when I assigned 10+ CPUs to the VM I ended up with:
a single process with 10+ threads
an emulated OS running hundreds of processes
my Java code which was creating another bunch of threads
All of that together caused that the setup was saturating the host Virtual Machine process which I think it was due to the way how the host OS was handling the processes context switching
On my server, the hard limit was 7 virtual CPU's, if I added more than that it would slow down the performance of the Java software
Running the Java software outside of the VM didn't show any performance issue, it worked out of the box with 60+ isolated threads.
We have almost the same setup as yours (Virtualbox running on a 48-core machine across 2 NUMA nodes).
I initially set the number of cores to the maximum supported in Virtualbox (e.g. 32), but quickly realized that one of the two NUMA nodes was always idling while the other stayed at medium loads, when the VM was under load.
Long story short, a process can only be assigned to a single NUMA node, and Virtualbox runs one user process with several threads... which means that we are limited to using 24 cores (and even less in practice considering that this is a 12-core cpu with hyperthreading).

The distribution of processing threads on the processor cores

The operating system independently distributes the processing of threads over the processor cores. The program has two threads. Initially, both threads are not loaded with work and are processed by one core. Later they are loaded with work. Will the operating system transfer the processing of one thread to another processor core?

How does Erlang implement concurrency without the use of OS threads?

If Erlang does its own process creation and scheduling, without utilizing OS threads, how does it make use of multiple CPU cores? My limited understanding is that the OS assigns the CPU cores to OS threads.
Erlang runs on a virtual machine called BEAM.
The Erlang process runs a separate BEAM VM on each core (using OS threads).
See this related SO question.

Multithreded applications on different CPUS

If, for example, there is a let's say embedded application which run on unicore CPU. And then that application would be ported on multi core CPU. Would that app run on single or multiple cores?
To be more specific I am interested in ARM CPU (but not only) and toolchain specifics e. g. standard C/C++ libraries.
The intention of this question is this: is it CPU's responsibility to "decide" to execute on multiple cores or compiler toolchain, developer and standard platfor specific libraries? And again, I am interested also in other systems' tendencies out there.
There are plenty of applications and RTOS (for example Linux) that run on different CPUs but the same architecture, so does that mean that they are compiled differently?
Generally speaking single-threaded code will always run on one core. To take advantage of multiple cores you need to have either multiple processes, multiple threads, or both.
There's nothing your compiler can do to help you here. This is an architectural consideration.
If you have multiple threads, for example, most multi-core systems will run them on whatever cores are available if the operating system you're running is properly compiled to support that. Running an OS that's been compiled single-core only will obviously limit your options here.
A single threaded program will run in one thread. It is theoretically possible for the thread to be scheduled to move to a different core, but the scheduler cannot turn a single thread into multiple threads and give you any parallel processing.
EDIT
I misunderstood your question. If there are multiple threads in the application, and that application is binary compatible with the new multicore CPU, the threads will indeed be scheduled to run on different CPUs, if the OS scheduler deems it appropriate.
Well it all depends on the software that if it wants to utilize other cores or not (if present). Lets take an example of Linux on ARM's cortexA53.
Initially a vendor provided boot loader runs on, FSBL (First state bootloader). It then passes control to Arm trusted firmware. ATF then runs uboot. All these run on a single core. Then uboot loads linux kernel and passes control to it. Linux then initializes some stuff and looks into some option, first in the bootargs for smp or nosmp flags. if smp it will get the number of CPUs assigned to it from dtb and then using SMC calls to ATF it will start other cores and then assign work to those cores to provide true feel of multiprocessing environment. This is normally called load balancing and in linux it is mostly done in fair.c file.

Resources