Is there a way to limit the number of CPU cores Bazel uses? - cpu-cores

Is there a way to tell Bazel when building how many CPU cores it can use?
TL;DR
I build TensorFlow on a VMware Workstation and being a virtual machine I can adjust the number of processors and cores for the virtual machine.
In the process of building TensorFlow I found that using only one core works.
When I give the Workstation four cores and build TensorFlow it eventually halts the system to the point I have to reboot.
If I wait a few hours (leave it alone overnight) it sometimes returns with the following error:
gcc: internal compiler error: Killed (program cc1plus)
While I can change the number of cores using the virtual machines configuration options I would prefer to do it without having to shut down and restart the virtual machine.

some examples for your .bazelrc
build --local_ram_resources=HOST_RAM*.5 --local_cpu_resources=HOST_CPUS-1 (leave one core free)
or
build --local_cpu_resources=1 (use a single core)
See https://docs.bazel.build/versions/master/command-line-reference.html#flag--local_cpu_resources
The currently accepted answer is deprecated.

From Bazel User Manual
--local_resources availableRAM,availableCPU,availableIO
This option, which takes three comma-separated floating point
arguments, specifies the amount of local resources that Bazel can take
into consideration when scheduling build and test activities. Option
expects amount of available RAM (in MB), number of CPU cores (with 1.0
representing single full core) and workstation I/O capability (with
1.0 representing average workstation). By default Bazel will estimate amount of RAM and number of CPU cores directly from system
configuration and will assume 1.0 I/O resource.
If this option is used, Bazel will ignore both
--ram_utilization_factor.

Related

JMeter script does not achieve required TPS/RPS on Linux VM, but achieves it on MAC system running on GUI mode

I have a script where I am using Throughput Shaping Timer to achieve 100 TPS/RPS.
When the script is executed on MAC System using GUI Mode, it is able to achieve ~99 TPS/RPS. But, when I execute it on Linux System it hardly goes beyond 60 RPS/TPS.
Following logs received on Linux OS (same script, so Thread Group settings remain as is):
No free threads available in current Thread Group Device Service
Some of the details given below:
JMeter version is 5.4.3 on both the systems (copied the same JMeter to Linux VM as well)
MAC OS version is: 11.6
Linux OS version is: Red Hat Enterprise Linux 8.6 (Ootpa)
Heap setting on both the systems are given below (even increased it to 13g on Linux VM):
: "${HEAP:="-Xms1g -Xmx1g -XX:MaxMetaspaceSize=256m"}"
Please let me know which settings I should do to achieve similar TPS/RPS as with my GUI on MAC.
Thread Group Setting shown in the attached image.
First of all GUI mode is for tests development and debugging, when it comes to test execution you should be running tests in command-line non-GUI mode in order to get accurate results
Make sure to use the same Java version
Make sure to use the same (or at least similar) hardware
Make sure to check resources consumption (CPU, RAM, Disk, Network, Swap usage, etc.) as it you're hitting the hardware or OS limits you might get false-negative results because JMeter cannot run requests as fast as it can and if this is the case you might need another Linux box and run JMeter in distributed mode
No free threads available in current Thread Group means that there is not enough threads in order to reach/maintain the desired throughput, you can try increasing the number of threads in the Thread Group or switch to Concurrency Thread Group and connect it to the Throughput Shaping Timer via Feedback function

Yocto runs only one task inside at a time

I have setup my development environment inside Virtual Machine running Ubuntu 14.04. My company doesn't allow me to run direct Linux Flavoured OS may be due to security reasons.
One thing I have observed is that in VM it only runs one task ast a time whereas if i run on my personal laptop it runs multiple tasks at a time.
Is there any way to configure poky in local.conf file for example or any other file for it to run multiple tasks at the same time. I have given more than 6 GB of RAM to the VM.
As it is running one task, build is taking a lot of time..
Thanks for your time
bitbake task executor enquires for number of CPUs dynamically, so it seems that you might have allocated 1 cpu to your VM. You might be able to see CPUs with below command in VM
lscpu
You might want to allocate more CPUs. VirtualBox lets you do that
Stop virtual machine
Click settings-> click system -> click processor -> Change the number of procesors.

Docker CPU/Mem allocation in Mac/Win

As far as I understood, at the moment, Docker for Mac requires that I decide upfront how much memory and CPU cores to statically allocate to the virtualized linux it runs on.
So that means that even when Docker is idle, my other programs will run on (N-3) CPU cores and (M-3)GB of memory. Right?
This is very suboptimal!
In Linux, it's ideal because a container is just another process. So it uses and releases the system memory as containers starts and stop.
Is my mental model correct?
Will one day Docker for Mac or Windows dynamically allocate CPU and Memory resources?
The primary issue here is that, for the moment, Docker can only run Linux containers on Linux. That means on OS X or Windows, Docker is running in a Linux VM, and it's ability to allocate resources is limited by the facilities provided by the virtualization software in use.
Of course, Docker can natively on Windows, as long as you want to run Windows containers, and in this situation may more closely match the Linux "a container is just a process" model.
It is possible that this will change in the future, but that's how things stand right now.
So that means that even when Docker is idle, my other programs will run on (N-3) CPU cores and (M-3)GB of memory. Right?
I suspect that's true for memory. I believe that if the docker vm is idle it isn't actually using much in the way of CPU resources (that is, you are not dedicating CPUs to the VM; rather, you are setting maximum limits on how many resources the vm can consume).

Is there a maximum number of CPU's that a VirtualBox could bare?

I am using VirtualBox 5.1 running in a host with 48 CPU's and 250GB of RAM
the virtual machine that I am importing (the guest) initially had 2 CPU's and 4GB of RAM.
Inside this machine I am running a process with Java that starts a dynamic number of threads to perform some tasks.
I ran it with configurations below:
The whole process in my laptop (2 CPUs/4GB RAM) ~ 11 seconds
same program in the virtual machine in the server
(15 CPUs and 32GB of RAM) ~ 45 seconds
same program in the virtual machine in the server
(20 CPUs and 32GB of RAM) ~ 100+ seconds
same program in the virtual machine in the server
(10 CPUs and 32GB of RAM) ~ 5+ seconds
First I thought that there was a problem in how I was managing the threads from Java but after many tests I figured out that there was a relation between the number of CPU's that the virtual machine has and its performance, the maximum was 10, after that the overall performance of the machine slows down(CPU starvation?)
The virtual machine runs Oracle Enterprise Linux 6.7 and the host runs Oracle Enterprise Linux 6.9
I couldn't found any hard limit in the Virtual Machine documentation regarding the number of CPU's.
Is there a setting that needs to be set to enable/take advantage of more than 10 CPU's in a VirtualBox instance?
Time has happened since I posted this question, just for the archive I
will share my findings hoping they can help to save time for others.
It turns out that the performance issues were due to the way how VirtualBox works. Especially the relationship between the OS and the hypervisor.
The Virtual Machine (the guest OS) at the end is a single process for the host and when you modify the number of CPU's in the Virtual Machine settings what they will do is change the number of threads that the process will have to emulate the other CPU's. (at least in VirtualBox)
Having said that, when I assigned 10+ CPUs to the VM I ended up with:
a single process with 10+ threads
an emulated OS running hundreds of processes
my Java code which was creating another bunch of threads
All of that together caused that the setup was saturating the host Virtual Machine process which I think it was due to the way how the host OS was handling the processes context switching
On my server, the hard limit was 7 virtual CPU's, if I added more than that it would slow down the performance of the Java software
Running the Java software outside of the VM didn't show any performance issue, it worked out of the box with 60+ isolated threads.
We have almost the same setup as yours (Virtualbox running on a 48-core machine across 2 NUMA nodes).
I initially set the number of cores to the maximum supported in Virtualbox (e.g. 32), but quickly realized that one of the two NUMA nodes was always idling while the other stayed at medium loads, when the VM was under load.
Long story short, a process can only be assigned to a single NUMA node, and Virtualbox runs one user process with several threads... which means that we are limited to using 24 cores (and even less in practice considering that this is a 12-core cpu with hyperthreading).

Multithreded applications on different CPUS

If, for example, there is a let's say embedded application which run on unicore CPU. And then that application would be ported on multi core CPU. Would that app run on single or multiple cores?
To be more specific I am interested in ARM CPU (but not only) and toolchain specifics e. g. standard C/C++ libraries.
The intention of this question is this: is it CPU's responsibility to "decide" to execute on multiple cores or compiler toolchain, developer and standard platfor specific libraries? And again, I am interested also in other systems' tendencies out there.
There are plenty of applications and RTOS (for example Linux) that run on different CPUs but the same architecture, so does that mean that they are compiled differently?
Generally speaking single-threaded code will always run on one core. To take advantage of multiple cores you need to have either multiple processes, multiple threads, or both.
There's nothing your compiler can do to help you here. This is an architectural consideration.
If you have multiple threads, for example, most multi-core systems will run them on whatever cores are available if the operating system you're running is properly compiled to support that. Running an OS that's been compiled single-core only will obviously limit your options here.
A single threaded program will run in one thread. It is theoretically possible for the thread to be scheduled to move to a different core, but the scheduler cannot turn a single thread into multiple threads and give you any parallel processing.
EDIT
I misunderstood your question. If there are multiple threads in the application, and that application is binary compatible with the new multicore CPU, the threads will indeed be scheduled to run on different CPUs, if the OS scheduler deems it appropriate.
Well it all depends on the software that if it wants to utilize other cores or not (if present). Lets take an example of Linux on ARM's cortexA53.
Initially a vendor provided boot loader runs on, FSBL (First state bootloader). It then passes control to Arm trusted firmware. ATF then runs uboot. All these run on a single core. Then uboot loads linux kernel and passes control to it. Linux then initializes some stuff and looks into some option, first in the bootargs for smp or nosmp flags. if smp it will get the number of CPUs assigned to it from dtb and then using SMC calls to ATF it will start other cores and then assign work to those cores to provide true feel of multiprocessing environment. This is normally called load balancing and in linux it is mostly done in fair.c file.

Resources