cmake bottlenecks on target linking despite -j option

cmake bottlenecks on target linking despite -j option - multithreading

Suppose I have a CMake project with consists of two targets A and B, each of which comes with source files to be compiled.
If I build with -jN option (where N is a number of compile jobs, which I set to the number of logical cores on my machine), CMake will start building A by compiling in parallel all of its source files. Processor load is now maximum on my machine, if there are more files than logical cores.
Then it will link A, using a single core, and not starting anything else. This is a major bottleneck.
Only afterwards will it being building B by starting N parallel compilation jobs of N of B's sources (like for A). There is however absolutely no dependency of any of B on A. Or if there is, I can't tell and would like CMake to tell me that, which I guess constitutes the first half of my question.
Second, supposing that there is no dependency of B on A, could I tell CMake to begin building B as soon as possible, or indeed right at the start ?

Related

Very slow RedHawk component builds

We have some components that build 15+ object files before linking them. We find that if we modify a .h file used by many or all, that builds are VERY slow. Some of our components take over an hour to build. It appears that RedHawk issues a make -j or a make -j with a large number, so that we have 15+ compiles running simultaneously and this overwhelms even 4 GB of RAM and results in excessive swapping and VERY slow execution (the entire CPU is nearly locked up, other windows are also dead until it completes). If we use a simple make from shell in the component it completes in 5 min. Is there a way to change RH to issue a simple make or make with an adjustable number of max processes?

If you're referring to how the IDE invokes the build you can check the build console. I'm pretty sure it either calls the top level build.sh or the build.sh within your implementation's folder. In either case you can modify that file to perform the build however you'd like.

GNU make - how to simulate multiple simultaneous jobs

I know that to allow make to be multithreaded, I use the command make --jobs=X where X is usually equal to number of cores (or twice that or whatever).
I am debugging a makefile - actually consists of many makefiles - to work with the --jobs=X option. Here's an example of why it currently doesn't:
T1:
mkdir D1
output_makefile.bat > ./D1/makefile
T2:
cd D1
make
Executing this with --jobs=X will lead to a race condition because T1 is not specified as a dependency of T2 and eventually T2 will get built ahead of T1; most of the bugs I need to fix are of this variety.
If X in --jobs=X is greater than the number of ?logical or physical? cores, the number of jobs executed simultaneously will be capped at the number of ?logical or physical? cores.
My machine has 4 physical/8 logical cores but the build machine that will be running our builds will have as many as 64 cores.
So I'm concerned that just because my makefile (a) builds the final output correctly (b) runs without errors on my machine with --jobs=4 does not mean it'll run correctly and without errors with --jobs=64 on a 64-core machine.
Is there a tool that will simulate make executing in an environment that has more cores than the physical machine?
What about creating a virtual machine with 64 cores and run it on my 4-core machine; is that even allowed by VMPlayer?
UPDATE 1
I realized that my understanding of make was incorrect: the number of job slots make creates is equal to the --jobs=N argument and not the number of cores or threads my PC has.
However, this by itself doesn't necessarily mean that make will also execute those jobs in parallel even if I have fewer cores than jobs by using task-switching.
I need to confirm that ALL the jobs are being executed in parallel vs merely 'queued up' and waiting for the actively executing jobs to finish.
So I created a makefile with 16 targets - more than the num of threads or cores I have - and each recipe merely echos the name of the target a configurable number of times.
make.mk
all: 1 2 3 4 ... 14 15 16
<target X>:
#loop_output.bat $#
loop_output.bat
#FOR /L %%G IN (1,1,2048) DO #echo (%1-%%G)
The output will be something like
(16-1) <-- Job 16
(6-1400)
(12-334)
(1-1616) <-- Job 1
(4-1661)
(15-113)
(11-632)
(2-1557)
(10-485)
(7-1234)
(5-1530)
The format is Job#X-Echo#Y. The fact that I see (1-1616) after (16-1) means that make is indeed executing target 16 at the same time as target 1.
The alternative is that make finishes jobs (1-#of cores/threads) and then takes another chunk of jobs equal to #num cores/threads but that's not what's happening.

See my "UPDATE 1":
No special software or make tricks are required. Regardless of number of cores you have, Make will execute the jobs truly in parallel by spawning multiple processes and letting the OS multitask them just like any other process.
Windows PITFALL #1: The version of Gnu Make available on SourceForge is 3.81 which does NOT have the ability to even execute using --jobs. You'll have to download ver 4.2 and build it.
>
Windows PITFALL #2: make 4.2 source will fail to build because of some header that VS2008 (and older) doesn't have. The fix is easy: you have to replace the invocation of the "symbol not found" with its macro equivalent; it should be obvious what I'm talking about when you try to build it. (I forgot what the missing symbol was).

How to get an ETA?

I am build several large set of source files (targets) using scons. Now, I would like to know if there is a metric I can use to show me:
How many targets remain to be build.
How long it will take -- this to be honest, this is probably a no-go as it is really hard to tell!
How can I do that in scons?

There is currently no progress indicator built into SCons, and it's also not trivial to provide one. The problem is, that SCons doesn't build the complete DAG first, and then starts the build...such that you'd have a total number of targets to visit that you could use as a reference (=100%).
Instead, it makes up the DAG on the go... It looks at each target, and then expands the list of its children (sources and implicit dependencies like headers) to check whether they are up-to-date. If a child has changed, it gets rebuilt by applying the same "build step" recursively.
In this way, SCons crawls itself from the list of targets, as given on the command-line (with the "." dir being the default), down the DAG...where only the parts are ever visited, that are required for (or, in other words: have a dependency to) the requested targets.
This makes it possible for SCons to handle things like "header files, generated by a program that must be compiled first" in the first go...but it also means that the total number of targets/children to get visited changes constantly.
So, a standard progress indicator would continuously climb towards the 80%-90%, just to then fall back to 50%...and I don't think this would give you the information you're really after.
Tip: If your builds are large and you don't want to wait, do incremental builds and only build the library/program you're currently doing work on ("scons lib1"). This will still take into account all dependencies, but only a fraction of the DAG has to get expanded. So, you use less memory and get faster update times...especially if you use the "interactive" mode. In a project with a 100000 C files total, the update of a single library with 500 C files takes about 1s on my machine. For more infos on this topic check out http://scons.org/wiki/WhySconsIsNotSlow .

How to speed up Linux kernel compilation?

I have core i5 with 8gb RAM.
I have VMware workstation 10.0.1 installed on my machine.
I have fedora 20 Desktop Edition installed on VMware as guest OS.
I am working on Linux kernel source code v 3.14.1. I am developing an I/O scheduler for Linux kernel. After any modifications in code every time it takes around 1 hour and 30 minutes for compiling and installing the whole kernel code to see the changes.
Compilation and Installation commands:
make menuconfig,
make,
make modules,
make modules_install,
make install
So my question is it possible to reduce 1 hour and 30 minutes time into only 10 to 15 minutes?

Do not do make menuconfig for every change you make to the sources, because it will trigger a full compilation of everything, no matter how trivial your change is. This is only needed when the configuration option of the kernel changes, and that should sheldom happen during your development.
Just do:
make
or if you prefer the parallel compilation:
make -j4
or whatever number of concurrent tasks you fancy.
Then the make install, etc. may be needed for deploying the recently built binaries, of course.
Another trick is to configure the kernel to the minimum needed for your tests. I've found that for many tasks a UML compilation (User Mode Linux) is the fastest. You may also find useful make localmodconfig instead of make menuconfig to start with.

Use make parallel build with -j option
Compile for the target architecture only, since otherwise make will build the kernel for every listed architecture.
i.e. for eg instead of running:
make
run:
make ARCH=<your architecture> -jN
where N is the no of cores on your machine (cat /proc/cpuinfo lists the no of cores). For eg, for i386 target and host machine with 4 cores (output of cat /proc/cpuinfo):
make ARCH=i386 -j4
Similarly you can run the other make targets (modules, modules_install, install) with -jN flag.
Note: make does a check of the files modified and compiles only those files which have been modified so only the initial build should take time, subsequent builds will be faster.

make -j will make use of all available CPUs.

You do not need to run make menuconfig again every time you make a change — it is only needed once to create the kernel .config file. (Or possibly again if you edit Kconfig files to add or modify configuration options, but this certainly shouldn't be happening often.)
So long as your .config is left alone, running make should only recompile files that you changed. There are a few files that must be compiled every time, but the vast majority are not.

ccache should be able to dramatically speed up your compile times. It speeds up recompilation by caching previous compilations and detecting when the same compilation is being done again. Your first compilation with ccache will be slower since it needs to populate the cache, but subsequent builds should be much faster.
If you don't want to fuss with ccache configurations you can just run it like so to compile the kernel:
ccache make

Perhaps in addition to the previous suggestions, while using ccache, you might want to unset CONFIG_GCC_PLUGINS (if it was set) otherwise you may get a lot of cache misses, as seen in this example.

Perhaps in addition to the previous suggestions, using ccache software (https://ccache.samba.org/) and a compilation directory on SSD drive should drastically decrease the compilation time.

If you have suffitient RAM and you wont be using your machine while the kernel is being built u can spawn a large number of concurrent jobs. But make sure your RAM is sufficient otherwise your system will hang and crash.

Use this command:
sudo make -j 4 && sudo make modules_install -j 4 && sudo make install -j 4
Where 4 is the number of cores I have alloted to working on this process.
Credits

Simple trick. If you don't use your own machine or have another one, you can log out completely and switch to a TTY terminal using CTRL + ALT + F*. Everything is much much faster.

How to reduce compilation cost in GCC and make?

I am trying to build some big libraries, like Boost and OpenCV, from their source code via make and GCC under Ubuntu 8.10 on my laptop. Unfortunately the compilation of those big libraries seem to be big burden to my laptop (Acer Aspire 5000). Its fan makes higher and higher noises until out of a sudden my laptop shuts itself down without the OS gracefully turning off.
So I wonder how to reduce the compilation cost in case of make and GCC?
I wouldn't mind if the compilation will take much longer time or more space, as long as it can finish without my laptop shutting itself down.
Is building the debug version of libraries always less costly than building release version because there is no optimization?
Generally speaking, is it possible to just specify some part of a library to install instead of the full library? Can the rest be built and added into if later found needed?
Is it correct that if I restart my laptop, I can resume compilation from around where it was when my laptop shut itself down? For example, I noticed that it is true for OpenCV by looking at the progress percentage shown during its compilation does not restart from 0%. But I am not sure about Boost, since there is no obvious information for me to tell and the compilation seems to take much longer.
UPDATE:
Thanks, brianegge and Levy Chen! How to use the wrapper script for GCC and/or g++? Is it like defining some alias to GCC or g++? How to call a script to check sensors and wait until the CPU temperature drops before continuing?

I'd suggest creating a wrapper script for gcc and/or g++
#!/bin/bash
sleep 10
exec gcc "$#"
Save the above as "gccslow" or something, and then:
export CC="gccslow"
Alternatively, you can call the script gcc and put it at the front of your path. If you do that, be sure to include the full path in the script, otherwise, the script will call itself recursively.
A better implementation could call a script to check sensors and wait until the CPU temperature drops before continuing.

For your latter question: A well written Makefile will define dependencies as a directed a-cyclical graph (DAG), and it will try to satisfy those dependencies by compiling them in the order according to the DAG. Thus as a file is compiled, the dependency is satisfied and need not be compiled again.
It can, however, be tricky to write good Makefiles, and thus sometime the author will resort to a brute force approach, and recompile everything from scratch.
For your question, for such well known libraries, I will assume the Makefile is written properly, and that the build should resume from the last operation (with the caveat that it needs to rescan the DAG, and recalculate the compilation order, that should be relatively cheap).

Instead of compiling the whole thing, you can compile each target separately. You have to examine the Makefile for identifying them.
Tongue-in-cheek: What about putting the laptop into the fridge while compiling?

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string