Linux kernel re-compilation too slow - linux

I am compiling the Linux kernel in a VM(virtual box) with 2 out of 4GB and 4 out of 8 CPUs allocated. My initial compilation took around 8-9 hours, and I was using make -j4 optimisation too. Now I added a simple system call to the kernel and just ran the make -j4 and and it has been compiling for the past 3 hours. I thought that after the initial compilation, make would only compile the small changes but it seems to be compiling everything (mostly the drivers). Is there any way I can speed up this compilation process?
For example, is there anyway by which I can disable some of the drivers that I don't really need, for example if I just want to implement a simple system call, I don't really need all the networking drivers, and maybe that would speed it up? i.e. I just want the bare minimum functionality for my kernel to test my system calls.

Compiling the kernel will always take a very long time, unfortunately there's no way around that besides having a really good processor with a lot of multi threading, however in huge projects like this, ccache will help with compilation times tremendously, it's not perfect, but far better than just compiling objects.
You won't see the difference at the initial compilation, but it will speed up recompilation by using the cache it has generated instead of compiling most of what already has been compiled before.

Related

How to speed up Linux kernel compilation?

I have core i5 with 8gb RAM.
I have VMware workstation 10.0.1 installed on my machine.
I have fedora 20 Desktop Edition installed on VMware as guest OS.
I am working on Linux kernel source code v 3.14.1. I am developing an I/O scheduler for Linux kernel. After any modifications in code every time it takes around 1 hour and 30 minutes for compiling and installing the whole kernel code to see the changes.
Compilation and Installation commands:
make menuconfig,
make,
make modules,
make modules_install,
make install
So my question is it possible to reduce 1 hour and 30 minutes time into only 10 to 15 minutes?
Do not do make menuconfig for every change you make to the sources, because it will trigger a full compilation of everything, no matter how trivial your change is. This is only needed when the configuration option of the kernel changes, and that should sheldom happen during your development.
Just do:
make
or if you prefer the parallel compilation:
make -j4
or whatever number of concurrent tasks you fancy.
Then the make install, etc. may be needed for deploying the recently built binaries, of course.
Another trick is to configure the kernel to the minimum needed for your tests. I've found that for many tasks a UML compilation (User Mode Linux) is the fastest. You may also find useful make localmodconfig instead of make menuconfig to start with.
Use make parallel build with -j option
Compile for the target architecture only, since otherwise make will build the kernel for every listed architecture.
i.e. for eg instead of running:
make
run:
make ARCH=<your architecture> -jN
where N is the no of cores on your machine (cat /proc/cpuinfo lists the no of cores). For eg, for i386 target and host machine with 4 cores (output of cat /proc/cpuinfo):
make ARCH=i386 -j4
Similarly you can run the other make targets (modules, modules_install, install) with -jN flag.
Note: make does a check of the files modified and compiles only those files which have been modified so only the initial build should take time, subsequent builds will be faster.
make -j will make use of all available CPUs.
You do not need to run make menuconfig again every time you make a change — it is only needed once to create the kernel .config file. (Or possibly again if you edit Kconfig files to add or modify configuration options, but this certainly shouldn't be happening often.)
So long as your .config is left alone, running make should only recompile files that you changed. There are a few files that must be compiled every time, but the vast majority are not.
ccache should be able to dramatically speed up your compile times. It speeds up recompilation by caching previous compilations and detecting when the same compilation is being done again. Your first compilation with ccache will be slower since it needs to populate the cache, but subsequent builds should be much faster.
If you don't want to fuss with ccache configurations you can just run it like so to compile the kernel:
ccache make
Perhaps in addition to the previous suggestions, while using ccache, you might want to unset CONFIG_GCC_PLUGINS (if it was set) otherwise you may get a lot of cache misses, as seen in this example.
Perhaps in addition to the previous suggestions, using ccache software (https://ccache.samba.org/) and a compilation directory on SSD drive should drastically decrease the compilation time.
If you have suffitient RAM and you wont be using your machine while the kernel is being built u can spawn a large number of concurrent jobs. But make sure your RAM is sufficient otherwise your system will hang and crash.
Use this command:
sudo make -j 4 && sudo make modules_install -j 4 && sudo make install -j 4
Where 4 is the number of cores I have alloted to working on this process.
Credits
Simple trick. If you don't use your own machine or have another one, you can log out completely and switch to a TTY terminal using CTRL + ALT + F*. Everything is much much faster.

How to speed up compilation time in linux

While compiling under linux I use flag -j16 as i have 16 cores. I am just wondering if it makes any sense to use sth like -j32. Actually this is a quesiton about scheduling of processor time and if it is possible to put more pressure on particular process than any other this way (let say i have like to pararell compilations each with -j16 and what if one would be -j32?).
I think it does not make much sense but I am not sure as do not know how kernel solves such things.
Kind regards,
I use a non-recursive build system based on GNU make and I was wondering how well it scales.
I ran benchmarks on a 6-core Intel CPU with hyper-threading. I measured compile times using -j1 to -j20. For each -j option make ran three times and the shortest time was recorded. Using -j9 results in shortest compile time, 11% better than -j6.
In other words, hyper-threading does help a little, and an optimal formula for Intel processors with hyper-threading is number_of_cores * 1.5:
Chart data is here.
The rule of thumb is to use the number of processors+1. Hyper-Thready counts, so a quad core CPU with HT should have -j9
Setting the value too high is counter-productive, if you do want to speed up compile times consider ccache to cache compiled objects that do not change in each compilation, and distcc to distribute the compilation across several machines.
We have a machine in our shop with the following characteristics:
256 core sparc solaris
~64gb RAM
Some of that memory used for a ram drive for /tmp
Back when it was originally setup, before other users discovered its existence, I ran some timing tests to see how far I could push it. The build in question is non-recursive, so all jobs are kicked off from a single make process. I also cloned my repo into /tmp to take advantage of the ram drive.
I saw improvements up to -j56. Beyond that my results flat lined much like Maxim's graph, until somewhere above (roughly) -j75 where performance began to degrade. Running multiple parallel builds I could push it beyond the apparent cap of -j56.
The primary make process is single-threaded; after running some tests I realized the ceiling I was hitting had to do with how many child processes the primary thread could service -- which was further hampered by anything in the makefiles that either required extra time to parse (eg., using = instead of := to avoid unnecessary delayed evaluation, complex user defined macros, etc) or used things like $(shell).
These are the things I've been able to do to speed up builds that have a noticeable impact:
Use := wherever possible
If you assign to a variable once with :=, then later with +=, it'll continue to use immediate evaluation. However, ?= and +=, when a variable hasn't been assigned previously, will always delay evaluation.
Delayed evaluation doesn't seem like a big deal until you have a large enough build. If a variable (like CFLAGS) doesn't change after all the makefiles have been parsed, then you probably don't want to use delayed evaluation on it (and if you do, you probably already know enough about what I'm talking about anyway to ignore my advice).
If you create macros you execute with the $(call) facility, try to do as much of the evaluation ahead of time as possible
I once got it in my head to create macros of the form:
IFLINUX = $(strip $(if $(filter Linux,$(shell uname)),$(1),$(2)))
IFCLANG = $(strip $(if $(filter-out undefined,$(origin CLANG_BUILD)),$(1),$(2)))
...
# an example of how I might have made the worst use of it
CXXFLAGS = ${whatever flags} $(call IFCLANG,-fsanitize=undefined)
This build produces over 10,000 object files, about 8,000 of which are from C++ code. Had I used CXXFLAGS := (...), it would only need to immediately replace ${CXXFLAGS} in all of the compile steps with the already evaluated text. Instead it must re-evaluate the text of that variable once for each compile step.
An alternative implementation that can at least help mitigate some of the re-evaluation if you have no choice:
ifneq 'undefined' '$(origin CLANG_BUILD)'
IFCLANG = $(strip $(1))
else
IFCLANG = $(strip $(2))
endif
... though that only helps avoid the repeated $(origin) and $(if) calls; you'd still have to follow the advice about using := wherever possible.
Where possible, avoid using custom macros inside recipes
The reasoning should be pretty obvious here after the above; anything that requires a variable or macro to be repeatedly evaluated for every compile/link step will degrade your build speed. Every macro/variable evaluation occurs in the same thread as what kicks off new jobs, so any time spent parsing is time make delays kicking off another parallel job.
I put some recipes in custom macros whenever it promotes code re-use and/or improves readability, but I try to keep it to a minimum.

Allocating a data page in linux with NX bit turned off

I would like to generate some machine code in my program and then run it. One way to do it would be to write out a .so file and then load it in the program but that seems too expensive.
IS there a way in linux for me to write out the code in my data pages and then set my function ointer there and just call it? I've seen something similar on windows where you can allocate a page with the NX protection turned off for that page, but I can't find a similar OS call for linux.
The mmap(2) (with munmap(2)) and mprotect(2) syscalls are the elementary operations to do that. Recall that syscalls are elementary operations from the point of view of an application. You want PROT_EXEC
You could just strace any dynamically linked executable to get a clue about how you might call them, since the dynamic linker ld.so is using them.
Generating a shared object might be less expensive than you imagine. Actually, generating C code, running the compiler, then dlopen-ing the resulting shared object has some sense, even when you work interactively. My MELT domain specific language (to extend GCC) is doing this. Recall that you can do a big lot of dlopen-s without issues.
If you want to generate machine code in memory, you could use GNU lightning (quick generation of slow machine code), libjit from dotgnu (generate less bad machine code), LuaJit, asmjit (x86 or amd64 specific), LLVM (slowly generate optimized machine code). BTW, the SBCL Common Lisp implementation is dynamically compiling to memory and produces good machine code at runtime (and there is also all the JIT for JVMs doing that).

Stripping down a kernel in linux?

I recently read a post (admittedly its a few years old) and it was advice for fast number-crunching program:
"Use something like Gentoo Linux with 64 bit processors as you can compile it natively as you install. This will allow you to get the maximum punch out of the machine as you can strip the kernel right down to only what you need."
can anyone elaborate on what they mean by stripping down the kernel? Also, as this post was about 6 years old, which current version of Linux would be best for this (to aid my google searches)?
There is some truth in the statement, as well as something somewhat nonsensical.
You do not spend resources on processes you are not running. So as a first instance I would try minimise the number of processes running. For that we quite enjoy Ubuntu server iso images at work -- if you install from those, log in and run ps or pstree you see a thing of beauty: six or seven processes. Nothing more. That is good.
That the kernel is big (in terms of source size or installation) does not matter per se. Many of this size stems from drivers you may not be using anyway. And the same rule applies again: what you do not run does not compete for resources.
So think about a headless server, stripped down -- rather than your average desktop installation with more than a screenful of processes trying to make the life of a desktop user easier.
You can create a custom linux kernel for any distribution.
Start by going to kernel.org and downloading the latest source. Then choose your configuration interface (you have the choice of console text, 'config', ncurses style 'menuconfig', KDE style 'xconfig' and GNOME style 'gconfig' these days) and execute ./make whateverconfig. After choosing all the options, type make to create your kernel. Then make modules to compile all the selected modules for this kernel. Then, make install will copy the files to your /boot directory, and make modules_install, copies the modules. Next, go to /boot and use mkinitrd to create the ram disk needed to boot properly, if needed. Then you'll add the kernel to your GRUB menu.lst, by editing menu.lst and copying the latest entry and adding a similar one pointing to the new kernel version.
Of course, that's a basic overview and you should probably search for 'linux kernel compile' to find more detailed info. Selecting the necessary kernel modules and options takes a bit of experience - if you choose the wrong options, the kernel might not be bootable and you'll have to start over, which is a pain because selecting the options and compiling the kernel can take 15-30 minutes.
Ultimately, it isn't going to make a large difference to compile a stripped-down custom kernel unless your given task is very, very performance sensitive. It makes sense to remove things you're never going to use from the kernel, though, like say ISDN support.
I'd have to say this question is more suited to SuperUser.com, by the way, as it's not quite about programming.

How to reduce compilation cost in GCC and make?

I am trying to build some big libraries, like Boost and OpenCV, from their source code via make and GCC under Ubuntu 8.10 on my laptop. Unfortunately the compilation of those big libraries seem to be big burden to my laptop (Acer Aspire 5000). Its fan makes higher and higher noises until out of a sudden my laptop shuts itself down without the OS gracefully turning off.
So I wonder how to reduce the compilation cost in case of make and GCC?
I wouldn't mind if the compilation will take much longer time or more space, as long as it can finish without my laptop shutting itself down.
Is building the debug version of libraries always less costly than building release version because there is no optimization?
Generally speaking, is it possible to just specify some part of a library to install instead of the full library? Can the rest be built and added into if later found needed?
Is it correct that if I restart my laptop, I can resume compilation from around where it was when my laptop shut itself down? For example, I noticed that it is true for OpenCV by looking at the progress percentage shown during its compilation does not restart from 0%. But I am not sure about Boost, since there is no obvious information for me to tell and the compilation seems to take much longer.
UPDATE:
Thanks, brianegge and Levy Chen! How to use the wrapper script for GCC and/or g++? Is it like defining some alias to GCC or g++? How to call a script to check sensors and wait until the CPU temperature drops before continuing?
I'd suggest creating a wrapper script for gcc and/or g++
#!/bin/bash
sleep 10
exec gcc "$#"
Save the above as "gccslow" or something, and then:
export CC="gccslow"
Alternatively, you can call the script gcc and put it at the front of your path. If you do that, be sure to include the full path in the script, otherwise, the script will call itself recursively.
A better implementation could call a script to check sensors and wait until the CPU temperature drops before continuing.
For your latter question: A well written Makefile will define dependencies as a directed a-cyclical graph (DAG), and it will try to satisfy those dependencies by compiling them in the order according to the DAG. Thus as a file is compiled, the dependency is satisfied and need not be compiled again.
It can, however, be tricky to write good Makefiles, and thus sometime the author will resort to a brute force approach, and recompile everything from scratch.
For your question, for such well known libraries, I will assume the Makefile is written properly, and that the build should resume from the last operation (with the caveat that it needs to rescan the DAG, and recalculate the compilation order, that should be relatively cheap).
Instead of compiling the whole thing, you can compile each target separately. You have to examine the Makefile for identifying them.
Tongue-in-cheek: What about putting the laptop into the fridge while compiling?

Resources