make -j 8 g++: internal compiler error: Killed (program cc1plus) - linux

When I deploy Apache Mesos on Ubuntu12.04, I follow the official document, in step "make -j 8" I'm getting this error in the console:
g++: internal compiler error: Killed (program cc1plus)
Please submit a full bug report,
with preprocessed source if appropriate.
See <file:///usr/share/doc/gcc-4.9/README.Bugs> for instructions.
make[2]: *** [slave/containerizer/mesos/libmesos_no_3rdparty_la-containerizer.lo] Error 1
make[2]: *** Waiting for unfinished jobs....
mv -f log/.deps/liblog_la-log.Tpo log/.deps/liblog_la-log.Plo
mv -f slave/containerizer/.deps/libmesos_no_3rdparty_la-docker.Tpo slave/containerizer/.deps/libmesos_no_3rdparty_la-docker.Plo
mv -f log/.deps/liblog_la-consensus.Tpo log/.deps/liblog_la-consensus.Plo
mv -f slave/containerizer/.deps/libmesos_no_3rdparty_la-external_containerizer.Tpo slave/containerizer/.deps/libmesos_no_3rdparty_la-external_containerizer.Plo
mv -f log/.deps/liblog_la-coordinator.Tpo log/.deps/liblog_la-coordinator.Plo
mv -f slave/.deps/libmesos_no_3rdparty_la-slave.Tpo slave/.deps/libmesos_no_3rdparty_la-slave.Plo
mv -f master/.deps/libmesos_no_3rdparty_la-master.Tpo master/.deps/libmesos_no_3rdparty_la-master.Plo
make[2]: Leaving directory `/root/Mesos/mesos/build/src'
make[1]: *** [all] Error 2
make[1]: Leaving directory `/root/Mesos/mesos/build/src'
make: *** [all-recursive] Error 1
what should I do?

Try running (just after the failure) dmesg.
Do you see a line like this?
Out of memory: Kill process 23747 (cc1plus) score 15 or sacrifice child
Killed process 23747, UID 2243, (cc1plus) total-vm:214456kB, anon-rss:178936kB, file-rss:5908kB
Most likely that is your problem. Running make -j 8 runs lots of process which use more memory. The problem above occurs when your system runs out of memory. In this case rather than the whole system falling over, the operating systems runs a process to score each process on the system. The one that scores the highest gets killed by the operating system to free up memory. If the process that is killed is cc1plus, gcc (perhaps incorrectly) interprets this as the process crashing and hence assumes that it must be a compiler bug. But it isn't really, the problem is the OS killed cc1plus, rather than it crashed.
If this is the case, you are running out of memory. So run perhaps make -j 4 instead. This will mean fewer parallel jobs and will mean the compilation will take longer but hopefully will not exhaust your system memory.

(Might be a memory issue)
For anybody still struggling with this (over 2 years after the question was asked), there's this trick on CryptoCurrencyTalk that seems to make it work.
For convenience I'm pasting it here:
Run these (adjust bs= and count= to the amount of swap you want)
sudo dd if=/dev/zero of=/swapfile bs=64M count=16
sudo mkswap /swapfile
sudo swapon /swapfile
That should let you compile your code. But make sure you then revert the swapon after compilation, with these:
sudo swapoff /swapfile
sudo rm /swapfile

check if your CentOS installation already has swap enabled by typing:
sudo swapon --show
If the output is empty, it means that your system does not have swap space enabled.
Create a swap file
1.create a file which will be used as a swap space. bs is the size of one block. count is num of blocks. it will get 1024K * 1M = 1G space.
sudo dd if=/dev/zero of=/swapfile bs=1024 count=1048576
2.Ensure that only the root user can read and write the swap file:
sudo chmod 600 /swapfile
3.set up a Linux swap area on the file
sudo mkswap /swapfile
4.activate the swap
sudo swapon /swapfile
5."sudo swapon --show" or "sudo free -h" you will see the swap space.

This was the clue in my scenario (compiling mesos on CentOS 7) on an AWS EC2 instance.
I fixed it by increasing memory and cpu to at least 4GiB and 2 vCPUs.

Related

Rust compilation on AWS fails while succeeding on other machines [duplicate]

I am using opensuse, specific the variant on mono's website when you click vmware
I get this error. Does anyone know how i might fix it?
make[4]: Entering directory `/home/rupert/Desktop/llvm/tools/clang/tools/driver'
llvm[4]: Linking Debug+Asserts executable clang
collect2: ld terminated with signal 9 [Killed]
make[4]: *** [/home/rupert/Desktop/llvm/Debug+Asserts/bin/clang] Error 1
The full text can be found here
Your virtual machine does not have enough memory to perform the linking phase. Linking is typical the most memory intensive part of a build since it's where all the object code comes together and is operated on as a whole.
If you can allocate more RAM to the VM then do that. Alternatively you could increase the amount of swap space. I am not that familiar with VMs but I imagine the virtual hard drive you set-up will have a swap partition. If you can make that bigger or allocate a second swap partition that would help.
Increasing the RAM, if only for the duration of your build, is the easiest thing to do though.
Also got the same issue and solved by doing following steps (It is memory issue only) -
Checks current swap space by running free command (It must be around 10GB.).
Checks the swap partition
sudo fdisk -l
/dev/hda8 none swap sw 0 0
Make swap space and enable it.
sudo swapoff -a
sudo /sbin/mkswap /dev/hda8
sudo swapon -a
If your swap disk size is not enough you would like to create swap file and use it.
Create swap file.
sudo fallocate -l 10g /mnt/10GB.swap
sudo chmod 600 /mnt/10GB.swap
OR
sudo dd if=/dev/zero of=/mnt/10GB.swap bs=1024 count=10485760
sudo chmod 600 /mnt/10GB.swap
Mount swap file.
sudo mkswap /mnt/10GB.swap
Enable swap file.
sudo swapon /mnt/10GB.swap
I tried with make -j1 and it works!. But it takes long time to build.
I had the same problem building on a VirtualBox system. FWIW I was building on a laptop with XP and 2GB RAM. I had to bump the virtual RAM up to 1462MB to get a successful build. Also note the recommended disk size of 8GB is not sufficient to build and install both LLVM and Clang under Ubuntu. I'd recommend at least 16GB.
I would suggest using of -l (--max-load) option instead of limiting -j in this case. Possibly helpful
answer.

Linux perf tool run issues

I am using perf tool to bench mark one of my projects. The issue I am facing is that wo get automatihen I run perf tool on my machine, everything works fine.
However, I am trying to run perf in automation servers to make it part of my check in process but I am getting the following error from automation servers
WARNING: Kernel address maps (/proc/{kallsyms,modules}) are restricted,
check /proc/sys/kernel/kptr_restrict.
Samples in kernel functions may not be resolved if a suitable vmlinux
file is not found in the buildid cache or in the vmlinux path.
Samples in kernel modules won't be resolved at all.
If some relocation was applied (e.g. kexec) symbols may be misresolved
even with a suitable vmlinux or kallsyms file.
Error:
Permission error - are you root?
Consider tweaking /proc/sys/kernel/perf_event_paranoid:
-1 - Not paranoid at all
0 - Disallow raw tracepoint access for unpriv
1 - Disallow cpu events for unpriv
2 - Disallow kernel profiling for unpriv
fp: Terminated
I tried changing /proc/sys/kernel/perf_event_paranoid to -1 and 0 but still see the same issue.
Anybody seen this before? Why would I need to run the command as root? I am able to run it on my machine without sudo.
by the way, the command is like this:
perf record -m 32 -F 99 -p xxxx -a -g --call-graph fp
You can't use -a (full system profiling) and sample kernel from non-root user: http://man7.org/linux/man-pages/man1/perf-record.1.html
Try running it without -a option and with event limited to userspace events by :u suffix:
perf record -m 32 -F 99 -p $PID -g --call-graph fp -e cycles:u
Or use software event for virtualized platforms without PMU passthrough
perf record -m 32 -F 99 -p $PID -g --call-graph fp -e cpu-clock:u

Under Linux, is it possible to gcore a process whose executable has been deleted?

Programming on CentOS 6.6, I deleted an executable (whoops, make clean) while it was running in a screen session.
Now, unrelated, I want to gcore the process to debug something. I have rebuilt the executable, but gcore doesn't accept the replaced file. It knows the original file was deleted and won't let me dump core.
# gcore 15659
core.YGsoec:4: Error in sourced command file:
/home/dev/bin/daemon/destinyd (deleted): No such file or directory.
gcore: failed to create core.15659
# ls -l /proc/15659/exe
lrwxrwxrwx. 1 root root 0 Mar 12 21:33 /proc/15659/exe -> /home/dev/bin/daemon/destinyd (deleted)
# ln -s /proc/15659/exe /home/dev/bin/daemon/destinyd
ln: creating symbolic link `/home/dev/bin/daemon/destinyd': File exists
# rm /proc/15659/exe
rm: remove symbolic link `/proc/15659/exe'? y
rm: cannot remove `/proc/15659/exe': Permission denied
FreeBSD's gcore has an optional argument "executable" which looks promising (as if I could specify a binary to use that is not /proc/15659/exe), but that's of no use to me as Linux's gcore does not have any such argument.
Are there any workarounds? Or will I just have to restart the process (using the recreated executable) and wait for the bug I'm tracking to reproduce itself?
Despite the output of ls -l /proc/15659/exe, the original executable is in fact still available through that path.
So, not only was I able to restore the original file with a simple cp (though this was not enough to restore the link and get gcore to work), but I was able to attach GDB to the process using this path as executable:
# gdb -p 15659 /proc/15659/exe
and then run the "generate-core-file" command, followed by "detach".
Then, I became free to examine the core file as needed:
# gdb /proc/15659/exe core.15659
In truth I had forgotten about the ability of GDB to generate core files, plus I was anxious about actually attaching GDB to the process because timing was quite important: generating the core file at precisely the right time to catch that bug.
But nos steered me back onto this path and, my fears apparently unfounded, GDB was able to produce a lovely core.15659 for me.

fusermount: umount failed: Invalid argument

When I try to umount the FUSE file system, I get an error:
root#ubuntu:/home/fufs/src# fusermount -u /tmp/kpfss
fusermount: failed to unmount /tmp/kpfss: Invalid argument
root#ubuntu:/home/fufs/src# fusermount -z -u /tmp/kpfss
fusermount: failed to unmount /tmp/kpfss: Invalid argument
How can I umount the file system? Thanks.
Had this problem before and you may find it helps to use umount thus:
umount -f /tmp/kpfss # or whatever the mount point is
When I've seen this issue, there was a dropped connection to the remote server, and trying to access the mount point locked the shell up. The shell process couldn't even be killed.
Using umount seemed to help sort that.
Often while developing fuse filesystems, I've experienced this when the fuse filesystem locks up in an infinite while loop, or having seg faulted somehow. The only way I know how to free it is to ps -ef | grep name_of_fuse_filesystem_process and kill the corresponding pid.

Simulate a faulty block device with read errors?

I'm looking for an easier way to test my application against faulty block devices that generate i/o read errors when certain blocks are read. Trying to use a physical hard drive with known bad blocks is a pain and I would like to find a software solution if one exists.
I did find the Linux Disk Failure Simulation Driver which allows creating an interface that can be configured to generate errors when certain ranges of blocks are read, but it is for the 2.4 Linux Kernel and hasn't been updated for 2.6.
What would be perfect would be an losetup and loop driver that also allowed you to configure it to return read errors when attempting to read from a given set of blocks.
It's not a loopback device you're looking for, but rather device-mapper.
Use dmsetup to create a device backed by the "error" target. It will show up in /dev/mapper/<name>.
Page 7 of the Device mapper presentation (PDF) has exactly what you're looking for:
dmsetup create bad_disk << EOF
0 8 linear /dev/sdb1 0
8 1 error
9 204791 linear /dev/sdb1 9
EOF
Or leave out the sdb1 parts to and put the "error" target as the device for blocks 0 - 8 (instead of sdb1) to make a pure error disk.
See also The Device Mapper appendix from "RHEL 5
Logical Volume Manager Administration".
There's also a flakey target - a combo of linear and error that sometimes succeeds. Also a delay to introduce intentional delays for testing.
It seems like Linux's built-in fault injection capabilities would be a good idea to use.
Blog: http://blog.wpkg.org/2007/11/08/using-fault-injection/
Reference: https://www.kernel.org/doc/Documentation/fault-injection/fault-injection.txt
The easiest way to play with block devices is using nbd.
Download the userland sources from git://github.com/yoe/nbd.git and modify nbd-server.c to fail at reading or writing on whichever areas you want it to fail on, or to fail in a controllably random pattern, or basically anything you want.
I would like to elaborate on Peter Cordes answer.
In bash, setup an image on a loopback device with ext4, then write a file to it named binary.bin.
imageName=faulty.img
mountDir=$(pwd)/mount
sudo umount $mountDir ## make sure nothing is mounted here
dd if=/dev/zero of=$imageName bs=1M count=10
mkfs.ext4 $imageName
loopdev=$(sudo losetup -P -f --show $imageName); echo $loopdev
mkdir $mountDir
sudo mount $loopdev $mountDir
sudo chown -R $USER:$USER mount
echo "2ed99f0039724cd194858869e9debac4" | xxd -r -p > $mountDir/binary.bin
sudo umount $mountDir
in python3 (since bash struggles to deal with binary data) search for the magic binary data in binary.bin
import binascii
with open("faulty.img", "rb") as fd:
s = fd.read()
search = binascii.unhexlify("2ed99f0039724cd194858869e9debac4")
beg=0
find = s.find(search, beg); beg = find+1; print(find)
start_sector = find//512; print(start_sector)
then back in bash mount the faulty block device
start_sector=## copy value from variable start_sector in python
next_sector=$(($start_sector+1))
size=$(($(wc -c $imageName|cut -d ' ' -f1)/512))
len=$(($size-$next_sector))
echo -e "0\t$start_sector\tlinear\t$loopdev\t0" > fault_config
echo -e "$start_sector\t1\terror" >> fault_config
echo -e "$next_sector\t$len\tlinear\t$loopdev\t$next_sector" >> fault_config
cat fault_config | sudo dmsetup create bad_drive
sudo mount /dev/mapper/bad_drive $mountDir
finally we can test the faulty block device by reading a file
cat $mountDir/binary.bin
which produces the error:
cat: /path/to/your/mount/binary.bin: Input/output error
clean up when you're done with testing
sudo umount $mountDir
sudo dmsetup remove bad_drive
sudo losetup -d $loopdev
rm fault_config $imageName

Resources