docker, openmpi and unexpected end of /proc/mounts line - linux

I have build environment to run code in a Docker container. One of the components is OpenMPI, which I think is the source of problem or manifest it.
When I run code using MPI I getting message,
Unexpected end of /proc/mounts line `overlay / overlay rw,relatime,lowerdir=/var/lib/docker/overlay2/l/NHW6L2TB73FPMK4A52XDP6SO2V:/var/lib/docker/overlay2/l/MKAGUDHZZTJF4KNSUM73QGVRUD:/var/lib/docker/overlay2/l/4PFRG6M47TX5TYVHKQQO2KCG7Q:/var/lib/docker/overlay2/l/4UR3OEP3IW5ZTADZ6OKT77ZBEU:/var/lib/docker/overlay2/l/LGBMK7HFUCHRTM2MMITMD6ILMG:/var/lib/docker/overlay2/l/ODJ2DJIGYGWRXEJZ6ECSLG7VDJ:/var/lib/docker/overlay2/l/JYQIR5JVEUVQPHEF452BRDVC23:/var/lib/docker/overlay2/l/AUDTRIBKXDZX62ANXO75LD3DW5:/var/lib/docker/overlay2/l/RFFN2MQPDHS2Z'
Unexpected end of /proc/mounts line `KNEJCAQH6YG5S:/var/lib/docker/overlay2/l/7LZSAIYKPQ56QB6GEIB2KZTDQA:/var/lib/docker/overlay2/l/CP2WSFS5347GXQZMXFTPWU4F3J:/var/lib/docker/overlay2/l/SJHIWRVQO5IENQFYDG6R5VF7EB:/var/lib/docker/overlay2/l/ICNNZZ4KB64VEFSKEQZUF7XI63:/var/lib/docker/overlay2/l/SOHRMEBEIIP4MRKRRUWMFTXMU2:/var/lib/docker/overlay2/l/DL4GM7DYQUV4RQE4Z6H5XWU2AB:/var/lib/docker/overlay2/l/JNEAR5ISUKIBKQKKZ6GEH6T6NP:/var/lib/docker/overlay2/l/LIAK7F7Q4SSOJBKBFY4R66J2C3:/var/lib/docker/overlay2/l/MYL6XNGBKKZO5CR3PG3HIB475X:/var/lib/do'
That message is printed for code line
MPI_Init(&argc,&argv);
To make the problem more complex to understand, a warning message is printed only when the host machine is mac os x, for linux host all is ok.
Except for warning message all works fine. I do not know how OpenMPI and docker well enough how this can be fixed.

This is likely due to your /proc/mount file having a line in it greater than 512 characters, causing the hwloc module of OpenMPI to fail to parse it correctly. Docker has a tendency to put very long lines into /proc/mounts. You can see the bug in openmpi-1.10.7/opal/mca/hwloc/hwloc191/hwloc/src/topology-linux.c:1677:
static void
hwloc_find_linux_cpuset_mntpnt(char **cgroup_mntpnt, char **cpuset_mntpnt, int fsroot_fd)
{
#define PROC_MOUNT_LINE_LEN 512
char line[PROC_MOUNT_LINE_LEN];
FILE *fd;
*cgroup_mntpnt = NULL;
*cpuset_mntpnt = NULL;
/* ideally we should use setmntent, getmntent, hasmntopt and endmntent,
* but they do not support fsroot_fd.
*/
fd = hwloc_fopen("/proc/mounts", "r", fsroot_fd);
if (!fd)
return;
This can be fixed by increasing the value of PROC_MOUNT_LINE_LEN, although that should be considered a temporary workaround.

This issue should be fixed in hwloc since 1.11.3 (released 2 years ago). You can either upgrade to OpenMPI 3.0 which contains a hwloc 1.11.7 >= 1.11.3. Or recompile OpenMPI to use an external hwloc instead of the old embedded one.

Related

Why are security_path_* symbols missing from kallsyms and System.map on kernel version 5.4.156?

I am failing to load an eBPF script that traces path renames by using kprobe:
int kprobe__security_path_rename( struct pt_regs *ctx, const struct path *old_dir, struct dentry *old_dentry, const struct path *new_dir, struct dentry *new_dentry )
{
...
}
It works fine on my Ubuntu machine (kernel 5.13.0), but fails on an AWS node (kernel 5.4.156) with the following error:
sh-4.2$ sudo ./tracker.py
cannot attach kprobe, probe entry may not exist
Traceback (most recent call last):
File "./tracker.py", line 698, in <module>
bpf = BPF(text=program)
File "/usr/lib/python3.7/site-packages/bcc/__init__.py", line 372, in __init__
self._trace_autoload()
File "/usr/lib/python3.7/site-packages/bcc/__init__.py", line 1232, in _trace_autoload
fn_name=fn.name)
File "/usr/lib/python3.7/site-packages/bcc/__init__.py", line 684, in attach_kprobe
(fn_name, event))
Exception: Failed to attach BPF program b'kprobe__security_path_rename' to kprobe b'security_path_rename'
I checked /proc/kallsyms and /boot/System.map-$(uname -r) and indeed the symbols security_path_{mknod,mkdir,unlink,rename} all exist on my machine and are missing on the AWS node.
I also observed that after updating the AWS kernel version to 5.4.176 the symbols appear and my program works. However, these symbols all appear in the source of all (relevant) kernel versions, are not marked static or notrace and are explicitly exported via EXPORT_SYMBOL.
Can't these symbols be kprobed on kernel 5.4.156?
I found the cause. The problem was not directly related to kernel versions, but rather to kernel config.
Apparently, the kernel version 5.4.156 for AWS nodes was configured without CONFIG_SECURITY_PATH, while newer kernel 5.4.176 for the same node was configured with this flag. In the former configuration, the security_path_* symbols mention in the question do not exist since their whole code path is guarded with #ifdefs.
One can test which kernel configuration flags are enabled by inspecting the config file, e.g. use one of the following commands:
grep CONFIG_SECURITY_PATH /boot/config-`uname -r`
grep CONFIG_SECURITY_PATH /boot/config
gunzip < /proc/config.gz | grep CONFIG_SECURITY_PATH

Unable to add a custom system call on x86 ubuntu linux

I am new to this and just learning about the kernel, and I am trying to add a custom call to kernel 4.20.4. This is the steps that I did.
First I create the file (kernel/printmsg.c) that contains the code.
#include <linux/kernel.h>
#include <linux/syscalls.h>
SYSCALL_DEFINE1(printmsg, int, i)
{
printk(KERN_DEBUG, "TESTING %d", i);
return 1;
}
Next, I add this file to the kernel/Makefile
obj-y = fork.o exec_domain.o panic.o \
// A few more lines
obj-y += printmsg.o // I added this line
Finally, I add the system call to the syscall table on arch/x86/entry/syscalls/syscall_64.tbl(I'm building this on a 64-bit Ubuntu) by appending this line:
548 64 printmsg sys_printmsg
Now, I proceed to run make. However, it has this error:
arch/x86/entry/syscall_64.o:(.rodata+0x1120): undefined reference to `sys_printmsg'
Makefile:1034: recipe for target 'vmlinux' failed
make: *** [vmlinux] Error 1
I've been scratching my head for a long time for this but I can't seem to realised what went wrong.
Hope that anyone that managed to find a problem can help out a poor soul. Thanks in advance!
Okay, after hours of trial and error, I have finally found the problem. From linux kernel v4.17 onwards, x86_64 system calls may begin with "__x64_sys".
So, instead of using 548 64 printmsg sys_printmsg, I changed it to 548 64 printmsg __x64_sys_printmsg. Then everything works.
Hoped this helped everyone that might have this problem.

Why do OpenGL-based VTK targets in drake executed via `bazel test` sometimes fail on Linux?

While a binary works with bazel run, when I run a test using bazel test, such as:
$ bazel test //systems/sensors:rgbd_camera_test
I encounter a slew of errors from VTK / OpenGL:
ERROR: In /vtk/Rendering/OpenGL2/vtkXOpenGLRenderWindow.cxx, line 820
vtkXOpenGLRenderWindow (0x55880715b760): failed to create offscreen window
ERROR: In /vtk/Rendering/OpenGL2/vtkOpenGLRenderWindow.cxx, line 816
vtkXOpenGLRenderWindow (0x55880715b760): GLEW could not be initialized.
ERROR: In /vtk/Rendering/OpenGL2/vtkShaderProgram.cxx, line 453
vtkShaderProgram (0x5588071d5aa0): Shader object was not initialized, cannot attach it.
ERROR: In /vtk/Rendering/OpenGL2/vtkOpenGLRenderWindow.cxx, line 1858
vtkXOpenGLRenderWindow (0x55880715b760): Hardware does not support the number of textures defined.
May I ask why this happens?
(Note: This post is a means to migrate from http://drake.mit.edu/faq.html to StackOverflow for user-based questions.)
The best workaround at the moment is to first mark the test as as local in the BUILD.bazel file, either with local = 1, or tags = [.., "local"]. Doing so will make the specific target run without sandboxing, such that it has an environment similar to that of bazel run.
As an example, in systems/sensors/BUILD.bazel:
drake_cc_googletest(
name = "rgbd_camera_test",
# ...
local = 1,
# ...
)
If this does not work, then try running the test in Bazel without sandboxing:
$ bazel test --spawn_strategy=standalone //systems/sensors:rgbd_camera_test
Please note that you can possibly add --spawn_strategy=standalone to your ~/.bazelrc, but be aware that this means your development testing environment may deviate even more from other developer's testing environments.

QEMU simple backend tracing dosen't print anything

I'm doing get simple trace file from QEMU.
I followed instructions docs/tracing.txt
with this command "qemu-system-x86_64 -m 2G -trace events=/tmp/events ../qemu/test.img"
i'd like to get just simple trace file.
i've got trace-pid file, however, it dosen't have anything in it.
Build with the 'simple' trace backend:
./configure --enable-trace-backends=simple
make
Create a file with the events you want to trace:
echo bdrv_aio_readv > /tmp/events
echo bdrv_aio_writev >> /tmp/events
Run the virtual machine to produce a trace file:
qemu -trace events=/tmp/events ... # your normal QEMU invocation
Pretty-print the binary trace file:
./scripts/simpletrace.py trace-events trace-* # Override * with QEMU
i followd this instructions.
please somebody give me some advise for this situation.
THANKS!
I got same problem by following the same document.
https://fossies.org/linux/qemu/docs/tracing.txt
got nothing because
bdrv_aio_readv and bdrv_aio_writev was not enabled by default, at least the version I complied, was not enabled. you need to open trace-events under source directory, looking for some line without disabled, e.g. I using:
echo "load_file" > /tmp/events
Then start qemu,
after a guest started, I run
./scripts/simpletrace.py trace-events trace-Pid
I got
load_file 1474.156 pid=5249 name=kvmvapic.bin path=qemu-2.8.0-rc0/pc-bios/kvmvapic.bin
load_file 22437.571 pid=5249 name=vgabios-stdvga.bin path=qemu-2.8.0-rc0/pc-bios/vgabios-stdvga.bin
load_file 10034.465 pid=5249 name=efi-e1000.rom
you can also add -monitor stdio to qemu command line, after it started, you can the following command in qemu CLI:
(qemu) info trace-events
load_file : state 1
vm_state_notify : state 1
balloon_event : state 0
cpu_out : state 0
cpu_in : state 0
1 means enabled events.
Modify the trace-events file in the source tree
As of v2.9.0 you also have to remove the disable from the lines you want to enable there, e.g.:
-disable exec_tb(void *tb, uintptr_t pc) "tb:%p pc=0x%"PRIxPTR
+exec_tb(void *tb, uintptr_t pc) "tb:%p pc=0x%"PRIxPTR
and recompile.
Here is a minimal fully automated runnable example that boots Linux and produces traces: https://github.com/cirosantilli/linux-kernel-module-cheat
For example, I used the traces to count how many boot instructions Linux has: https://github.com/cirosantilli/linux-kernel-module-cheat/blob/c7bbc6029af7f4fab0a23a380d1607df0b2a3701/count-boot-instructions.md
I have a lightly patched QEMU as a submodule, the key commit is: https://github.com/cirosantilli/qemu/commit/e583d175e4cdfb12b4812a259e45c679743b32ad

Reserving physical memory space as early as possible in Linux boot-time

I am trying to find a way to reserve physical memory for a proprietary memory type hardware as early as possible after system boots up (Linux CentOs with Intel Xeon server platform).
I did the following at setup_arch() in arch/x86/kernel/setup.c and it works, but found out that I am not allowed to patch the kernel. The requirement is no BIOS and kernel mod.
setup_arch()
{
....
// Calls a proprietary function that returns custom proprietary memory module's starting address and size.
memblock_reserve(mem_start_addr, mem_size);
.....
}
I cannot use memmap=xx/xx either at Grub, because the start and size of the device is unknown (it has to be "discovered" by software)
Is there any way to do this?
One idea is to write a custom grub module and set memmap=xx using it.
The following is how to do it.
Note that following method only works above CentOS 7 since CentOS 6.x or below uses grub 0.9x .
In that case, you may have to modify code of grub 0.9x and replace /boot/grub/stage1 or /boot/grub/stage2
$ git clone git://git.savannah.gnu.org/grub.git
$ cd grub
$ git checkout grub-2.02-beta2 # CentOS 7 currently uses grub-2.02-beta
$ vim grub-core/Makefile.core.def # add following row
module = {
name = my_custom_module;
common = lib/my_custom_module.c;
};
$ vim grub-core/lib/my_custom_module.c # create following file
#include <grub/dl.h>
#include <grub/env.h>
GRUB_MOD_LICENSE ("GPLv3+");
GRUB_MOD_INIT(my_custom_module){
// Calls a proprietary function that returns custom proprietary memory module's starting address and size.
const char *mem_size = "123";
grub_env_set("my_memsize",mem_size);
}
GRUB_MOD_FINI(my_custom_module){
}
$ ./autogen.sh
$ ./configure
$ make
Now you can find that grub-core/my_custom_module.mod is created.
so copy it to /boot/grub2/i386-pc/ (or whatever your *.mod file exists)
Edit the grub.conf and add something like
insmod my_custom_module
linux /boot/vmlinuz-3.10.el7.x86_64 root=UUID=1a3b5c7d9 ro memmap=${my_memsize}

Resources