OpenMPI -host and -hostfile options - openmpi

With OpenMPI v2, when I run a test program with -host, it works. I mean, the process spans to the hosts I specified. However, when I specify -hostfile, it doesn't work!!
mahmood#cluster:mpitest$ /share/apps/computer/openmpi-2.0.1/bin/mpirun -host compute-0-0,cluster -np 2 a.out
****************************************************************************
* hwloc 1.11.2 has encountered what looks like an error from the operating system.
*
* Package (P#1 cpuset 0xffff0000) intersects with NUMANode (P#1 cpuset 0xff00ffff) without inclusion!
* Error occurred in topology.c line 1048
*
* The following FAQ entry in the hwloc documentation may help:
* What should I do when hwloc reports "operating system" warnings?
* Otherwise please report this error message to the hwloc user's mailing list,
* along with the output+tarball generated by the hwloc-gather-topology script.
****************************************************************************
Hello world from processor cluster.hpc.org, rank 1 out of 2 processors
Hello world from processor compute-0-0.local, rank 0 out of 2 processors
mahmood#cluster:mpitest$ cat hosts
cluster
compute-0-0
mahmood#cluster:mpitest$ /share/apps/computer/openmpi-2.0.1/bin/mpirun -hostfile hosts -np 2 a.out
****************************************************************************
* hwloc 1.11.2 has encountered what looks like an error from the operating system.
*
* Package (P#1 cpuset 0xffff0000) intersects with NUMANode (P#1 cpuset 0xff00ffff) without inclusion!
* Error occurred in topology.c line 1048
*
* The following FAQ entry in the hwloc documentation may help:
* What should I do when hwloc reports "operating system" warnings?
* Otherwise please report this error message to the hwloc user's mailing list,
* along with the output+tarball generated by the hwloc-gather-topology script.
****************************************************************************
Hello world from processor cluster.hpc.org, rank 0 out of 2 processors
Hello world from processor cluster.hpc.org, rank 1 out of 2 processors
What is the issue then and how can I resolve it?

Hosts listed in the -host argument provide one slot each, therefore -host A,B means one slot on host A and one slot on host B.
To force mpiexec to launch N processes per node, use the following option
--map-by ppr:N:node
In your case, for one process per node, it should be --map-by ppr:1:node. Alternatively, you can limit the number of slots per host to one by modifying the host file to look like this:
cluster slots=1 max_slots=1
compute-0-0 slots=1 max_slots=1
(though slots=1 should be the default if not provided...)

Related

MPI hello_world to test infiniband

I have virtual machine which has passthrough infiniband nic. I am testing inifinband functionality using hello world program. I am new in this world so may need help to understand following error
I have install openmpi on ubuntu using apt-get command
spatel#ib-1:~$ mpirun -V
mpirun (Open MPI) 4.0.3
Infiniband nic
spatel#ib-1:~$ lspci -nn | grep -i mell
00:05.0 Infiniband controller [0207]: Mellanox Technologies MT28908 Family [ConnectX-6 Virtual Function] [15b3:101c]
My hello world program
spatel#ib-1:~$ mpirun -np 2 ./mpi_hello_world
--------------------------------------------------------------------------
WARNING: No preset parameters were found for the device that Open MPI
detected:
Local host: ib-1
Device name: mlx5_0
Device vendor ID: 0x02c9
Device vendor part ID: 4124
Default device parameters will be used, which may result in lower
performance. You can edit any of the files specified by the
btl_openib_device_param_files MCA parameter to set values for your
device.
NOTE: You can turn off this warning by setting the MCA parameter
btl_openib_warn_no_device_params_found to 0.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
By default, for Open MPI 4.0 and later, infiniband ports on a device
are not used by default. The intent is to use UCX for these devices.
You can override this policy by setting the btl_openib_allow_ib MCA parameter
to true.
Local host: ib-1
Local adapter: mlx5_0
Local port: 1
--------------------------------------------------------------------------
--------------------------------------------------------------------------
WARNING: There was an error initializing an OpenFabrics device.
Local host: ib-1
Local device: mlx5_0
--------------------------------------------------------------------------
--------------------------------------------------------------------------
Open MPI failed an OFI Libfabric library call (fi_endpoint). This is highly
unusual; your job may behave unpredictably (and/or abort) after this.
Local host: ib-1
Location: mtl_ofi_component.c:629
Error: Unspecified error (256)
--------------------------------------------------------------------------
Hello world from processor ib-1, rank 0 out of 2 processors
Hello world from processor ib-1, rank 1 out of 2 processors
[ib-1:65704] 1 more process has sent help message help-mpi-btl-openib.txt / no device params found
[ib-1:65704] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
[ib-1:65704] 1 more process has sent help message help-mpi-btl-openib.txt / ib port not selected
[ib-1:65704] 1 more process has sent help message help-mpi-btl-openib.txt / error in device init
[ib-1:65704] 1 more process has sent help message help-mtl-ofi.txt / OFI call fail
It throws bunch of warning and error so not sure what i should understand, does it use ib interface to run this job?
UPDATE
After suggested by #Gilles Gouaillardet in comment i have compiled ompi with ucx and now i am seeing following output during hello_world prog
spatel#ib-1:~$ /home/spatel/ompi/bin/mpirun -np 2 ./hello_world_ucx --mca opal_common_ucx_opal_mem_hooks 1
--------------------------------------------------------------------------
PMIx was unable to find a usable compression library
on the system. We will therefore be unable to compress
large data streams. This may result in longer-than-normal
startup times and larger memory footprints. We will
continue, but strongly recommend installing zlib or
a comparable compression library for better user experience.
You can suppress this warning by adding "pcompress_base_silence_warning=1"
to your PMIx MCA default parameter file, or by adding
"PMIX_MCA_pcompress_base_silence_warning=1" to your environment.
--------------------------------------------------------------------------
Hello world from processor ib-1, rank 0 out of 2 processors
Hello world from processor ib-1, rank 1 out of 2 processors
Now to test my infiniband network i created similar another vm ib-2 with inifinband nic to see hello_world using RDMA for communication.
/home/spatel/ompi/bin/mpirun --host ib-1,ib-2 -np 2 ./hello_world_ucx --mca opal_common_ucx_opal_mem_hooks 1
Same time i run tcpdump on ibs5 interface which is my Infiniband nic but i see no activity and notice MPI messages still using traditional nic eth0 for communication. how do i make sure it use only infiniband for MPI (i don't have any IP configure on ib nic)

Trouble initializing SDK node using USB-TTL M210 v2

I am trying to connect M210 v2 RTK to a desktop computer with Ubuntu 18.04, ROS Melodic and parallel installation of Opencv 3.3.1 and 4.5.3 using a USB-TTL RS232 to make UART connection and an USB-USB connecting drone and desktop to be able to run Advanced Sensing.
When I call ls -l /dev/ttyACM* && ls -l /dev/ttyUSB* it returns that it is indentified the USB and ACM connection.
crw-rw---- 1 root dialout 166, 0 out 4 13:18 /dev/ttyACM0
crw-rw---- 1 root dialout 188, 0 out 4 13:18 /dev/ttyUSB0
I also set the transfer rate of TTL-USB to 921600 using minicom, and gave persmission to device to read and write with sudo usermod -a -G dialout $USER && sudo chmod 666 /dev/ttyUSB0
Unfortunatelly when I launch roslaunch dji_osdk_ros dji_sdk_node.launch it appears some connection problem presented below and I am not being able to fix it. I have been trying to turn on/off drone and RC several times ass described here, but the problem still stand.
started roslaunch server http://V3D06:43613/
SUMMARY
========
PARAMETERS
* /dji_sdk/acm_name: /dev/ttyACM0
* /dji_sdk/align_time: False
* /dji_sdk/app_id: 1076017
* /dji_sdk/app_version: 1
* /dji_sdk/baud_rate: 921600
* /dji_sdk/dxc: False
* /dji_sdk/enc_key: 6bd1d26f8dd897e4b...
* /dji_sdk/serial_name: /dev/ttyUSB0
* /dji_sdk/use_broadcast: False
* /rosdistro: melodic
* /rosversion: 1.14.12
NODES
/
dji_sdk (dji_osdk_ros/dji_sdk_node)
auto-starting new master
process[master]: started with pid [2436]
ROS_MASTER_URI=http://localhost:11311
setting /run_id to bde7b4d2-252e-11ec-8a59-1831bfb3e154
process[rosout-1]: started with pid [2458]
started core service [/rosout]
process[dji_sdk-2]: started with pid [2464]
[ INFO] [1633364323.534426789]: Advanced Sensing is Enabled on M210.
Read App ID
User Configuration read successfully.
[1276751.089]STATUS/1 # getDroneVersion, L1702: ret = 0
[1276751.089]STATUS/1 # parseDroneVersionInfo, L1122: Device Serial No. = 1DADG3E00100U4
[1276751.089]STATUS/1 # parseDroneVersionInfo, L1124: Firmware = 3.4.3.44
[1276751.089]STATUS/1 # functionalSetUp, L279: Shake hand with drone successfully by getting drone version.
[1276751.089]STATUS/1 # legacyX5SEnableTask, L56: Legacy X5S Enable task created.
[1276752.089]STATUS/1 # sendHeartbeatToFCTask, L1576: OSDK send heart beat to fc task created.
[1276752.289]STATUS/1 # Control, L40: The control class is going to be deprecated.It will be better to use the FlightController class instead!
[1276752.290]STATUS/1 # FileMgrImpl, L253: register download file callback handler successfully.
[1276753.557]STATUS/1 # PSDKModule, L98: MOP only support M300, so mop client will not be initialized here.
[1276753.557]STATUS/1 # PSDKModule, L98: MOP only support M300, so mop client will not be initialized here.
[1276753.557]STATUS/1 # PSDKModule, L98: MOP only support M300, so mop client will not be initialized here.
[1276753.557]STATUS/1 # initDJIHms, L900: DJI HMS is not supported on this platform!
[1276753.567]STATUS/1 # getDroneVersion, L1702: ret = 0
[1276753.567]STATUS/1 # parseDroneVersionInfo, L1122: Device Serial No. = 1DADG3E00100U4
[1276753.567]STATUS/1 # parseDroneVersionInfo, L1124: Firmware = 3.4.3.44
[1276753.567]STATUS/1 # AdvancedSensing, L145: Advanced Sensing init for the M210 drone
[1276753.567]STATUS/1 # init, L49: Looking for USB device...
[1276753.572]STATUS/1 # init, L65: Found 8 USB devices, identifying DJI device...
[1276753.572]STATUS/1 # init, L83: Found a DJI device...
[1276753.572]STATUS/1 # init, L96: Attempting to open DJI USB device...
[1276753.572]ERRORLOG/1 # init, L101: Failed to open DJI USB device...
[1276753.572]ERRORLOG/1 # init, L102: Error code: -3
[1276753.572]ERRORLOG/1 # init, L105: Please make sure you provide a udev file for your system and reboot the computer
[1276753.573]STATUS/1 # LiveViewImpl, L89: Finding if liveview stream is available now.
[1276754.076]STATUS/1 # init, L254: Start advanced sensing initalization
[1276754.076]STATUS/1 # activate, L1329: version 0x304032C
[1276754.076]STATUS/1 # adv_pthread, L46: adv pthread created !!!!!!!!!!!!!!!!!!!!!!!
[1276754.076]STATUS/1 # adv_pthread, L48: adv pthread running !!!!!!!!!!!!!!!!!!!!!!!
[dji_sdk-2] process has died [pid 2464, exit code -11, cmd /home/vant3d/catkin_ws/devel/lib/dji_osdk_ros/dji_sdk_node __name:=dji_sdk __log:=/home/vant3d/.ros/log/bde7b4d2-252e-11ec-8a59-1831bfb3e154/dji_sdk-2.log].
log file: /home/vant3d/.ros/log/bde7b4d2-252e-11ec-8a59-1831bfb3e154/dji_sdk-2*.log
It appears it has some problem providing a udev file, but I don't know how to fix it. Does anyone have some idea to help on this problems?
Thank you!
That's my post. Firstly turn off advanced sensing to try whether a basic FTDI works.
The second which DJI OSDK version are you using? does the OSDK version match the version in OSDK-ROS? I saw you have M300 in. that is usually in OSDK 4+. For M210, I only use 3.8 and 3.9
If basic FTDI works, and you can get all the feedback. there is a higher chance that you have the wrong ACM config. DJI RNDIS thing is nasty and may not be config properly. You need to manually set static IP of 192.168.43.1 (or I remember something like this 42 or 43, you need to check on this static IP) and set it manually

Node.js "Illegal instruction" on PowerPC 440EP and PowerPC E300C3

I can't run node.js on PowerPC 440EP, I get only error "Illegal instruction".
Hardware info:
cat /proc/cpuinfo
processor : 0
cpu : 440EP Rev. C
clock : 533.333332MHz
revision : 24.212 (pvr 4222 18d4)
bogomips : 1066.66
timebase : 533333332
platform : CPU440EP
model : micran,cpu440
Memory : 128 MB
LD_SHOW_AUXV=1 /bin/true
AT_DCACHEBSIZE: 0x20
AT_ICACHEBSIZE: 0x20
AT_UCACHEBSIZE: 0x0
AT_SYSINFO_EHDR: 0x100000
AT_HWCAP: booke mmu fpu ppc32
AT_PAGESZ: 4096
AT_CLKTCK: 100
AT_PHDR: 0x10000034
AT_PHENT: 32
AT_PHNUM: 8
AT_BASE: 0x48000000
AT_FLAGS: 0x0
AT_ENTRY: 0x1000446c
AT_UID: 0
AT_EUID: 0
AT_GID: 0
AT_EGID: 0
AT_SECURE: 0
AT_RANDOM: 0xbf8c04f2
AT_EXECFN: /bin/true
AT_PLATFORM: ppc440
AT_BASE_PLATFORM:ppc440
Software info:
I'm using powerpc-440-linux-gnu compiler (version 5.2.0) and Linux v3.6.7.
I tried to use different versions of sources:
*node-0.10-ppc* from https://github.com/ibmruntimes/node
*node-4.x-port* from https://github.com/ibmruntimes/node
*node-v4.4.7* from https://nodejs.org/dist/v4.4.7/node-v4.4.7.tar.gz
*node-6.x* from https://github.com/nodejs/node
I'm using the following script for build of node.js:
#!/bin/bash
CROSS_COMPILER=powerpc-440-linux-gnu
HOST=powerpc-linux
ENDIAN=big
BUILD_PATH=/home/user/node
CFLAGS=-Os
JOBS=4
export ARCH=ppc
export CC=${CROSS_COMPILER}-gcc
export CXX=${CROSS_COMPILER}-g++
export CFLAG=${CFLAGS}
export AR=${CROSS_COMPILER}-ar r
export LINK=${CROSS_COMPILER}-g++
export PATH=${PATH}:/home/user/powerpc-440-linux-gnu/bin
./configure --without-snapshot --prefix=${BUILD_PATH} --dest-cpu=ppc --dest-os=linux
make -j ${JOBS}
make install
Which version of node.js should I use?
Do we have working portable
version of node.js for PowerPC 440EP ?
Sad update
I got the following answer from issues page on https://github.com/nodejs:
[Michael Dawson] The particular chip mentioned is based on the older PowerPC cores and does not have all of the Power5+ instructions available.
There are roughly two reasons for an illegal instruction. Either a memory corruption is derailing the control flow with the result that the CPU is trying to execute garbage/data.
The other reason would be that your node.js binary contains an instruction that isn't known to your CPU aka. your cross compiler output isn't matching your CPU. Investigate if you need to pass an additional -mcpu= or -mtune= argument to the compiler (or rather to configure).
As node.js contains a just in time compiler itself there is also the third option that node.js is generating instructions not suitable for you CPU variant.
I would investigate option two first.

error with freebsd uwsgi

I have a error with uwsgi
when i start my config - uwsgi bottle.ini
!!! no internal routing support, rebuild with pcre support !!!
setgid() to 80
setuid() to 80
your processes number limit is 5547
your memory page size is 4096 bytes
detected max file descriptor number: 58982
lock engine: ipcsem
uwsgi_lock_ipcsem_init()/semget(): No space left on device [core/lock.c line 507]
uwsgi_ipcsem_clear()/semctl(): Invalid argument [core/lock.c line 631]
my bottle.ini
[uwsgi]
socket = 185.21.214.275:80
chdir = /usr/local/www/myapp/
virtualenv = /usr/local/www/mypython
master = true
wsgi-file = /usr/local/www/myapp/app.py
uid = www
gid = www
I have had reinstalled uwsgi and pcre but proble is still appeare
It is explained here: http://uwsgi-docs.readthedocs.org/en/latest/ThingsToKnow.html
On OpenBSD, NetBSD and FreeBSD < 9, SysV IPC semaphores are used as the locking subsystem. These operating systems tend to limit the number of allocable semaphores to fairly small values. You should raise the default limits if you plan to run more than one uWSGI instance. FreeBSD 9 has POSIX semaphores, so you do not need to bother with that.

Using Linux virtual mouse driver

I am trying to implement a virtual mouse driver according to the Essential Linux device Drivers book. There is a user space application, which generates coordinates as well as a kernel module.
See: Virtual mouse driver and userspace application code and also a step by step on how to use this driver.
1.) I compile the code of the user space application and driver.
2.) Next i checked dmesg output and have,
input: Unspecified device as /class/input/input32
Virtual Mouse Driver Initialized
3.) The sysfs node was created properly during initialization (found in /sys/devices/platform/vms/coordinates)
4.) I know that the virtual mouse driver (input32 ) is linked to event5 by checking the following:
$ cat /proc/bus/input/devices
I: Bus=0000 Vendor=0000 Product=0000 Version=0000
N: Name=""
P: Phys=
S: Sysfs=/devices/virtual/input/input32
U: Uniq=
H: Handlers=event5
B: EV=5
B: REL=3
5.) Next i attach a GPM server to the event interface: gpm -m /dev/input/event5 -t evdev
6.) Run the user space application to generate random coordinates for virtual mouse and observe generated coordinates using od -x /dev/input/event5.
And nothing happens. Why?
Also here author mentioned that gdm should be stopped, using /etc/init.d/gdm stop, but i get "no such service" when stopping gdm.
Here is my complete script for building and runing virtual mouse:
make -C /usr/src/kernel/2.6.35.6-45.fc14.i686/ SUBDIRS=$PWD modules
gcc -o app_userspace app_userspace.c
insmod app.ko
gpm -m /dev/input-event5 -t evdev
./app_userspace
Makefile:
obj-m+=app.o
Kernel version: 2.6.35.6
As i said before i can recieve the result through od, but i received it through your program
echo 9 19 > /sys/devices/platform/virmouse/vmevent
gives:
time 1368284298.207654 type 2 code 0 value 9
time 1368284298.207657 type 2 code 1 value 19
time 1368284298.207662 type 0 code 0 value 0
So now the question is: what is wrong with X11? I would like to stress, that i tried this code under two different distributions Ubuntu 11.04 and Fedora 14.
Maybe this will help: in Xorg.0.log i see the following:
[ 21.022] (II) No input driver/identifier specified (ignoring)
[ 272.987] (II) config/udev: Adding input device (/dev/input/event5)
[ 272.987] (II) No input driver/identifier specified (ignoring)
[ 666.521] (II) config/udev: Adding input device (/dev/input/event5)
[ 666.521] (II) No input driver/identifier specified (ignoring)
I spent a huge amount of time, resolving this issue, and i would like to help other people, who run in this problem. I think some outer X11 features interfered my module work. After disabling GDM it now works fine (runlevel 3). Working code you can find here http://fred-zone.blogspot.ru/2010/01/mouse-linux-kernel-driver.html working distro ubuntu 11.04 (gdm disabled)
Try replacing the below lines of code in the input device driver
set_bit(EV_REL, vms_input_dev->evbit);
set_bit(REL_X, vms_input_dev->relbit);
set_bit(REL_Y, vms_input_dev->relbit);
with
vms_input_dev->name = "Virtual Mouse";
vms_input_dev->phys = "vmd/input0"; // "vmd" is the driver's name
vms_input_dev->id.bustype = BUS_VIRTUAL;
vms_input_dev->id.vendor = 0x0000;
vms_input_dev->id.product = 0x0000;
vms_input_dev->id.version = 0x0000;
vms_input_dev->evbit[0] = BIT_MASK(EV_KEY) | BIT_MASK(EV_REL);
vms_input_dev->keybit[BIT_WORD(BTN_MOUSE)] = BIT_MASK(BTN_LEFT) | BIT_MASK(BTN_RIGHT) | BIT_MASK(BTN_MIDDLE);
vms_input_dev->relbit[0] = BIT_MASK(REL_X) | BIT_MASK(REL_Y);
vms_input_dev->keybit[BIT_WORD(BTN_MOUSE)] |= BIT_MASK(BTN_SIDE) | BIT_MASK(BTN_EXTRA);
vms_input_dev->relbit[0] |= BIT_MASK(REL_WHEEL);
It worked for me on ubuntu 12.04

Resources