What command does MPI run over SSH? - openmpi

I have OpenMPI installed and I'm running a script on multiple nodes with mpiexec. OpenMPI requires that the node I'm running the mpiexec command on have SSH access to the other nodes in the cluster.
What specifically does OpenMPI do over SSH to start the processes on the other nodes? It runs my MPI script, sure, but how does MPI run in such that each node is assigned a rank, for example?
Thank you.

Unless you are running under a (supported) resource manager (such as Slurm, PBS or other), the plm/rsh component will be used to start the MPI app.
Long story short, Open MPI uses a distributed virtual machine (DVM) to launch the MPI tasks. The first step is to have one daemon per node.
The initial "daemon" is mpirun, and then one orted daemon have to be remotely spawned on each other node, and this is where plm/rsh uses SSH.
By default, if you are running on less than 64 nodes, then mpirun will SSH to all the other nodes. But if you are running on a larger number of nodes, then mpirun will use a tree spawn algorithm, in which other nodes might ssh to other nodes.
Bottom line, if you are using ssh with Open MPI, and unless you are running on a small cluster with default settings, all nodes should be able to ssh passwordless to all nodes.

Related

How do I run Open MPI jobs on two VMs?

I'm trying to run the hello_ompi program on two Ubuntu virtual machines on my computer.
This program can be found here.
The VMs have two processors and one core per processor.
The installed OS is Ubuntu 20.04.3-LTS 64 bit.
The hostfile I'm using is as follows:
192.168.xxx.xxx
192.168.xxx.xxx
I tried:
mpirun -n 2 --hostfile my_hostfile hello_ompi
The ouptut was:
--------------------------------------------------------------------------
mpirun was unable to find the specified executable file, and therefore
did not launch the job. This error was first reported for process
rank 0; it may have occurred for other processes as well.
NOTE: A common cause for this error is misspelling a mpirun command
line parameter option (remember that mpirun interprets the first
unrecognized command line token as the executable).
Node: 192.168.xxx.xxx
Executable: hello_ompi
--------------------------------------------------------------------------
I realized that the executable needs to be in a directory that is identical to that of the host node.
i.e. if on the host node path to the executable is:
/home/youruser/somedir/executable.c
Then on all of the machines in the hostfile the executable must be in the exact same directory on those machines.
The command ran perfectly once I corrected this.

Prevent developers from unintentionally killing daemon / worker processes on local machine

A lot of newbs will kill all their node.js processes on their local machines with
pkill -f node
Or
killall node
Etc.
I have a library that uses some daemon processes/workers running on the developer's machine and I will need to restart them if the development "accidentally" kills (all) node.js processes.
The problem is that using NPM libs like forever or supervisor will not solve this problem because they are node.js processes as well TMK.
Can anyone recommend a daemon watcher / relauncher system that will work on MacOS or *nix?
Perhaps supervisord can do what I want to do on both MacOS and *nix? Or perhaps there is another solution to this problem?
I wrote node-windows, node-mac, and node-linux for this purpose. They are essentially wrappers around node processes, but all three libraries share a common API for managing things like restarts/stop/start/etc.

Running Matlab code on a cluster

I have a university account for the university's cluster, but I don't know how can I use it to run my Matlab code. Could anyone help? I connect to the cluster by typing below code in the terminal of my laptop:
ssh myusername#192.168.194.222
Then it asks me to type my password.After that, below text appears:
Welcome to gav 9.1.1 (3.12.60-ql-generic-9.1-74) based on Ubuntu 14.04.5 LTS
Last login: Sun Apr 16 10:45:49 2017 from 192.168.41.213
gav:~ >
How can I run my code after these processes? Could anyone help me?
It looks like you have a Linux shell, so you can run your script (for instance yourScript.m)
> matlab -nojvm -nodisplay -nosplash < yourScript.m
(see also https://uk.mathworks.com/help/matlab/ref/matlablinux.html)
As far as I know, there are two possibilities:
Conventional Matlab is installed on the Cluster
The Matlab Distributed Computing server is installed on the cluster
Conventional Matlab is installed on the Cluster
You execute Matlab on the cluster as you would on your local computer. I guess that you work on Windows on your local computer, given that you quote a simple shell prompt in your question ;) All right, all right, bad psychic skillz ;) see edit below.
What you see is the cluster awaiting a program name to execute. This is called the "Shell". Google "Linux shell tutorial" or start with this tutorial to get information about how to operate a Linux system without a graphical desktop.
Try to start matlab by simply typing matlab after the text you've seen. If it works, you see Matlab's welcome message and the Matlab prompt as you would see it in Matlab's command window on your local PC.
Bonus: you can try to execute Matlab on the cluster but see a graphical interface by replacing your ssh call by ssh -X myusername#192.168.194.222, so add an additional -X.
Upload your Matlab scripts to the cluster, for example by using WinSCP (tutorial)
Execute your Matlab functions like you would locally by navigating into the correct folder and typing the function name.
EDIT: As you use Linux, you may use gio mount ssh://myusername#192.168.194.222 to access your home folder on the cluster via your file manager. If that fails, try gvfs-mount ssh://myusername#192.168.194.222 (the old name of the tool). The packages gvfs-backends and gvfs-fuse (I assume that you use ubuntu, other distributions may have different package names) must be installed for this; use your Package manager to install them if you get an error like "command not found".
Distributed Computing Server
This provides a set of Matlab "Workers" which are sent tasks from your local Computer. You use your local Matlab installation to connect to the Distributed computing server. Start with the Matlab Help Pages for the Distributed Computing Server

Running a terminal command permanently

I am currently hosting my database for free on Openshift and have my program running on a linux box on my local server. I need to pass the data from the program to my openshift database. I want to run the linux box headless.
To do this I run the command:
rhc port-forward -a webapp
My question is how can I run this command permanently without it timing out (some checking to see if process is running?) and without a terminal running (background process)?
You could add that command in the startup settings of your Linux computer. So a systemd configuation, or an init one (details could depend upon your particular distribution and system). See systemd(1) and/or sysvinit
You could also use crontab(5). It can be used for periodic tasks, but also for started once tasks, thru some #reboot entry.
At last, you might use batch facilities, look into at (& batch)
Perhaps you may just want nohup(1) (or screen(1)...)

How to execute a command after everything else has been executed on boot?

I am using Hortonworks HDP 2.3 (CentOS release 6.7). My requirement is to make a curl request on boot, however the command should be executed only after various other services (Ambari, HDFS, YARN, etc) have started. I added the command in etc/profile, but this doesn't tend to wait for these services to start. Is there a way I can ensure that this curl request be the very LAST command to be executed on boot?
Add it to /etc/rc5.d/S99local (assuming you want it to start at multi-user runlevel 5, and that the other services are started via init).

Resources