Accessing Matlab MDCS Cluster over SSH - linux

I just installed Matlab's Distributed Computing Server on a bunch of machines and it works, but only for those physically connected to the cluster's network. For remote access those machines are 2 SSH hops away. How this problem is usually solved? I thought in setting up a VPN, but to me this seems like last resort.
What I want is that everybody in the lab, using their own versions of Matlab, with the correct Toolbox, just run their code in the cluster somewhat effortlessly. I guess I could ask to everybody just tar-ball their files and access a remote installation of matlab, somehow forwarding the GUI session (VNC or X-Forward), but that seem ugly.
Any help?

It is possible to set up "remote access" to a cluster running MDCS so that clients without direct access can submit jobs there. The documentation for this starts here:
http://www.mathworks.com/help/mdce/configure-parallel-computing-products-for-a-generic-scheduler.html
I'm not quite sure how to configure things so that the submission can work across two SSH connections - the example integration scripts shipping with MDCS all presume only one. However, it should be possible providing that:
The client can put the job and task files somewhere the execution nodes can see them
The client can trigger the appropriate qsub or whatever on the cluster headnode
You might also consider simply contacting MathWorks installation support.

Related

How to launch a "rogue" cli server as unprivileged user

Let's state a situation:
I have the possibility to run arbitrary commands on a server as an unprivileged user, through "unconventional means".
I do not have the possibility to login using ssh to that server, either as my unprivileged user or anything else. So I do not have currently a CLI allowing me to run any commands I would like in a "normal" way.
I can ping that server and nothing prevents me to connect to arbitrary ports.
I still would like to have a command line to allow me to run arbitrary command as i wish on that server.
Theoretically nothing would prevent me to launch any program as my unprivileged user, including one that would open a port, allow some remote user to connect to it and just forward any commands to bash, returning the result. I just don't know any good program to do that.
So, does any one know? I looked at ways to launch ssh_server as an unprivileged user but some users reported that recent versions of ssh_server do not allow that anymore. Actually I don't even need ssh specifically, any way to get a working CLI would do the trick. Even a crappy node.js program launching an http server would work, as long as I have a CLI (... and it's not excessively crappy, the goal is to have a clean CLI, not something that bugs every two characters).
In case you would ask why I would like to do that, it's not related to anything illegal ^^. I just have to work with a very crappy Jenkins server for which I'm not allowed to have direct access to its agents. Whoever is responsible for that server doesn't give a sh** about its users' needs so we have to use hacky solutions just to have some diagnostic data about that server (like ram, cpu and disk usage, installed programs, etc...). Having a CLI that I can launch some time instead of altering a build configuration and waiting 20 minutes to have an answer about what's going on would really help.
Thanks in advance for any answer.
So do you have shell access to the server at least once? E.g., during the single day of the month when you are physically present at the site of your client or the outsourcing contractor?
And if you have shell access then, can you or your sysmin install Cockpit?
It listens on port 9090.
You can then use the credentials of your local user and open a terminal window in your browser. See sidebar item "Terminal" on the screenshots of the cockpit homepage.
According to the documentation
Cockpit has no special privileges and doesn’t run as root. It creates a session as the logged in user and has the same permissions as that user.

Is it possible to write a shell script that takes input then will ssh into a server and run scripts on my local machine?

I have several scripts on my local machine. These scripts run install and configuration commands to setup my Elasticsearch nodes. I have 15 nodes coming and we definitely do not want to do that by hand.
For now, let's call them Script_A, Script_B, Script_C and Script_D.
Script_A will be the one to initiate the procces, it currently contains:
#!/bin/bash
read -p "Enter the hostname of the remote machine: " hostname
echo "Now connecting to $hostname!"
ssh root#$hostname
This works fine obviously and I can get into any server I need to. My confusion is running the other scripts remotely. I have read few other articles/SO questions but I'm just not understanding the methodology.
I will have a directory on my machine as follows:
Elasticsearch_Installation
|
|=> Scripts
|
|=> Script_A, Script_B, etc..
Can I run the Script_A, which remotes into the server, then come back to my local and run Script_B and so on within the remote server without moving the files over?
Please let me know if any of this needs to be clarified, I'm fairly new to the Linux environment in general.. much less running remote installs from scripts over the network.
Yes you can. Use ssh in non interactive mode, it will be like launching a command in your local environment.
ssh root#$hostname /remote/path/to/script
Nothing will be changed in your local system, you will be at the same point where you launched the ssh command.
NB: this command will ask you a password, if you want a really non interactive flow, set up host a passwordless login, like explained here
How to ssh to localhost without password?
You have a larger problem than just setting up many nodes: you have to be concerned with ongoing maintenance and administration of all those nodes, too. This is the space in which configuration management systems such as Puppet, Ansible, and others operate. But these have a learning curve to overcome, and they require some infrastructure of their own. You would probably benefit from one of them in the medium-to-long term, but if your new nodes are coming next week(ish) then you probably want a solution that you can use immediately to get up and going.
Certainly you can ssh into the server to run commands there, including non-interactively.
My confusion is running the other scripts remotely.
Of course, if you want to run your own scripts on the remote machine then they have to be present there, first. But this is not a major problem, for if you have ssh then you also have scp, the secure copy program. It can copy files to the remote machine, something like this:
#!/bin/bash
read -p "Enter the hostname of the remote machine: " hostname
echo "Now connecting to $hostname!"
scp Script_[ABCD] root#${hostname}:./
ssh root#hostname ./Script_A
I also manage Elasticsearch clusters with multiple nodes. A hack that works for me is using Terminator Terminal Emulator and split it into multiple windows/panes, one for each ES node. Then you can broadcast the commands you type in one window into all the windows.
This way, you run commands & view their results almost interactively across all nodes parallely. You could also save this layout of windows in Terminator, and then you can get this view quickly using a shortcut.
PS, this approach will only work of you have only small number of nodes & that too for small tasks only. The only thing that will scale with the number of nodes & the number of times and variety of tasks you need to perform will probably be a config management solution like Puppet or Salt.
Fabric is another interesting project that may be relevant to your use case.

What is the best way to set this working environment for my research group?

We recently got a supercomputer (I will call it the "cluster", it has 4 GPUs and 12-core processor with some decent storage and RAM) to our lab for machine learning research. A Linux distro (most possibly CentOS or Ubuntu depending on your suggestions of course) will be installed in the machine. We want to design the remote access in such a way that we have the following user hierarchy:
Admin (1 person, the professor): This will be the only superuser of the cluster.
Privileged User (~3 people, PhD students): These guys will be the more tech-savvy or long-term researchers of the lab that will have a user defined for themselves at the cluster. They should be able to setup their own environment (through docker or conda), remote dev their projects and transfer files in and out of the cluster freely.
Regular User (~3 people, Master's students): We expect these kind of users to only interact with the cluster for its computing capabilities and the data it stores. They should not have their own user at the cluster. It is ok if they can only use Jupyter Notebooks. They should be able to access the read-only data in the cluster as the data we are working on will be too much for them to download it locally. However, they should not be able to change anything within the cluster and only be able to have their notebooks and a number of output files there which they should be able to download to their local system whenever necessary for reporting purposes.
We also want to allocate only a certain portion of our computing capabilities for type 3 users. The others should be able to access all the capabilities when they need.
For all users, it should be easy to access the cluster from whatever OS they have on their personal computers. For type 1 and 2 I think PyCharm for remote developing .py files and tunneling for jupyter notebooks is the best option.
I did a lot of research on this but since I don't have an IT background I cannot be sure if the following approach would work.
Set up JupyterHub for type 3 users. This way we don't have to have these guys to have a user at the cluster. However, I am not sure about the GPU support of this. According to here, we can only limit CPU per user. Also, will they be able to access the data under Admin's home directory when we set up the hub or do we have to duplicate the data for that? We only want them to be able to access specific portions of data (the ones related to whatever project they are working on since they sign a confidentiality to only that project). Is this possible with JuptyterHub?
The rest (type-1 and type-2) will have their (sudo or not) users at the cluster. For this case, is there UI to workaround so that users can more easily transfer files from and to the cluster (that they don't have to use scp)? Is FileZilla an option for example?
Finally, if the type-2 users can resolve the issues type-3 users have so that they don't have refer to the professor each time they have a problem. But afaik, you have to be a superuser to control stuff at JupyterHub.
If anyone had to setup this kind of an environment at their own lab and share their experiences I would be grateful.

Jenkins - Managing a pool of resources

I'm trying to set up a Jenkins system where a certain program has to be run on a board on the network, accessed using telnet. We're talking about hundreds of such jobs here, therefore we will be setting up multiple boards. Therefore, each job has to be allocated a board, but the catch is that only one job can have a certain board at the same time, otherwise the program fails.
The solution I have right now is using a master-slave set-up where I connect to the same machine using SSH (so a master and multiple slaves on the same machine). Each of the slave nodes then has a label for the IP address the program has to telnet to. This works, scheduling wise, but it might cause issues because all nodes connect using SSH to the same machine. Connecting to the boards using SSH is not an option.
Is there any way to get the same functionality as above, but then without using SSH to connect to the same machine? So basically I want to be able to say: we have n available machines, when a job comes in give it one of those machines and pass it a label belonging to that machine (its IP address in this case); now there are n-1 machines left.
Mutual exclusion comes close, but does not allow the above functionality, and jobs waiting for a resource take up one of the executors of a node.
Thanks a lot!
I realize your problem is probably solved already years ago, but in case someone else is looking for the answer and runs into this.
You can use "Lockable resources" plugin and set the ip address as the name of the resource and use label such use test-board-ip.It is simple and easy to use.
Another possibility is to use "External resources dispatcher" plugin. It provides a bit more possibilities, but it has a bug that causes it to hang sometimes. And it seems there is no maintenance any more (last updates from 2013).
Maybe you should hava a look at the Lock and Latches Plugin. You are able to lock a resource with this plugin with only requireing the job to lock the board you want to.
https://wiki.jenkins-ci.org/display/JENKINS/Locks+and+Latches+plugin

Automated deployment of files to multiple Macs

We have a set of Mac machines (mostly PPC) that are used for running Java applications for experiments. The applications consist of folders with a bunch of jar files, some documentation, and some shell scripts.
I'd like to be able to push out new version of our experiments to a directory on one Linux server, and then instruct the Macs to update their versions, or retrieve an entire new experiment if they don't yet have it.
../deployment/
../deployment/experiment1/
../deployment/experiment2/
and so on
I'd like to come up with a way to automate the update process. The Macs are not always on, and they have their IP addresses assigned by DHCP, so the server (which has a domain name) can't contact them directly. I imagine that I would need some sort of daemon running full-time on the Macs, pinging the server every minute or so, to find out whether some "experiments have been updated" announcement has been set.
Can anyone think of an efficient way to manage this? Solutions can involve either existing Mac applications, or shell scripts that I can write.
You might have some success with a simple Subversion setup; if you have the dev tools on your farm of Macs, then they'll already have Subversion installed.
Your script is as simple as running svn up on the deployment directory as often as you want and checking your changes in to the Subversion server from your machine. You can do this without any special setup on the server.
If you don't care about history and a version control system seems too "heavy", the traditional Unix tool for this is called rsync, and there's lots of information on its website.
Perhaps you're looking for a solution that doesn't involve any polling; in that case, maybe you could have a process that runs on each Mac and registers a local network Bonjour service; DNS-SD libraries are probably available for your language of choice, and it's a pretty simple matter to get a list of active machines in this case. I wrote this script in Ruby to find local machines running SSH:
#!/usr/bin/env ruby
require 'rubygems'
require 'dnssd'
handle = DNSSD.browse('_ssh._tcp') do |reply|
puts "#{reply.name}.#{reply.domain}"
end
sleep 1
handle.stop
You can use AppleScript remotely if you turn on Remote Events on the client machines. As an example, you can control programs like iTunes remotely.
I'd suggest that you put an update script on your remote machines (AppleScript or otherwise) and then use remote AppleScript to trigger running your update script as needed.
If you update often then Jim Puls idea is a great one. If you'd rather have direct control over when the machines start looking for an update then remote AppleScript is the simplest solution I can think of.

Resources