What is the best way to set this working environment for my research group? - remote-access

We recently got a supercomputer (I will call it the "cluster", it has 4 GPUs and 12-core processor with some decent storage and RAM) to our lab for machine learning research. A Linux distro (most possibly CentOS or Ubuntu depending on your suggestions of course) will be installed in the machine. We want to design the remote access in such a way that we have the following user hierarchy:
Admin (1 person, the professor): This will be the only superuser of the cluster.
Privileged User (~3 people, PhD students): These guys will be the more tech-savvy or long-term researchers of the lab that will have a user defined for themselves at the cluster. They should be able to setup their own environment (through docker or conda), remote dev their projects and transfer files in and out of the cluster freely.
Regular User (~3 people, Master's students): We expect these kind of users to only interact with the cluster for its computing capabilities and the data it stores. They should not have their own user at the cluster. It is ok if they can only use Jupyter Notebooks. They should be able to access the read-only data in the cluster as the data we are working on will be too much for them to download it locally. However, they should not be able to change anything within the cluster and only be able to have their notebooks and a number of output files there which they should be able to download to their local system whenever necessary for reporting purposes.
We also want to allocate only a certain portion of our computing capabilities for type 3 users. The others should be able to access all the capabilities when they need.
For all users, it should be easy to access the cluster from whatever OS they have on their personal computers. For type 1 and 2 I think PyCharm for remote developing .py files and tunneling for jupyter notebooks is the best option.
I did a lot of research on this but since I don't have an IT background I cannot be sure if the following approach would work.
Set up JupyterHub for type 3 users. This way we don't have to have these guys to have a user at the cluster. However, I am not sure about the GPU support of this. According to here, we can only limit CPU per user. Also, will they be able to access the data under Admin's home directory when we set up the hub or do we have to duplicate the data for that? We only want them to be able to access specific portions of data (the ones related to whatever project they are working on since they sign a confidentiality to only that project). Is this possible with JuptyterHub?
The rest (type-1 and type-2) will have their (sudo or not) users at the cluster. For this case, is there UI to workaround so that users can more easily transfer files from and to the cluster (that they don't have to use scp)? Is FileZilla an option for example?
Finally, if the type-2 users can resolve the issues type-3 users have so that they don't have refer to the professor each time they have a problem. But afaik, you have to be a superuser to control stuff at JupyterHub.
If anyone had to setup this kind of an environment at their own lab and share their experiences I would be grateful.

Related

How to share an AI Platform JupyterLab among multiple users?

This is my understanding that anyone with project editor permissions can access the AI Platform Jupyter notebooks, which is great but not very practical since this could cause several issues. I would like to use this environment as an "always-on" machine with GPU enabled and allow different people in my team to access it. Right now everyone is logged in as the default "jupyter" user when logging in with OPEN JUPYTERLAB button. Is there a way to log in with different credentials?
Any tips would be greatly appreciated!
Today it's not possible to use different credentials with the OPEN JUPYTERLAB button.
However, you have SSH access to the underlying VM. You could have each person SSH into the VM with port forwarding. Then everyone would be directly using JupyterLab without the Proxy intermediary.
It's a little less user friendly, but will get you distinct users

Docker containers as virtual desktop clones

I see that a number of people have set up Docker containers with Guacamole or other tools to allow them to remote in to GUI as if the container was a remote Linux desktop. A friend of mine had a conversation with a professor who told him that they set up Ubuntu desktop access for their students via ubuntu/rdp docker containers.
It's an attractive concept for efficiently packed cloned desktops since you don't need 50 copies of the guest OS, but how would you manage such a swarm without a connection broker like a VDI solution or a hypervizer console like a KVM setup? Would you simply use standard docker (or swarm) management tools to manage the containers themselves, then some separate remote client for the actual remote control connections?
I'm currently reading up on Docker, but unclear: If each desktop is the same, so say Firefox, LibreOffice, etc. Is there any way to gain efficiency by sharing these resources as well? For instance, could there be a container with those resources that the others all connect to... or have it shared on a lower level like the OS? Looking for any way to gain efficiency, lower overall cpu, ram, etc for all combined machines on server. Really looking for anything other than a separate copy of the same thing in each container.
I see that there are solutions for shared persistent storage in containers like Hatchway. Are there other issues caused by statelessness of the container that this does not address?
Also, I see a few ways people have cobbled together internet connectivity for docker containers (like IP per container), but most of the older posts are people frustrated with the process. Is there now a standard or preferred way to do something like this?
Or, if docker/containers are absolutely the wrong way to go about setting up the most efficient possible Linux remote desktop clones, I'd love to understand exactly what part does not work so I can find the right way.
I see after days of reading that LXD is actually what I'm looking for (Linux machine containers) instead of Docker (process containers).

Use Microsoft Azure as a computing cluster

My lab just got a sponsorship from Microsoft Azure and I'm exploring how to utilize it. I'm new to industrial level cloud service and pretty confused about tons of terminologies and concepts. In short, here is my scenario:
I want to experiment the same algorithm with multiple datasets, aka data parallelism.
The algorithm is implemented with C++ on Linux (ubuntu 16.04). I made my best to use static linking, but still depends on some dynamic libraries. However these dynamic libraries can be easily installed by apt.
Each dataset is structured, means data (images, other files...) are organized with folders.
The idea system configuration would be a bunch of identical VMs and a shared file system. Then I can submit my job with 'qsub' from a script or something. Is there a way to do this on Azure?
I investigated the Batch Service, but having trouble installing dependencies after creating compute node. I also had trouble with storage. So far I only saw examples of using Batch with Blob storage, with is unstructured.
So are there any other services in Azure can meet my requirement?
I somehow figured it out my self based on the article: https://azure.microsoft.com/en-us/documentation/articles/virtual-machines-linux-classic-hpcpack-cluster/. Here is my solution:
Create a HPC Pack with a Windows head node and a set of Linux compute node. Here are several useful template in Marketplace.
From Head node, we can execute command inside Linux compute node, either inside HPC Cluster Manager, or using "clusrun" inside PowerShell. We can easily install dependencies via apt-get for computing node.
Create a File share inside one of the storage account. This can be mounted by all machines inside the cluster.
One glitch here is that for some encryption reason, you can not mount the File share on Linux machines outside the Azure. There are two solutions in my head: (1) mount the file share to Windows head node, and create file sharing from there, either by FTP or SSH. (2) create another Linux VM (as a bridge), mount the File share on that VM and use "scp" to communicate with it from outside. Since I'm not familiar with Windows, I adopted the later solution.
For executable, I simply uploaded the binary executable compiled on my local machine. Most dependencies are statically linked. There are still a few dynamic objects, though. I upload these dynamic object to the Azure and set LD_LIBRARY_PATH when execute programs on computing node.
Job submission is done in Windows head node. To make it more flexible, I wrote a python script, which writes XML files. The Job Manager can load these XML files to create a job. Here are some instructions: https://msdn.microsoft.com/en-us/library/hh560266(v=vs.85).aspx
I believe there should be more a elegant solution with Azure Batch Service, but so far my small cluster runs pretty well with HPC Pack. Hope this post can help somebody.
Azure files could provide you a shared file solution for your Ubuntu boxes - details are here:
https://azure.microsoft.com/en-us/documentation/articles/storage-how-to-use-files-linux/
Again depending on your requirement you can create a pseudo structure via Blob storage via the use of containers and then the "/" in the naming strategy of your blobs.
To David's point, whilst Batch is generally looked at for these kind of workloads it may not fit your solution. VM Scale Sets(https://azure.microsoft.com/en-us/documentation/articles/virtual-machine-scale-sets-overview/) would allow you to scale up your compute capacity either via load or schedule depending on your workload behavior.

How to download files using OFF (Owner Free File System) P2P?

I am testing P2P apps. I have downloaded OFF (Owner free Filesystem) P2P from the below link:
http://sourceforge.net/projects/offsystem/files/OFF%20System/
But I am unable to download any files using this client. I am not getting error messages even. I have referred following link also.
REF:
http://www.ghacks.net/2009/04/10/p2p-the-owner-free-file-system/
Please suggest some ideas if any of you used this OFF system.
From what I recall, the program ships with a list of bootstrap nodes. Chances are as active development ceased (at least as far as I am aware) many years ago that none of the bootstrap nodes are online.
I doubt that you will be able to get the network to function as there are unlikely to be any other nodes still running.
If you were to set-up a cluster of VM's running the software, it should be possible to set a bootstrap node in the config somewhere, once it has a connection, it will retrieve a list of other nodes that it can connect to.

Accessing Matlab MDCS Cluster over SSH

I just installed Matlab's Distributed Computing Server on a bunch of machines and it works, but only for those physically connected to the cluster's network. For remote access those machines are 2 SSH hops away. How this problem is usually solved? I thought in setting up a VPN, but to me this seems like last resort.
What I want is that everybody in the lab, using their own versions of Matlab, with the correct Toolbox, just run their code in the cluster somewhat effortlessly. I guess I could ask to everybody just tar-ball their files and access a remote installation of matlab, somehow forwarding the GUI session (VNC or X-Forward), but that seem ugly.
Any help?
It is possible to set up "remote access" to a cluster running MDCS so that clients without direct access can submit jobs there. The documentation for this starts here:
http://www.mathworks.com/help/mdce/configure-parallel-computing-products-for-a-generic-scheduler.html
I'm not quite sure how to configure things so that the submission can work across two SSH connections - the example integration scripts shipping with MDCS all presume only one. However, it should be possible providing that:
The client can put the job and task files somewhere the execution nodes can see them
The client can trigger the appropriate qsub or whatever on the cluster headnode
You might also consider simply contacting MathWorks installation support.

Resources