Creating shared space for PBS Pro - linux

How do you create shared space across nodes?
I have a designated drive that I would like to use but maintain the ability to add additional drives later

Let's assume you are just starting out and do not have any specific performance requirements. Then probably the easiest way to go would be to start an NFS server on the head node and export your dedicate drive as an NFS file share to the nodes. Your nodes would be able to mount this share over the network under the same mountpoint.
If your dedicated drives are spread across the cluster, the problem obviously gets trickier. After you have become comfortable with NFS, have a look at parallel file systems such as Gluster FS.

Related

Can kubernetes provide a pod with an emptyDir volume from the host backed by a specific filesystem different than the host's?

I know this is a bit weird, but I'm building an application that makes small local changes to ephemeral file/folder systems and needs to sync them with a store of record. I am using NFS right now, but it is slow, not super scalable, and expensive. Instead, I'd love to take advantage of btrfs or zfs snapshotting for efficient syncing of snapshots of a small local filesystem, and push the snapshots into cloud storage.
I am running this application in Kubernetes (in GKE), which uses GCP VMs with ext4 formatted root partitions. This means that when I mount an emptyDir volume into my pods, the folder is on an ext4 filesystem I believe.
Is there an easy way to get an ephemeral volume mounted with a different filesystem that supports these fancy snapshotting operations?
No. Nor does GKE offer that kind of low level control anyway but the rest of this answer presumes you've managed to create a local mount of some kind. The easiest answer is a hostPath mount, however that requires you manually account for multiple similar pods on the same host so they don't collide. A new option is an ephemeral CSI volume combined with a CSI plugin that basically reimplements emptyDir. https://github.com/kubernetes-csi/csi-driver-host-path gets most of the way there but would 1) require more work for this use case and 2) is explicitly not supported for production use. Failing either of those, you can move the whole kubelet data directory onto another mount, though that might not accomplish what you are looking for.

Are docker volumes better option for write heavy operations than binding directories directly?

Reading through docker documentation I found this passage (located here):
Block-level storage drivers such as devicemapper, btrfs, and zfs perform better for write-heavy workloads (though not as well as Docker
volumes).
So does this mean that one should always use docker volumes when expecting lot's of persistent writing?
The container-local filesystem never stores persistent data, so you don't have a choice but to mount something into the container if you want data to live on after the container exits. The "block-level storage drivers" you quote discuss particular install-time options for how images and containers are stored, and aren't related to any particular volume or bind-mount implementation.
As far as performance goes, my general expectation is that the latency of disk I/O will far outweigh any overhead of any particular implementation. Without benchmarking any particular implementation, on a native Linux host, I would expect a named volume, a bind-mount, and writes to the container filesystem to be more or less similar.
From a programming point of view, you will probably get better long-term performance improvement from figuring out how to have fewer disk accesses (for example, by grouping together related database requests into a single transaction) than by trying to optimize the Docker-level storage.
The one prominent exception to this is that bind mounts on MacOS are known to be very slow and you should avoid them if your workload involves substantial disk access. (This includes both reading and writing, and includes some interpreted languages that want to read in every possible source file at startup time.) If you're managing something like database storage where you can't usefully directly access the files anyways, use a named volume. For your application code, COPY it into an image in a Dockerfile and do not overwrite it at run time.
should always use docker volumes when expecting lot's of persistent writing?
It depends.
Yes you want some kind of external to the container storage for any persistent data since data written inside the container is lost when that container is removed.
Whether that should be a host bind or named volume depends on how you need to manage that data. A host volume is a bind mount to the host filesystem. It gives you direct access to that data, but that direct access also comes with uid/gid permission issues and losses the initialization feature of named volumes.
Named volumes with all the defaults is just a bind mount to a folder under /var/lib/docker, so performance would be the same as a host volume of the underlying filesystem is the same. That said the named volume can be configured to mount just about anything you can do with the mount command.
Since each of these options can have varying underlying filesystem, and the performance difference comes from that underlying filesystem choice, there's no way to answer this in any generic sense. Hence, it depends.

Can I use GlusterFS volume storage directly without mounting?

I have setup small cluster of GlusterFS with 3+1 nodes.
They're all on the same LAN.
There are 3 servers and 1 laptop (via Wifi) that is also GlusterFS node.
A laptop often disconnects from the network. ;)
Use case I want to achieve is this:
I want my laptop to automatically synchronize with GlusterFS filesystem when it reconnects. (That's easy and done.)
But, when laptop is disconnected from cluster I still want to access filesystem "offline". Modify, add, remove files..
Obviously the only way I can access GlusterFS filesystem when it's offline from cluster, is accessing volume storage directly. The one I configured creating a gluster volume. I guess it's the brick.
Is it safe to modify files inside storage?
Will they be replicated to the cluster when the node re-connects?
There are multiple questions in your list:
First: Can I access GlusterFS when my system is not connected to it:
If you setup a GlusterFS daemon & brick on your system, mount this local daemon through gluster how you would usually do that and add a replication target also, you can access your brick through gluster as if it was not on your local system. The data will then be synchronized with the replication target once you re-connect your system to the network.
Second: Can I edit files in my brick directly:
Technically you can: You can just navigate to your brick and edit a file, however since gluster will not know what you changed, the changes will not be replicated and you will create a split brain situation. So it is certainly not advisable (so don't do that unless you want to change it manually in your replication brick also).
Tomasz, it is definitely not a good idea to directly manipulate the backend volume. Say you add a new file to the backend volume, glusterfs is not aware of this change and the file appears as spurious file when the parent directory is accessed via the glusterfs volume. I am not sure if glusterfs is ideal for your usecase

Processing speed over mounted path

I have two scenarios.
Scenario 1: Machine A contains 1000 documents as folders. This folder of machine A is mounted in machine B. I process documents within these folders in machine B and store the output result in mounted path in machine B.
Scenario 2: The documents in machine A is directly copied into machine B and processed
Scenario 2 is much faster than Scenario 1. I could guess its because there is no data transfer happening over the network between 2 machines. Is there a way I can use mounting and still achieve better performance?
Did you try enabling a cache? - for NFS: https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Storage_Administration_Guide/fscachenfs.html - CIFS should have caching enabled by default (unless you disabled it)
The other option would be to use something like Windows’ offline files, which copies files and folders between client and server in the background, so you don’t need to deal with it. The only thing I’ve found for linux is OFS.
But the performance depends on the size of the files and if you read them randomly or sequentially. For instance when I am encoding videos, I access the file right away via the network from my NFS, because it takes as much time as it would take to read and write the file. This way no additional time is “wasted” on the encoding, as the application can encode the stream which is coming from the network.
So for large files you might want to change the algorithms to a sequential read, on the other hand small files which are copied within seconds, could be also synced between server and client using rsync, bittorrent sync, dropbox or one of the other hundreds of tools. And this is actually quite commonly done.

How to implement Shared Storage for Concurrent File Access between 2 nodes (Linux)

I need to design a Clustered application which runs separate instances on 2 nodes. These nodes are both Linux VM's running on VMware. Both application instances need to access a database & a set of files.
My intention is that a shared storage disk (external to both nodes) should contain the database & files. The applications would co-ordinate (via RPC-like mechanism) to determine which instance is master & which one is slave. The master would have write-access to the shared storage disk & the slave will have read-only access.
I'm having problems determining the file system for the shared storage device, since it would need to support concurrent access across 2 nodes. Going for a proprietary clustered file system (like GFS) is not a viable alternative owing to costs. Is there any way this can be accomplished in Linux (EXT3) via other means?
Desired behavior is as follows:
Instance A writes to file foo on shared disk
Instance B can read whatever A wrote into file foo immediately.
I also tried using SCSI PGR3 but it did not work.
Q: Are both VM's co-located on the same physical host?
If so, why not use VMWare shared folders?
Otherwise, if both are co-located on the same LAN, what about good old NFS?
try using heartbeat+pacemaker, it has couple of inbuilt options to monitor cluster. Should have something to look for data too
you might look at an active/passive setup with "drbd+(heartbeat|pacemaker)" ..
drbd gives you a distributed block device over 2 nodes, where you can deploy an ext3-fs ontop ..
heartbeat|pacemaker gives you a solution to handle which node is active and passive and some monitoring/repair functions ..
if you need read access on the "passive" node too, try to configure a NAS too on the nodes, where the passive node may mount it e.g. nfs|cifs ..
handling a database like pgsq|mysql on a network attached storage might not work ..
Are you going to be writing the applications from scratch? If so, you can consider using Zookeeper for the coordination between master and slave. This will put the coordination logic purely into the application code.
GPFS Is inherently a clustered filesystem.
You setup your servers to see the same lun(s), build the GPFS filesystem on the lun(s) and mount the GPFS filesystem on the machines.
If you are familiar with NFS, it looks like NFS, but it's GPFS, A Clustered filesystem by nature.
And if one of your GPFS servers goes down, if you defined your environment correctly, no one is the wiser and things continue to run.

Resources