Read from a XFS brick, write on a volume? - glusterfs

Filesystem notifications are not available on volumes, the reason why we started reading directly from brick.
Is it okay to read directly from a brick, but write to a volume so that replication happens?
The volume is created using 3 bricks using a replication strategy. Could anyone please suggest the demerits of directly reading from brick.

If the file on the brick from which you read is not in sync with the other copy/copies of the replica (i.e. there is a self-heal that is pending), you can get stale data. Reading from the mount ensures that you always get the up to date data.
Though not comparable with inotify, you can use glusterfind to provide some level of filesystem notifications.

Related

Can kubernetes provide a pod with an emptyDir volume from the host backed by a specific filesystem different than the host's?

I know this is a bit weird, but I'm building an application that makes small local changes to ephemeral file/folder systems and needs to sync them with a store of record. I am using NFS right now, but it is slow, not super scalable, and expensive. Instead, I'd love to take advantage of btrfs or zfs snapshotting for efficient syncing of snapshots of a small local filesystem, and push the snapshots into cloud storage.
I am running this application in Kubernetes (in GKE), which uses GCP VMs with ext4 formatted root partitions. This means that when I mount an emptyDir volume into my pods, the folder is on an ext4 filesystem I believe.
Is there an easy way to get an ephemeral volume mounted with a different filesystem that supports these fancy snapshotting operations?
No. Nor does GKE offer that kind of low level control anyway but the rest of this answer presumes you've managed to create a local mount of some kind. The easiest answer is a hostPath mount, however that requires you manually account for multiple similar pods on the same host so they don't collide. A new option is an ephemeral CSI volume combined with a CSI plugin that basically reimplements emptyDir. https://github.com/kubernetes-csi/csi-driver-host-path gets most of the way there but would 1) require more work for this use case and 2) is explicitly not supported for production use. Failing either of those, you can move the whole kubelet data directory onto another mount, though that might not accomplish what you are looking for.

Are docker volumes better option for write heavy operations than binding directories directly?

Reading through docker documentation I found this passage (located here):
Block-level storage drivers such as devicemapper, btrfs, and zfs perform better for write-heavy workloads (though not as well as Docker
volumes).
So does this mean that one should always use docker volumes when expecting lot's of persistent writing?
The container-local filesystem never stores persistent data, so you don't have a choice but to mount something into the container if you want data to live on after the container exits. The "block-level storage drivers" you quote discuss particular install-time options for how images and containers are stored, and aren't related to any particular volume or bind-mount implementation.
As far as performance goes, my general expectation is that the latency of disk I/O will far outweigh any overhead of any particular implementation. Without benchmarking any particular implementation, on a native Linux host, I would expect a named volume, a bind-mount, and writes to the container filesystem to be more or less similar.
From a programming point of view, you will probably get better long-term performance improvement from figuring out how to have fewer disk accesses (for example, by grouping together related database requests into a single transaction) than by trying to optimize the Docker-level storage.
The one prominent exception to this is that bind mounts on MacOS are known to be very slow and you should avoid them if your workload involves substantial disk access. (This includes both reading and writing, and includes some interpreted languages that want to read in every possible source file at startup time.) If you're managing something like database storage where you can't usefully directly access the files anyways, use a named volume. For your application code, COPY it into an image in a Dockerfile and do not overwrite it at run time.
should always use docker volumes when expecting lot's of persistent writing?
It depends.
Yes you want some kind of external to the container storage for any persistent data since data written inside the container is lost when that container is removed.
Whether that should be a host bind or named volume depends on how you need to manage that data. A host volume is a bind mount to the host filesystem. It gives you direct access to that data, but that direct access also comes with uid/gid permission issues and losses the initialization feature of named volumes.
Named volumes with all the defaults is just a bind mount to a folder under /var/lib/docker, so performance would be the same as a host volume of the underlying filesystem is the same. That said the named volume can be configured to mount just about anything you can do with the mount command.
Since each of these options can have varying underlying filesystem, and the performance difference comes from that underlying filesystem choice, there's no way to answer this in any generic sense. Hence, it depends.

How to perform backup and restore of Janusgraph database which is backed by Apache Cassandra?

I'm having trouble in figuring out on how to take the backup of Janusgraph database which is backed by persistent storage Apache Cassandra.
I'm looking for correct methodology on how to perform backup and restore tasks. I'm very new to this concept and have no idea on how to do this. It will be highly appreciated if someone explain the correct approach or point me to rightful documentation to safely execute the tasks.
Thanks a lot for your time.
Cassandra can be backed up a few ways. One way is called a "snapshot". You can issue this via "nodetool snapshot" command. What cassandra will do is to create a "snapshots" sub-directory, if it doesn't already exist, under each table that's being "backed up" (each table has its own directory where it stores its data) and then it will create the specific snapshot directory for this particular occurrence of the snapshot (either you can name the directory with the "nodetool snapshot" parameter or let it default). Cassandra will then create soft links to all of the sstables that exist for that particular table - looping through each table, keyspace or database - depending on your "nodetool snapshot" parameters. It's very fast as creating soft links takes almost 0 time. You will have to perform this command on each node in the cassandra cluster to back up all of the data. Each node's data will be backed up to the local host. I know DSE, and possibly Apache, are adding functionality to back up to object storage as well (I don't know if this is an OpsCenter-only capability or if it can be done via the snapshot command as well). You will have to watch the space consumption on this as there are no processes to clean these up.
Like many database systems, you can also purchase/use 3rd party software to perform backups (e.g. Cohesity (formally Talena), Rubrik, etc.). We use one such product in our environments and it works well (graphical interface, easy-to-use point-in-time recoveryt, etc.). They also offer easy-to-use "refresh" capabilities (e.g. refresh your PT environment from, say, production backups).
Those are probably the two best options.
Good luck.

What does the GlusterFS server option cluster.readdir-optimize control?

I have been trying to optimise the small file performance of my GlusterFS storage cluster.
A number of forum threads and blog posts seem to suggest setting the cluster.readdir-optimize property on the volume, like:
$ gluster volume get test-share cluster.readdir-optimize on
The default for this option (as of GlusterFS v3.10) seems to be off, which makes me think there must be some trade-off to having this feature enabled. However, I have not been able to find anywhere any documentation explaining exactly what this option does.
I would like to understand the function of this option before I enable it in production.
As noted in the relevant GlusterFS git repository commit message, the readdir-optimize option supports the following:
Bring in option which is supported by posix xlator
to filter out directory's entries from being returned.
DHT would now request non-first subvols to filter out
directory entries.
I don't fully understand how this directly improves performance in GlusterFS with respect to small files. But according to the GlusterFS documentation the BD xalator performs the function of wrapping the GlusterFS block back-end and enables GlusterFS volumes to be composed of bricks which are themselves underlying logical volumes.

Can I use GlusterFS volume storage directly without mounting?

I have setup small cluster of GlusterFS with 3+1 nodes.
They're all on the same LAN.
There are 3 servers and 1 laptop (via Wifi) that is also GlusterFS node.
A laptop often disconnects from the network. ;)
Use case I want to achieve is this:
I want my laptop to automatically synchronize with GlusterFS filesystem when it reconnects. (That's easy and done.)
But, when laptop is disconnected from cluster I still want to access filesystem "offline". Modify, add, remove files..
Obviously the only way I can access GlusterFS filesystem when it's offline from cluster, is accessing volume storage directly. The one I configured creating a gluster volume. I guess it's the brick.
Is it safe to modify files inside storage?
Will they be replicated to the cluster when the node re-connects?
There are multiple questions in your list:
First: Can I access GlusterFS when my system is not connected to it:
If you setup a GlusterFS daemon & brick on your system, mount this local daemon through gluster how you would usually do that and add a replication target also, you can access your brick through gluster as if it was not on your local system. The data will then be synchronized with the replication target once you re-connect your system to the network.
Second: Can I edit files in my brick directly:
Technically you can: You can just navigate to your brick and edit a file, however since gluster will not know what you changed, the changes will not be replicated and you will create a split brain situation. So it is certainly not advisable (so don't do that unless you want to change it manually in your replication brick also).
Tomasz, it is definitely not a good idea to directly manipulate the backend volume. Say you add a new file to the backend volume, glusterfs is not aware of this change and the file appears as spurious file when the parent directory is accessed via the glusterfs volume. I am not sure if glusterfs is ideal for your usecase

Resources