Network File System mount, with local fallback? - linux

I'm trying to set up a directory that is stored on a local (Linux) server and mirrored on several (windows) computers.
I've looked at options like Unison, but with the type of data being worked on it's very important to have instantaneous sync to the server.
I have also looked at NFS mounting (ie sftp, webdav, etc.), but these clients are taking real-time data and I can't afford any network connectivity problems.
What would be perfect is a combination of the two : NFS mount (and "instantaneous sync") for as long as the network is up, with a local fallback storage location in case there is a network problem. Any ideas?

You might consider using one of the following:
Block device emulation:
DRBD
iSCSI + RAID1
Distributed file systems:
There are a number of file systems in this segment, with varying semantics and feature sets. I'm naming a few, though I haven't used any of those: GlusterFS, XtreemFS, Lustre.
I remember reading about a mirroring FUSE file system driver that was able to direct read traffic to the fastest (local) mirror using latency profiling. You could see if you can find it again here.

Related

map 1 iscsi disk to see two hosts I throw files to one machine, the other doesn't have to reboot the machine. will see

I map 1 iscsi disk to see two hosts, what else do I need to set? For linux, now all I can do is see both disks the same, but when I throw files to one machine, the other doesn't have to reboot the machine. will see
Is this what you're trying to do? Two machines connected to the same LUN via iSCSI?
host A -- iscsi --\
>-- LUN
host B -- iscsi --/
If that's the case, you will need what is called a "clustered file system" to make it work without data corruption. As you've seen, it will appear to work, sort of, with a sync mounted Linux filesystem like ext4, or one with a tight flush period. But eventually the uncoordinated updates from the two hosts are going to corrupt the filesystem and the data it holds.
Wikipedia has a page on clustered filesystems here: https://en.wikipedia.org/wiki/Clustered_file_system.
VMware VMFS is an example of a clustered filesystem. Modern versions of VMware use SCSI COMPARE AND WRITE commands (essentially, atomic test and set) to coordinate access to different areas of the shared LUN. Older versions used SCSI reservations.

What is the point of using a virtual filesystem?

so I’m making this software that encrypts the files on a computer. A friend of mine (we're both students so don't be too hard on us) suggested I use a Virtual File System. I asked why, and what even is that, and they gave me some half assed answer that didn't help
[I don't know if this is important but I'm on a linux environment]
so no worries I went on Google and searched. But I still don't get it. The explanations, especially the one on Wikipedia doesn't make sense to me. What is a VFS? What is the actual need, or advantage to using a Virtual File System? As opposed to just, not?
I'm asking because I'm actually curious, and if it is that useful, I'd like to implement it into what I'm making.
Thank you
(also any links or books or something I could read on that would expand on my knowledge would help too)
Very generally speaking, the VFS is a layer of abstraction. Think of the VFS like an abstract base class that has to be used when you want to implement your concrete class of file system like NTFS, Ext3, NFS or others. It offers basic functionality that the concrete file systems can use, but also is an interface that the concrete classes have to implement against.
No idea if that was what you were looking for. Let me know if it wasn't and I can add more detail.
The VFS is part of a kernel and is a unified abstraction layer used by file systems and user applications that presents multiple local or network file systems in a common accessible format, regardless of the file system of the volume the files are on, the location of the volume the files are on (local or network), the bus / controller / storage standard or network protocol, or whether the file system is mounted on a volume or file system + volume is mounted at a mount point, allowing it to be accessible anywhere.
The VFS includes:
File IO / file mapping / file metadata / directory traversal APIs which call the underlying file system that is mounted to the volume no matter what the file system is.
API for file system drivers to be notified of volume arrival such that they can identify whether their file system is on the volume
API for file systems to perform read / write operations on the volume with their file system without knowing the underlying bus / controller / storage transfer standards, or the network storage (block, file) / transport / network / data link / physical protocols, or the physical partition or sector of the volume on the storage medium (only the logical cluster within it), or the operation of the storage medium (other than knowing whether or not external fragmentation matters).
Reparse point functionality such as mount points, directory junctions and symbolic links -- it reparses the filepath (unlike a hard link) to produce a file path for the underlying file system to access
Caching pages of files so they can be fetched from RAM without having to call the file system, and only having to call the file system on a file cache page miss (see comments).
Prefetching parts of a file around a page miss (demand paging) or prefetching associated files or dynamic libraries i.e. prefetch on Windows or even Superfetch.
A file explorer GUI application can then use the API to interact with the virtual file system representation of the volumes, and the VFS calls the underlying file system, which then read/write to their volumes through the VFS. The file explorer can then visually represent the virtual file system representations of the volumes on a common interface

What's the difference between a VFS and an NFS?

This might be a dumb question, but I'm struggling to find resources that clearly explain how a VFS is different from an NFS. Can they both be used for the same purpose?
Bonus question: Can you watch a VFS with inotify like you can an NFS?
"NFS" is a network filesystem that's been around for decades. Wikipedia has you covered on that front.
"VFS" is a more generic term that simply means "virtual filesystem". Within the context of Linux, it refers to the part of the kernel with which your user-space programs actually interact when they interact with "files". The VFS layer then passes requests to a concrete filesystem driver -- such as NFS, for example, or ext4, or others.
Read more here and here.
A virtual file system (VFS) is an abstraction layer on top of a more concrete file system.The purpose of a VFS is to allow client applications to access different types of concrete file systems in a uniform way, Where as Network File System (NFS) is a distributed file system protocol originally developed by Sun Microsystem in 1984, allowing a user on a client computer to access files over a computer network much more like local storage is accessed
A VFS can be used to access local and network storage devices transparently without the client application noticing the difference. It can be used to bridge the differences in Windows, Mac and Unix file systems, so that applications can access files on local file systems of those types without having to know what type of file system they are accessing Where as, NFS like many other protocols, builds on the Open Newtork Computing Remote Procedure Call (ONC RPC) system. The NFS is an open standard defined in Request for comments (RFC), allowing anyone to implement the protocol.
"VFS" is the name given to the entire layer in the kernel situated between the system calls and the filesystem drivers; it is not a filesystem in its own right.

Is it possible to simulate Linux on USB devices using VMware?

I have successfully installed RedHat Linux and run them on harddrive using VMware simulation. Things work quite smooth if I put all the nodes VM on my physical machine.
For management purposes, I want to use USB devices to store ISO and plug one if more nodes are needed. I would like to run VMware on my physical machines.
Can I just build one virtual machine on one USB device? So I can plug one node if needed.
I mean, if I simulate machine A one USB 1 and another machine B on USB 2, can I build a network using my physical machine as server?
(1) If so, are there problems I should pay attention to?
(2) If not, are there any alternative solution for my management purpose?(I do not want to make VMs on partitions of my physical machine now) Can I use multiple mobile hard drives instead?
Actually I want to start up master-slaves Hadoop2.x deployments using virtual machines. Are there any good reference for this purpose?
I shall explain that am not too lazy to have a try on my idea, however, it is now rather expensive to do so if I do not even know something about the feasibility of this solution.
Thanks for your time.
I'm not an expert on VMWare, but I know that this is common on almost any virtualization system. You can install a system :
on a physical device (a hard disk, a hard disk partition)
or on a file
The physical device way allows normally better performances since you only use one driver between the OS and the device, while the file way offer greater simplicity to add one VM.
Now for your questions :
Can I just build one virtual machine on one USB device? Yes, you can always do it on a file, and depending on host OS directly on the physical device
... can I build a network using my physical machine as server? Yes, VMWare will allow the VM to communicate with each other and/or with the host and/or with external world depending on how you configure the network interfaces of your VMs.
If so, are there problems I should pay attention to?
USB devices are pluggable and unpluggable. If you unadvertantly unplug one while the OS is active bad things could happen. That's why I advised you to use files on the hard disk to host your VMs.
memory sticks (no concern for USB disks) support a limited number of writes and generally perform poorly on writes. Never put temp filesystem of swap there but use a memory filesystem for that usage, as is done for live filesystems on read-only CD or DVD
every VMs uses memory from the host system. That is often the first limitation for the number of simultaneous VMs on a personnal system

Real-Time File Mirroring in Linux to a NAS

Can anyone tell how I might best mirror selected files and folders to a NAS, (Network Addrssable Storage) box from a Linux workstation in real-time?
These are very large files, (> 50GB) and are being continually modified, so I would only like to change those portions of the files that have been changed, added or deleted.
FYI: These files are actually Virtual Box virtual hard disk (VDI) files.
I discovered that my Synology DS211J NAS can run an RSync service. So I enabled that and used lsyncd for the live mirror... the VirtualBox VMs... all works very well.
Rsync only synchronises the parts of files that have change and so is very efficient at synchronising large files.
Of the solutions that #awm mentioned, only drbd provides block-level, realtime synchronization. The other tools will meet your goal of only propagating deltas, but they operate asynchronously. In fact, rsync will work just as well in this case, since you're not trying to provide bi-directional synchronization.
For drbd to provide block-level replication, you need need to install the drbd kernel modules and userspace tools on both the workstation on the NAS...which means this solution is only appropriate if your NAS is actually a fairly generic Linux box over which you have a great deal of control.
Before hand I just want to suggest that you don't do this. You can easily bottlenet your network and NAS and cause all sorts of problem on your host.
That being said, these claim they can do it:
Unison can be found at: http://www.cis.upenn.edu/~bcpierce/unison/
PeerSoft can do it too: http://www.peersoftware.com/products/peersync/peersyncserver/overview.aspx
Maybe - http://www.drbd.org/

Resources