How to tell whether two NFS mounts are on the same remote filesystem? - linux

My Linux-based system displays statistics for NFS-mounted filesystems, something like this:
Remote Path Mounted-on Stats
server1:/some/path/name /path1 100 GB free
server2:/other/path/name /path2 100 GB free
Total: 200 GB free
That works fine. The problem is when the same filesystem on the NFS server has been mounted twice on my client:
Remote Path Mounted-on Stats
server1:/some/path/name /path1 100 GB free
server1:/some/path/name2 /path2 100 GB free
Total: 200 GB free
server1's /some/path/name and /some/path/name2 are actually on the same filesystem, which has 100 GB free, but I erroneously add them up and report 200 GB free.
Is there any way to detect that they're on the same partition?
Approaches that won't work:
"Use statfs()": statfs() returns a struct statfs, which has a "file system ID" field, f_fsid. Unfortunately it's undefined and gets zeroed out over NFS.
"Don't mount the same partion multiple times." This is outside of my control.
"Use a heuristic based on available space." The method has to definitively work. Also, statfs() caches its output so it would be difficult to get this right in the face of large data movement.
If there's no solution I'll have to generate a config file in every potential mount point on the server side, but it would be a lot nicer if there was
some clean way to avoid that.
Thanks!

I guess if "stat -c %d /mountpoint" do what you want (I cannot test it right now)?

You probably want to read the remote system's shared file systems - using:
showmount -e server
That will give you the real paths that are being shared. When walking mounts from the remote system, prune them to the common root from the remote system and use that to determine if the mount points are from the same underlying file system.
This doesn't help you in the case that the file systems are separately shared from the same underlying file system.
You could add in a heuristic of checking for the overall file system size and space available, and assuming that if they're the same, and from the same remote server that it's on the same partition mapped to the shortest common path of the mount devices.
None of these help if you share from a loopback mounted file system that looks completely different in form from the others.
It doesn't help you in the case of a server that can be addressed in different names and addresses.

Related

Linux : where exactly is a file saved when there are multiple logical volumes?

I've mostly worked in Windows environments and am still very noobish in everything Linux, so it's very likely I'm missing basic Linux concepts. That being said, I have questions about logical volumes and their interactions with files :
I have to use an Ubuntu machine (which I did not set up). On this machine, there is a physical volume /dev/sda2 which is in a volume group vg0.
That volume group vg0 has 4 logical volumes : lv1, mounted on /, lv2, mounted on /boot, lv3, mounted on /var and lv4, mounted on /tmp
My questions are as follows :
If I save a file (for example foo.txt) in the /var directory, will it be stored on the lv3(/var) logical volume ?
If the lv3(/var) logical volume is full and I try to save foo.txt in the /var directory, will it be stored on the lv1(/) logical volume (after all, /var is in /) ?
If the lv1(/) volume is full and I try to save foo.txt somewhere outside of /var (for example in /home), will it be stored on the lv3(/var) logical volume ?
What could be the point of having all these logical volumes, would 1 volume on / not be much simpler ?
It's quite obvious, from my questions, that I don't really get the relations between logical volumes, mount points and files. Is there somewhere a good tutorial where I could educate myself ?
Thanks in advance.
Yes, because lv3 is mounted on /var any files put in /var go there.
No, there are no special cases that happen when the device is full - you just get a device is full error. Despite /var appearing to be a child of /, that has been overridden by mounting lv3 on /var
No, again because there are no special cases for the device being full. It doesn't care, it just tries to put the file where it goes.
Yes, it is much simpler to have it all in /. But it can cause problems. For example, /boot is often its own volume so that you can't fill it up and prevent your system from working if you download a bunch of stuff in your home folder. There are different schools of thought on how much/how little you should separate your file system into different volumes. It is somewhat just opinion, but those opinions are based on various use cases and problems.
I don't have a great answer other than use the search engine of your choice! Honestly, when you are starting out it doesn't matter so much as long as you have space to put your stuff! If you are a newbie, it might be good to just put everything in one volume - as long as you keep an eye and don't let it fill up.

Getting trouble to get the file count or file list of a large mounted drive

I have a shared drive where I have more than 2 million of wmv files of around 2TB total size. I am trying to access the drive by mounting it using smb protocol from my local MAC machine. When I run "
$ ls -a | wc -l
command to check the total count of files. I am getting different result every time. e.g if sometime I get the result as X then next time I am getting another result Y Here is the sample output sample output, which should not be as no other person is accessing this drive. When I investigate more I come to know that "ls" command output is different every time. This command should work as I have been using them since decade. Is it something I am doing wrong or in a large volume of data or network shared drive, this command fails ? I am sure there is no access or network issue while I am doing this activity. Any hint or work around will be much appreciated
I have faced the similar issue when I tried to access shared location which is having around 200K numbers of file.In my case the shared drive file system was NTFS file system. I believe there is a compatibility issue with SMB protocol and NTFS file system. Finally I tried to mount the shared drive using "NFS" instead of "SMB" and I could able to get the correct number of files in my mounted drive. This problem never happened in WINDOWS as I have mounted much higher number of files earlier using Windows many time. Hope this helps.
It is most likely because the file list is not immediately available to OSX from the network share. Apples implementation of SMB is still a bit buggy, unfortunately.
You may try: defaults write com.apple.desktopservices DSDontWriteNetworkStores -bool TRUE
And see if it helps.

Should content-addressable paths be used in ext4 or btrfs for directories?

I tested this by comparing the speed of reading a file from a directory with 500,000 and a directory with just 100 files.
The result: Both were equally fast.
Test details:
I created a directory with 500,000 files for x in {1..500000}; do touch $x; done, run time cat test-dir/some-file and compared this to another directory with just 100 files.
They both executed equally fast, but maybe on heavy load there's a difference or is ext4 and btrfs clever enough and we don't need content-addressable paths anymore?
With content-addressable paths I could distribute the 500,000 files into multiple subdirectories like this:
/www/images/persons/a/1/john.png
/www/images/persons/a/2/henrick.png
....
/www/images/persons/b/c/frederick.png
...
The 500,000 files are served via nginx to UA, so I want to avoid a latency, but maybe that is no more relevant with ext4 or btrfs?
Discussing this question at another place the answer seems to be that for read operations you don't need to implement content-addressable storage, because there are no iterations over the lookup table in nowerdays filesystems. The filesystem gets the place to look for the file directly.
With ext4 you only have the # of inodes as limitation.

linux 2.6.43, ext3, 10K RPM SAS disk, 2 sequential write(direct io) on different file acting like random write

I recently stall on this one problem:
"2 sequential write(direct io 4KB alignemnt block) on different file acting like random write, which yield poor write performance in 10K RPM SAS disk".
The thing confuse me most: I got batch of server, all equip with same kind of disk (raid 1 with 2 300GB 10K RPM disk), but response different.
several servers seams ok with this kind of write pattern, disk happy accepted up to 50+MB/s;
(same kernel version, same filesystem, with different lib (libc 2.4))
others not so much, 100 op/s seams reach the limit of underlying disk, which confirm the random write performance of disk;
((same kernel version, same filesystem, with different lib (libc 2.12)))
[NOTE: I check the "pwrite" code of different libc, which tell nothing but simple "syscall"]
I have managed to rule out the possibly:
1. software bug in my own program;
by a simple deamon(compile with no dynamic link), do sequcetial direct io write;
2. disk problem;
switch 2 different version of linux system on one test machine, which perform well on my direct io write pattern, and a couple of day after switch to old lib version, the bad random write;
I try to compare:
/sys/block/sda/queue/*, which may different in both way;
filefrag show nothing but two different file interleaved sequenctial grow physical block id;
there must be some kind of write strategy lead to this problem, but i don't know where to start:
different kernel setting ?, may be related to how ext3 allocate disk block ?
raid cache(write back) or disk cache write strategy?
or underlying disk strategy to mapping logical block into real physical block ?
really appreciate
THE ANS IS:
it's because of /sys/block/sda/queue/schedule setting:
MACHINE A: display schedule: cfq, but undlying, it's deadline;
MACHINE B: the schedule is consistent with cfq;
//=>
SINCE my server is db svr, deadline is my best option;

IntelliJ IDEA compilation speedup in Linux

I'm working with IntelliJ IDEA on Linux and recently I've got 16 GB of RAM, so is there any ways to speed up my projects compilation using this memory?
First of all, in order to speedup IntelliJ IDEA itself, you may find this discussion very useful.
The easiest way to speedup compilation is to move compilation output to RAM disk.
RAM disk setup
Open fstab
$ sudo gedit /etc/fstab
(instead of gedit you can use vi or whatever you like)
Set up RAM disk mount point
I'm using RAM disks in several places in my system, and one of them is /tmp, so I'll just put my compile output there:
tmpfs /var/tmp tmpfs defaults 0 0
In this case your filesystem size will not be bounded, but it's ok, my /tmp size right now is 73MB. But if you afraid that RAM disk size will become too big - you can limit it's size, e.g.:
tmpfs /var/tmp tmpfs defaults,size=512M 0 0
Project setup
In IntelliJ IDEA, open Project Structure (Ctrl+Alt+Shift+S by default), then go to Project - 'Project compiler output' and move it to RAM disk mount point:
/tmp/projectName/out
(I've added projectName folder in order to find it easily if I need to get there or will work with several projects at same time)
Then, go to Modules, and in all your modules go to Paths and select 'Inherit project compile output path' or, if you want to use custom compile output path, modify 'Output path' and 'Test output path' the way you did it to project compiler output before.
That's all, folks!
P.S. A few numbers: time of my current project compilation in different cases (approx):
HDD: 80s
SSD: 30s
SSD+RAM: 20s
P.P.S. If you use SSD disk, besides compilation speedup you will reduce write operations on your disk, so it will also help your SSD to live happily ever after ;)
Yes you can. There is several ways to do this. First you can fine tune the JVM for the amount of memory you have. Take this https://gist.github.com/zafarella/43bc260c3c0cdc34f109 one as example.
In addition depending on what linux distribution you use there is a way creating RAM disk and rsyncing content into HDD. Basically you will place all logs and tmp files (including indexes) into RAM - your Idea will fly.
Use something like this profile-sync-daemon to keep files synced. It is possible easily add Idea as an app. Alternatively you can use anything-sync-daemon
You need to change "idea.system.path" and "idea.log.path"
More details on Idea settings could be found at their docs. The idea is to move whatever changes often into RAM.
More RAM Disk alternatives https://wiki.debian.org/SSDOptimization#Persistent_RAMDISK
The bad about this solution is that when you run out of space in RAM OS will page things and it will slow down everything.
Hope that helps.
In addition to ramdisk approach, you might speedup compilation by giving its process more memory (but not too much) and compiling independent modules in parallel. Both options can be found on Settings | Compiler.

Resources