Postgresql 'main/pg_notify/0000': Stale NFS file handle

Postgresql 'main/pg_notify/0000': Stale NFS file handle - linux

I have a Debian Wheezy computer running a Postgresql Server and NO NFS filesystems.
After rebooting the computer, the following error has appeared:
ls: cannot access 0000: Stale NFS file handle
516439 drwx------ 2 postgres postgres 8 Nov 12 20:25 .
516480 drwx------ 3 postgres postgres 4096 Nov 17 17:08 ..
? ?????????? ? ? ? ? ? 0000
The "/var/lib/postgresql/9.1/main/pg_notify/0000" file is STALE and I cannot remove it or do anything at all with it. In order to get rid of that file, I tried the following options:
Rebooting the computer in order to unmount the filesystem (as suggested in several forums) did not work.
Removing postgresql (apt-get -purge) did not do anything at all either.
Trying to manually remove that file does not work either (Stale NFS file handle).
This directory is part of a JFS partition over a ciphered volume managed by LVM.
The output for the fsck:
fsck.jfs version 1.1.15, 04-Mar-2011
processing started: 11/17/2014 20:22:30
Using default parameter: -p
The current device is: /
ujfs_rw_diskblocks: read 0 of 4096 bytes at offset 32768
ujfs_rw_diskblocks: read 0 of 4096 bytes at offset 61440
Superblock is corrupt and cannot be repaired
since both primary and secondary copies are corrupt.
Output for ls -l:
ls -l /var/lib/postgresql/9.1/main/pg_notify/0000
I would like to know...
Why do I have a problem with a NFS handle in a non-NFS partition?
Is there anyway in which I can get rid of that file (workarounds are more than welcome as well)?

Related

How to use rsync properly to keep all file permissions and ownership?

I am trying to use rsync to backup some data from one computer (PopOS! 21.04) to another (Rocky 8.4). But no matter which flags I use with rsync, file permissions and ownership never seem to be saved.
What I do, is run this command locally on PopOS:
sudo rsync -avz /home/user1/test/ root#192.168.10.11:/root/ttt/
And the result I get something link this:
[root#rocky_clone0 ~]# ls -ld ttt/
drwxrwxr-x. 2 user23 user23 32 Dec 17 2021 ttt/
[root#rocky_clone0 ~]# ls -l ttt/
total 8
-rw-rw-r--. 1 user23 user23 57 Dec 17 2021 test1
-rw-rw-r--. 1 user23 user23 29 Dec 17 2021 test2
So all the file ownership change to user23, which is the only regular user on Rocky. I don't understand how this happens, with rsync I am connecting to root on the remote host, but as the result files are copied as user23. Why isn't -a flag work properly in this case?
I have also tried these flags:
sudo rsync -avz --rsync-path="sudo -u user23 rsync -a" /home/user1/test root#192.168.10.11:/home/user23/rrr
This command couldn't copy to the root directory, so I had to change the remote destination to user23's home folder. But the result is the same.
If someone could explain to me what am I doing wrong, and how to backup files with rsync so that permissions and ownership stay the same as on the local computer I would very much appreciate it.

Have a look at how the (target)filesystem is mounted on the Rocky(target) system.
Some mounted filesystems (such as many FUSE mounts) do not support the classical unix permissions, and simply use the name of the user who mounted the filesystem as owner/group.
Any attempt to chown/chmod/etc (either by you or by rsync) will just silently be ignored, but appear to "succeed" (no errors reported).

what happens if I create a file using vim in /dev directory. How the file will be created as the /dev is not standard file system

What happens if I create a file using vim in the /dev directory. How will the file be created as the /dev is not a standard file system. I can see a file being created but standard Kernel file operation create was not called. Now I am not sure how this file was created by kernel. Will it use some udev bound Kernel API to create this file.
Note : I can see the file in /dev after creation. Look at the ls output below.
crw-rw-rw- 1 root tty 5, 0 Aug 24 17:32 tty
-rw-r--r-- 1 root root 35 Aug 24 17:37 abc
-rw-r--r-- 1 root root 0 Aug 24 17:37 ght
-rw-r--r-- 1 root root 0 Aug 24 17:51 ioiu
I want to find this out to determine what will happen if some illegal SW forcefully writes to /dev directory , how can I find that out.

If you try in MacOS it won't work even as root.
If you try in CentOS 8 it will work if you're root.
Other Linux flavors your mileage may vary.
It is a very interesting directory that highlights one important aspect of the Linux filesystem - everything is a file or a directory.
Example
[root]# date > /dev/date
[root]# cat /dev/date
Tue Aug 24 19:13:04 UTC 2021
All that being said, your concern about nefarious software creating a file in this specific directory seems too specific. If the software has the ability to write to /dev it can write to anywhere and hide in plain site. If you're really concerned about this, install a file integrity monitoring (FIM) package to monitor file CRUD.
References
dev filesystem

Two identical NFS shares, but only one of the two gives Stale file handle errors

I have a Linux (raspbian) server:
$ uname -a
Linux hester 4.19.97-v7l+ #1294 SMP Thu Jan 30 13:21:14 GMT 2020 armv7l GNU/Linux
With two directories that have the same user/group/permissions:
$ ls -ld /mnt/storage/gitea/ /mnt/storage/hester/
drwxr-xr-x 2 nobody nogroup 26 Mar 2 10:20 /mnt/storage/gitea/
drwxr-xr-x 3 nobody nogroup 21 Feb 21 11:26 /mnt/storage/hester/
These two directories are exported with the same parameters in the exports file:
$ cat /etc/exports
/mnt/storage/hester 192.168.1.15(rw,sync,no_subtree_check)
/mnt/storage/gitea 192.168.1.15(rw,sync,no_subtree_check)
On another machine (the 192.168.1.15 mentioned in the exports file) I mount both, successfully :
$ mount /mnt/storage/gitea/
$ echo $?
0
$ mount /mnt/storage/hester/
$ echo $?
0
But now weird things happen:
$ ls -l /mnt/storage/
ls: cannot access '/mnt/storage/gitea': Stale file handle
total 0
d????????? ? ? ? ? ? gitea
drwxr-xr-x 3 nobody nogroup 21 Feb 21 11:26 hester
I really can't figure
what's the source of the error, and above all
where I could look for a difference between the two.
I'm open to suggestions for further investigations or answers for the my doubts. Thanks in advance for any useful input!

I finally found the solution, which was to explicitly add an fsid option in exports:
$ cat /etc/exports
/mnt/storage/hester 192.168.1.15(rw,sync,fsid=20,no_subtree_check)
/mnt/storage/gitea 192.168.1.15(rw,sync,fsid=21,no_subtree_check)
I'm not entirely sure as to the reason why this works. From the man page I get that "NFS needs to be able to identify each filesystem that it exports. Normally it will use a UUID for the filesystem (if the filesystem has such a thing) or the device number of the device holding the filesystem (if the filesystem is stored on the device)."
Both these mountpoints are on the same filesystem, so according to the man page they should have the same fsid, but this causes the same directory to be exported, so I think it means that each export needs to have a separate fsid.
One more note: /mnt/storage is an XFS filesystem over a RAID3, so this could also have made NFS confused about UUIDs of devices.

"ls -lh" directories reported size : linux vs cifs mounted drives

From a linux box, I recently mounted a Windows share using cifs.
The intend was to "locally" use rsync to backup my windows machine.
The command line used to mount the Windows drive is something like:
mount \\192.168.1.74\share /cifs1 -t cifs -o noserverino,iocharset=utf8,ro
Please also note that the drive attached to the linux box is formatted ntfs.
When doing a sample backup, rsync was always re-copying the directories (names), but not the files. After looking more closely at the "ls -lh " output at both ends, I noticed that on the linux side the size of the directory level is always 0:
TTT-Admin#1080-Router:/tmp/mnt/RT-1080/tmp# ls -lh
drwxrwxrwx 1 TTT-Admi root 0 Feb 8 12:14 DeltaCopy
but the size of the directory level on the cifs side was always different from 0:
TTT-Admin#1080-Router:/cifs1/temp/Rsync-Packages# ls -lh
drwxr-xr-x 1 TTT-Admi root 8.0K Feb 8 12:14 DeltaCopy
This difference explains why rsync was always recopying directories, but not recopying the folders(which was correct, folder sizes and time stamps being the same on both ends).
EDIT: the rsync command is:
> rsync -av /cifs1/Temp/Rsync-Packages/DeltaCopy /mnt/RT-1080/tmp/rsync
-av /cifs1/Temp/Rsync-Packages/DeltaCopy /mnt/RT-1080/tmp/
Is a directory supposed to "have a size" or not ? What should I do to solve this discrepancy ?

du -skh * in / returns vastly different size from df on centos 5.5

I have a vps slice running centos 5.5 I am supposed to have 15 gigs of disk space, but according to df it seems to double my disk space usage.
when I run du -skh * in / as root i get:
[root#yardvps1 /]# du -skh *
0 aquota.group
0 aquota.user
5.2M bin
4.0K boot
4.0K dev
4.9M etc
2.5G home
12M lib
14M lib64
4.0K media
4.0K mnt
299M opt
0 proc
692K root
23M sbin
4.0K selinux
4.0K srv
0 sys
48K tmp
2.0G usr
121M var
this is consistent with what I have uploaded to the machine, and adds up to about 5gigs.
BUT when i run df i get:
[root#yardvps1 /]# df
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/simfs 15728640 11659048 4069592 75% /
none 262144 4 262140 1% /dev
it is showing me using almost 12 gigs already.
what is causing this discrepancy and is there anything I can do about it, I planned the server out based on 15 gigs but now it is basically only letting me have about 7 gigs of stuff on it.
thanks.

The most common cause of this effect is open files that have been deleted.
The kernel will only free the disk blocks of a deleted file if it is not in use at the time of its deletion. Otherwise that is deferred until the file is closed, or the system is rebooted.
A common Unix-world trick to ensure that no temporary files are left around is the following:
A process creates and opens a temporary file
While still holding the open file descriptor, the process unlinks (i.e. deletes) the file
The process reads and writes to the file normally using the file descriptor
The process closes the file descriptor when it's done, and the kernel frees the space
If the process (or the system) terminates unexpectedly, the temporary file is already deleted and no clean-up is necessary.
As a bonus, deleting the file reduces the chances of naming collisions when creating temporary files and it also provides an additional layer of obscurity over the running processes - for anyone but the root user, that is.
This behaviour ensures that processes don't have to deal with files that are suddenly pulled from under their feet, and also that processes don't have to consult each other in order to delete a file. It is unexpected behaviour for those coming from Windows systems, though, since there you are not normally allowed to delete a file that is in use.
The lsof command, when run as root, will show all open files and it will specifically indicate deleted files that are deleted:
# lsof 2>/dev/null | grep deleted
bootlogd 2024 root 1w REG 9,3 58 917506 /tmp/init.0W2ARi (deleted)
bootlogd 2024 root 2w REG 9,3 58 917506 /tmp/init.0W2ARi (deleted)
Stopping and restarting the guilty processes, or just rebooting the server should solve this issue.
Deleted files could also be held open by the kernel if, for example, it's a mounted filesystem image. In this case unmounting the filesystem or rebooting the server should do the trick.
In your case, judging by the size of the "missing" space I'd look for any references to the file that you used to set up the VPS e.g. the Centos DVD image that you deleted after installing.

Another case which I've come across although it doesn't appear to be your issue is if you mount a partition "on top" of existing files.
If you do so you effectively hide existing files that exist in the directory on the mounted-to partition (the mount point) from the mounted partition.
To fix: stop any processes with open files on the mounted partition, unmount partition, find and move/remove any files that now appear in mount point directory.

I had the same trouble with FreeBSD server. The reboot helped.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string