Gluster changes file metadata after upgrade at attempt to read file - glusterfs

We test upgrade of clusters with Number of Bricks: 2 x 2 = 4 configuration from gluster 5 to gluster 9 (Centos 7 OS).
Often (in 90% of the cases) we see below problem.
After upgrade of the first node we trigger manual heal, waiting when everything will be healed.
But if after this we are trying to access files on upgraded node (ls -l), then Number of entries in heal pending goes up.
When we a looking at details of affected files on upgraded and not upgraded nodes, we see following:
On upgraded node afr have metadata bit set and trusted.glusterfs.mdata present:
fa05.sg05# getfattr -d -m . -e hex /data1/gluster/20200922-upg-004/many_files/562_4.log; stat /data1/gluster/20200922-upg-004/many_files/562_4.log
getfattr: Removing leading '/' from absolute path names
# file: data1/gluster/20200922-upg-004/many_files/562_4.log
trusted.afr.dirty=0x000000000000000000000000
trusted.afr.shared-client-1=0x000000000000000100000000
trusted.gfid=0x3bf30f76e2c94a67a0878bea1bd8db97
trusted.gfid2path.a9c4d7d45775ec60=0x36343866643935342d653534622d343762302d386631392d6339333636306561303838632f3536325f342e6c6f67
trusted.glusterfs.mdata=0x01000000000000000000000000632c3c3c000000000d6470f900000000632c3c3c000000000d552e6800000000632c3c3
metadata of file looks like this:
File: ‘/data1/gluster/20200922-upg-004/many_files/562_4.log’
Size: 4096 Blocks: 8 IO Block: 4096 regular file
Device: 820h/2080d Inode: 29306144911 Links: 2
Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/ root)
Access: 2022-09-22 10:43:08.221686087 +0000
Modify: 2022-09-22 10:43:08.223686248 +0000
Change: 2022-09-22 10:44:18.690378905 +0000
Birth: -
On not upgraded node everything is OK (no any changes of metadata)
We see that Change field differs on upgraded node, and change time on upgraded node corresponds to the time when ls -l operation has been performed:
fa05.sg05# date; ls -l /shared/20200922-upg-004/many_files/ | wc -l; date
Thu Sep 22 10:44:17 UTC 2022
10229
Thu Sep 22 10:44:35 UTC 2022
Not sure what can be the reason for such metadata update and how we can avoid this?
Because this lead to healing of huge amount of files and high I/O load on upgraded node.
*Details and logs are here: https://github.com/gluster/glusterfs/issues/3829
Thanks in advance!

Related

Linux Named Pipe Mounted on Docker Volume Showing as Regular File

I am trying to use a named pipe to run certain commands from a dockerised guest application to the host.
I am aware of the risks and this is not public facing, so please no comments about not doing this.
I have a named pipe configured on the host using:
sudo mkfifo -m a+rw /path/to/pipe/file
When I check the created pipe permissions with ls -la file, it shows the pipe has been created and intended permissions are set.
prw-rw-rw- 1 root root 0 Feb 2 11:43 file
When I then test the input by catting a command into the pipe from the host, this runs successfully.
Input
echo "echo test" > file
Output
[!] Starting listening on named pipe: file
test
The problem appears to be within my docker container. I have created a volume and mounted the named pipe from the host. When I then start an sh session and ls -l however, the file named pipe appears to be a normal file without the p and permission properties present on the host.
/hostpipe # ls -la
total 12
drwxr-xr-x 2 root root 4096 Feb 1 16:25 .
drwxr-xr-x 1 root root 4096 Feb 2 11:44 ..
-rw-r--r-- 1 root root 11 Feb 2 11:44 file
Running the same and similar echo "echo test" > file does not work from within the guest.
The host is a Linux desktop on baremetal.
Linux desktop 5.15.0-58-generic #64-Ubuntu SMP Thu Jan 5 11:43:13 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
And the guest is an Alpine image
FROM python:3.8-alpine
and
Linux b16a4357fcf5 5.15.0-58-generic #64-Ubuntu SMP Thu Jan 5 11:43:13 UTC 2023 x86_64 Linux
Any idea what is going wrong here?
The issue was how the container was being set up. I was using a regular volume used for persisting data not mounting drives and files. I had to change my definition to use the - type: bind
Using volumes without the bind parameter does not allow use of the host file system functionality and only allows data sharing.
Before
volumes:
- static_data:/vol/static
- ./web:/web
- /opt/named_pipes/:/hostpipe
After
volumes:
- static_data:/vol/static
- ./web:/web
- type: bind
source: /opt/named_pipes/
target: /hostpipe

How to change a File's last Access/Modify/Change Date?

I have some files on a scratch drive of a HPC server. The server automatically deletes files which are 2 weeks old.
Using stat filename.txt I can see the follow information. Is there a way to somehow open/touch/manipulate files to update the Access date to prevent deletion without actually changing the file?
File: ‘name’
Size: 2583438768 Blocks: 4819945 IO Block: 524288 regular file
Device: xxh/xxxd Inode: 10354xxxx Links: 1
Access: (/-rw-r--r--) Uid: (/) Gid: (/)
Context: system_u:object_r:tmp_t:s0
Access: 2022-11-22 09:47:33.000000000 -0800
Modify: 2019-12-06 06:50:33.000000000 -0800
Change: 2022-11-22 16:54:55.000000000 -0800
Birth: -
Use the Linux touch command, eg:
$ touch filename.txt

Why does ls say "file exists"?

In case it matters: I stumbled over this problem when backing up a directory using rsync in a Cygwin environment, and rsync suddenly gave the error message:
rsync: readdir("/cygdrive/d/portable/FirefoxPortable/Data/profile/storage/default/moz-extension+++4c6d0e71-68ce-470e-87aa-8db1a3f6524d^userContextId=4294967295/idb"): File exists (17)
Here, /cygdrive/d/portable/FirefoxPortable is the directory to be saved, and until now, this has worked fine. Suspecting that the real problem is not related to rsync, I did a
ls /cygdrive/d/portable/FirefoxPortable/Data/profile/storage/default/moz-extension+++4c6d0e71-68ce-470e-87aa-8db1a3f6524d^userContextId=4294967295/idb
and indeed got the error message
ls: reading directory '/cygdrive/d/portable/FirefoxPortable/Data/profile/storage/default/moz-extension+++4c6d0e71-68ce-470e-87aa-8db1a3f6524d^userContextId=4294967295/idb': File exists
So, idb is a directory (which is true), because ls says that it is reading this direcory, but why do I get a File exists error for a directory?
ls -ld /cygdrive/d/portable/FirefoxPortable/Data/profile/storage/default/moz-extension+++4c6d0e71-68ce-470e-87aa-8db1a3f6524d^userContextId=4294967295/idb
yields
drwxr-xr-x 1 FISRONA Domain Users 0 May 6 2019 '/cygdrive/d/portable/FirefoxPortable/Data/profile/storage/default/moz-extension+++4c6d0e71-68ce-470e-87aa-8db1a3f6524d^userContextId=4294967295/idb'
and
stat /cygdrive/d/portable/FirefoxPortable/Data/profile/storage/default/moz-extension+++4c6d0e71-68ce-470e-87aa-8db1a3f6524d^userContextId=4294967295/idb
displayed:
File: /cygdrive/d/portable/FirefoxPortable/Data/profile/storage/default/moz-extension+++4c6d0e71-68ce-470e-87aa-8db1a3f6524d^userContextId=4294967295/idb
Size: 0 Blocks: 0 IO Block: 65536 directory
Device: 5a61dfech/1516363756d Inode: 12873190524118251466 Links: 1
Access: (0755/drwxr-xr-x) Uid: (3672028/ FISRONA) Gid: (1049089/Domain Users)
Access: 2019-05-06 11:32:50.000000000 +0200
Modify: 2019-05-06 11:32:50.190000000 +0200
Change: 2019-05-06 11:32:50.190000000 +0200
Birth: 2019-05-06 11:32:50.190000000 +0200
What could be messed up here to cause this behaviour?
BTW, I also checked the parent directory (because sometimes, differences in upper/lower case filenames can result in odd effects under Cygwin, due to the underlying Windows operating system):
ls -l /cygdrive/d/portable/FirefoxPortable/Data/profile/storage/default/moz-extension+++4c6d0e71-68ce-470e-87aa-8db1a3f6524d^userContextId=4294967295/
total 0
drwxr-xr-x 1 FISRONA Domain Users 0 May 6 2019 idb

How To to find out the creation time of the filesystem in linux

I want to know the creation time for the filesystem in linux .
You can use tune2fs tool to get creation time:
sudo tune2fs -l /dev/sda1
In result one of the lines is:
Filesystem created: Mon Apr 4 15:07:44 2016
Last mount time: Mon Dec 12 14:48:51 2016
Last write time: Mon Dec 12 14:48:50 2016
This command works on ext2 -ext 4 filesystem. The command is not supported on XFS filesystem. Refer the below link :man tune2fs

Postgresql 'main/pg_notify/0000': Stale NFS file handle

I have a Debian Wheezy computer running a Postgresql Server and NO NFS filesystems.
After rebooting the computer, the following error has appeared:
ls: cannot access 0000: Stale NFS file handle
516439 drwx------ 2 postgres postgres 8 Nov 12 20:25 .
516480 drwx------ 3 postgres postgres 4096 Nov 17 17:08 ..
? ?????????? ? ? ? ? ? 0000
The "/var/lib/postgresql/9.1/main/pg_notify/0000" file is STALE and I cannot remove it or do anything at all with it. In order to get rid of that file, I tried the following options:
Rebooting the computer in order to unmount the filesystem (as suggested in several forums) did not work.
Removing postgresql (apt-get -purge) did not do anything at all either.
Trying to manually remove that file does not work either (Stale NFS file handle).
This directory is part of a JFS partition over a ciphered volume managed by LVM.
The output for the fsck:
fsck.jfs version 1.1.15, 04-Mar-2011
processing started: 11/17/2014 20:22:30
Using default parameter: -p
The current device is: /
ujfs_rw_diskblocks: read 0 of 4096 bytes at offset 32768
ujfs_rw_diskblocks: read 0 of 4096 bytes at offset 61440
Superblock is corrupt and cannot be repaired
since both primary and secondary copies are corrupt.
Output for ls -l:
ls -l /var/lib/postgresql/9.1/main/pg_notify/0000
I would like to know...
Why do I have a problem with a NFS handle in a non-NFS partition?
Is there anyway in which I can get rid of that file (workarounds are more than welcome as well)?

Resources