How long does gluster volume heal? - glusterfs

I removed one brick 3 days ago and added it back right away
Then I used this command to heal the gluster volume.
gluster volume heal k8s_share full
But now, gluster volume is still healing
How long does the gluster volume heal?
My Volume Info
Replicas : 3
Used : 60GB
[root#k8s-worker-3 kubedata]# gluster volume heal k8s_share info
Brick 192.168.XX.X1:/mnt/gluster/k8s_share
/kubedata/monitoring-prometheus-k8s-db-prometheus-k8s-0-pvc-710aef84-9c25-48ad-a116-0a9e2565d471/prometheus-db/01GGTPRP9R4K7X9YGMVN7QEZXG/chunks/000001
/kubedata/monitoring-prometheus-k8s-db-prometheus-k8s-0-pvc-710aef84-9c25-48ad-a116-0a9e2565d471/prometheus-db/01GGVS373PRM6ZXC81R3V773BW/chunks/000001
/kubedata/monitoring-prometheus-k8s-db-prometheus-k8s-0-pvc-710aef84-9c25-48ad-a116-0a9e2565d471/prometheus-db/01GGVBBRKRH5RYA384GHHGWS65/tombstones
...
Status: Connected
Number of entries: 66
Brick 192.168.XX.X2:/mnt/gluster/k8s_share
/kubedata/monitoring-prometheus-k8s-db-prometheus-k8s-0-pvc-710aef84-9c25-48ad-a116-0a9e2565d471/prometheus-db/01GGVS373PRM6ZXC81R3V773BW/tombstones
/kubedata/monitoring-prometheus-k8s-db-prometheus-k8s-0-pvc-710aef84-9c25-48ad-a116-0a9e2565d471/prometheus-db/wal/00003918
/kubedata/monitoring-prometheus-k8s-db-prometheus-k8s-0-pvc-710aef84-9c25-48ad-a116-0a9e2565d471/prometheus-db/01GGVBBRKRH5RYA384GHHGWS65/index
/kubedata/monitoring-prometheus-k8s-db-prometheus-k8s-0-pvc-710aef84-9c25-48ad-a116-0a9e2565d471/prometheus-db/01GGTPRP9R4K7X9YGMVN7QEZXG/tombstones
/kubedata/monitoring-prometheus-k8s-db-prometheus-k8s-0-pvc-710aef84-9c25-48ad-a116-0a9e2565d471/prometheus-db/01GGVS373PRM6ZXC81R3V773BW/chunks
...
Status: Connected
Number of entries: 65
Brick 192.168.XX.X3:/mnt/gluster/k8s_share
Status: Connected
Number of entries: 0
[root#k8s-worker-3 kubedata]# gluster volume status k8s_share
Status of volume: k8s_share
Gluster process TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------
Brick 192.168.XX.X1:/mnt/gluster/k8s
_share 49152 0 Y 2023
Brick 192.168.XX.X2:/mnt/gluster/k8
s_share 49152 0 Y 2124
Brick 192.168.XX.X3:/mnt/gluster/k8s
_share 49153 0 Y 16283
Self-heal Daemon on localhost N/A N/A Y 16328
Self-heal Daemon on 192.168.XX.X2 N/A N/A Y 26467
Self-heal Daemon on 192.168.XX.X1 N/A N/A Y 20903
Task Status of Volume k8s_share
------------------------------------------------------------------------------
There are no active volume tasks
I want all replicas to be in sync

Related

USB2CAN qdisc buffer full and no requeues

Hi I'm trying to connect my Linux VM to a physical CAN-Bus.
The USB Passthrough and setup of the CAN interface is working perfectly fine, but I have trouble sending messages from the VM.
First of all here is my VM version and Hardware:
user#usb-can:~$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 22.04.1 LTS
Release: 22.04
Codename: jammy
USB2CAN adapter and Documentation
http://www.inno-maker.com/product/usb-can/
https://github.com/INNO-MAKER/usb2can/blob/master/Document/USB2CAN%20UserManual%20v.1.8.pdf
So first of all if I'm sending 15 CAN Messages from my VM to my CAN interface with cansend can0 123#DEADBEEF. and the first 2-3 messages are registered and also shown when I do a candump can0:
user#usb-can:~$ candump can0
can0 123 [4] DE AD BE EF
can0 123 [4] DE AD BE EF
can0 123 [4] DE AD BE EF
However the remaining 12 are not sent anymore and when I send additional frames I get:
user#usb-can:~cansend can0 123#DEADBEEF
write: No buffer space available
So I found out that I could inspect the buffer, and it showed this:
user#usb-can:~$ tc -s qdisc show dev can0
qdisc pfifo_fast 0: root refcnt 2 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
Sent 144 bytes 9 pkt (dropped 3, overlimits 0 requeues 1)
backlog 176b 11p requeues 1
And this locks the whole device and I cant send anything because packages get dropped.
However this is with nothing attached to the adapter so I assume that is normal? Maybe somebody can verify this with knowledge about USB to CAN devices or with his own device?
Because there is no termination resistor so it would make sense that its not working properly.
BUT When I connect a Termination resistor of 120 Ohm and use the jumper to enable the 120 Ohm in the adapter, I should have the 2 required termination resistors and thus being able to send the CAN frames. But I get the same error as before:
user#usb-can:~$ tc -s qdisc show dev can0
qdisc pfifo_fast 0: root refcnt 2 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
Sent 80 bytes 5 pkt (dropped 0, overlimits 0 requeues 1)
backlog 176b 11p requeues 1
So in my mind the CAN Network looks like this:
USB 2 CAN adapter
_________________/\_________________
/ \
_________
|usb2can|
---------
| |
CAN HIGH ___________*____|______________________________
| | |
__________ | _______________
|120 Ohm | | |120 Ohm |
---------- | ---------------
| | |
CAN LOW ----------------*------------------------------
Do I need to add another device to the network to make it work or shouldn't it work like that?
I already tried using different termination resistors if maybe one would be broken and also tried attaching an additional device. But no success yet.

Gluster volume sizes became different after changing volume type

I have a 4 node glusterfs cluster (Gluster 3.12) whose volume type was originally "Distributed-Replicated". I change it to "Replicated" by doing the following commands.
gluster volume remove-brick gv0 node{3..4}:/bricks/gv0/brick1 start
gluster volume remove-brick gv0 node{3..4}:/bricks/gv0/brick1 status
gluster volume add-brick gv0 replica 4 node{3..4}:/bricks/gv0/brick1
After that, "gluster volume info" tells me that I now have a 4-way replicated volume.
Volume Name: gv0
Type: Replicate
Volume ID: 23baed0a-9853-462d-a992-019c31ed4ab2
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 4 = 4
Transport-type: tcp
Bricks:
Brick1: node3:/bricks/gv0/brick1
Brick2: node4:/bricks/gv0/brick1
Brick3: node1:/bricks/gv0/brick1
Brick4: node2:/bricks/gv0/brick1
But when I checked the sizes of the bricks on each server, they are all different. The 2 bricks that I removed has less size compared to the other 2 bricks that were retained.
[root#node1 ~]# du -sh /bricks/gv0/brick1/
2.1M /bricks/gv0/brick1/
[root#node1 ~]#
...
[root#node2 ~]# du -sh /bricks/gv0/brick1/
2.1M /bricks/gv0/brick1/
[root#node2 ~]#
...
[root#node3 ~]# du -sh /bricks/gv0/brick1/
5.8M /bricks/gv0/brick1/
[root#node3 ~]#
...
[root#node4 ~]# du -sh /bricks/gv0/brick1/
5.8M /bricks/gv0/brick1/
[root#node4 ~]#
I discovered that on the re-added bricks, there were files that will only appear once you access them via "ls". After doing it on all missing files, the sizes of all bricks are now the same.
Is there a way on how to achieve that without forcing file access?
And why did I encountered that scenario of different in brick sizes?

Docker service process zombie holding a Device Mapper managed device

I'm running RHEL (kernel 4.1.12) with Docker (1.12.1) and my docker service dockerd became a zombie [dockerd] <defunct> with PID 412
# ps -a | grep dockerd
1 412 412 412 ? -1 Zsl 0 23:28 [dockerd] <defunct>
and it holds resources, in particular a device 251:4 (/dev/dm-4) which is a Device Mapper managed:
# dmsetup ls
docker-251:0-6815748-pool (251:1)
docker-251:0-6815748-e97dd950.......59a691feaf6 (251:4)
# lsof | grep 251,4
dockerd 412 6844 root 1257u BLK 251,4 0t0 2439769 /dev/dm-4
as a result, removing the thin entry docker-251:0-6815748-e97dd950.......59a691feaf6 fails
# dmsetup remove docker-251:0-6815748-e97dd950.......59a691feaf6
device-mapper: remove ioctl on docker-251:0-6815748-e97dd950.......59a691feaf6 failed: Device or resource busy
Command failed
Any suggestions how to cleanup the leftovers of the docker service (e.g., all the DM entries) besides restarting the whole system?
Is it really possible that a zombie process holds resources?

Map lvm volume to Physical volume

lsblk provides output in this fornat:
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sr0 11:0 1 1024M 0 rom
sda 8:0 0 300G 0 disk
sda1 8:1 0 500M 0 part /boot
sda2 8:2 0 299.5G 0 part
vg_data1-lv_root (dm-0) 253:0 0 50G 0 lvm /
vg_data2-lv_swap (dm-1) 253:1 0 7.7G 0 lvm [SWAP]
vg_data3-LogVol04 (dm-2) 253:2 0 46.5G 0 lvm
vg_data4-LogVol03 (dm-3) 253:3 0 97.7G 0 lvm /map1
vg_data5-LogVol02 (dm-4) 253:4 0 97.7G 0 lvm /map2
sdb 8:16 0 50G 0 disk
for a mounted volume say /map1 how do i directly get the physical volume associated with it. Is there any direct command to fetch the information?
There is no direct command to show that information for a mount. You can run
lvdisplay -m
Which will show which physical volumes are currently being used by the logical volume.
Remember, thought, that there is no such thing as a direct association between a logical volume and a physical volume. Logical volumes are associated with volume groups. Volume groups have a pool of physical volumes over which they can distribute any volume group. If you always want to know that a given lv is on a given pv, you have to restrict the vg to only having that one pv. That rather misses the point. You can use pvmove to push extents off a pv (sometimes useful for maintenance) but you can't stop new extents being created on it if logical volumes are extended or created.
As to why there is no such potentially useful command...
LVM is not ZFS. ZFS is a complete storage and filesystem management system, managing both storage (at several levels of abstraction) and the mounting of filesystems. LVM, in contrast, is just one layer of the Linux Virtual File System. It provides a layer of abstraction on top of physical storage devices and makes no assumption about how the logical volumes are used.
Leaving the grep/awk/cut/whatever to you, this will show which PVs each LV actually uses:
lvs -o +devices
You'll get a separate line for each PV used by a given LV, so if an LV has extents on three PVs you will see three lines for that LV. The PV device node path is followed by the starting extent(I think) of the data on that PV in parentheses.
I need to emphasize that there is no direct relation between a mountpoint (logical volume) and a physical volume in LVM. This is one of its design goals.
However you can traverse the associations between the logical volume, the volume group and physical volumes assigned to that group. However this only tells you: The data is stored on one of those physical volumes, but not where exactly.
I couldn't find a command which can produce the output directly. However you can tinker something using mount, lvdisplay, vgdisplay and awk|sed:
mp=/mnt vgdisplay -v $(lvdisplay $(mount | awk -vmp="$mp" '$3==mp{print $1}') | awk '/VG Name/{print $3}')
I'm using the environment variable mp to pass the mount point to the command. (You need to execute the command as root or using sudo)
For my test-scenario it outputs:
...
--- Volume group ---
VG Name vg1
System ID
Format lvm2
Metadata Areas 2
Metadata Sequence No 2
VG Access read/write
VG Status resizable
...
VG Size 992.00 MiB
PE Size 4.00 MiB
Total PE 248
Alloc PE / Size 125 / 500.00 MiB
Free PE / Size 123 / 492.00 MiB
VG UUID VfOdHF-UR1K-91Wk-DP4h-zl3A-4UUk-iB90N7
--- Logical volume ---
LV Path /dev/vg1/testlv
LV Name testlv
VG Name vg1
LV UUID P0rgsf-qPcw-diji-YUxx-HvZV-LOe0-Iq0TQz
...
Block device 252:0
--- Physical volumes ---
PV Name /dev/loop0
PV UUID Qwijfr-pxt3-qcQW-jl8q-Q6Uj-em1f-AVXd1L
PV Status allocatable
Total PE / Free PE 124 / 0
PV Name /dev/loop1
PV UUID sWFfXp-lpHv-eoUI-KZhj-gC06-jfwE-pe0oU2
PV Status allocatable
Total PE / Free PE 124 / 123
If you only want to display the physical volumes you might pipe the results of the above command to sed:
above command | sed -n '/--- Physical volumes ---/,$p'
dev=$(df /map1 | tail -n 1|awk '{print $1}')
echo $dev | grep -q ^/dev/mapper && lvdisplay -m $dev 2>/dev/null | awk '/Physical volume/{print $3}' || echo $dev

Re-scan LUN on Linux

We have expend existing LUN size on EMC Storage and now i want to re-scan on Host side but i don't know how to figure out SCSI ID of that specific LUN. I am new to storage.. This is what i am doing but don't know whether it is a right way or not
Pseudo name=emcpowerj
CLARiiON ID=APM00112500570 [Oracle_Cluster]
Logical device ID=200601602E002900B6BCA114C9F8E011 [LUN01]
state=alive; policy=CLAROpt; priority=0; queued-IOs=0;
Owner: default=SP A, current=SP A Array failover mode: 1
==============================================================================
--------------- Host --------------- - Stor - -- I/O Path -- -- Stats ---
### HW Path I/O Paths Interf. Mode State Q-IOs Errors
==============================================================================
2 qla2xxx sdaj SP A1 active alive 0 1
2 qla2xxx sdaw SP B1 active alive 0 4
1 qla2xxx sdj SP A0 active alive 0 1
1 qla2xxx sdw SP B0 active alive 0 4
Here i am running find command on sdX device to find out SCSI ID to i can do echo 1 > /sys/bus/scsi/drivers/X:X:X:X/rescan to do re-scan LUN
$ find /sys/devices -name "*block*" | grep -e "sdaj" -e "sdaw" -e "sdj" -e "sdw"
/sys/devices/pci0000:00/0000:00:09.0/0000:05:00.1/host2/rport-2:0-1/target2:0:1/**2:0:1:8**/block:sdaw
/sys/devices/pci0000:00/0000:00:09.0/0000:05:00.1/host2/rport-2:0-0/target2:0:0/**2:0:0:8**/block:sdaj
/sys/devices/pci0000:00/0000:00:09.0/0000:05:00.0/host1/rport-1:0-1/target1:0:1/**1:0:1:8**/block:sdw
/sys/devices/pci0000:00/0000:00:09.0/0000:05:00.0/host1/rport-1:0-0/target1:0:0/**1:0:0:8**/block:sdj
or there is a alternative or other way to scan LUN?
I like to use the "lsscsi" program, which is probably available for your distribution.
% lsscsi
[0:0:0:0] cd/dvd NECVMWar VMware IDE CDR00 1.00 /dev/sr0
[2:0:0:0] disk VMware, VMware Virtual S 1.0 /dev/sda
[2:0:1:0] disk VMware, VMware Virtual S 1.0 /dev/sdb
As for rescanning the bus, that's pretty much it.

Resources