Check if dm multipath paths had previous errors like powerpath - rhel

We recently switches storage vendors and are now bound to DM Multipath for multipath management.
Does DM Multipath have a feature to see if any of the paths had a previous error on one of the paths.
In PowerPath you could see if there where any paths that had any errors since the last reboot/cleanup. There is a column where the errors are displayed. Like so (output is from windows verson, but does not differ from the rhel version we are using):
Pseudo name=harddisk1
Unity ID=CK0000000000001 [HOST1]
Logical device ID=123ABC123ABC123 [HOST1]
state=alive; policy=CLAROpt; queued-IOs=0
Owner: default=SP B, current=SP B Array failover mode: 4
==============================================================================
--------------- Host --------------- - Stor - -- I/O Path -- -- Stats ---
### HW Path I/O Paths Interf. Mode State Q-IOs Errors
==============================================================================
1 port1\path0\tgt1\lun28 c1t0d0 SP B2 active alive 0 1
1 port1\path0\tgt0\lun28 c1t1d0 SP A2 active alive 0 1
0 port0\path0\tgt1\lun28 c0t1d0 SP B3 active alive 0 1
0 port0\path0\tgt0\lun28 c0t0d0 SP A3 active alive 0 1
Is there a similar output for multipath. Or do you need to go through all of the system logging?
I've already been searching the internet this last week, but it seems like there is non.
Thank you in advance.

multipath -ll gives similar output:
# multipath -ll
eui.006a1354d146f94124a9376b00011010 dm-2 NVME, NoName Vendor
size=11G features='3 queue_if_no_path queue_mode mq' hwhandler='0' wp=rw
`-+- policy='queue-length 0' prio=50 status=active
|- 0:2:2:69648 nvme0n2 259:3 active ready running
|- 1:5:2:69648 nvme1n2 259:12 active ready running
|- 10:4:2:69648 nvme10n2 259:4 active ready running
|- 11:1:2:69648 nvme11n2 259:13 active ready running
...

Related

Fluent Bit not saving any data on filesystem

I am new to fluent bit and currently doing a POC. I tried multiple things but couldn't make Fluent Bit save any data to filesystem.
[SERVICE]
flush 1
daemon Off
log_level trace
parsers_file parsers.conf
plugins_file plugins.conf
http_server on
http_listen 0.0.0.0
http_port 2020
storage.metrics on
storage.path /var/log/fluent-bit/buffer
storage.max_chunks_up 4
storage.sync full
storage.backlog.mem_limit 1M
[INPUT]
name cpu
tag cpu.local
# Read interval (sec) Default: 1
interval_sec 1
[INPUT]
name exec
tag d-disk
command df -h --type=ext4 | grep -v Filesystem
interval_sec 1
interval_nsec 0
[INPUT]
name mem
tag memory
interval_sec 1
[OUTPUT]
name stdout
match memory
When I go to /var/log/fluent-bit/buffer and run ls -a I see nothing.
My aim to make Fluent Bit save data on disk.
Here we have to specifically mention the buffering mechanism to use. more details.
Try adding storage.type filesystem in your INPUT section

monitoring linux server sockets or files

I have the famous socketexception too many open files bug.
Iam running an apache http server, tomcat server and a mysql database on my server.
I checked the limit of open files with ulimit -n that gave me 1024.
If i want to check how many files are opened by lsof -u tomcat, it gives me 5
same for mysql. I not sure what the problem is.. but i have also a readlink permission denied.
i want to monitor my socket connections and opened files on my server. I thought about using the decribed linux commands in a shell script and send them per mail to me.
The other option i think is using netstat and count maybe the connections.. but its loading very slowly and is giving me getnameinfo fail.
what would be the better command to monitor the bug i have`?
EDIT:
SHOW GLOBAL VARIABLES LIKE '%open%';
Variable_name Value
Com_ha_open 0
Com_show_open_tables 0
Open_files 8
Open_streams 0
Open_table_definitions 87
Open_tables 64
Opened_files 673
Opened_table_definitions 87
Opened_tables 628
Slave_open_temp_tables 0
SHOW GLOBAL VARIABLES LIKE '%open%';
Variable_name Value
have_openssl DISABLED
innodb_open_files 300
open_files_limit 2000
table_open_cache 64
SHOW GLOBAL VARIABLES LIKE '%connect%'
character_set_connection latin1
collation_connection latin1_swedish_ci
connect_timeout 10
init_connect
max_connect_errors 10
max_connections 400
max_user_connections 0
SHOW GLOBAL STATUS LIKE '%connect%';
Variable_name Value
Aborted_connects 1
Connections 35954
Max_used_connections 102
Ssl_client_connects 0
Ssl_connect_renegotiates 0
Ssl_finished_connects 0
Threads_connected 11
You may check ulimit values with 'ulimit -a' to determine capacity of Open Files.
From OS Command Prompt, ulimit -n 8192 and press enter to enable more Open Files dyamically.
To make this change persist across OS restart, the next URL can be your guide.
https://glassonionblog.wordpress.com/2013/01/27/increase-ulimit-and-file-descriptors-limit/
Where their example is for 500000 capacity, use 8192 for your system, please.
Suggestions to consider for your my.cnf [mysqld] section,
thread_cache_size=100 # to support your max_used_connections of 102
max_user_connections=400 # from 0 to match max_connections requested
table_open_cache=800 # from 64 to reduce Opened_tables count
innodb_open_files=800 # from 300 to match table_open_cache requested
Implementing these details should avoid 'too many open files' message. For additional assistance, view profile, Network profile for contact information and free downloadable Utility Scripts to assist with performance tuning.

I can't unrepresent LUN (SAN) devices from server

I've 22Tb lun from SAN Storage (HITACHI) on my Linux Server(CentOS 6.7).
I configure multipath for this lun, and now I wanna remove it.
The storage team deattach the lun from my server and when I run "multipath -ll"it still exists.
mpathf (360060e801667af00000167af0000014b) dm-2 HITACHI,OPEN-V*12
size=22T features='0' hwhandler='0' wp=rw
`-+- policy='round-robin 0' prio=0 status=enabled
|- 3:0:0:3 sdf 8:80 failed faulty running
`- 3:0:1:3 sdn 8:208 failed faulty running
this message stay until I reboot the server and i can't reboot all of my servers because they are in production environment.
anybody know what should I do?
Thanks
First you need to be sure the mpathf device is not being used:
lsof | grep mpathf
dmsetup info mpathf | grep -i open
In the dmsetup info open count needs to be equal to 0
So you are using the luns with lvm or something else, you need to remove everything from the lun.
Now you can delete sub disks with echo 1 > /sys/block/<x>/device/delete

linux unzip excluding everything in the folder and underneath

Hi I have to unzip a file that could have a Directory and I want to exclude everything within that directory, I tried lot of options and looked here as well, but doesn't seem to find any good solution.
These are the contents of the zip file:
Please note the depth of EXCLUDE folder is unknown, but we have to exclude everything
$unzip -l patch2.zip
Archive: patch2.zip
Length Date Time Name
--------- ---------- ----- ----
0 2013-10-29 17:42 EXCLUDE/
0 2013-10-29 17:24 EXCLUDE/inner/
0 2013-10-29 17:24 EXCLUDE/inner/inner1.txt
0 2013-10-29 15:45 EXCLUDE/file.txt
0 2013-10-29 15:44 patch.jar
0 2013-10-29 15:44 system.properties
--------- -------
0 6 files
I tried this command, which only extract the files within it, but not the folder and its contents:
$unzip -l patch2.zip -x EXCLUDE/*
Archive: patch2.zip
Length Date Time Name
--------- ---------- ----- ----
0 2013-10-29 17:42 EXCLUDE/
0 2013-10-29 17:24 EXCLUDE/inner/
0 2013-10-29 17:24 EXCLUDE/inner/inner1.txt
0 2013-10-29 15:44 patch.jar
0 2013-10-29 15:44 system.properties
--------- -------
0 5 files
Thanks for the help.
You need to quote the exclude pattern so that it is passed to unzip. Otherwise it will be expanded by the shell before being passed to unzip.
Try:
unzip patch2.zip -x "EXCLUDE/*"
#dogbane answer is right.
But I still add another [I hope] interresting option, as you are on linux:
mc
(aka: Midnight Commander)
Start it, and then : on the Right panel, navigate to where you want your files to end up, and on the Left panel, navigate "inside" the ZIP file, and at that first level select + copy the things you need (ie, select all, and unselect the EXCLUDE folder, for example)
mc is VERY flexible and nice to use, especially to tar/untar/zip/move/delete/rename files... (on windows, an equivalent is TotalCommander, and I use its "synchronise" option very often to keep backups and origin in sync). It allow you to navigate archives as if they were uncompressed (trying to minimize the actual decompression to just the "navigating" part so you don't uncompress them twice).

Re-scan LUN on Linux

We have expend existing LUN size on EMC Storage and now i want to re-scan on Host side but i don't know how to figure out SCSI ID of that specific LUN. I am new to storage.. This is what i am doing but don't know whether it is a right way or not
Pseudo name=emcpowerj
CLARiiON ID=APM00112500570 [Oracle_Cluster]
Logical device ID=200601602E002900B6BCA114C9F8E011 [LUN01]
state=alive; policy=CLAROpt; priority=0; queued-IOs=0;
Owner: default=SP A, current=SP A Array failover mode: 1
==============================================================================
--------------- Host --------------- - Stor - -- I/O Path -- -- Stats ---
### HW Path I/O Paths Interf. Mode State Q-IOs Errors
==============================================================================
2 qla2xxx sdaj SP A1 active alive 0 1
2 qla2xxx sdaw SP B1 active alive 0 4
1 qla2xxx sdj SP A0 active alive 0 1
1 qla2xxx sdw SP B0 active alive 0 4
Here i am running find command on sdX device to find out SCSI ID to i can do echo 1 > /sys/bus/scsi/drivers/X:X:X:X/rescan to do re-scan LUN
$ find /sys/devices -name "*block*" | grep -e "sdaj" -e "sdaw" -e "sdj" -e "sdw"
/sys/devices/pci0000:00/0000:00:09.0/0000:05:00.1/host2/rport-2:0-1/target2:0:1/**2:0:1:8**/block:sdaw
/sys/devices/pci0000:00/0000:00:09.0/0000:05:00.1/host2/rport-2:0-0/target2:0:0/**2:0:0:8**/block:sdaj
/sys/devices/pci0000:00/0000:00:09.0/0000:05:00.0/host1/rport-1:0-1/target1:0:1/**1:0:1:8**/block:sdw
/sys/devices/pci0000:00/0000:00:09.0/0000:05:00.0/host1/rport-1:0-0/target1:0:0/**1:0:0:8**/block:sdj
or there is a alternative or other way to scan LUN?
I like to use the "lsscsi" program, which is probably available for your distribution.
% lsscsi
[0:0:0:0] cd/dvd NECVMWar VMware IDE CDR00 1.00 /dev/sr0
[2:0:0:0] disk VMware, VMware Virtual S 1.0 /dev/sda
[2:0:1:0] disk VMware, VMware Virtual S 1.0 /dev/sdb
As for rescanning the bus, that's pretty much it.

Resources