I am trying to create a dm-cache device using a virtual machine. I have multiple disks labelled Cache_disk, Device_to_Cache and meta_data in /dev/sdb,sdc,sdd respectively, When trying to create the cache, I run the command:
dmsetup create my_cache --table '0 16775168 cache /dev/sdd1 /dev/sdb1 /dev/sdc1 512 1 writeback default 0'
as instructed in the dm-cache article Documentation page
I have enabled dm-cache in the kernel but am catching this error:
device-mapper: reload ioctl failed: Invalid or incomplete multibyte or wide character
command failed
When looking at the dmesg device-mapper: cache metadata fails its sb_check
root#msali014-VirtualBox:/home/msali014# dmesg
[ 5432.738603] device-mapper: cache-policy-mq: version 1.0.0 loaded
[ 5432.794852] device-mapper: cache metadata: sb_check failed: magic 0: wanted 1623043
[ 5432.794862] device-mapper: block manager: superblock validator check failed for block 0
[ 5432.794867] device-mapper: cache metadata: couldn't read lock superblock
[ 5432.797952] device-mapper: table: 252:0: cache: Error creating metadata object
the /var/log/syslog is similar:
Jun 28 11:17:01 msali014-VirtualBox CRON[2935]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly)
Jun 28 11:33:08 msali014-VirtualBox kernel: [ 5432.738603] device-mapper: cache-policy-mq: version 1.0.0 loaded
Jun 28 11:33:08 msali014-VirtualBox kernel: [ 5432.794852] device-mapper: cache metadata: sb_check failed: magic 0: wanted 1623043
Jun 28 11:33:08 msali014-VirtualBox kernel: [ 5432.794862] device-mapper: block manager: superblock validator check failed for block 0
Jun 28 11:33:08 msali014-VirtualBox kernel: [ 5432.794867] device-mapper: cache metadata: couldn't read lock superblock
Jun 28 11:33:08 msali014-VirtualBox kernel: [ 5432.797952] device-mapper: table: 252:0: cache: Error creating metadata object
Jun 28 11:33:08 msali014-VirtualBox kernel: [ 5432.797960] device-mapper: ioctl: error adding target to table
Jun 28 11:33:08 msali014-VirtualBox udevd[619]: inotify_add_watch(6, /dev/dm-0, 10) failed: No such file or directory
Jun 28 11:33:08 msali014-VirtualBox udevd[619]: inotify_add_watch(6, /dev/dm-0, 10) failed: No such file or directory
[ 5432.797960] device-mapper: ioctl: error adding target to table
How can I change the value of sb->magic to make dm-cache successfully load? Any help would be greatly appreciated.
The multibyte or wide character error message is troubling to me and I don't have any direct advice to working around that.
I assume /dev/sdd1 and /dev/sdb1 are your metadata and data storage block devices? Do they contain any data?
Have you tried zeroing out the metadata volume (dd if=/dev/zero of=/dev/sdd1)? I had issues with when setting up dm-cache a while back.
In a nutshell I do the following (on Ubuntu 13.04 + Linux 3.10 release):
dmsetup create ssd-metadata --table '0 19370 linear /dev/disk/by-id/scsi-SATA_OCZ-AGILITY2_f2d200034-part6 0'
dmsetup create ssd-blocks --table '0 189008982 linear /dev/disk/by-id/scsi-SATA_OCZ-AGILITY2_f2d200034-part6 19370'
dmsetup create home-cached --table '0 1048576000 cache /dev/mapper/ssd-metadata /dev/mapper/ssd-blocks /dev/vg0/spindle 512 1 writeback default 0'
On a side note, I ran the 3.9.6 and a few earlier 3.9 kernels without issue on Ubuntu 12.11 and 13.04.
If all else fails, I have a working solution for my setup on my blog with more details, you might want to check out for a step by step tutorial.
I had the exact same problem:
[ 968.960618] device-mapper: cache metadata: sb_check failed: blocknr 985712174465152: wanted 0
Copying 0 to my metadata device fixed the problem.
dd bs=64k if=/dev/zero of=/dev/md1
Thanks #Kyle.
Related
I been getting this error on my manjaro linux machine, here is some more info:
- Journal begins at Mon 2021-03-08 18:37:49 EET, ends at Tue 2021-03-09 16:21:19 EET. --
Mar 09 11:02:26 manjaro kernel: tpm_crb MSFT0101:00: can't request region for resource [mem 0xcfbb6000-0xcfbb9fff]
Mar 09 11:02:29 manjaro kernel: kfd kfd: STONEY not supported in kfd
Mar 09 11:02:32 manjaro systemd-backlight[1332]: Failed to get backlight or LED device 'backlight:acpi_video0': No such device
Mar 09 11:02:32 manjaro systemd[1]: Failed to start Load/Save Screen Backlight Brightness of backlight:acpi_video0.
Subject: A start job for unit systemd-backlight#backlight:acpi_video0.service has failed
Defined-By: systemd
Support: https://forum.manjaro.org/c/support
A start job for unit systemd-backlight#backlight:acpi_video0.service has finished with a failure.
The job identifier is 1354 and the job result is failed.
Mar 09 11:02:32 manjaro systemd-backlight[1333]: Failed to get backlight or LED device 'backlight:acpi_video1': No such device
Mar 09 11:02:32 manjaro systemd[1]: Failed to start Load/Save Screen Backlight Brightness of backlight:acpi_video1.
Subject: A start job for unit systemd-backlight#backlight:acpi_video1.service has failed
Defined-By: systemd
Support: https://forum.manjaro.org/c/support
A start job for unit systemd-backlight#backlight:acpi_video1.service has finished with a failure.
The job identifier is 1360 and the job result is failed.
I don't know if the kfd error it's happening because of the first error.
I would like to know what it actually means, where is it coming from, and how can I fix it?
And maybe a word on the systemd-backlight#backlight:acpi_video1.service error.
The setup i have:
Cpu:
AMD A9-9420 RADEON R5, 5 COMPUTE CORES 2C+3G, 2586 MHz
GPU:
ATI Stoney [Radeon R2/R3/R4/R5 Graphics]
4GB RAM, 250GB SSD
OS: Linux manjaro 5.9.16-1-MANJARO #1 SMP PREEMPT Mon Dec 21 22:00:46 UTC 2020 x86_64 GNU/Linux
Configuration:
NFS server and the k8s cluster(single node cluster) run on two machines and use the same OS and NFS software, as below:
[root#test-2 ~]# yum info nfs-utils
Failed to set locale, defaulting to C
Loaded plugins: fastestmirror
Loading mirror speeds from cached hostfile
* base: mirrors.tuna.tsinghua.edu.cn
* extras: mirrors.bfsu.edu.cn
* updates: mirrors.huaweicloud.com
Installed Packages
Name : nfs-utils
Arch : x86_64
Epoch : 1
Version : 1.3.0
Release : 0.68.el7
Size : 1.1 M
Repo : installed
From repo : base
Summary : NFS utilities and supporting clients and daemons for the kernel NFS server
URL : http://sourceforge.net/projects/nfs
License : MIT and GPLv2 and GPLv2+ and BSD
Description : The nfs-utils package provides a daemon for the kernel NFS server and
: related tools, which provides a much higher level of performance than the
: traditional Linux NFS server used by most users.
:
: This package also contains the showmount program. Showmount queries the
: mount daemon on a remote host for information about the NFS (Network File
: System) server on the remote host. For example, showmount can display the
: clients which are mounted on that host.
:
: This package also contains the mount.nfs and umount.nfs program.
[root#test-2 ~]# cat /etc/redhat-release
CentOS Linux release 7.7.1908 (Core)
[root#test-2 ~]# uname -a
Linux test-2 3.10.0-1062.el7.x86_64 #1 SMP Wed Aug 7 18:08:02 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
[root#test-2 ~]# cat /etc/exports
/home/nfs 192.168.0.0/24(rw,sync,no_root_squash,no_subtree_check,insecure)
K8S version: v1.17.9
Problems:
The application(a statefulset) running on k8s is using a PV that was dynamically provisioned by the k8s-nfs-provisioner, the PV is actually backed by a directory on remote NFS server. The application is keeping "CrashLoopBackOff" because it hits "input/output error" constantly when writing some data to the PV after only a few seconds of running.
Meanwhile, I saw a lot of errors in /var/log/messages:
Dec 2 17:11:36 localhost kernel: NFS: nfs4_reclaim_open_state: Lock reclaim failed!
Dec 2 17:11:36 localhost kernel: NFS: nfs4_reclaim_open_state: Lock reclaim failed!
Dec 2 17:11:36 localhost kernel: NFS: nfs4_reclaim_open_state: Lock reclaim failed!
Dec 2 17:11:36 localhost kernel: NFS: nfs4_reclaim_open_state: Lock reclaim failed!
Dec 2 17:11:36 localhost kernel: NFS: nfs4_reclaim_open_state: Lock reclaim failed!
Dec 2 17:11:36 localhost kernel: NFS: nfs4_reclaim_open_state: Lock reclaim failed!
Dec 2 17:12:05 localhost kernel: NFS: nfs4_reclaim_open_state: Lock reclaim failed!
Dec 2 17:12:05 localhost kernel: NFS: nfs4_reclaim_open_state: Lock reclaim failed!
Dec 2 17:21:41 localhost kernel: NFS: nfs4_reclaim_open_state: Lock reclaim failed!
Dec 2 17:21:41 localhost kernel: NFS: nfs4_reclaim_open_state: Lock reclaim failed!
Dec 2 17:21:41 localhost kernel: NFS: nfs4_reclaim_open_state: Lock reclaim failed!
Dec 2 17:21:41 localhost kernel: NFS: nfs4_reclaim_open_state: Lock reclaim failed!
Dec 2 17:21:42 localhost kernel: NFS: nfs4_reclaim_open_state: Lock reclaim failed!
I took a tcpdump until hit "Lock reclaim failed" in system log, and found there are many NFS errors as below:
NFS4ERR_BADSESSION (10052)
NFS4ERR_STALE_CLIENTID (10022)
NFS4ERR_NO_GRACE (10033)
I'm not sure if they're related to the "lock reclaim failed" or the "input/output" error.
I have encountered this problem on different machines with different machines from time to time and it really annoys me.
Anyone knows the root cause or how to fix it? Big thanks in advance.
Screenshots
application pod log
NFS errors in tcpdump
nfsstate -m output on k8s
nfsstate -c output on k8s, NOTE the high open_noat value.
NFS server configuration (my k8s node is 111.1.30.16)
when i run yum command:
> yum
There was a problem importing one of the Python modules
required to run yum. The error leading to this problem was:
/usr/lib64/python2.6/lib-dynload/arraymodule.so: cannot read file data: Input/output error
Please install a package which provides this module, or
verify that the module is installed correctly.
It's possible that the above module doesn't match the
current version of Python, which is:
2.6.6 (r266:84292, Jul 23 2015, 15:22:56)
[GCC 4.4.7 20120313 (Red Hat 4.4.7-11)]
Current version of Python is 2.6.6,not other。
system logs:
Oct 16 09:56:50 localhost kernel: mptbase: ioc0: LogInfo(0x31080000): Originator={PL}, Code={SATA NCQ Fail All Commands After Error}, SubCode(0x0000) cb_idx mptscsih_io_done
Oct 16 09:56:50 localhost kernel: LSI Debug log info 31080000 for channel 0 id 0
Oct 16 09:56:50 localhost kernel: mptbase: ioc0: LogInfo(0x31080000): Originator={PL}, Code={SATA NCQ Fail All Commands After Error}, SubCode(0x0000) cb_idx mptscsih_io_done
Oct 16 09:56:50 localhost kernel: LSI Debug log info 31080000 for channel 0 id 0
Oct 16 09:56:50 localhost kernel: sd 6:0:0:0: [sda] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Oct 16 09:56:50 localhost kernel: sd 6:0:0:0: [sda] Sense Key : Medium Error [current]
Oct 16 09:56:50 localhost kernel: Info fld=0x4d59fc8
Oct 16 09:56:50 localhost kernel: sd 6:0:0:0: [sda] Add. Sense: Unrecovered read error
Oct 16 09:56:50 localhost kernel: sd 6:0:0:0: [sda] CDB: Read(10): 28 00 04 d5 9f c8 00 00 08 00
Oct 16 09:56:50 localhost kernel: end_request: critical medium error, dev sda, sector 81108936
Who know how to fix? Thank you!
Input/output error indicates that you system cannot read the file. Your log indicates that the hard drive is failing. Reinstall yum through RPM if you must, but ultimately backup your critical data and salvage the storage array.
I create an VM instance. I can connect to it as soon ad the SSH Daemon is started. But this is too early because kernel startup is only at approx. 30%. Is there a gcloud or other API to get the VM state when the kernel has finished startup?
Nov 18 10:58:51 image-name google: No startup script found in metadata.
Nov 18 10:58:53 image-name kernel: [ 27.491829] aufs au_opts_verify:1570:docker[2414]: dirperm1 breaks the protection by the permission bits on the lower branch
Nov 18 10:58:53 image-name kernel: [ 27.703142] aufs au_opts_verify:1570:docker[2414]: dirperm1 breaks the protection by the permission bits on the lower branch
Nov 18 10:58:53 image-name kernel: [ 27.735867] aufs au_opts_verify:1570:docker[2414]: dirperm1 breaks the protection by the permission bits on the lower branch
Nov 18 10:58:53 image-name kernel: [ 27.771732] aufs au_opts_verify:1570:docker[2260]: dirperm1 breaks the protection by the permission bits on the lower branch
Nov 18 10:58:53 image-name kernel: [ 27.797540] device vethfa3ab85 entered promiscuous mode
Nov 18 10:58:53 image-name kernel: [ 27.804420] IPv6: ADDRCONF(NETDEV_UP): vethfa3ab85: link is not ready
Nov 18 10:58:53 image-name kernel: [ 28.028306] IPv6: ADDRCONF(NETDEV_CHANGE): vethfa3ab85: link becomes ready
Nov 18 10:58:53 image-name kernel: [ 28.035505] docker0: port 1(vethfa3ab85) entered forwarding state
Nov 18 10:58:53 image-name kernel: [ 28.041963] docker0: port 1(vethfa3ab85) entered forwarding state
Nov 18 10:58:53 image-name kernel: [ 28.048532] IPv6: ADDRCONF(NETDEV_CHANGE): docker0: link becomes ready
Nov 18 10:58:54 image-name kernel: [ 28.980082] IPv6: eth0: IPv6 duplicate address fe80::42:acff:fe11:1 detected!
->>> about here I can SSH to the server
Nov 18 10:59:08 image-name kernel: [ 43.068094] docker0: port 1(vethfa3ab85) entered forwarding state
Nov 18 10:59:53 image-name kernel: [ 87.944452] aufs au_opts_verify:1570:docker[2864]: dirperm1 breaks the protection by the permission bits on the lower branch
Nov 18 10:59:53 image-name kernel: [ 88.001012] aufs au_opts_verify:1570:docker[2864]: dirperm1 breaks the protection by the permission bits on the lower branch
Nov 18 10:59:53 image-name kernel: [ 88.049510] aufs au_opts_verify:1570:docker[2815]: dirperm1 breaks the protection by the permission bits on the lower branch
->>> I want to know about this point in the startup process
My problem is that I can connect to it using SSH when kernel progress is below 30% and some processes are not yet started. I want to detect somehow if the server has completed startup. Or is there a script that can push to the server (through the GCE APIs) to notify me when a server is completely up?
gcloud compute instances describe image-name does return the same output from the moment the instance is started till the kernel startup is complete.
(In my case I use the Node.js GCE API, but this should not make any difference.)
Presently I am not aware of any such google native API that can provide a progress of instance start.
However this is a quick workaround check if this fits your requirement.
You can either use the Google Startup script or the native linux rc.local. The concept is the same, so explaining it for the case of rc.local [as it is generic and not tied to google]
We know that the last process in a bootup sequence that runs is rc.local. Any command or script or call that is in this rc.local [which is a sh or bash script by itself] will be executed at the end of boot process.
So the idea would be in the google image in case of rc.local, have a script or a call which send your a notification or writes a output to central system like KV or cloud storage the state that bootup is all done.
Similar to Kamran, but here is how I get this done. It depends on using a google startup script and an image where gcloud is installed by default (though you could rework this to just use curl and API calls)
On instance creation/configuration, I set a custom metadata flag: serverready=False
At the end of my google startup script, I have this:
sudo gcloud compute instances add-metadata $(hostname) \
--metadata serverready=True \
--zone $(curl \
"http://metadata.google.internal/computeMetadata/v1/instance/zone" \
-H "Metadata-Flavor: Google"|cut -d/ -f4)
When I run the instance creation, I can just poll the metadata for the serverready key, and set my app to wait until it sees serverready=True
I'm receiving this error
BLKRASET: Inappropriate ioctl for device
when trying to run
sudo blockdev --setra 256 /data
on my Linux server. The server is being used as a MongoDB server and /data is where it stores it's data.
I initially tried to run this command when I received this warning when starting my MongoDB shell:
Wed Mar 20 22:40:49.850 [initandlisten]
Wed Mar 20 22:40:49.850 [initandlisten] ** WARNING: Readahead for
/data/db is set to 2048KB
Wed Mar 20 22:40:49.850 [initandlisten] ** We suggest setting it to
256KB (512 sectors) or less
Wed Mar 20 22:40:49.850 [initandlisten] **
http://dochub.mongodb.org/core/readahead
The blockdev --setra command is supposed to set the readahead value for that directory and resolve the issue but I'm running into this issue
The blockdev command operates on block devices (disks), not directories. You need to pass it the name of the device in /dev/ where your data directory is stored. If you df /data it will tell you which device is currently mounted there. Then you can run blockdev --setra 512 /dev/whatever