mtd_stresstest does not show any output and runs for ever even with count=1 - linux

I am using a rockchip 3188 based system on board which has 8 gb NAND flash.
I want to test the reliability of the NAND flash.
At least , I want to identify boards with bad NAND flashes shipping from factory.
I am using Ubuntu 14.04.
The NAND flash is partitioned into two parts :
1. mtd0: contains bootloader, kernel and initrd
2. mtd1: contains RFS
I tried running mtd_stresstest by "modprobe mtd_stresstest dev=1" and it never says a word. If I run it for too long, my system is getting corrupted. The corruption is expected as it is playing with the same device / is mounted on.
But the command is not returning even if I use count=1.
Please let me know what is going wrong.
I tried the following too:
Flashed a USB stick with ubuntu rfs meant for arm and plugged it to SOB.
bind mounted /proc to /media//proc
bind mounted /sys to /media/
cd /media/
chroot .
init 1
modprobe mtd_stresstest dev=1 count=1 ----> never says a word
Could you please also suggest if there is any other way test the NAND flash device reliability.

Related

Clone one MicroSD card to another

So, I have a Raspbian Lite booted from PINN, the bootloader, on my Raspberry Pi 2B v1.1. I have it all written on a 8.0GB micro SD Card. I just bought an upgrade - a 64.0GB micro SD. I have a lot of things on my original 8GB SD, so I don't want to have to manually re-install every little thing I have.
My question is this: Is there a way to clone my whole card, with every partition, using the terminal in Raspbian Lite to a new SD Card?
I have tried rpi-clone: it seems to only copy over two partitions.
I have the 64GB plugged in via a USB adapter, no problem there.
Here are my partitions on my 8.0GB card:
Thanks, Bobbay
It's best to duplicate your SD card on a computer where the operating system is not running from that SD card - mainly because the contents of the card could change while you are duplicating it in a live system.
So, I would boot a PC off a live distro, like Knoppix. Once booted, start a Terminal and check the names of the disk drives like this:
ls /dev/sd?
You'll probably just have /dev/sda, but check! Now attach your 8GB SD card, wait a couple of seconds and check what name that got allocated. It's likely to be /dev/sdb
ls /dev/sd?
If it's /dev/sdb save that as SRC (source), like this:
SRC=/dev/sdb
Now attach your 64GB SD card, wait a couple of seconds and check what name that got allocated. It's likely to be /dev/sdc
ls /dev/sd?
If it's /dev/sdc save that as DST (destination), like this:
DST=/dev/sdc
If, and only if, everything works as above, you can now clone SRC to DST with:
sudo dd if=$SRC of=$DST bs=65536
The command above will take a fair time to run. Once it is complete, you will have a clone of your original disk, as /dev/sdc However, this will have the same size partitions as your 8GB drive, so you will want to expand the partitions out to fill the available space. I don't know which one(s) you want to expand, or by how much, but you will want to use the resize2fs command on the new disk. You can get help on that with:
man resize2fs

Custom Kernel, access eMMC memory

I am building kernels in my company.
Currently we have asurface 3 (non pro) device here and it sould boot with our own kernel and miniroot.
so far it boots up, but doesnt detect the eMMC memory.
IN the future more eMMC devices should be supported so I added a lot of mmc drivers directly into the kernel. We are limited to 90MB miniroot size, so every driver usually is build into the kernel
here's the current mmc config
cat kernel/config-x86_64-4.4.11 | grep MMC
# CONFIG_PCI_MMCONFIG is not set
CONFIG_MMC=y
CONFIG_MMC_DEBUG=y
# MMC/SD/SDIO Card Drivers
CONFIG_MMC_BLOCK=m
CONFIG_MMC_BLOCK_MINORS=8
CONFIG_MMC_BLOCK_BOUNCE=y
CONFIG_MMC_TEST=y
# MMC/SD/SDIO Host Controller Drivers
CONFIG_MMC_SDHCI=y
CONFIG_MMC_SDHCI_PCI=y
# CONFIG_MMC_RICOH_MMC is not set
CONFIG_MMC_SDHCI_ACPI=y
CONFIG_MMC_SDHCI_PLTFM=y
CONFIG_MMC_WBSD=y
CONFIG_MMC_TIFM_SD=y
CONFIG_MMC_SDRICOH_CS=y
CONFIG_MMC_CB710=y
CONFIG_MMC_VIA_SDMMC=y
CONFIG_MMC_VUB300=y
CONFIG_MMC_USHC=y
CONFIG_MMC_USDHI6ROL0=y
# CONFIG_MMC_REALTEK_USB is not set
CONFIG_MMC_TOSHIBA_PCI=y
CONFIG_MMC_MTK=y
CONFIG_MMC_SDHCI=y
CONFIG_MMC_SDHCI_PCI=y
CONFIG_MMC_SDHCI_ACPI=y
CONFIG_MMC_SDHCI_PLTFM=y
Still the mmcblk device does not show up.
any suggestions on how to make this work? Any modules I might be missing?
Cheers
Well the Surface 3 Tablet works a little bit different
After adding GPIO kernel modules the eMMC memory got recognized and usable

Fix the boot order/eMMC on a Beagle Bone Black

Problem
The problem I'm having was caused by the following action: When I had BBB connected to my PC (using USB cable), I accidentally formatted the ~92 MB partion that contained the getting started files.
Because of this, each time I apply power to BBB, the USB LEDs do not light up. It only works when I have the Angstrom image on an external microSD card.
What I've tried
I thought that this was caused because the eMMC is corrupt and for some reason is not bootable. So, I tried to boot from the external microSD card (that has the newest image running) and to use dd command where if was equal to the current microSD card and of to the target microSD card (built in on the board).
When I restarted the BBB, I looks like dd was successful (when I executed it, it told me that everything was successful). Now, there is one partition with the GettingStarted files and another with the Linux kernel.
Question
Despite this, it's not possible to boot from the internal microSD card. Does anyone know how this can be solved? Is there something to do with the boot order?
To force a boot from SD you need to remove power from the board completely, hold down S2 and then re-apply power. Keep holding the button until the four leds start turning on. You have to do this at power on, and once you've done it the board will continue to boot from SD on a reboot or reset, only removing power will change the behaviour.
You could also move R68 to R93 if you want to make the board boot from SD by default.
Also note the boot sequence in the table on page 6 of the schematics, by default if MLO can't be found on the eMMC, it'll look for it on the SD card. So deleting MLO normally causes the board to boot from SD if the appropriate files are present.
According to the Beaglebone Black Cook Book,
the card boots from the SD if available.
This is also how it work with Debian 8.3 image for BBB
(note that I am using the version of the image that does NOT
flash...).

How to get real file offset in NAND by file name?

Using linux, I can use raw access to NAND or access to files through filesystem. So, when I need to know, where my file is really located in NAND, what should I do? I cannot found any utilities providing this feature. Moreover, I cannot detect any possibility of this, besides hacking kernel with tons of "printk" (it's not nice way, I guess).
Can anybody enlighten me on this? (I'm using YAFFS2 and JFFS2 filesystems)
You can make a copy of any partition with nanddump. Transfer that partition dump to a PC. The nandsim utility can be used to mount the partitions on a PC.
modprobe nandsim first_id_byte=0x2c second_id_byte=0xda \
third_id_byte=0x90 fourth_id_byte=0x95 parts=2,64,64
flash_erase /dev/mtd3 0 0
ubiformat /dev/mtd3 -f rootfs.ubi
This command emulates a Micron 256MB NAND flash with four partitions. If you just capture the single partition and not the whole device, don't set parts. You can also do nanddump on each partition and then concatenate them all. This code targeted mtd3 with a UbiFs partition. For JFFS2 or YAFFS2, you can try nandwrite or some other appropriate flashing utility on the PC.
How to get real file offset in NAND by file name?
The files may span several NAND sectors and they are almost never contiguous. There is not much of an advantage to keep file data together as there is no disk head that takes physical time to seek. Some flash has marginally better efficiency for sequential reads; yet other flash will give better performance for reads from another erase block.
I would turn on debug at either the MTD layer or in the filesystem. In a live system, the position of the file may migrate over time on the flash even if it is not written. This is active wear leveling.

How can I simulate a failed disk during testing?

In a Linux VM (Vmware workstation or similar), how can I simulate a failure on a previously working disc?
I have a situation happening in production where a disc fails (probably a controller, cable or firmware problem). Obviously this is not predictable or reproducible, I want to test my monitoring to ensure that it alerts correctly.
I'd ideally like to be able to simulate a situation where it fails writes but succeeds reads, as well as a complete failure, i.e. the scsi interface reports errors back to the kernel.
There are several layers at which a disk error can be simulated. If you are testing a single user-space program, probably the simplest approach is to interpose the appropriate calls (e.g. write()) and have them sometimes return an error. The libfiu fault-injection library can do this using its fiu-run tool.
Another approach is to use a kernel driver that can pass through data to/from another device, but inject faults along the way. You can then mount the device and use it from any application as if it was a faulty disk. The fsdisk driver is an example of this.
There is also a fault injection infrastructure that has been merged in to the Linux kernel, although you will probably need to reconfigure your kernel to enable it. It is documented in Documentation/fault-injection/fault-injection.txt. This is useful for testing kernel code.
It is also possible to use SystemTap to inject faults at the kernel level. See The SCSI fault injection test and Kernel Fault injection using SystemTap.
To add to mark4o's answer, you can also use Linux's Device Mapper to generate failing devices.
Device Mapper's delay device can be used to send read and write I/O of the same block to different underlying devices (it can also delay that I/O as its name suggests). Device Mapper's error device can be used to generate permanent errors when a particular block is accessed. By combining the two you can create a device where writes always fail but reads always succeed for a given area.
The above is a more complicated example of what is described in the question Simulate a faulty block device with read errors? (see https://stackoverflow.com/a/1871029 for a simple Device Mapper example).
There is also a list of Linux disk fault injection mechanisms on the Special File that causes I/O error Unix & Linux question.
A simple way to make a SCSI disk disappear with a 2.6 kernel is:
echo 1 > /sys/bus/scsi/devices/H:B:T:L/delete
(H:B:T:L is host, bus, target, LUN). To simulate the read-only case you'll have to use the fault injection methods that mark4o mentioned, though.
Linux kernel provides a nice feature called “fault injection”
echo 1 > /sys/block/vdd/vdd2/make-it-fail
To setup some of the options:
mkdir /debug
mount debugfs /debug -t debugfs
cd /debug/fail_make_request
echo 10 > interval # interval
echo 100 > probability # 100% probability
echo -1 > times # how many times: -1 means no limit
https://lxadm.com/Using_fault_injection
You may use scsi_debug kernel module to simulate a RAM disk and it supports all the SCSI errors with opts and every_nth options.
Please check this http://sg.danny.cz/sg/sdebug26.html
Example on medium error on sector 4656:
[fge#Gris-Laptop ~]$ sudo modprobe scsi_debug opts=2 every_nth=1
[fge#Gris-Laptop ~]$ sudo dd if=/dev/sdb of=/dev/null
dd: error reading ‘/dev/sdb’: Input/output error
4656+0 records in
4656+0 records out
2383872 bytes (2.4 MB) copied, 0.021299 s, 112 MB/s
[fge#Gris-Laptop ~]$ dmesg|tail
[11201.454332] blk_update_request: critical medium error, dev sdb, sector 4656
[11201.456292] sd 5:0:0:0: [sdb] FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[11201.456299] sd 5:0:0:0: [sdb] Sense Key : Medium Error [current]
[11201.456303] sd 5:0:0:0: [sdb] Add. Sense: Unrecovered read error
[11201.456308] sd 5:0:0:0: [sdb] CDB: Read(10) 28 00 00 00 12 30 00 00 08 00
[11201.456312] blk_update_request: critical medium error, dev sdb, sector 4656
You could alter the opts and every_nth options in runtime via sysfs:
echo 2 | sudo tee /sys/bus/pseudo/drivers/scsi_debug/opts
echo 1 | sudo tee /sys/bus/pseudo/drivers/scsi_debug/opts
One can also use methods that are provided by the disks to do media error testing. SCSI has a WRITE LONG command that can be used to corrupt a block by writing data with invalid ECC. SATA and NVMe also have similar commands.
For the most common case (SATA) you can use hdparm with --make-bad-sector to employ that command, you can use sg_write_long for SCSI and for NVMe you can use the nvme-cli with the write-uncor option.
The big advantage that these commands have over other injection methods is that they also behave just like a drive does, with full latency impacts and also the recovery upon a write to that sector by reallocation. This includes also error counters going up in the drive.
The disadvantage is that if you do this too much for the same drive its error counters will go up and SMART may flag the disk as bad or you may exhaust its reallocation tables. So do use it for manual testing but if you are running it on automated testing don't do it too often.
You can also use a low-level SCSI utility (sg3-utils) to stop the drive. It will still respond to Inquiry, so its state will still be "running" but reads and writes will fail until it is started again. I've tested RAID drive removal and recovery using mdadm this way.
sg_start --stop /dev/sdb

Resources