How to check a ubifs filesystem? - linux

ubifs has no fsck program, so how do you check the filesystem integrity when using ubifs?
My target system is ARM, Linux 3.2.58.

From what I have found so far on UBIFS' web page:
integrity - UBIFS (as well as UBI) checksums everything it writes to
the flash media to guarantee data integrity, UBIFS does not leave data
or meta-data corruptions unnoticed (JFFS2 is doing the same); By
default, UBIFS checks only meta-data CRC when it reads from the media,
but not the data CRC; however, you may force CRC checking for data
using one of the UBIFS mount options - see here.
If you need to check filesystem for corruption
If your UBIFS filesystem is mounted with chk_data_crc option, then simple cat $FILES > /dev/null should be sufficient. If not, you can only detect and recover metadata corruption. The corruption of file body will pass unnoticed.
I used something like find / -type f -print -exec cat {} + > /dev/null
If you need to recover broken files
Again from the overview section:
to make it more clear, imaging you have wiped out the FAT table on
your FAT file system; for FAT FS this would be fatal; but if you
similarly wipe out UBIFS index, you still may re-construct it,
although a special user-space tool would be required to do this (this
utility is not implemented at the moment, though)
While theoretically possible, you're on your own.
Back up the flash contents, arm yourself with UBIFS data structures (maybe sources) and a hex editor, and good luck.
edit: As I figured, Linux's MTD driver already applies ECC (error correction code) to MTD devices.
I believe, the criterion for a data loss if there are more than /sys/class/mtd/mtd*/ecc_strength errors per /sys/class/mtd/mtd*/ecc_step_size flash block. mtd_read() (it's a MTD API one level below UBIFS) will return EUCLEAN in this case. Don't know if there exists a tool that uses it to check for errors.
The "bit flipped" warnings we get do not mean that there was a data loss yet. You can write to /sys/class/mtd/mtd*/bitflip_threshold to control the amount of warnings you get.

You can just read all files, which essentially causes ubifs to check them. Cfr. the advise given on the mailing list. The ubifs implementation will recover if it is possible. There is however no guarantee that this will capture all corruptions.
In theory, ubifs should never be corrupted, but in practice bugs in ubifs or NAND drivers can still cause corruptions.

Related

How to get real file offset in NAND by file name?

Using linux, I can use raw access to NAND or access to files through filesystem. So, when I need to know, where my file is really located in NAND, what should I do? I cannot found any utilities providing this feature. Moreover, I cannot detect any possibility of this, besides hacking kernel with tons of "printk" (it's not nice way, I guess).
Can anybody enlighten me on this? (I'm using YAFFS2 and JFFS2 filesystems)
You can make a copy of any partition with nanddump. Transfer that partition dump to a PC. The nandsim utility can be used to mount the partitions on a PC.
modprobe nandsim first_id_byte=0x2c second_id_byte=0xda \
third_id_byte=0x90 fourth_id_byte=0x95 parts=2,64,64
flash_erase /dev/mtd3 0 0
ubiformat /dev/mtd3 -f rootfs.ubi
This command emulates a Micron 256MB NAND flash with four partitions. If you just capture the single partition and not the whole device, don't set parts. You can also do nanddump on each partition and then concatenate them all. This code targeted mtd3 with a UbiFs partition. For JFFS2 or YAFFS2, you can try nandwrite or some other appropriate flashing utility on the PC.
How to get real file offset in NAND by file name?
The files may span several NAND sectors and they are almost never contiguous. There is not much of an advantage to keep file data together as there is no disk head that takes physical time to seek. Some flash has marginally better efficiency for sequential reads; yet other flash will give better performance for reads from another erase block.
I would turn on debug at either the MTD layer or in the filesystem. In a live system, the position of the file may migrate over time on the flash even if it is not written. This is active wear leveling.

Is rename() without fsync() safe?

Is it safe to call rename(tmppath, path) without calling fsync(tmppath_fd) first?
I want the path to always point to a complete file.
I care mainly about Ext4. Is the rename() promised to be safe in all future Linux kernel versions?
A usage example in Python:
def store_atomically(path, data):
tmppath = path + ".tmp"
output = open(tmppath, "wb")
output.write(data)
output.flush()
os.fsync(output.fileno()) # The needed fsync().
output.close()
os.rename(tmppath, path)
No.
Look at libeatmydata, and this presentation:
Eat My Data: How Everybody Gets File IO Wrong
http://www.oscon.com/oscon2008/public/schedule/detail/3172
by Stewart Smith from MySql.
In case it is offline/no longer available, I keep a copy of it:
The video here
The presentation slides (online version of slides)
From ext4 documentation:
When mounting an ext4 filesystem, the following option are accepted:
(*) == default
auto_da_alloc(*) Many broken applications don't use fsync() when
noauto_da_alloc replacing existing files via patterns such as
fd = open("foo.new")/write(fd,..)/close(fd)/
rename("foo.new", "foo"), or worse yet,
fd = open("foo", O_TRUNC)/write(fd,..)/close(fd).
If auto_da_alloc is enabled, ext4 will detect
the replace-via-rename and replace-via-truncate
patterns and force that any delayed allocation
blocks are allocated such that at the next
journal commit, in the default data=ordered
mode, the data blocks of the new file are forced
to disk before the rename() operation is
committed. This provides roughly the same level
of guarantees as ext3, and avoids the
"zero-length" problem that can happen when a
system crashes before the delayed allocation
blocks are forced to disk.
Judging by the wording "broken applications", it is definitely considered bad practice by the ext4 developers, but in practice it is so widely used approach that it was patched in ext4 itself.
So if your usage fits the pattern, you should be safe.
If not, I suggest you to investigate further instead of inserting fsync here and there just to be safe. That might not be such a good idea since fsync can be a major performance hit on ext3 (read).
On the other hand, flushing before rename is the correct way to do the replacement on non-journaling file systems. Maybe that's why ext4 at first expected this behavior from programs, the auto_da_alloc option was added later as a fix. Also this ext3 patch for the writeback (non-journaling) mode tries to help the careless programs by flushing asynchronously on rename to lower the chance of data loss.
You can read more about the ext4 problem here.
If you only care about ext4 and not ext3 then I'd recommend using fsync on the new file before doing the rename. The fsync performance on ext4 seems to be much better than on ext3 without the very long delays. Or it might be the fact that writeback is the default mode (at least on my Linux system).
If you only care that the file is complete and not which file is named in the directory then you only need to fsync the new file. There's no need to fsync the directory too since it will point to either the new file with its complete data, or the old file.

How to make file sparse?

If I have a big file containing many zeros, how can i efficiently make it a sparse file?
Is the only possibility to read the whole file (including all zeroes, which may patrially be stored sparse) and to rewrite it to a new file using seek to skip the zero areas?
Or is there a possibility to make this in an existing file (e.g. File.setSparse(long start, long end))?
I'm looking for a solution in Java or some Linux commands, Filesystem will be ext3 or similar.
A lot's changed in 8 years.
Fallocate
fallocate -d filename can be used to punch holes in existing files. From the fallocate(1) man page:
-d, --dig-holes
Detect and dig holes. This makes the file sparse in-place,
without using extra disk space. The minimum size of the hole
depends on filesystem I/O block size (usually 4096 bytes).
Also, when using this option, --keep-size is implied. If no
range is specified by --offset and --length, then the entire
file is analyzed for holes.
You can think of this option as doing a "cp --sparse" and then
renaming the destination file to the original, without the
need for extra disk space.
See --punch-hole for a list of supported filesystems.
(That list:)
Supported for XFS (since Linux 2.6.38), ext4 (since Linux
3.0), Btrfs (since Linux 3.7) and tmpfs (since Linux 3.5).
tmpfs being on that list is the one I find most interesting. The filesystem itself is efficient enough to only consume as much RAM as it needs to store its contents, but making the contents sparse can potentially increase that efficiency even further.
GNU cp
Additionally, somewhere along the way GNU cp gained an understanding of sparse files. Quoting the cp(1) man page regarding its default mode, --sparse=auto:
sparse SOURCE files are detected by a crude heuristic and the corresponding DEST file is made sparse as well.
But there's also --sparse=always, which activates the file-copy equivalent of what fallocate -d does in-place:
Specify --sparse=always to create a sparse DEST file whenever the SOURCE file contains a long enough sequence of zero bytes.
I've finally been able to retire my tar cpSf - SOURCE | (cd DESTDIR && tar xpSf -) one-liner, which for 20 years was my graybeard way of copying sparse files with their sparseness preserved.
Some filesystems on Linux / UNIX have the ability to "punch holes" into an existing file. See:
LKML posting about the feature
UNIX file trunctation FAQ (search for F_FREESP)
It's not very portable and not done the same way across the board; as of right now, I believe Java's IO libraries do not provide an interface for this.
If hole punching is available either via fcntl(F_FREESP) or via any other mechanism, it should be significantly faster than a copy/seek loop.
I think you would be better off pre-allocating the whole file and maintaining a table/BitSet of the pages/sections which are occupied.
Making a file sparse would result in those sections being fragmented if they were ever re-used. Perhaps saving a few TB of disk space is not worth the performance hit of a highly fragmented file.
You can use $ truncate -s filename filesize on linux teminal to create sparse file having
only metadata.
NOTE --Filesize is in bytes.
According to this article, it seems there is currently no easy solution, except for using FIEMAP ioctl. However, I don't know how you can make "non sparse" zero blocks into "sparse" ones.

ext2 "image" files vs real ext2 devices

I'm tasked with writing a reader program for windows that is able read an ext2 partition.
For my testing I'm using a drive I formatted to ext2 and a file I created using mkfs (a file that does mount and work well under linux)
For some reason when I read the superblock from the drive (the real one) I get all the right meta-data (i.e. block size, inode count etc..) but doing the exact same thing to the file returns bad results (which make no sense).
is there a difference between the 2?
I open the drive using \.\X:
and I make the file using mkfs.
There shouldn't be any difference between ext2 on a partition and stored within a file (and indeed there isn't; I just checked); however, IIRC, the offset of the primary superblock is 2048 instead of 1024, if ext2 is installed on a bare disk (e.g. /dev/sda instead of /dev/sda1). This is to accomodate the MBR and other junk. (I can't find it in the docs from skimming just now, but this sticks out in my mind as something I ran into.) However, it's somewhat unusual to install to a bare drive, so I doubt this is your problem.
I wrote some ext2 utilities a few years ago, and after starting writing it by hand, I switched to using Ted Ts'o (the ext2 filesystem creator)'s e2fsprogs, which come with headers/libraries/etc. for doing all this in a more flexible and reliable fashion.
You may also want to check at offset 0x438 into the file/partition for the magic number 0xEF53, and consider it not an ext2/3 filesystem if that's not there, before pulling in the entire superblock, just as a sanity check.
Here's some docs that will probably be helpful: http://www.nongnu.org/ext2-doc/ext2.html

Shred: Doesn't work on Journaled FS?

Shred documentation says shred is "not guaranteed to be effective" (See bottom). So if I shred a document on my Ext3 filesystem or on a Raid, what happens? Do I shred part of the file? Does it sometimes shred the whole thing and sometimes not? Can it shred other stuff? Does it only shred the file header?
CAUTION: Note that shred relies on a very important assumption:
that the file system overwrites data in place. This is the
traditional way to do things, but many modern file system designs
do not satisfy this assumption. The following are examples of file
systems on which shred is not effective, or is not guaranteed to be
effective in all file sys‐ tem modes:
log-structured or journaled file systems, such as those supplied with AIX and Solaris (and JFS, ReiserFS, XFS, Ext3, etc.)
file systems that write redundant data and carry on even if some writes fail, such as RAID-based file systems
file systems that make snapshots, such as Network Appliance’s NFS server
file systems that cache in temporary locations, such as NFS version 3 clients
compressed file systems
In the case of ext3 file systems, the above disclaimer applies
(and shred is thus of limited effectiveness) only in data=journal
mode, which journals file data in addition to just metadata. In
both the data=ordered (default) and data=writeback modes, shred
works as usual. Ext3 journaling modes can be changed by adding
the data=something option to the mount options for a
particular file system in the /etc/fstab file, as documented in the
mount man page (man mount).
All shred does is overwrite, flush, check success, and repeat. It does absolutely nothing to find out whether overwriting a file actually results in the blocks which contained the original data being overwritten. This is because without knowing non-standard things about the underlying filesystem, it can't.
So, journaling filesystems won't overwrite the original blocks in place, because that would stop them recovering cleanly from errors where the change is half-written. If data is journaled, then each pass of shred might be written to a new location on disk, in which case nothing is shredded.
RAID filesystems (depending on the RAID mode) might not overwrite all of the copies of the original blocks. If there's redundancy, you might shred one disk but not the other(s), or you might find that different passes have affected different disks such that each disk is partly shredded.
On any filesystem, the disk hardware itself might just so happen to detect an error (or, in the case of flash, apply wear-leveling even without an error) and remap the logical block to a different physical block, such that the original is marked faulty (or unused) but never overwritten.
Compressed filesystems might not overwrite the original blocks, because the data with which shred overwrites is either random or extremely compressible on each pass, and either one might cause the file to radically change its compressed size and hence be relocated. NTFS stores small files in the MFT, and when shred rounds up the filesize to a multiple of one block, its first "overwrite" will typically cause the file to be relocated out to a new location, which will then be pointlessly shredded leaving the little MFT slot untouched.
Shred can't detect any of these conditions (unless you have a special implementation which directly addresses your fs and block driver - I don't know whether any such things actually exist). That's why it's more reliable when used on a whole disk than on a filesystem.
Shred never shreds "other stuff" in the sense of other files. In some of the cases above it shreds previously-unallocated blocks instead of the blocks which contain your data. It also doesn't shred any metadata in the filesystem (which I guess is what you mean by "file header"). The -u option does attempt to overwrite the file name, by renaming to a new name of the same length and then shortening that one character at a time down to 1 char, prior to deleting the file. You can see this in action if you specify -v too.
The other answers have already done a good job of explaining why shred may not be able to do its job properly.
This can be summarised as:
shred only works on partitions, not individual files
As explained in the other answers, if you shred a single file:
there is no guarantee the actual data is really overwritten, because the filesystem may send writes to the same file to different locations on disk
there is no guarantee the fs did not create copies of the data elsewhere
the fs might even decide to "optimize away" your writes, because you are writing the same file repeatedly (syncing is supposed to prevent this, but again: no guarantee)
But even if you know that your filesystem does not do any of the nasty things above, you also have to consider that many applications will automatically create copies of file data:
crash recovery files which word processors, editors (such as vim) etc. will write periodically
thumbnail/preview files in file managers (sometimes even for non-imagefiles)
temporary files that many applications use
So, short of checking every single binary you use to work with your data, it might have been copied right, left & center without you knowing. The only realistic way is to always shred complete partitions (or disks).
The concern is that data might exist on more than one place on the disk. When the data exists in exactly one location, then shred can deterministically "erase" that information. However, file systems that journal or other advanced file systems may write your file's data in multiple locations, temporarily, on the disk. Shred -- after the fact -- has no way of knowing about this and has no way of knowing where the data may have been temporarily written to disk. Thus, it has no way of erasing or overwriting those disk sectors.
Imagine this: You write a file to disk on a journaled file system that journals not just metadata but also the file data. The file data is temporarily written to the journal, and then written to its final location. Now you use shred on the file. The final location where the data was written can be safely overwritten with shred. However, shred would have to have some way of guaranteeing that the sectors in the journal that temporarily contained your file's contents are also overwritten to be able to promise that your file is truly not recoverable. Imagine a file system where the journal is not even in a fixed location or of a fixed length.
If you are using shred, then you're trying to ensure that there is no possible way your data could be reconstructed. The authors of shred are being honest that there are some conditions beyond their control where they cannot make this guarantee.

Resources