I have a program that takes a file, compresses it using /usr/bin/zip or /bin/gzip or /bin/bzip2, and removes the original if and only if the compress operation completes successfully.
However, this program can be killed (via kill -9), or, in principle, can even crash on its own!
Question: Can I assume that the zipped output file that gets created on disk is always valid, without ever having to decompress it and comparing it with the original?
In other words, no matter the point the compress operation gets ungracefully interrupted at, does the fact that the compressed output file exists on disk imply it's valid?
In other words, are the compress operation and the file creation on disk together an atomic transaction?
The main concern here is not removing the original file if the compressed file is invalid, but without having to undergo the costly decompress and compare operations.
Note:
Ignore OS file buffers not flushing to disk due to UPS failure.
Ignore disk/media related failure. This can happen much later anyway, and quite independently of the program's interruption.
A. Yes, if zip, gzip, or bzip2 complete successfully, you can assume that the resulting compressed file is valid with a high probability. Those programs have been around a loooong time, and I would assert that very nearly all data integrity bugs were worked out of them long ago. You also need to consider the reliability of your hardware in its operating environment.
B. (Your "in other words" seem like entirely different questions.) No. An ungracefully interrupted compress operation will generally leave a partial and invalid compressed file behind.
C. No. The file is created and then written to a chunk at a time. Those operations are certainly not atomic.
You just need to verify that the compression utility completed successfully by virtue of it exiting normally and returning zero as the exit code. Then you do not need to examine the compressed file unless you are super paranoid, perhaps because the data has very high value to you.
I should note that verifying the compressed data will take a fraction of the time it takes to compress it, at least for zip and gzip. bzip2 will take about the same amount of time as it took to compress.
Related
I'm learning LevelDB and RocksDB and are confused by how they keep WAL data integrity without truncate.
What I found:
Log files are always seeked at block boundaries (which is 8 KiB). Guess that means there's no garbage between two blocks.
The log writer (and underlying WriteableFile) never truncates file on failure writes. It just continues write. Guess that means a failed write won't change the file offset so next write still locate at where it should be.
But from Posix spec it says:
This volume of POSIX.1-2017 does not specify the value of the file offset after an error is returned; there are too many cases. For programming errors, such as [EBADF], the concept is meaningless since no file is involved. For errors that are detected immediately, such as [EAGAIN], clearly the pointer should not change. After an interrupt or hardware error, however, an updated value would be very useful and is the behavior of many implementations.
So is this a unspecific behavior which should not rely on or actually ensured by practical systems and safe to use?
I'm learning LevelDB and RocksDB and are confused by how they keep WAL
data integrity without truncate.
RocksDB first split the log file to be fixed-length "blocks"(which is 32KiB). The fixed-length blocks make it easy to verify the checksum of each block while reading.
And upon the fixed-length blocks, there are serialized "records". The length of each "record" could be stored across multiple "blocks". And the atomicity of each WriteBatch in rocksdb is ensured by we can read the full content of a "record", with blocks ensuring the checksum integrity.
If write failure happens, the next time we open the same log file for reading, the last incomplete write is ignored so that the incomplete WriteBatch that met I/O error is not committed.
The log writer (and underlying WriteableFile) never truncates file on failure writes. It just continues write. Guess that means a failed write won't change the file offset so next write still locate at where it should be.
I think RocksDB won't reuse the same log file for writing if I/O error happened. (not sure yet)
I tried to truncate the log file when I/O error happened and reuse it next time program restarts. But it ends up with many corner case problems and I don't think it is a good practice.
Just to be clear, my questions are language/OS agnostic (independent).
I am working on a program (support many OSs, currently written in Golang) that receive many chunks of data (like a stream of data chunks) and sequentially write it all down to a pre-specified position (pos >= 0) in a file. Only 1 process with 1 thread accessing the file. I use regular write function that use write system call (which one depends on the OS it runs) internally, not buffered IO.
Supposed that while my program was writing, suddenly system crashed (the most severe kind of crash: power failure).
When the system is turned back on, I need to verify how many chunks is completely written to HDD. (*)
The HDD that my program writes to is just today regular desktop or laptop HDD (not some fancy one (with battery-backed) found in some high-end servers).
Supposed that bit corruption during transfer to and reading from HDD is very highly unlikely to happen and is negligible.
My questions are:
Do I need to do checksum on all of the written chunks to verify (*)?
Or do I just need to check and confirm that the nth chunk is correct and assume all the chunks before it (0 -> n-1) is correct too?
2.1 If 2. is enough, does that means sequential writes order is guaranteed to be preserved by FS of any OS (random writes can still be reordered though)?
Is my way of doing recovery rely on the same principle as append-only log file as seen in many crash-proof databases?
I suspect your best bet is to study up on Cyclic Redundancy Checks (CRC).
As I understand it a CRC would allow you to verify that what was intended to be written actually was.
I also suggest that worrying about the cause of any errors is not very worthwhile (transmission errors vs. errors for any other reason such as a power failure).
Hope this helps.
I'm thinking about ways for my application to detect a partially-written record after a program or OS crash. Since records are only ever appended to a file (never overwritten), is a crash while writing guaranteed to yield a file size that is shorter than it should be? Is this guaranteed even if the file was opened in read-write mode instead of append mode, so long as writes are always at the end of the file? This would greatly simplify crash recovery, since comparing the last record's expected size and position with the actual file size would be enough to detect a partial write.
I understand that random-access writes can be reordered by the filesystem, but I'm having trouble finding information on whether this can happen when appending. I imagine an out-of-order append would require the filesystem to create a "hole" at the tail of the (sparse) file, write blocks beyond the hole, and then fill in the blocks in between, but I'm hoping that such an approach would be so inefficient that nobody would ever implement their filesystem that way.
I suppose another problem might be a filesystem updating the directory entry's file size field before appending the new blocks to to the file, and the OS crashing in between. Does this ever happen in practice? (ext4, perhaps?) Is there a quick way to detect it? (And what happens when trying to read the unwritten blocks that should exist according to the file's size?)
Is there anything else, such as write reordering performed by a disk/flash drive, that would get in the way of using file size as a way to detect a partial append? I don't expect to be able to compensate for this sort of drive trickery in my application, but it would be good to know about.
If you want to be SURE that you're never going to lose records, you need a consistent journaling or transactional system for your files.
There is absolutely no guarantee that a write will have been fulfilled unless you either set O_DIRECT [which you probably do not want to do], or you use markers to indicate aht "this has been fully committed", that are only written when the file is closed. You can either do that in the mainfile, or, for example, have a file that records, externally, "last written record". If you open & close that file, it should be safe as long as the APP is what is crashing - if the OS crashes [or is otherwise abruptly stopped - e.g. power cut, disk unplugged, etc], all bets are off.
Write reordering and write caching is/can be done at all levels - the C library, the OS, the filesystem module and the hard disk/controller itself are all ABLE to reorder writes.
I have one writer which creates and sometimes updates a file with some status information. The readers are implemented in lua (so I got only io.open) and possibly bash (cat, grep, whatever). I am worried about what would happen if the status information is updated (which means a complete file rewrite) while a reader has an open handle to the file: what can happen? I have also read that if the write/read operation is below 4KB, it is atomic: that would be perfectly fine for me, as the status info can fit well in such dimension. Can I make this assumption?
A read or write is atomic under 4Kbytes only for pipes, not for disk files (for which the atomic granularity may be the file system block size, usually 512 bytes).
In practice you could avoid bothering about such issues (assuming your status file is e.g. less than 512 bytes), and I believe that if the writer is opening and writing quickly that file (in particular, if you avoid open(2)-ing a file and keeping the opened file handle for a long time -many seconds-, then write(2)-ing later -once, a small string- inside it), you don't need to bother.
If you are paranoid, but do assume that readers are (like grep) opening a file and reading it quickly, you could write to a temporary file and rename(2)-ing it when written (and close(2)-ed) in totality.
As Duck suggested, locking the file in both readers and writers is also a solution.
I may be mistaken, in which case someone will correct me, but I don't think the external readers are going to pay any attention to whether the file is being simultaneously updated. They are are going to print (or possibly eof or error out) whatever is there.
In any case, why not avoid the whole mess and just use file locks. Have the writer flock (or similar) and the readers check the lock. If they get the lock they know they are ok to read.
Shred documentation says shred is "not guaranteed to be effective" (See bottom). So if I shred a document on my Ext3 filesystem or on a Raid, what happens? Do I shred part of the file? Does it sometimes shred the whole thing and sometimes not? Can it shred other stuff? Does it only shred the file header?
CAUTION: Note that shred relies on a very important assumption:
that the file system overwrites data in place. This is the
traditional way to do things, but many modern file system designs
do not satisfy this assumption. The following are examples of file
systems on which shred is not effective, or is not guaranteed to be
effective in all file sys‐ tem modes:
log-structured or journaled file systems, such as those supplied with AIX and Solaris (and JFS, ReiserFS, XFS, Ext3, etc.)
file systems that write redundant data and carry on even if some writes fail, such as RAID-based file systems
file systems that make snapshots, such as Network Appliance’s NFS server
file systems that cache in temporary locations, such as NFS version 3 clients
compressed file systems
In the case of ext3 file systems, the above disclaimer applies
(and shred is thus of limited effectiveness) only in data=journal
mode, which journals file data in addition to just metadata. In
both the data=ordered (default) and data=writeback modes, shred
works as usual. Ext3 journaling modes can be changed by adding
the data=something option to the mount options for a
particular file system in the /etc/fstab file, as documented in the
mount man page (man mount).
All shred does is overwrite, flush, check success, and repeat. It does absolutely nothing to find out whether overwriting a file actually results in the blocks which contained the original data being overwritten. This is because without knowing non-standard things about the underlying filesystem, it can't.
So, journaling filesystems won't overwrite the original blocks in place, because that would stop them recovering cleanly from errors where the change is half-written. If data is journaled, then each pass of shred might be written to a new location on disk, in which case nothing is shredded.
RAID filesystems (depending on the RAID mode) might not overwrite all of the copies of the original blocks. If there's redundancy, you might shred one disk but not the other(s), or you might find that different passes have affected different disks such that each disk is partly shredded.
On any filesystem, the disk hardware itself might just so happen to detect an error (or, in the case of flash, apply wear-leveling even without an error) and remap the logical block to a different physical block, such that the original is marked faulty (or unused) but never overwritten.
Compressed filesystems might not overwrite the original blocks, because the data with which shred overwrites is either random or extremely compressible on each pass, and either one might cause the file to radically change its compressed size and hence be relocated. NTFS stores small files in the MFT, and when shred rounds up the filesize to a multiple of one block, its first "overwrite" will typically cause the file to be relocated out to a new location, which will then be pointlessly shredded leaving the little MFT slot untouched.
Shred can't detect any of these conditions (unless you have a special implementation which directly addresses your fs and block driver - I don't know whether any such things actually exist). That's why it's more reliable when used on a whole disk than on a filesystem.
Shred never shreds "other stuff" in the sense of other files. In some of the cases above it shreds previously-unallocated blocks instead of the blocks which contain your data. It also doesn't shred any metadata in the filesystem (which I guess is what you mean by "file header"). The -u option does attempt to overwrite the file name, by renaming to a new name of the same length and then shortening that one character at a time down to 1 char, prior to deleting the file. You can see this in action if you specify -v too.
The other answers have already done a good job of explaining why shred may not be able to do its job properly.
This can be summarised as:
shred only works on partitions, not individual files
As explained in the other answers, if you shred a single file:
there is no guarantee the actual data is really overwritten, because the filesystem may send writes to the same file to different locations on disk
there is no guarantee the fs did not create copies of the data elsewhere
the fs might even decide to "optimize away" your writes, because you are writing the same file repeatedly (syncing is supposed to prevent this, but again: no guarantee)
But even if you know that your filesystem does not do any of the nasty things above, you also have to consider that many applications will automatically create copies of file data:
crash recovery files which word processors, editors (such as vim) etc. will write periodically
thumbnail/preview files in file managers (sometimes even for non-imagefiles)
temporary files that many applications use
So, short of checking every single binary you use to work with your data, it might have been copied right, left & center without you knowing. The only realistic way is to always shred complete partitions (or disks).
The concern is that data might exist on more than one place on the disk. When the data exists in exactly one location, then shred can deterministically "erase" that information. However, file systems that journal or other advanced file systems may write your file's data in multiple locations, temporarily, on the disk. Shred -- after the fact -- has no way of knowing about this and has no way of knowing where the data may have been temporarily written to disk. Thus, it has no way of erasing or overwriting those disk sectors.
Imagine this: You write a file to disk on a journaled file system that journals not just metadata but also the file data. The file data is temporarily written to the journal, and then written to its final location. Now you use shred on the file. The final location where the data was written can be safely overwritten with shred. However, shred would have to have some way of guaranteeing that the sectors in the journal that temporarily contained your file's contents are also overwritten to be able to promise that your file is truly not recoverable. Imagine a file system where the journal is not even in a fixed location or of a fixed length.
If you are using shred, then you're trying to ensure that there is no possible way your data could be reconstructed. The authors of shred are being honest that there are some conditions beyond their control where they cannot make this guarantee.