How can I create a corrupt file with specified file size? - linux

By corrupt I mean make a empty file or take an actual file and corrupting it so it becomes unreadable.

dd if=/dev/urandom of=somefile bs=somesize count=1
See the man page for details on the size.

Burn the file onto a cd/dvd then lightly scratch the disk

Files can contain any arbitrary pattern of bytes, so there is no (digital) way to create a file that is "corrupted". You can certainly modify a file in an existing format (for example, an XML file) so that it no longer is valid in whatever format it's supposed to be, but it's still just a file on disk and is perfectly readable.
You generally need to physically alter the storage medium on which the file is stored, in order to make the file actually unreadable.

My guess is that your best way is to mount a FUSE filesystem which is modified to return errors for specific files. As FUSE really is a filesystem driver (albeit in userspace), it can throw back whatever error code you want.

Related

Syncing a file system that has no file on it

Say I want to synchronize data buffers of a file system to disk (in my case the one of an USB stick partition) on a linux box.
While searching for a function to do that I found the following
DESCRIPTION
sync() causes all buffered modifications to file metadata and
data to be written to the underlying file sys‐
tems.
syncfs(int fd) is like sync(), but synchronizes just the file system
containing file referred to by the open file
descriptor fd.
But what if the file system has no file on it that I can open and pass to syncfs? Can I "abuse" the dot file? Does it appear on all file systems?
Is there another function that does what I want? Perhaps by providing a device file with major / minor numbers or some such?
Yes I think you can do that. The root directory of your file system will have at least one inode for your root directory. You can use the .-file to do that. Play also around with ls -i to see the inode numbers.
Is there a possibility to avoid your problem by mounting your file system with sync? Does performance issues hamper? Did you have a look at remounting? This can sync your file system as well in particular cases.
I do not know what your application is, but I suffered problems with synchronization of files to a USB stick with the FAT32-file system. It resulted in weird read and write errors. I can not imagine any other valid reason why you should sync an empty file system.
From man 8 sync description:
"sync writes any data buffered in memory out to disk. This can include (but is not
limited to) modified superblocks, modified inodes, and delayed reads and writes. This
must be implemented by the kernel; The sync program does nothing but exercise the sync(2)
system call."
So, note that it's all about modification (modified inode, superblocks etc). If you don't have any modification, it don't have anything to sync up.

How to estimate a file size from header's sector start address?

Suppose I have a deleted file in my unallocated space on a linux partition and i want to retrieve it.
Suppose I can get the start address of the file by examining the header.
Is there a way by which I can estimate the number of blocks to be analyzed hence (this depends on the size of the image.)
In general, Linux/Unix does not support recovering deleted files - if it is deleted, it should be gone. This is also good for security - one user should not be able to recover data in a file that was deleted by another user by creating huge empty file spanning almost all free space.
Some filesystems even support so called secure delete - that is, they can automatically wipe file blocks on delete (but this is not common).
You can try to write a utility which will open whole partition that your filesystem is mounted on (say, /dev/sda2) as one huge file and will read it and scan for remnants of your original data, but if file was fragmented (which is highly likely), chances are very small that you will be able to recover much of the data in some usable form.
Having said all that, there are some utilities which are trying to be a bit smarter than simple scan and can try to be undelete your files on Linux, like extundelete. It may work for you, but success is never guaranteed. Of course, you must be root to be able to use it.
And finally, if you want to be able to recover anything from that filesystem, you should unmount it right now, and take a backup of it using dd or pipe dd compressed through gzip to save space required.

I/O Performance in Linux

File A in a directory which have 10000 files, and file B in a directory which have 10 files, Would read/write file A slower than file B?
Would it be affected by different journaling file system?
No.
Browsing the directory and opening a file will be slower (whether or not that's noticeable in practice depends on the filesystem). Input/output on the file is exactly the same.
EDIT:
To clarify, the "file" in the directory is not really the file, but a link ("hard link", as opposed to symbolic link), which is merely a kind of name with some metadata, but otherwise unrelated to what you'd consider "the file". That's also the historical reason why deleting a file is done via the unlink syscall, not via a hypothetical deletefile call. unlink removes the link, and if that was the last link (but only then!), the file.
It is perfectly legal for one file to have a hundred links in different directories, and it is perfectly legal to open a file and then move it to a different place or even unlink it (while it remains open!). It does not affect your ability to read/write on the file descriptor in any way, even when a file (to your knowledge) does not even exist any more.
In general, once a file has been opened and you have a handle to it, the performance of accessing that file will be the same no matter how many other files are in the same directory. You may be able to detect a small difference in the time it takes to open the file, as the OS will have to search for the file name in the directory.
Journaling aims to reduce the recover time from file system crashes, IMHO, it will not affect the read/write speed of files. Journaling ext2

Creating a hard link to partial file contents in linux

I have fileA, which has a size of 200 MB. Now I want to create a hardlink to fileA, named fileB, but I only want this file point to the first 100 mb of fileA. So basically I need fileB to point to the same data blocks, but with a different length. It doesn't necessarily have to be a real hardlink it could be a virtual file proxying contents.
I was thinking about duplicating the Inode somehow and changing the length, but I presume this could cause filesystem coherency issues (when datablocks move around etc.). Is there any linux tool or user-level system call that could let me do this?
You cannot do this directly in the manner you are talking about. There are filesystems that sort of do this. For example LessFS can do this. I also believe that the underlying structure of btrfs supports this, so if someone put the hooks in, it would be accessible at user level.
But, there is no way to do this with any of the ext filesystems, and I believe that it happens implicitly in LessFS.
There is a really ugly way of doing it in btrfs. If you make a snapshot, then truncate the snapshot file to 100M, you have effectively achieved your goal.
I believe this would also work with btrfs and a sufficiently recent version of cp you could just copy the file with cp then truncate one copy. The version of cp would have to have the --reflink option, and if you want to be extra sure about this, give the --reflink=always option.
Adding to #Omnifarious's answer:
What you're describing is not a hard link. A hard links is essentially a reference to an inode, by a path name. (A soft link is a reference to a path name, by a path name.) There is no mechanism to say, "I want this inode, but slightly different, only the first k blocks". A copy-on-write filesystem could do this for you under the covers. If you were using such a filesystem then you would simply say
cp fileA fileB && truncate -s 200M fileB
Of course, this also works on a non-copy-on-write filesystem, but it takes up an extra 200 MB instead of just the filesystem overhead.
Now, that said, you could still implement something like this easily on Linux with FUSE. You could implement a filesystem that mirrors some target directory but simply artificially sets a maximum length to files (at say 200 MB).
FUSE
FUSE Hello World
Maybe you can check ChunkFS. I think this is what you need (I didn't try it).

Shred: Doesn't work on Journaled FS?

Shred documentation says shred is "not guaranteed to be effective" (See bottom). So if I shred a document on my Ext3 filesystem or on a Raid, what happens? Do I shred part of the file? Does it sometimes shred the whole thing and sometimes not? Can it shred other stuff? Does it only shred the file header?
CAUTION: Note that shred relies on a very important assumption:
that the file system overwrites data in place. This is the
traditional way to do things, but many modern file system designs
do not satisfy this assumption. The following are examples of file
systems on which shred is not effective, or is not guaranteed to be
effective in all file sys‐ tem modes:
log-structured or journaled file systems, such as those supplied with AIX and Solaris (and JFS, ReiserFS, XFS, Ext3, etc.)
file systems that write redundant data and carry on even if some writes fail, such as RAID-based file systems
file systems that make snapshots, such as Network Appliance’s NFS server
file systems that cache in temporary locations, such as NFS version 3 clients
compressed file systems
In the case of ext3 file systems, the above disclaimer applies
(and shred is thus of limited effectiveness) only in data=journal
mode, which journals file data in addition to just metadata. In
both the data=ordered (default) and data=writeback modes, shred
works as usual. Ext3 journaling modes can be changed by adding
the data=something option to the mount options for a
particular file system in the /etc/fstab file, as documented in the
mount man page (man mount).
All shred does is overwrite, flush, check success, and repeat. It does absolutely nothing to find out whether overwriting a file actually results in the blocks which contained the original data being overwritten. This is because without knowing non-standard things about the underlying filesystem, it can't.
So, journaling filesystems won't overwrite the original blocks in place, because that would stop them recovering cleanly from errors where the change is half-written. If data is journaled, then each pass of shred might be written to a new location on disk, in which case nothing is shredded.
RAID filesystems (depending on the RAID mode) might not overwrite all of the copies of the original blocks. If there's redundancy, you might shred one disk but not the other(s), or you might find that different passes have affected different disks such that each disk is partly shredded.
On any filesystem, the disk hardware itself might just so happen to detect an error (or, in the case of flash, apply wear-leveling even without an error) and remap the logical block to a different physical block, such that the original is marked faulty (or unused) but never overwritten.
Compressed filesystems might not overwrite the original blocks, because the data with which shred overwrites is either random or extremely compressible on each pass, and either one might cause the file to radically change its compressed size and hence be relocated. NTFS stores small files in the MFT, and when shred rounds up the filesize to a multiple of one block, its first "overwrite" will typically cause the file to be relocated out to a new location, which will then be pointlessly shredded leaving the little MFT slot untouched.
Shred can't detect any of these conditions (unless you have a special implementation which directly addresses your fs and block driver - I don't know whether any such things actually exist). That's why it's more reliable when used on a whole disk than on a filesystem.
Shred never shreds "other stuff" in the sense of other files. In some of the cases above it shreds previously-unallocated blocks instead of the blocks which contain your data. It also doesn't shred any metadata in the filesystem (which I guess is what you mean by "file header"). The -u option does attempt to overwrite the file name, by renaming to a new name of the same length and then shortening that one character at a time down to 1 char, prior to deleting the file. You can see this in action if you specify -v too.
The other answers have already done a good job of explaining why shred may not be able to do its job properly.
This can be summarised as:
shred only works on partitions, not individual files
As explained in the other answers, if you shred a single file:
there is no guarantee the actual data is really overwritten, because the filesystem may send writes to the same file to different locations on disk
there is no guarantee the fs did not create copies of the data elsewhere
the fs might even decide to "optimize away" your writes, because you are writing the same file repeatedly (syncing is supposed to prevent this, but again: no guarantee)
But even if you know that your filesystem does not do any of the nasty things above, you also have to consider that many applications will automatically create copies of file data:
crash recovery files which word processors, editors (such as vim) etc. will write periodically
thumbnail/preview files in file managers (sometimes even for non-imagefiles)
temporary files that many applications use
So, short of checking every single binary you use to work with your data, it might have been copied right, left & center without you knowing. The only realistic way is to always shred complete partitions (or disks).
The concern is that data might exist on more than one place on the disk. When the data exists in exactly one location, then shred can deterministically "erase" that information. However, file systems that journal or other advanced file systems may write your file's data in multiple locations, temporarily, on the disk. Shred -- after the fact -- has no way of knowing about this and has no way of knowing where the data may have been temporarily written to disk. Thus, it has no way of erasing or overwriting those disk sectors.
Imagine this: You write a file to disk on a journaled file system that journals not just metadata but also the file data. The file data is temporarily written to the journal, and then written to its final location. Now you use shred on the file. The final location where the data was written can be safely overwritten with shred. However, shred would have to have some way of guaranteeing that the sectors in the journal that temporarily contained your file's contents are also overwritten to be able to promise that your file is truly not recoverable. Imagine a file system where the journal is not even in a fixed location or of a fixed length.
If you are using shred, then you're trying to ensure that there is no possible way your data could be reconstructed. The authors of shred are being honest that there are some conditions beyond their control where they cannot make this guarantee.

Resources