Create sparse file with alternate data and hole on ext3 and XFS - linux

I created 1 program to create sparse file which contains alternate empty blocks and data blocks.
For example block1=empty, block2=data, block3=empty .....
#define BLOCK_SIZE 4096
void *buf;
int main(int argc, char **argv)
{
buf=malloc(512);
memset(buf,"a",512);
int fd=0;
int i;
int sector_per_block=BLOCK_SIZE/512;
int block_count=4;
if(argc !=2 ){
printf("Wrong usage\n USAGE: program absolute_path_to_write\n");
_exit(-1);
}
fd=open(argv[1],O_RDWR | O_CREAT,0666);
if(fd <= 0){
printf("file open failed\n");
_exit(0);
}
while(block_count > 0){
lseek(fd,BLOCK_SIZE,SEEK_CUR);
block_count--;
for(i=0;i<sector_per_block;i++)
write(fd,buf,512);
block_count--;
}
close(fd);
return 0;
}
Suppose, I create a new_sparse_file using this above code.
When I run this program, on ext3 FS with block size 4KB, ls -lh shows size of new_sparse_file as 16KB, while du -h shows 8 kb, which, I think is correct.
On xfs, block size of 4kb, ls -lh shows 16KB but `du -h* shows 12kb.
Why are there different kinds of behavior?

This is a bug in XFS :
http://oss.sgi.com/archives/xfs/2011-06/msg00225.html

Sparse files is just an optimization of space, and any FS may not to create a sparse file to optimize file fragmentation and file access speed. So, you can't depend on how spare the file will be on some FS.
So, 12kb on XFS is correct too.

Related

Logical block number to address (Linux filesystem)

int block_count;
struct stat statBuf;
int block;
int fd = open("file.txt", O_RDONLY);
fstat(fd, &statBuf);
block_count = (statBuf.st_size + statBuf.st_blksize - 1) / statBuf.st_blksize;
int i,add;
for(i = 0; i < block_count; i++) {
block = i;
if (ioctl(fd, FIBMAP, &block)) {
perror("FIBMAP ioctl failed");
}
printf("%3d %10d\n", i, block);
add = block;
}
char buffer[255];
int fd2 = open("/dev/sda1", O_RDONLY);
lseek(fd2, add, SEEK_SET);
read(fd2, buffer, 20);
printf("%s%s\n","ss ",buffer);
Output:
0 5038060
1 5038061
2 5038062
3 5038063
4 5038064
5 5038065
ss
I am using the above code to get the logical block number of a file. Lets suppose I want to read the contents of the last block number, How would I do that?
Is there a way to get the address of a block from logical block number?
PS: I am using linux and filesystem is ext4
The above code actually is getting you the physical block number. The FIBMAP ioctl takes as input the logical block number, and returns the physical block number. You can then multiply that by the blocksize (which is 4k for most ext4 file systems, but you can get that by using the BLKBSZGET ioctl) to get the byte offset if you want to read that disk block using lseek and read.
Note that the more modern interface that people tend to use today is the FIEMAP interface. This doesn't require root, and returns the physical byte offset, plus a lot more data. For more information, please see:
https://www.kernel.org/doc/Documentation/filesystems/fiemap.txt
Or you can look at the source code for the filefrag command, which is part of e2fsprogs:
https://git.kernel.org/cgit/fs/ext2/e2fsprogs.git/tree/misc/filefrag.c

linux redirect 100GB stdout to file fails

I have this command that writes over 100GB of data to a file.
zfs send snap1 > file
Something appears to go wrong several hours into the process. E.g., if I run the job twice, the output is slightly different. If I try to process the file with
zfs receive snap2 < file
an error is reported after several hours.
For debugging purposes, I'm guessing that there's some low probability failure in the shell redirection. Has anyone else seen problems with redirecting massive amounts of data? Any suggestions about where to proceed?
Debugging this is tedious because small examples work, and running the large case takes over 3 hours each time.
Earlier I had tried pipes:
zfs send snap1| zfs receive snap2
However this always failed with much smaller examples, for which
zfs send snap1 > file; zfs receive snap2 < file
worked. (I posted a question about that, but got no useful responses.) This is another reason that I suspect the shell.
Thanks.
The probability that the failure is in the shell (or OS) is negligible compared to a bug in zfs or a problem in how you are using it.
It just takes some minutes to test your hypothesis: compile this stupid program:
#include<unistd.h>
#include<string.h>
#define BUF 1<<20
#define INPUT 56
int main(int argc, char* argv[]) {
char buf[BUF], rbuf[BUF], *a, *b;
int len, i;
memset(buf, INPUT, sizeof(buf));
if (argc == 1)
{
while ((len = read(0, rbuf, sizeof(rbuf))) > 0)
{
a = buf; b = rbuf;
for (i = 0; i < len; ++i)
{
if (*a != *b)
return 1;
++a; ++b;
}
}
}
else
{
while (write(1, buf, sizeof(buf)) > 0);
}
return 0;
}
then try mkfifo a; ./a.out w > a in a shell and pv < a | ./a.out in another one, see how long does it take to get any bit flip.
It should get in the TiB region relatively fast...

Linux file descriptor table and vmalloc

I see that the Linux kernel uses vmalloc to allocate memory for fdtable when it's bigger than a certain threshold. I would like to know when this happens and have some more clear information.
static void *alloc_fdmem(size_t size)
{
/*
* Very large allocations can stress page reclaim, so fall back to
* vmalloc() if the allocation size will be considered "large" by the VM.
*/
if (size <= (PAGE_SIZE << PAGE_ALLOC_COSTLY_ORDER)) {
void *data = kmalloc(size, GFP_KERNEL|__GFP_NOWARN);
if (data != NULL)
return data;
}
return vmalloc(size);
}
alloc_fdmem is called from alloc_fdtable and the last function is called from expand_fdtable
I wrote this code to print the size.
#include <stdio.h>
#define PAGE_ALLOC_COSTLY_ORDER 3
#define PAGE_SIZE 4096
int main(){
printf("\t%d\n", PAGE_SIZE << PAGE_ALLOC_COSTLY_ORDER);
}
Output
./printo
32768
So, how many files does it take for the kernel to switch to using vmalloc to allocate fdtable?
So PAGE_SIZE << PAGE_ALLOC_COSTLY_ORDER is 32768
This is called like:
data = alloc_fdmem(nr * sizeof(struct file *));
i.e. it's used to store struct file pointers.
If your pointers are 4 bytes, it happens when your have 32768/4 = 8192 open files, if your pointers are 8 bytes, it happens at 4096 open files.

Detect block size for quota in Linux

The limit placed on disk quota in Linux is counted in blocks. However, I found no reliable way to determine the block size. Tutorials I found refer to block size as 512 bytes, and sometimes as 1024 bytes.
I got confused reading a post on LinuxForum.org for what a block size really means. So I tried to find that meaning in the context of quota.
I found a "Determine the block size on hard disk filesystem for disk quota" tip on NixCraft, that suggested the command:
dumpe2fs /dev/sdXN | grep -i 'Block size'
or
blockdev --getbsz /dev/sdXN
But on my system those commands returned 4096, and when I checked the real quota block size on the same system, I got a block size of 1024 bytes.
Is there a scriptable way to determine the quota block size on a device, short of creating a known sized file, and checking it's quota usage?
The filesystem blocksize and the quota blocksize are potentially different. The quota blocksize is given by the BLOCK_SIZE macro defined in <sys/mount.h> (/usr/include/sys/mount.h):
#ifndef _SYS_MOUNT_H
#define _SYS_MOUNT_H 1
#include <features.h>
#include <sys/ioctl.h>
#define BLOCK_SIZE 1024
#define BLOCK_SIZE_BITS 10
...
The filesystem blocksize for a given filesystem is returned by the statvfs call:
#include <stdio.h>
#include <sys/statvfs.h>
int main(int argc, char *argv[])
{
char *fn;
struct statvfs vfs;
if (argc > 1)
fn = argv[1];
else
fn = argv[0];
if (statvfs(fn, &vfs))
{
perror("statvfs");
return 1;
}
printf("(%s) bsize: %lu\n", fn, vfs.f_bsize);
return 0;
}
The <sys/quota.h> header includes a convenience macro to convert filesystem blocks to disk quota blocks:
/*
* Convert count of filesystem blocks to diskquota blocks, meant
* for filesystems where i_blksize != BLOCK_SIZE
*/
#define fs_to_dq_blocks(num, blksize) (((num) * (blksize)) / BLOCK_SIZE)

Tools to reduce risk regarding password security and HDD slack space

Down at the bottom of this essay is a comment about a spooky way to beat passwords. Scan the entire HDD of a user including dead space, swap space etc, and just try everything that looks like it might be a password.
The question: part 1, are there any tools around (A live CD for instance) that will scan an unmounted file system and zero everything that can be? (Note I'm not trying to find passwords)
This would include:
Slack space that is not part of any file
Unused parts of the last block used by a file
Swap space
Hibernation files
Dead space inside of some types of binary files (like .DOC)
The tool (aside from the last case) would not modify anything that can be detected via the file system API. I'm not looking for a block device find/replace but rather something that just scrubs everything that isn't part of a file.
part 2, How practical would such a program be? How hard would it be to write? How common is it for file formats to contain uninitialized data?
One (risky and costly) way to do this would be to use a file system aware backup tool (one that only copies the actual data) to back up the whole disk, wipe it clean and then restore it.
I don't understand your first question (do you want to modify the file system? Why? Isn't this dead space exactly where you want to look?)
Anyway, here's an example of such a tool:
#include <stdio.h>
#include <alloca.h>
#include <string.h>
#include <ctype.h>
/* Number of bytes we read at once, >2*maxlen */
#define BUFSIZE (1024*1024)
/* Replace this with a function that tests the passwort consisting of the first len bytes of pw */
int testPassword(const char* pw, int len) {
/*char* buf = alloca(len+1);
memcpy(buf, pw,len);
buf[len] = '\0';
printf("Testing %s\n", buf);*/
int rightLen = strlen("secret");
return len == rightLen && memcmp(pw, "secret", len) == 0;
}
int main(int argc, char* argv[]) {
int minlen = 5; /* We know the password is at least 5 characters long */
int maxlen = 7; /* ... and at most 7. Modify to find longer ones */
int avlen = 0; /* available length - The number of bytes we already tested and think could belong to a password */
int i;
char* curstart;
char* curp;
FILE* f;
size_t bytes_read;
char* buf = alloca(BUFSIZE+maxlen);
if (argc != 2) {
printf ("Usage: %s disk-file\n", argv[0]);
return 1;
}
f = fopen(argv[1], "rb");
if (f == NULL) {
printf("Couldn't open %s\n", argv[1]);
return 2;
}
for(;;) {
/* Copy the rest of the buffer to the front */
memcpy(buf, buf+BUFSIZE, maxlen);
bytes_read = fread(buf+maxlen, 1, BUFSIZE, f);
if (bytes_read == 0) {
/* Read the whole file */
break;
}
for (curstart = buf;curstart < buf+bytes_read;) {
for (curp = curstart+avlen;curp < curstart + maxlen;curp++) {
/* Let's assume the password just contains letters and digits. Use isprint() otherwise. */
if (!isalnum(*curp)) {
curstart = curp + 1;
break;
}
}
avlen = curp - curstart;
if (avlen < minlen) {
/* Nothing to test here, move along */
curstart = curp+1;
avlen = 0;
continue;
}
for (i = minlen;i <= avlen;i++) {
if (testPassword(curstart, i)) {
char* found = alloca(i+1);
memcpy(found, curstart, i);
found[i] = '\0';
printf("Found password: %s\n", found);
}
}
avlen--;
curstart++;
}
}
fclose(f);
return 0;
}
Installation:
Start a Linux Live CD
Copy the program to the file hddpass.c in your home directory
Open a terminal and type the following
su || sudo -s # Makes you root so that you can access the HDD
apt-get install -y gcc # Install gcc
This works only on Debian/Ubuntu et al, check your system documentation for others
gcc -o hddpass hddpass.c # Compile.
./hddpass /dev/YOURDISK # The disk is usually sda, hda on older systems
Look at the output
Test (copy to console, as root):
gcc -o hddpass hddpass.c
</dev/zero head -c 10000000 >testdisk # Create an empty 10MB file
mkfs.ext2 -F testdisk # Create a file system
rm -rf mountpoint; mkdir -p mountpoint
mount -o loop testdisk mountpoint # needs root rights
</dev/urandom head -c 5000000 >mountpoint/f # Write stuff to the disk
echo asddsasecretads >> mountpoint/f # Write password in our pagefile
# On some file systems, you could even remove the file.
umount testdisk
./hdpass testdisk # prints secret
Test it yourself on an Ubuntu Live CD:
# Start a console and type:
wget http://phihag.de/2009/so/hddpass-testscript.sh
sh hddpass-testscript.sh
Therefore, it's relatively easy. As I found out myself, ext2 (the file system I used) overwrites deleted files. However, I'm pretty sure some file systems don't. Same goes for the pagefile.
How common is it for file formats to contain uninitialized data?
Less and less common, I would've thought. The classic "offender" is older versions of MS office applications that (essentially) did a memory dump to disk as its "quicksave" format. No serialisation, no selection of what to dump and a memory allocator that doesn't zero newly allocated memory pages. That lead to not only juicy things from previous versions of the document (so the user could use undo), but also juicy snippets from other applications.
How hard would it be to write?
Something that clears out unallocated disk blocks shouldn't be that hard. It'd need to run either off-line or as a kernel module, so as to not interfer with normal file-system operations, but most file systems have an "allocated"/"not allocated" structure that is fairly straight-forward to parse. Swap is harder, but as long as you're OK with having it cleared on boot (or shutdown), it's not too tricky. Clearing out the tail block is trickier, definitely not something I'd want to try to do on-line, but it shouldn't be TOO hard to make it work for off-line cleaning.
How practical would such a program be?
Depends on your threat model, really. I'd say that on one end, it'd not give you much at all, but on the other end, it's a definite help to keep information out of the wrong hands. But I can't give a hard and fast answer,
Well, if I was going to code it for a boot CD, I'd do something like this:
File is 101 bytes but takes up a 4096-byte cluster.
Copy the file "A" to "B" which has nulls added to the end.
Delete "A" and overwrite it's (now unused) cluster.
Create "A" again and use the contents of "B" without the tail (remember the length).
Delete "B" and overwrite it.
Not very efficient, and would need a tweak to make sure you don't try to copy the first (and therefor full) clusters in a file. Otherwise, you'll run into slowness and failure if there's not enough free space.
There's tools that do this efficiently that are open source?

Resources