Discrepancy between call to statvfs and df command - linux

When I use the statvfs command on a Linux machine to get the available free space on a mounted file system, the number I get is slightly different than what is reported by df.
For example, one on machine I have with a 500G hard drive, I get the following output from df:
# df --block-size=1 --no-sync
Filesystem 1B-blocks Used Available Use% Mounted on
/dev/md0 492256247808 3422584832 463828406272 1% /
tmpfs 2025721856 0 2025721856 0% /lib/init/rw
varrun 2025721856 114688 2025607168 1% /var/run
varlock 2025721856 4096 2025717760 1% /var/lock
udev 2025721856 147456 2025574400 1% /dev
tmpfs 2025721856 94208 2025627648 1% /dev/shm
A call to statvfs gives me a block size of 4096 and 119344155 free blocks, so that there should be 488,833,658,880 bytes free. Yet, df reports there are 463,828,406,272 bytes free. Why is there a discrepancy here?

Since your discrepancy is close to 5% [1], which is the default percentage allocated for root, there is a possibility that you compare the df result with the ->f_bfree of statvfs and
not with ->f_bavail, which is what df uses.
[1]: (488833658880 - 463828406272)/492256247808 = 0.0508

#include <stdio.h>
#include <sys/statvfs.h>
int main(){
struct statvfs stat;
int a=statvfs("/",&stat);
printf("total free disk space of the partition: %d GB \n",(stat.f_bavail)*8/2097152);
//512 is 2^9 - one half of a kilobyte.
//A kilobyte is 2^10. A megabyte is 2^20. A gigabyte is 2^30. A terabyte
//is 2^40. And so on. The common computer units go up by 10's of powers
//of 2 like that.
//So you need to divide by 2^(30-9) == 2^21 == 2097152 to get gigabytes.
//And multiply by 8 because 1 byte=8bit
return 0;
}
I do in this form because I prefer the outcome in Gb, but you can modify the units changing the exponent.And is true the first answer, as you can see, I use that too

Note that under Linux, df uses stat on the device file, not statvfs; cf the coreutils source.
However, the basic principle above applies. As to why df is so much faster and whether there are any shortcuts available for du....
The filesystem entry for any specific folder contains information about that folder only: how much space is allocated for the folder itself, and how space is allocated for the file system entries for the files and folders in that folder - it does not contain the total space occupied by that folder and all of its subfolders.
To get that information, du has to list all of the folders in the original folder, all of their folders, and so on, totalling as it goes.
So du will return very quickly for a folder without subfolders, and ever more slowly for folders with increasing numbers of subfolders.
Contrast that with df, with a call to stat(3) against a device file, or with a call to statfs(2) or statvfs(3) against any file on a device, etc., all of which return information about the specific device/filesystem immediately.
du can only rival the speed of df in the case of being called against a single file, where both du and df are making a single system call and doing very little math.

Related

Ext4 "unused inodes" "free inodes" diffrence?

When I use the dumpe2fs command to look at the Block Group of the ext4 filesystem, I see "free inodes" and "unused inodes".
I want to know the difference between them ?
Why do they have different values in Group 0 ?
Group 0: (Blocks 0-32767) [ITABLE_ZEROED]
Checksum 0xd1a1, unused inodes 0
Primary superblock at 0, Group descriptors at 1-3
Reserved GDT blocks at 4-350
Block bitmap at 351 (+351), Inode bitmap at 367 (+367)
Inode table at 383-892 (+383)
12 free blocks, 1 free inodes, 1088 directories
Free blocks: 9564, 12379-12380, 12401-12408, 12411
Free inodes: 168
Group 1: (Blocks 32768-65535) [ITABLE_ZEROED]
Checksum 0x0432, unused inodes 0
Backup superblock at 32768, Group descriptors at 32769-32771
Reserved GDT blocks at 32772-33118
Block bitmap at 352 (+4294934880), Inode bitmap at 368 (+4294934896)
Inode table at 893-1402 (+4294935421)
30 free blocks, 0 free inodes, 420 directories
Free blocks: 37379-37384, 37386-37397, 42822-42823, 42856-42859, 42954-42955, 44946-44947, 45014-45015
Free inodes:
The "unused inodes" reported are inodes at the end of the inode table for each group that have never been used in the lifetime of the filesystem, so e2fsck does not need to scan them during repair. This can speed up e2fsck pass-1 scanning significantly.
The "free inodes" are the current unallocated inodes in the group. This number includes the "unused inodes" number, so that they will still be used if there are many (typically very small) inodes allocated in a single group.
From:
https://unix.stackexchange.com/a/715165/536354

Intel NVMe drive Performance degradation with xfs filesystem with sector size other than 4096

I am working with NVMe card on linux(Ubuntu 14.04).
I am finding some performance degradation for Intel NVMe card when formatted with xfs file system with its default sector size(512). or any other sector size less than 4096.
In the experiment I formatted the card with xfs filesystem with default options. I tried running fio with 64k block size on an arm64 platform with 64k page size.
This is the command used
fio --rw=randread --bs=64k --ioengine=libaio --iodepth=8 --direct=1 --group_reporting --name=Write_64k_1 --numjobs=1 --runtime=120 --filename=new --size=20G
I could get only the below values
Run status group 0 (all jobs):
READ: io=20480MB, aggrb=281670KB/s, minb=281670KB/s, maxb=281670KB/s, mint=744454msec, maxt=74454msec
Disk stats (read/write):
nvme0n1: ios=326821/8, merge=0/0, ticks=582640/0, in_queue=582370, util=99.93%
I tried formatting as follows:
mkfs.xfs -f -s size=4096 /dev/nvme0n1
then the values were :
Run status group 0 (all jobs):
READ: io=20480MB, aggrb=781149KB/s, minb=781149KB/s, maxb=781149KB/s, mint=266
847msec, maxt=26847msec
Disk stats (read/write):
nvme0n1: ios=326748/7, merge=0/0, ticks=200270/0, in_queue=200350, util=99.51%
I find no performance degradation when used with
4k page size
Any fio block size lesser than 64k
With ext4 fs with default configs
What could be the issue? Is this any alignment issue? What Am I missing here? Any help appreciated
The issue is your SSD's native sector size is 4K. So your file system's block size should be set to match so that reads and writes are aligned on sector boundaries. Otherwise you'll have blocks that span 2 sectors, and therefore require 2 sector reads to return 1 block (instead of 1 read).
If you have an Intel SSD, the newer ones have a variable sector size you can set using their Intel Solid State Drive DataCenter Tool. But honestly 4096 is still probably the drive's true sector size anyway and you'll get the most consistent performance using it and setting your file system to match.
On ZFS on Linux the setting is ashift=12 for 4K blocks.

How to get rid of "Some devices missing" in BTRFS after reuse of devices?

I have been playing around with BTRFS on a few drives I had lying around. At first I created BTRFS using the entire drive, but eventually I decided I wanted to use GPT partitions on the drives and recreated the filesystem I needed on the partitions that resulted. (This was so I could use a portion of each drive as Linux swap space, FYI.)
When I got this all done, BTRFS worked a treat. But I have annoying messages saying that I have some old filesystems from my previous experimentation that I have actually nuked. I worry this meant that BTRFS was confused about what space on the drives was available, or that some sort of corruption might occur.
The messages look like this:
$ sudo btrfs file show
Label: 'x' uuid: 06fa59c9-f7f6-4b73-81a4-943329516aee
Total devices 3 FS bytes used 159.20GB
devid 3 size 931.00GB used 134.01GB path /dev/sde
*** Some devices missing
Label: 'root' uuid: 5f63d01d-3fde-455c-bc1c-1b9946e9aad0
Total devices 4 FS bytes used 1.13GB
devid 4 size 931.51GB used 1.03GB path /dev/sdd
devid 3 size 931.51GB used 2.00GB path /dev/sdc
devid 2 size 931.51GB used 1.03GB path /dev/sdb
*** Some devices missing
Label: 'root' uuid: e86ff074-d4ac-4508-b287-4099400d0fcf
Total devices 5 FS bytes used 740.93GB
devid 4 size 911.00GB used 293.03GB path /dev/sdd1
devid 5 size 931.51GB used 314.00GB path /dev/sde1
devid 3 size 911.00GB used 293.00GB path /dev/sdc1
devid 2 size 911.00GB used 293.03GB path /dev/sdb1
devid 1 size 911.00GB used 293.00GB path /dev/sda1
As you can see, I have an old filesystem labeled 'x' and an old one labeled 'root', and both of these have "Some devices missing". The real filesystem, the last one shown, is the one that I am now using.
So how do I clean up the old "Some devices missing" filesystems? I'm a little worried, but mostly just OCD and wanting to tidy up this messy output.
Thanks.
To wipe from disks that are NOT part of your wanted BTRFS FS, I found:
How to clean up old superblock ?
...
To actually remove the filesystem use:
wipefs -o 0x10040 /dev/sda
8 bytes [5f 42 48 52 66 53 5f 4d] erased at offset 0x10040 (btrfs)"
from: https://btrfs.wiki.kernel.org/index.php/Problem_FAQ#I_can.27t_mount_my_filesystem.2C_and_I_get_a_kernel_oops.21
I actually figured this out for myself. Maybe it will help someone else.
I poked around in the code to see what was going on. When the btrfs filesystem show command is used to show all filesystems on all devices, it scans every device and partition in /proc/partitions. Each device and each partition is examined to see if there is a BTRFS "magic number" and associated valid root data structure found at 0x10040 offset from the beginning of the device or partition.
I then used hexedit on a disk that was showing up wrong in my own situation and sure enough there was a BTRFS magic number (which is the ASCII string _BHRfS_M) there from my previous experiments.
I simply nailed that magic number by overwriting a couple of the characters of the string with "**", also using hexedit, and the erroneous entries magically disappeared!

why is the output of `du` often so different from `du -b`

why is the output of du often so different from du -b? -b is shorthand for --apparent-size --block-size=1. only using --apparent-size gives me the same result most of the time, but --block-size=1 seems to do the trick. i wonder if the output is then correct even, and which numbers are the ones i want? (i.e. actual filesize, if copied to another storage device)
Apparent size is the number of bytes your applications think are in the file. It's the amount of data that would be transferred over the network (not counting protocol headers) if you decided to send the file over FTP or HTTP. It's also the result of cat theFile | wc -c, and the amount of address space that the file would take up if you loaded the whole thing using mmap.
Disk usage is the amount of space that can't be used for something else because your file is occupying that space.
In most cases, the apparent size is smaller than the disk usage because the disk usage counts the full size of the last (partial) block of the file, and apparent size only counts the data that's in that last block. However, apparent size is larger when you have a sparse file (sparse files are created when you seek somewhere past the end of the file, and then write something there -- the OS doesn't bother to create lots of blocks filled with zeros -- it only creates a block for the part of the file you decided to write to).
Minimal block granularity example
Let's play a bit to see what is going on.
mount tells me I'm on an ext4 partition mounted at /.
I find its block size with:
stat -fc %s .
which gives:
4096
Now let's create some files with sizes 1 4095 4096 4097:
#!/usr/bin/env bash
for size in 1 4095 4096 4097; do
dd if=/dev/zero of=f bs=1 count="${size}" status=none
echo "size ${size}"
echo "real $(du --block-size=1 f)"
echo "apparent $(du --block-size=1 --apparent-size f)"
echo
done
and the results are:
size 1
real 4096 f
apparent 1 f
size 4095
real 4096 f
apparent 4095 f
size 4096
real 4096 f
apparent 4096 f
size 4097
real 8192 f
apparent 4097 f
So we see that anything below or equal to 4096 takes up 4096 bytes in fact.
Then, as soon as we cross 4097, it goes up to 8192 which is 2 * 4096.
It is clear then that the disk always stores data at a block boundary of 4096 bytes.
What happens to sparse files?
I haven't investigated what is the exact representation is, but it is clear that --apparent does take it into consideration.
This can lead to apparent sizes being larger than actual disk usage.
For example:
dd seek=1G if=/dev/zero of=f bs=1 count=1 status=none
du --block-size=1 f
du --block-size=1 --apparent f
gives:
8192 f
1073741825 f
Related: How to test if sparse file is supported
What to do if I want to store a bunch of small files?
Some possibilities are:
use a database instead of filesystem: Database vs File system storage
use a filesystem that supports block suballocation
Bibliography:
https://serverfault.com/questions/565966/which-block-sizes-for-millions-of-small-files
https://askubuntu.com/questions/641900/how-file-system-block-size-works
Tested in Ubuntu 16.04.
Compare (for example) du -bm to du -m.
The -b sets --apparent-size --block-size=1,
but then the m overrides the block-size to be 1M.
Similar for -bh versus -h:
the -bh means --apparent-size --block-size=1 --human-readable, and again the h overrides that block-size.
Files and folders have their real size and the size on disk.
--apparent-size is file or folder real size
size on disk is the amount of bytes the file or folder takes on disk.
Same thing when using just du.
If you encounter that apparent-size is almost always several magnitudes higher than disk usage then it means that you have a lot of (`sparse') files of files with internal fragmentation or indirect blocks.
Because by default du gives disk usage, which is the same or larger than the file size. As said under --apparent-size
print apparent sizes, rather than disk usage; although the apparent size is usually smaller, it may be
larger due to holes in (`sparse') files, internal fragmentation, indirect blocks, and the like

Quickly create a large file on a Linux system

How can I quickly create a large file on a Linux (Red Hat Linux) system?
dd will do the job, but reading from /dev/zero and writing to the drive can take a long time when you need a file several hundreds of GBs in size for testing... If you need to do that repeatedly, the time really adds up.
I don't care about the contents of the file, I just want it to be created quickly. How can this be done?
Using a sparse file won't work for this. I need the file to be allocated disk space.
dd from the other answers is a good solution, but it is slow for this purpose. In Linux (and other POSIX systems), we have fallocate, which uses the desired space without having to actually writing to it, works with most modern disk based file systems, very fast:
For example:
fallocate -l 10G gentoo_root.img
This is a common question -- especially in today's environment of virtual environments. Unfortunately, the answer is not as straight-forward as one might assume.
dd is the obvious first choice, but dd is essentially a copy and that forces you to write every block of data (thus, initializing the file contents)... And that initialization is what takes up so much I/O time. (Want to make it take even longer? Use /dev/random instead of /dev/zero! Then you'll use CPU as well as I/O time!) In the end though, dd is a poor choice (though essentially the default used by the VM "create" GUIs). E.g:
dd if=/dev/zero of=./gentoo_root.img bs=4k iflag=fullblock,count_bytes count=10G
truncate is another choice -- and is likely the fastest... But that is because it creates a "sparse file". Essentially, a sparse file is a section of disk that has a lot of the same data, and the underlying filesystem "cheats" by not really storing all of the data, but just "pretending" that it's all there. Thus, when you use truncate to create a 20 GB drive for your VM, the filesystem doesn't actually allocate 20 GB, but it cheats and says that there are 20 GB of zeros there, even though as little as one track on the disk may actually (really) be in use. E.g.:
truncate -s 10G gentoo_root.img
fallocate is the final -- and best -- choice for use with VM disk allocation, because it essentially "reserves" (or "allocates" all of the space you're seeking, but it doesn't bother to write anything. So, when you use fallocate to create a 20 GB virtual drive space, you really do get a 20 GB file (not a "sparse file", and you won't have bothered to write anything to it -- which means virtually anything could be in there -- kind of like a brand new disk!) E.g.:
fallocate -l 10G gentoo_root.img
Linux & all filesystems
xfs_mkfile 10240m 10Gigfile
Linux & and some filesystems (ext4, xfs, btrfs and ocfs2)
fallocate -l 10G 10Gigfile
OS X, Solaris, SunOS and probably other UNIXes
mkfile 10240m 10Gigfile
HP-UX
prealloc 10Gigfile 10737418240
Explanation
Try mkfile <size> myfile as an alternative of dd. With the -n option the size is noted, but disk blocks aren't allocated until data is written to them. Without the -n option, the space is zero-filled, which means writing to the disk, which means taking time.
mkfile is derived from SunOS and is not available everywhere. Most Linux systems have xfs_mkfile which works exactly the same way, and not just on XFS file systems despite the name. It's included in xfsprogs (for Debian/Ubuntu) or similar named packages.
Most Linux systems also have fallocate, which only works on certain file systems (such as btrfs, ext4, ocfs2, and xfs), but is the fastest, as it allocates all the file space (creates non-holey files) but does not initialize any of it.
truncate -s 10M output.file
will create a 10 M file instantaneously (M stands for 10241024 bytes, MB stands for 10001000 - same with K, KB, G, GB...)
EDIT: as many have pointed out, this will not physically allocate the file on your device. With this you could actually create an arbitrary large file, regardless of the available space on the device, as it creates a "sparse" file.
For e.g. notice no HDD space is consumed with this command:
### BEFORE
$ df -h | grep lvm
/dev/mapper/lvm--raid0-lvm0
7.2T 6.6T 232G 97% /export/lvm-raid0
$ truncate -s 500M 500MB.file
### AFTER
$ df -h | grep lvm
/dev/mapper/lvm--raid0-lvm0
7.2T 6.6T 232G 97% /export/lvm-raid0
So, when doing this, you will be deferring physical allocation until the file is accessed. If you're mapping this file to memory, you may not have the expected performance.
But this is still a useful command to know. For e.g. when benchmarking transfers using files, the specified size of the file will still get moved.
$ rsync -aHAxvP --numeric-ids --delete --info=progress2 \
root#mulder.bub.lan:/export/lvm-raid0/500MB.file \
/export/raid1/
receiving incremental file list
500MB.file
524,288,000 100% 41.40MB/s 0:00:12 (xfr#1, to-chk=0/1)
sent 30 bytes received 524,352,082 bytes 38,840,897.19 bytes/sec
total size is 524,288,000 speedup is 1.00
Where seek is the size of the file you want in bytes - 1.
dd if=/dev/zero of=filename bs=1 count=1 seek=1048575
Examples where seek is the size of the file you want in bytes
#kilobytes
dd if=/dev/zero of=filename bs=1 count=0 seek=200K
#megabytes
dd if=/dev/zero of=filename bs=1 count=0 seek=200M
#gigabytes
dd if=/dev/zero of=filename bs=1 count=0 seek=200G
#terabytes
dd if=/dev/zero of=filename bs=1 count=0 seek=200T
From the dd manpage:
BLOCKS and BYTES may be followed by the following multiplicative suffixes: c=1, w=2, b=512, kB=1000, K=1024, MB=1000*1000, M=1024*1024, GB =1000*1000*1000, G=1024*1024*1024, and so on for T, P, E, Z, Y.
To make a 1 GB file:
dd if=/dev/zero of=filename bs=1G count=1
I don't know a whole lot about Linux, but here's the C Code I wrote to fake huge files on DC Share many years ago.
#include < stdio.h >
#include < stdlib.h >
int main() {
int i;
FILE *fp;
fp=fopen("bigfakefile.txt","w");
for(i=0;i<(1024*1024);i++) {
fseek(fp,(1024*1024),SEEK_CUR);
fprintf(fp,"C");
}
}
You can use "yes" command also. The syntax is fairly simple:
#yes >> myfile
Press "Ctrl + C" to stop this, else it will eat up all your space available.
To clean this file run:
#>myfile
will clean this file.
I don't think you're going to get much faster than dd. The bottleneck is the disk; writing hundreds of GB of data to it is going to take a long time no matter how you do it.
But here's a possibility that might work for your application. If you don't care about the contents of the file, how about creating a "virtual" file whose contents are the dynamic output of a program? Instead of open()ing the file, use popen() to open a pipe to an external program. The external program generates data whenever it's needed. Once the pipe is open, it acts just like a regular file in that the program that opened the pipe can fseek(), rewind(), etc. You'll need to use pclose() instead of close() when you're done with the pipe.
If your application needs the file to be a certain size, it will be up to the external program to keep track of where in the "file" it is and send an eof when the "end" has been reached.
One approach: if you can guarantee unrelated applications won't use the files in a conflicting manner, just create a pool of files of varying sizes in a specific directory, then create links to them when needed.
For example, have a pool of files called:
/home/bigfiles/512M-A
/home/bigfiles/512M-B
/home/bigfiles/1024M-A
/home/bigfiles/1024M-B
Then, if you have an application that needs a 1G file called /home/oracle/logfile, execute a "ln /home/bigfiles/1024M-A /home/oracle/logfile".
If it's on a separate filesystem, you will have to use a symbolic link.
The A/B/etc files can be used to ensure there's no conflicting use between unrelated applications.
The link operation is about as fast as you can get.
The GPL mkfile is just a (ba)sh script wrapper around dd; BSD's mkfile just memsets a buffer with non-zero and writes it repeatedly. I would not expect the former to out-perform dd. The latter might edge out dd if=/dev/zero slightly since it omits the reads, but anything that does significantly better is probably just creating a sparse file.
Absent a system call that actually allocates space for a file without writing data (and Linux and BSD lack this, probably Solaris as well) you might get a small improvement in performance by using ftrunc(2)/truncate(1) to extend the file to the desired size, mmap the file into memory, then write non-zero data to the first bytes of every disk block (use fgetconf to find the disk block size).
This is the fastest I could do (which is not fast) with the following constraints:
The goal of the large file is to fill a disk, so can't be compressible.
Using ext3 filesystem. (fallocate not available)
This is the gist of it...
// include stdlib.h, stdio.h, and stdint.h
int32_t buf[256]; // Block size.
for (int i = 0; i < 256; ++i)
{
buf[i] = rand(); // random to be non-compressible.
}
FILE* file = fopen("/file/on/your/system", "wb");
int blocksToWrite = 1024 * 1024; // 1 GB
for (int i = 0; i < blocksToWrite; ++i)
{
fwrite(buf, sizeof(int32_t), 256, file);
}
In our case this is for an embedded linux system and this works well enough, but would prefer something faster.
FYI the command dd if=/dev/urandom of=outputfile bs=1024 count = XX was so slow as to be unusable.
Shameless plug: OTFFS provides a file system providing arbitrarily large (well, almost. Exabytes is the current limit) files of generated content. It is Linux-only, plain C, and in early alpha.
See https://github.com/s5k6/otffs.
So I wanted to create a large file with repeated ascii strings. "Why?" you may ask. Because I need to use it for some NFS troubleshooting I'm doing. I need the file to be compressible because I'm sharing a tcpdump of a file copy with the vendor of our NAS. I had originally created a 1g file filled with random data from /dev/urandom, but of course since it's random, it means it won't compress at all and I need to send the full 1g of data to the vendor, which is difficult.
So I created a file with all the printable ascii characters, repeated over and over, to a limit of 1g in size. I was worried it would take a long time. It actually went amazingly quickly, IMHO:
cd /dev/shm
date
time yes $(for ((i=32;i<127;i++)) do printf "\\$(printf %03o "$i")"; done) | head -c 1073741824 > ascii1g_file.txt
date
Wed Apr 20 12:30:13 CDT 2022
real 0m0.773s
user 0m0.060s
sys 0m1.195s
Wed Apr 20 12:30:14 CDT 2022
Copying it from an nfs partition to /dev/shm took just as long as with the random file (which one would expect, I know, but I wanted to be sure):
cp ascii1gfile.txt /home/greygnome/
uptime; free -m; sync; echo 1 > /proc/sys/vm/drop_caches; free -m; date; dd if=/home/greygnome/ascii1gfile.txt of=/dev/shm/outfile bs=16384 2>&1; date; rm -f /dev/shm/outfile
But while doing that I ran a simultaneous tcpdump:
tcpdump -i em1 -w /dev/shm/dump.pcap
I was able to compress the pcap file down to 12M in size! Awesomesauce!
Edit: Before you ding me because the OP said, "I don't care about the contents," know that I posted this answer because it's one of the first replies to "how to create a large file linux" in a Google search. And sometimes, disregarding the contents of a file can have unforeseen side effects.
Edit 2: And fallocate seems to be unavailable on a number of filesystems, and creating a 1GB compressible file in 1.2s seems pretty decent to me (aka, "quickly").
You could use https://github.com/flew-software/trash-dump
you can create file that is any size and with random data
heres a command you can run after installing trash-dump (creates a 1GB file)
$ trash-dump --filename="huge" --seed=1232 --noBytes=1000000000
BTW I created it

Resources