Directory Stats command line interface? - linux

Windirstat/ Kdirstat/ Disk Inventory X has been nothing short of revolutionary in file managment. Why is there no text-only command line equivalent? I'd need it for SSH administration of my file servers.
We have all the building blocks: du, tree etc.
Is there one? Why not? Can someone please write one? :)
EDIT: du does ALMOST what I want. What I want is something that sorts each subdirectory by size (rather than full path) and indents so that it's easier to avoid double-counting. du would give me this:
cd a
du . -h
1G b
2G c
1K c/d
1K c/e
2G c/f
It's not immediately obvious that c and c/f are overlapping. What I want is this:
cd a
dir_stats .
1G b
2G c
|
+---- 2G f
|
+---- 1K d
|
+---- 1K e
in which it is clear that the 2G from f is because of the 2G from c. I can find all the info not related to c more easily (i.e. by just scanning the first column).

I'd recommend using ncdu, which stands for NCurses Disk Usage. Basically it's a collapsible version of du, with a basic command line user interface.
One thing worth noting is that it runs a bit slower than du on large amounts of data, so I'd recommend running it in a screen or using the command line options to first scan the directory and then view the results. Note the q option, it reduces the refresh rate from 1/10th of a second to 2 seconds, recommended for SSH connections.
Viewing total root space usage:
ncdu -xq /
Generate results file and view later:
ncdu -1xqo- / | gzip > export.gz
# ...some time later:
zcat export.gz | ncdu -f-

You can use KDirStat (or the new QDirStat) together with the perl script that comes along with either one to collect data on your server, then copy that file to your desktop machine and view it with KDirStat / QDirStat.
See also
https://github.com/shundhammer/qdirstat/tree/master/scripts
or
https://github.com/shundhammer/kdirstat/blob/master/kdirstat/kdirstat-cache-writer
The script does not seem to be included with the KDE 4 port K4DirStat, but it can still read and write the same cache files.
--
HuHa (Stefan Hundhammer - author of the original KDirStat)

As mentioned here: https://unix.stackexchange.com/questions/45828/print-size-of-directory-content-with-tree-command-in-tree-1-5
tree --du -h -L 2
is very much in the right spirit of my goal. The only problem is, I don't think it supports sorting so is not suitable for huge file system hierarchies :(

Don't bother trying to do disk space management with ascii art visializations. Du follows Unix's elegant philosophy in all respects and so gives you sorting etc for free.
Get comfortable with du and you'll have much more power in finding disk hogs remotely

Related

what is the most reliable command to find actual size of a file linux

Recently I tried to find out the size of a file using various command and it showed huge differences.
ls -ltr showed its size around 34GB (bytes rounded off by me ) while
du -sh filename showed it to be around 11GB. while
stat command showed the same to be around 34GB .
Any idea which is the most reliable command to find actual size of the file ?
There was some copy operation performed on it and we are unsure of if this was appropriately done as after a certain time source file from where copy was being performed was removed by a job of ours.
There is no inaccuracy or reliability issue here, you're just comparing two different numbers: logical size vs physical size.
Here's Wikipedia's illustration for sparse files:
ls shows the gray+green areas, the logical length of the file. du (without --apparent-size) shows only the green areas, since those are the ones that take up space.
You can create a sparse file with dd count=0 bs=1M seek=100 of=myfile.
ls shows 100MiB because that's how long the file is:
$ ls -lh myfile
-rw-r----- 1 me me 100M Jul 15 10:57 myfile
du shows 0, because that's how much data it's allocated:
$ du myfile
0 myfile
ls -l --block-size=M
will give you a long format listing (needed to actually see the file size) and round file sizes up to the nearest MiB.
If you want MB (10^6 bytes) rather than MiB (2^20 bytes) units, use --block-size=MB instead.
If you don't want the M suffix attached to the file size, you can use something like --block-size=1M. Thanks Stéphane Chazelas for suggesting this.
This is described in the man page for ls; man ls and search for SIZE. It allows for units other than MB/MiB as well, and from the looks of it (I didn't try that) arbitrary block sizes as well (so you could see the file size as number of 412-byte blocks, if you want to).
Note that the --block-size parameter is a GNU extension on top of the Open Group's ls, so this may not work if you don't have a GNU userland (which most Linux installations do). The ls from GNU coreutils 8.5 does support --block-size as described above.
There are several notions of file size, as explained in that other guiy's answer and the wikipage figure on sparse files.
However, you might want to use both ls(1) & stat(1) commands.
If coding in C, consider using stat(2) & lseek(2) syscalls.
See also the references in this answer.

ImageMagick: how to achieve low memory usage while resizing a large number of image files?

I would like to resize a large number (about 5200) of image files (PPM format, each 5 MB in size) and save them to PNG format using convert.
Short version:
convert blows up 24 GB of memory although I use the syntax that tells convert to process image files consecutively.
Long version:
Regarding more than 25 GB of image data, I figure I should not process all files simultaneously. I searched the ImageMagick documentation about how to process image files consecutively and I found:
It is faster and less resource intensive to resize each image it is
read:
$ convert '*.jpg[120x120]' thumbnail%03d.png
Also, the tutorial states:
For example instead of...
montage '*.tiff' -geometry 100x100+5+5 -frame 4 index.jpg
which reads all the tiff files in first, then resizes them. You can
instead do...
montage '*.tiff[100x100]' -geometry 100x100+5+5 -frame 4 index.jpg
This will read each image in, and resize them, before proceeding to
the next image. Resulting in far less memory usage, and possibly
prevent disk swapping (thrashing), when memory limits are reached.
Hence, this is what I am doing:
$ convert '*.ppm[1280x1280]' pngs/%05d.png
According to the docs, it should treat each image file one by one: read, resize, write. I am doing this on a machine with 12 real cores and 24 GB of RAM. However, during the first two minutes, the memory usage of the convert process grows to about 96 %. It stays there a while. CPU usage is at maximum. A bit longer and the process dies, just saying:
Killed
At this point, no output files have been produced. I am on Ubuntu 10.04 and convert --version says:
Version: ImageMagick 6.5.7-8 2012-08-17 Q16 http://www.imagemagick.org
Copyright: Copyright (C) 1999-2009 ImageMagick Studio LLC
Features: OpenMP
It looks like convert tries to read all data before starting the conversion. So either there is a bug in convert, an issue with the documentation or I did not read the documentation properly.
What is wrong? How can I achieve low memory usage while resizing this large number of image files?
BTW: a quick solution would be to just loop over the files using the shell and invoke convert for each file independently. But I'd like to understand how to achieve the same with pure ImageMagick.
Thanks!
Without having direct access to your system it's really hard to help you debugging this.
But you can do three things to help yourself narrowing down this problem:
Add -monitor as the first commandline argument to see more details about what's going on.
(Optionally) add -debug all -log "domain: %d +++ event: %e +++ function: %f +++ line: %l +++ module: %m +++ processID: %p +++ realCPUtime: %r +++ wallclocktime: %t +++ userCPUtime: %u \n\r"
Temporarily, don't use '*.ppm[1280x1280]' as an argument, but use 'a*.ppm[1280x1280]' instead. The purpose is to limit your wildcard expansion (or some other suitable way to achieve the same) to only a few matches, instead of all possible matches.
If you do '2.' you'll need to do '3.' as well otherwise you'll be overwhelmed by the mass of output. (Also your system does seem to not be able to process the full wildcard anyway without having to kill the process...)
If you do not find a solution, then...
...register a username at the official ImageMagick bug report forum.
...report your problem there to see if they can help you (these guys are rather friendly and responsive if you ask politely).
Got the same issue, it seems it's because ImageMagick create temporary files into the /tmp directory, which is often mounted as a tmpfs.
Just move your tmp somewhere else.
For example:
create a "tmp" directory on a big external drive
mkdir -m777 /media/huge_device/tmp
make sure the permissions are set to 777
chmod 777 /media/huge_device/tmp
as root, mount it in replacement to your /tmp
mount -o bind /media/huge_device/tmp /tmp
Note: It should be possible to use with the TMP environment variable to do the same trick.
I would go with GNU Parallel if you have 12 cores - something like this, which works very well. As it does only 12 images at a time, whilst still preserving your output file numbering, it only uses minimal RAM.
scene=0
for f in *.ppm; do
echo "$f" $scene
((scene++))
done | parallel -j 12 --colsep ' ' --eta convert {1}[1280x1280] -scene {2} pngs/%05d.png
Notes
-scene lets you set the scene counter, which comes out in your %05d part.
--eta predicts when your job will be done (Estimated Arrival Time).
-j 12 runs 12 jobs in parallel at a time.

Fastest Way To Calculate Directory Sizes

what is the best and fastest way to calculate directory sizes? For example we will have the following structure:
/users
/a
/b
/c
/...
We need the output to be per user directory:
a = 1224KB
b = 3533KB
c = 3324KB
...
We plan on having tens maybe even hundred of thousands of directories under /users. The following shell command works:
du -cms /users/a | grep total | awk '{print $1}'
But, we will have to call it N number of times. The entire point, is that the output; each users directory size will be stored in our database. Also, we would love to have it update as frequently as possible, but without blocking all the resources on the server. Is it even possible to have it calculate users directory size every minute? How about every 5 minutes?
Now that I am thinking about it some more, would it make sense to use node.js? That way, we can calculate directory sizes, and even insert into the database all in one transaction. We could do that as well in PHP and Python, but not sure it is as fast.
Thanks.
Why not just:
du -sm /users/*
(The slowest part is still likely to be du traversing the filesystem to calculate the size, though).
What do you need this information for? If it's only for reminding the users that their home directories are too big, you should add quota limits to the filesystem. You can set the quota to 1000 GB if you just want the numbers without really limiting disk usage.
The numbers are usually accurate whenever you access anything on the disk. The only downside is that they tell you how large the files are that are owned by a particular user, instead of how large the files below his home directory are. But maybe you can live with that.
I think what you are looking for is:
du -cm --max-depth=1 /users | awk '{user = substr($2,7,300);
> ans = user ": " $1;
> print ans}'
The magic numbers 7 is taking away the substring /users/, and 300 is just an arbitrary big number (awk is not one of my best languages =D, but I am guessing that part is not going to be written in awk anyways.) It's faster since you don't involve greping for the total and the loop is contained inside du. I bet it can be done faster, but this should be fast enough.
If you have multiple cores you can run the du command in parallel,
For example (running from the folder you want to examine):
>> parallel du -sm ::: *
>> ls -a | xargs -P4 du -sm
[The number after the -P argument sets the amount of cpus you want to use]
not that slow but will show you folders size: du -sh /* > total.size.files.txt
Fastest way for analyze storage using ncdu package:
sudo apt-get install ncdu
command example:
ncdu /your/directory/

Unix directory inodes - fragmentation, and dumping directory contents

We are having a problem on Linux with directory inodes getting large and slow to navigate over time, as many files are created and removed. For example:
% ls -ld foo
drwxr-xr-x 2 webuser webuser 1562624 Oct 26 18:25 foo
% time find foo -type f | wc -l
518
real 0m1.777s
user 0m0.000s
sys 0m0.010s
% cp -R foo foo.tmp
% ls -ld foo.tmp
drwxr-xr-x 2 webuser webuser 45056 Oct 26 18:25 foo.tmp
% time find foo.tmp -type f | wc -l
518
real 0m0.198s
user 0m0.000s
sys 0m0.010s
The original directory has 518 files, takes 1.5 MB to represent, and takes 1.7 seconds to traverse.
The rebuilt directory has the same number of files, takes 45K to represent and .2 seconds to traverse.
I'm wondering what would cause this. My guess is fragmentation - this is not supposed to be a problem with Unix file systems in general, but in this case we are using the directory for short-term cache files and are thus constantly creating, renaming and removing a large number of small files.
I'm also wondering if there's a way to dump the literal binary contents of the directory - that is, read the directory as if it were a file - which would perhaps give me insight into why it is so big. Neither read() nor sysread() from Perl will allow me to:
swartz> perl -Mautodie -MPOSIX -e 'sysopen(my $fh, "foo", O_RDONLY); my $len = sysread($fh, $buf, 1024);'
Can't sysread($fh, '', '1024'): Is a directory at -e line 1
System info:
Linux 2.6.18-128.el5PAE #1 SMP Wed Dec 17 12:02:33 EST 2008 i686 i686 i386 GNU/Linux
Thanks!
Jon
For question 1, external fragmentation normally causes an overhead of about 2x or so,1 plus you have internal fragmentation from allocation granularity. Neither of these comes close to explaining your observation.
So, I don't think it is normal steady-state fragmentation.
The most obvious speculation is that 1.5MB is the high-water mark; at one time it really did have either 1.5MB bytes of entries or 1.5MB/2 bytes of entries with expected fragmentation.
Another speculation is that the 50% rule is being defeated by a non-Markovian allocation. Imagine that I name files with "tmp%d", so, tmp1, tmp2, ... tmp1000, tmp1001, ...
The problem here is that rm tmp1 doesn't make room for tmp1001. This is obviously a wild guess.
Q2: There isn't a good way to read the raw directory. AFAIK, you would need to either hack the kernel or use debugfs to change the inode type, read it, then change it back, or use debugfs to read the inode, get the block numbers, then read the blocks. A functional debugging approach is probably more reasonable.
You can address the performance issue by making sure that indexing is enabled. See tune2fs.
1Knuth's fifty percent rule: in the steady state, 50% of ops are allocations, 50% are frees, 50% of free blocks merge, then holes are 50% of allocations, and 50% of the space is wasted. (Aka, 100% overhead.) This is considered "normal". Malloc has the same problem.
It happens because of fragmentation due to reiterated file creation and deletion. As the inode size increases, it never shrinks again, so it stays big even if mostly empty.
I think you have mainly two measures to confront the problem:
Build a subdirectories structure in order to prevent too many children under one single directory parent. For example, if you are creating files whose path has a format like dir/file-%06d, then you are causing it to have one million children with their expected huge directory inode. You would rather design some subtree structure decomposing the filenames into their variable prefixes, e.g., if your file is file-123456.ext, allocate them under something like dir/files/1/2/3/4/123456.ext. This strategy will limit the maximum amount of children to 1000 under the final directory leaf. The level of decompositon would depend on the size of the variable part of the filename.
As a countermeasure, once you already have huge directory inodes, there's little more to do other than to create a new (small-inode) sibling directory, to move all the original (.)files to the new directory, to delete the original directory and to rename the new directory to the original name. Beware of concurrently running services under the original path.
Some shell-fu involving find and stat --printf='%b' or %s on directories may help you to detect other troublesome spots into your filesystem, and put them under closer observation.
For specific filesystem details, look at this post in ServerFault.com

Quickly create a large file on a Linux system

How can I quickly create a large file on a Linux (Red Hat Linux) system?
dd will do the job, but reading from /dev/zero and writing to the drive can take a long time when you need a file several hundreds of GBs in size for testing... If you need to do that repeatedly, the time really adds up.
I don't care about the contents of the file, I just want it to be created quickly. How can this be done?
Using a sparse file won't work for this. I need the file to be allocated disk space.
dd from the other answers is a good solution, but it is slow for this purpose. In Linux (and other POSIX systems), we have fallocate, which uses the desired space without having to actually writing to it, works with most modern disk based file systems, very fast:
For example:
fallocate -l 10G gentoo_root.img
This is a common question -- especially in today's environment of virtual environments. Unfortunately, the answer is not as straight-forward as one might assume.
dd is the obvious first choice, but dd is essentially a copy and that forces you to write every block of data (thus, initializing the file contents)... And that initialization is what takes up so much I/O time. (Want to make it take even longer? Use /dev/random instead of /dev/zero! Then you'll use CPU as well as I/O time!) In the end though, dd is a poor choice (though essentially the default used by the VM "create" GUIs). E.g:
dd if=/dev/zero of=./gentoo_root.img bs=4k iflag=fullblock,count_bytes count=10G
truncate is another choice -- and is likely the fastest... But that is because it creates a "sparse file". Essentially, a sparse file is a section of disk that has a lot of the same data, and the underlying filesystem "cheats" by not really storing all of the data, but just "pretending" that it's all there. Thus, when you use truncate to create a 20 GB drive for your VM, the filesystem doesn't actually allocate 20 GB, but it cheats and says that there are 20 GB of zeros there, even though as little as one track on the disk may actually (really) be in use. E.g.:
truncate -s 10G gentoo_root.img
fallocate is the final -- and best -- choice for use with VM disk allocation, because it essentially "reserves" (or "allocates" all of the space you're seeking, but it doesn't bother to write anything. So, when you use fallocate to create a 20 GB virtual drive space, you really do get a 20 GB file (not a "sparse file", and you won't have bothered to write anything to it -- which means virtually anything could be in there -- kind of like a brand new disk!) E.g.:
fallocate -l 10G gentoo_root.img
Linux & all filesystems
xfs_mkfile 10240m 10Gigfile
Linux & and some filesystems (ext4, xfs, btrfs and ocfs2)
fallocate -l 10G 10Gigfile
OS X, Solaris, SunOS and probably other UNIXes
mkfile 10240m 10Gigfile
HP-UX
prealloc 10Gigfile 10737418240
Explanation
Try mkfile <size> myfile as an alternative of dd. With the -n option the size is noted, but disk blocks aren't allocated until data is written to them. Without the -n option, the space is zero-filled, which means writing to the disk, which means taking time.
mkfile is derived from SunOS and is not available everywhere. Most Linux systems have xfs_mkfile which works exactly the same way, and not just on XFS file systems despite the name. It's included in xfsprogs (for Debian/Ubuntu) or similar named packages.
Most Linux systems also have fallocate, which only works on certain file systems (such as btrfs, ext4, ocfs2, and xfs), but is the fastest, as it allocates all the file space (creates non-holey files) but does not initialize any of it.
truncate -s 10M output.file
will create a 10 M file instantaneously (M stands for 10241024 bytes, MB stands for 10001000 - same with K, KB, G, GB...)
EDIT: as many have pointed out, this will not physically allocate the file on your device. With this you could actually create an arbitrary large file, regardless of the available space on the device, as it creates a "sparse" file.
For e.g. notice no HDD space is consumed with this command:
### BEFORE
$ df -h | grep lvm
/dev/mapper/lvm--raid0-lvm0
7.2T 6.6T 232G 97% /export/lvm-raid0
$ truncate -s 500M 500MB.file
### AFTER
$ df -h | grep lvm
/dev/mapper/lvm--raid0-lvm0
7.2T 6.6T 232G 97% /export/lvm-raid0
So, when doing this, you will be deferring physical allocation until the file is accessed. If you're mapping this file to memory, you may not have the expected performance.
But this is still a useful command to know. For e.g. when benchmarking transfers using files, the specified size of the file will still get moved.
$ rsync -aHAxvP --numeric-ids --delete --info=progress2 \
root#mulder.bub.lan:/export/lvm-raid0/500MB.file \
/export/raid1/
receiving incremental file list
500MB.file
524,288,000 100% 41.40MB/s 0:00:12 (xfr#1, to-chk=0/1)
sent 30 bytes received 524,352,082 bytes 38,840,897.19 bytes/sec
total size is 524,288,000 speedup is 1.00
Where seek is the size of the file you want in bytes - 1.
dd if=/dev/zero of=filename bs=1 count=1 seek=1048575
Examples where seek is the size of the file you want in bytes
#kilobytes
dd if=/dev/zero of=filename bs=1 count=0 seek=200K
#megabytes
dd if=/dev/zero of=filename bs=1 count=0 seek=200M
#gigabytes
dd if=/dev/zero of=filename bs=1 count=0 seek=200G
#terabytes
dd if=/dev/zero of=filename bs=1 count=0 seek=200T
From the dd manpage:
BLOCKS and BYTES may be followed by the following multiplicative suffixes: c=1, w=2, b=512, kB=1000, K=1024, MB=1000*1000, M=1024*1024, GB =1000*1000*1000, G=1024*1024*1024, and so on for T, P, E, Z, Y.
To make a 1 GB file:
dd if=/dev/zero of=filename bs=1G count=1
I don't know a whole lot about Linux, but here's the C Code I wrote to fake huge files on DC Share many years ago.
#include < stdio.h >
#include < stdlib.h >
int main() {
int i;
FILE *fp;
fp=fopen("bigfakefile.txt","w");
for(i=0;i<(1024*1024);i++) {
fseek(fp,(1024*1024),SEEK_CUR);
fprintf(fp,"C");
}
}
You can use "yes" command also. The syntax is fairly simple:
#yes >> myfile
Press "Ctrl + C" to stop this, else it will eat up all your space available.
To clean this file run:
#>myfile
will clean this file.
I don't think you're going to get much faster than dd. The bottleneck is the disk; writing hundreds of GB of data to it is going to take a long time no matter how you do it.
But here's a possibility that might work for your application. If you don't care about the contents of the file, how about creating a "virtual" file whose contents are the dynamic output of a program? Instead of open()ing the file, use popen() to open a pipe to an external program. The external program generates data whenever it's needed. Once the pipe is open, it acts just like a regular file in that the program that opened the pipe can fseek(), rewind(), etc. You'll need to use pclose() instead of close() when you're done with the pipe.
If your application needs the file to be a certain size, it will be up to the external program to keep track of where in the "file" it is and send an eof when the "end" has been reached.
One approach: if you can guarantee unrelated applications won't use the files in a conflicting manner, just create a pool of files of varying sizes in a specific directory, then create links to them when needed.
For example, have a pool of files called:
/home/bigfiles/512M-A
/home/bigfiles/512M-B
/home/bigfiles/1024M-A
/home/bigfiles/1024M-B
Then, if you have an application that needs a 1G file called /home/oracle/logfile, execute a "ln /home/bigfiles/1024M-A /home/oracle/logfile".
If it's on a separate filesystem, you will have to use a symbolic link.
The A/B/etc files can be used to ensure there's no conflicting use between unrelated applications.
The link operation is about as fast as you can get.
The GPL mkfile is just a (ba)sh script wrapper around dd; BSD's mkfile just memsets a buffer with non-zero and writes it repeatedly. I would not expect the former to out-perform dd. The latter might edge out dd if=/dev/zero slightly since it omits the reads, but anything that does significantly better is probably just creating a sparse file.
Absent a system call that actually allocates space for a file without writing data (and Linux and BSD lack this, probably Solaris as well) you might get a small improvement in performance by using ftrunc(2)/truncate(1) to extend the file to the desired size, mmap the file into memory, then write non-zero data to the first bytes of every disk block (use fgetconf to find the disk block size).
This is the fastest I could do (which is not fast) with the following constraints:
The goal of the large file is to fill a disk, so can't be compressible.
Using ext3 filesystem. (fallocate not available)
This is the gist of it...
// include stdlib.h, stdio.h, and stdint.h
int32_t buf[256]; // Block size.
for (int i = 0; i < 256; ++i)
{
buf[i] = rand(); // random to be non-compressible.
}
FILE* file = fopen("/file/on/your/system", "wb");
int blocksToWrite = 1024 * 1024; // 1 GB
for (int i = 0; i < blocksToWrite; ++i)
{
fwrite(buf, sizeof(int32_t), 256, file);
}
In our case this is for an embedded linux system and this works well enough, but would prefer something faster.
FYI the command dd if=/dev/urandom of=outputfile bs=1024 count = XX was so slow as to be unusable.
Shameless plug: OTFFS provides a file system providing arbitrarily large (well, almost. Exabytes is the current limit) files of generated content. It is Linux-only, plain C, and in early alpha.
See https://github.com/s5k6/otffs.
So I wanted to create a large file with repeated ascii strings. "Why?" you may ask. Because I need to use it for some NFS troubleshooting I'm doing. I need the file to be compressible because I'm sharing a tcpdump of a file copy with the vendor of our NAS. I had originally created a 1g file filled with random data from /dev/urandom, but of course since it's random, it means it won't compress at all and I need to send the full 1g of data to the vendor, which is difficult.
So I created a file with all the printable ascii characters, repeated over and over, to a limit of 1g in size. I was worried it would take a long time. It actually went amazingly quickly, IMHO:
cd /dev/shm
date
time yes $(for ((i=32;i<127;i++)) do printf "\\$(printf %03o "$i")"; done) | head -c 1073741824 > ascii1g_file.txt
date
Wed Apr 20 12:30:13 CDT 2022
real 0m0.773s
user 0m0.060s
sys 0m1.195s
Wed Apr 20 12:30:14 CDT 2022
Copying it from an nfs partition to /dev/shm took just as long as with the random file (which one would expect, I know, but I wanted to be sure):
cp ascii1gfile.txt /home/greygnome/
uptime; free -m; sync; echo 1 > /proc/sys/vm/drop_caches; free -m; date; dd if=/home/greygnome/ascii1gfile.txt of=/dev/shm/outfile bs=16384 2>&1; date; rm -f /dev/shm/outfile
But while doing that I ran a simultaneous tcpdump:
tcpdump -i em1 -w /dev/shm/dump.pcap
I was able to compress the pcap file down to 12M in size! Awesomesauce!
Edit: Before you ding me because the OP said, "I don't care about the contents," know that I posted this answer because it's one of the first replies to "how to create a large file linux" in a Google search. And sometimes, disregarding the contents of a file can have unforeseen side effects.
Edit 2: And fallocate seems to be unavailable on a number of filesystems, and creating a 1GB compressible file in 1.2s seems pretty decent to me (aka, "quickly").
You could use https://github.com/flew-software/trash-dump
you can create file that is any size and with random data
heres a command you can run after installing trash-dump (creates a 1GB file)
$ trash-dump --filename="huge" --seed=1232 --noBytes=1000000000
BTW I created it

Resources