How does du calculate the size of a file in kilobytes?

How does du calculate the size of a file in kilobytes? - du

I have a file for which
du -b filename gives 67108864 as the answer (which is supposed to be in bytes),
while
du filename gives 65604 (which is supposed to be in kilobytes).
However, it should return 67108864/1024 = 65536 as the answer.
I looked at the man entry for du, but couldn't find the answer.
What am I missing?
I'm running Ubuntu version 12.04 on a 64 bit machine.

-b is not just bytes:
-b, --bytes equivalent to `--apparent-size --block-size=1'
--apparent-size print apparent sizes, rather than disk usage; although
the apparent size is usually smaller, it may be
larger due to holes in (`sparse') files, internal
fragmentation, indirect blocks, and the like

Related

How stat show a file size?

just curios how stat show a file size....
for example I have a file with only one letter in it
echo a > test
when I do
stat test
I see Size: 2
but when I do
du -h test
I see 4.0K
So what I'm missing ?
Thanks

As per the du(1) manpage, du estimates disk usage, it doesn't show apparent file size, unless you pass --apparent-size.
It seems the estimation is based on the assumption of a 4096-byte disk allocation granularity. (Make a 4097-bytes-large file, du will report 8KiB, etc.).

How can I find total size of file in bytes? [duplicate]

This question already has answers here:
perl file size calculation not working
(2 answers)
Closed 3 years ago.
I want total of file size in bytes. This is my code.
my $total= print `stat --printf="%s\n" www/ | du -ah www/* > report.txt `;
I got output in K. But I want in Byte so how can I get and how can I find total?
My out is coming like this
4.0K www/1.html
3.0K www/2.html

First you do not give enough information to help.
Then you ask for the total size in your headline but then show som shell code and ask how to get the size in bytes for the du command.
Seems you didn't understand du but simply copied it from somewhere. du -h is for "human readable" output, which you would have known had you read man du.
And then you try to put output from shell commands into a perl variable. You will only get the value 1 btw.
If you really want to go for perl, try to understand what (as an example)
while (<www/*>) …
will do. Also check perldoc -f -X. Search for -s.
The total size you can get by summing up the individual sizes collected with -s. Ah! And there is also a stat in perl. Check perldoc -f stat.
I hope this is enough to get you going…

forrtl: No space left on device

My simulations stop with forrtl: No space left on device error.
When I use ls --sort=size -alh, it will report total of 96M and max of 60M usage.
When I use du -h, it will report total of 159G and max of 158G (for the same folder)
When I use df -h, it will report:
Filesystem Size Used Avail Use% Mounted on
/dev/nvme0n1p4 930G 883G 0 100% /
Initially, I thought that there are huge number of hidden files that are taking space. I tried to remove hidden files as explained here.
However, I got the same result. I was wondering how I can find the space-consuming items to remove/delete them.

You could use ncdu to find easier does directories that are using most of the space, for example, try this:
ncdu -x /
option -x is to stay in the same file system (do not traverse other filesystem mounts)
What also could be happening is that some applications/processes still running but haven't "free" the files, this is because the file descriptor in the filesystem is held open, in this case, you could use:
lsof | grep deleted
Sometimes if a file is deleted while it is opened by a process, it will not actually free up the disk space until the process is ended.
More about this in this answer: https://unix.stackexchange.com/a/68532/53084
In case you can't use ncdu you could use the find command, for example, to find files bigger than 4096 bytes in the current directory:
find . -type f -size +4096c
More about other options here: https://superuser.com/a/204571/284722

sort runs out of memory

I'm using a pipe including sort to merge multiple large textfiles and remove dupes.
I don't have root permissions but the box isn't configured in any way to cut non root privileges further down than default debian jessie.
The box has 32GB RAM and 16GB are in use.
Regardless on how I call sort (GNU sort 8.13) it fills up all the remaining RAM and crashes with "out of memory".
It really fills up all the memory before crashing. I followed the process in top.I tried to explicitly set the max memory usage with the -S parameter ranging from 80% to 10% and from 8G to 500M.
The whole pipe looks similar to:
cat * | tr -cd '[:print:]' |sort {various params tested here} -T /other/tmp/path/ | uniq > ../output.txt
Always the same behavior.
Does anyone know what could cause such issue?
And of course how to solve it?

I found the issue myself. It's fairly easy.
The "tr -cd '[:print:]'" removes line breaks and sort reads line by line.
So it tries to read all the files as one line and the -S parameter can't do its job.

How to create a file with a given size in Linux?

For testing purposes I have to generate a file of a certain size (to test an upload limit).
What is a command to create a file of a certain size on Linux?

For small files:
dd if=/dev/zero of=upload_test bs=file_size count=1
Where file_size is the size of your test file in bytes.
For big files:
dd if=/dev/zero of=upload_test bs=1M count=size_in_megabytes

Please, modern is easier, and faster. On Linux, (pick one)
truncate -s 10G foo
fallocate -l 5G bar
It needs to be stated that truncate on a file system supporting sparse files will create a sparse file and fallocate will not. A sparse file is one where the allocation units that make up the file are not actually allocated until used. The meta-data for the file will however take up some considerable space but likely no where near the actual size of the file. You should consult resources about sparse files for more information as there are advantages and disadvantages to this type of file. A non-sparse file has its blocks (allocation units) allocated ahead of time which means the space is reserved as far as the file system sees it. Also fallocate nor truncate will not set the contents of the file to a specified value like dd, instead the contents of a file allocated with fallocate or truncate may be any trash value that existed in the allocated units during creation and this behavior may or may not be desired. The dd is the slowest because it actually writes the value or chunk of data to the entire file stream as specified with it's command line options.
This behavior could potentially be different - depending on file system used and conformance of that file system to any standard or specification. Therefore it is advised that proper research is done to ensure that the appropriate method is used.

Just to follow up Tom's post, you can use dd to create sparse files as well:
dd if=/dev/zero of=the_file bs=1 count=0 seek=12345
This will create a file with a "hole" in it on most unixes - the data won't actually be written to disk, or take up any space until something other than zero is written into it.

Use this command:
dd if=$INPUT-FILE of=$OUTPUT-FILE bs=$BLOCK-SIZE count=$NUM-BLOCKS
To create a big (empty) file, set $INPUT-FILE=/dev/zero.
Total size of the file will be $BLOCK-SIZE * $NUM-BLOCKS.
New file created will be $OUTPUT-FILE.

On OSX (and Solaris, apparently), the mkfile command is available as well:
mkfile 10g big_file
This makes a 10 GB file named "big_file". Found this approach here.

You can do it programmatically:
#include <unistd.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <stdlib.h>
int main() {
int fd = creat("/tmp/foo.txt", 0644);
ftruncate(fd, SIZE_IN_BYTES);
close(fd);
return 0;
}
This approach is especially useful to subsequently mmap the file into memory.
use the following command to check that the file has the correct size:
# du -B1 --apparent-size /tmp/foo.txt
Be careful:
# du /tmp/foo.txt
will probably print 0 because it is allocated as Sparse file if supported by your filesystem.
see also: man 2 open and man 2 truncate

Some of these answers have you using /dev/zero for the source of your data. If your testing network upload speeds, this may not be the best idea if your application is doing any compression, a file full of zeros compresses really well. Using this command to generate the file
dd if=/dev/zero of=upload_test bs=10000 count=1
I could compress upload_test down to about 200 bytes. So you could put yourself in a situation where you think your uploading a 10KB file but it would actually be much less.
What I suggest is using /dev/urandom instead of /dev/zero. I couldn't compress the output of /dev/urandom very much at all.

you could do:
[dsm#localhost:~]$ perl -e 'print "\0" x 100' > filename.ext
Where you replace 100 with the number of bytes you want written.

dd if=/dev/zero of=my_file.txt count=12345

Use fallocate if you don't want to wait for disk.
Example:
fallocate -l 100G BigFile
Usage:
Usage:
fallocate [options] <filename>
Preallocate space to, or deallocate space from a file.
Options:
-c, --collapse-range remove a range from the file
-d, --dig-holes detect zeroes and replace with holes
-i, --insert-range insert a hole at range, shifting existing data
-l, --length <num> length for range operations, in bytes
-n, --keep-size maintain the apparent size of the file
-o, --offset <num> offset for range operations, in bytes
-p, --punch-hole replace a range with a hole (implies -n)
-z, --zero-range zero and ensure allocation of a range
-x, --posix use posix_fallocate(3) instead of fallocate(2)
-v, --verbose verbose mode
-h, --help display this help
-V, --version display version

This will generate 4 MB text file with random characters in current directory and its name "4mb.txt"
You can change parameters to generate different sizes and names.
base64 /dev/urandom | head -c 4000000 > 4mb.txt

There are lots of answers, but none explained nicely what else can be done. Looking into man pages for dd, it is possible to better specify the size of a file.
This is going to create /tmp/zero_big_data_file.bin filled with zeros, that has size of 20 megabytes :
dd if=/dev/zero of=/tmp/zero_big_data_file.bin bs=1M count=20
This is going to create /tmp/zero_1000bytes_data_file.bin filled with zeros, that has size of 1000 bytes :
dd if=/dev/zero of=/tmp/zero_1000bytes_data_file.bin bs=1kB count=1
or
dd if=/dev/zero of=/tmp/zero_1000bytes_data_file.bin bs=1000 count=1
In all examples, bs is block size, and count is number of blocks
BLOCKS and BYTES may be followed by the following multiplicative suffixes: c =1, w =2, b =512, kB =1000, K =1024, MB =1000*1000, M =1024*1024, xM =M GB =1000*1000*1000, G =1024*1024*1024, and so on for T, P, E, Z, Y.

As shell command:
< /dev/zero head -c 1048576 > output

Kindly run below command for quickly creating larger file with certain size
in linux
for i in {1..10};do fallocate -l 2G filename$i;done
explanation:-Above command will create 10 files with 10GB size in just few seconds.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

How does du calculate the size of a file in kilobytes? - du

-b is not just bytes: -b, --bytes equivalent to `--apparent-size --block-size=1' --apparent-size print apparent sizes, rather than disk usage; although the apparent size is usually smaller, it may be larger due to holes in (`sparse') files, internal fragmentation, indirect blocks, and the like

Related

How stat show a file size?

How can I find total size of file in bytes? [duplicate]

forrtl: No space left on device

sort runs out of memory

How to create a file with a given size in Linux?

Categories

Resources