Why directory takes up more space than it actually is? - freebsd

I use FreeNAS (on FreeBSD) with ZFS file system.
One directory (only one) takes up more space than it actually is.
I've checked folder size it is 720Gb, but if I check space on file system i see that it takes 1292Gb.
Tell me, what could it be.
how i check space

Related

If the size of the file exceeds the maximum size of the file system, what happens?

For example, In FAT32 partition, The maximum file size is 4GB. but I was able to create a 5GB file with vim and I saved the file and opened it again, the console output was broken like a staircase. I have three questions.
If the size of the file exceeds the maximum size of the file system, what happens?
In my case, Why break?
In Unix system call, stat() can succeed up to a 2GB(2^31 - 1). Does this have anything to do with the file system? Is there a relationship between the limits of data in stat() and the limits of each feature in the file system?
If the size of the file exceeds the maximum size of the file system, what happens?
By definition, that can never happens. What really happens is that some system call (probably write(2) ...) is failing, and the code doing that should take care of that case.
Notice that FAT32 filesystems restrict the maximal size of files to 2Gigabytes. Use a better file system on your USB key if you want more (or split(1) large files in smaller chunks before copying them to your FAT32-formatted USB key).
If using <stdio.h> notice that fflush(3), fprintf(3), fclose(3) (and most other standard functions) can fail (e.g. because they will do some failing write(2)).
the console output was broken like a staircase
probably because your pseudoterminal was in some broken state. See stty(1), reset(1), termios(3) and read the tty demystified.
In Unix system call, stat() can succeed up to a 2GB(2^31 - 1)
You are misunderstanding stat(2). Read again its documentation
Read Advanced Linux Programming then syscalls(2).
I was able to create a 5GB file with vim
To understand the behavior of vim read first its documentation then study its source code (it is free software, and you can and perhaps should study its code).
You could also use strace(1) to understand what system calls are done by some command or process.

Compare a running process in memory with an executable in disk

I have a big project which will load an executable (let's call it greeting) into memory, but for some reason (e.g. there are many files called greeting under different directories), I need to know if the process in memory is exactly the one I want to use.
I know how to compare two files: diff, cmp, cksum and so on. But is there any way to compare a process in memory with an executable in hard disk?
According this answer you can get the contents of the memory version of the binary from the proc file system. I think you can cksum the original and the in memory version.
According to the man page of /proc, under Linux 2.2 and later, the
file is a symbolic link containing the actual pathname of the executed
command. Apparently, the binary is loaded into memory, and
/proc/[pid]/exe points to the content of the binary in memory.

Hard links linux, memory

When you copy files in linux(using contex menu copy command) does linux create hard links of files ?
Also, what happens if you delete original file, than hard link, that file still persist in memory, but it's pointer is removed ?
I have trouble understanding few things with a memory.
To free disk space, you need to delete both files, right ?
Does hard link points to memory location of a original file ? I used to see term inode, I'm now quiet sure what inode really is.
The inode is all the file data except the content.
A directory contains a set of names and numbers: "This directory contains file foo, which is file number 3 on this drive, bar, which is file number 4, quux, 17, viz, 123 and lastly ohmygod, 77321341". Inode number 3 contains "This file was created on Januar 1, 1970, last modified on January 1, 1990 and last read on January 2, 1990. It is 722 bytes large, and those bytes are in 4k block number 768123 on the drive" and a few more things.
The stat() system call shows how many blocks are needed, and almost everything else related to the inode.
Copying does not create hard links, that would be broken behavior. A hard link is just an additional first-class name to the same file; modify the file via one name (and not by saving under a temp name and then moving it, as some editors do), and you will see the change in the file when accessed under the other name, too. Not what I’d expect from a copy.
Note that there is nothing special about the first name a file had. All hard links are simply pointing at the same file.
Once the last directory entry pointing to a file is removed, there may still be file handles open pointing to it (from programs that opened the file). As long as one of those exists, the file is still there and can be used. It just cannot be opened by processes that haven’t done so before any longer, since it has no name any more.
When there is no more directory entry pointing to a file and no program has an open handle to the file any more, it can never be reached again. Therefore, the operating system frees the space on the disk.

Can inode and crtime be used as a unique file identifier?

I have a file indexing database on Linux. Currently I use file path as an identifier.
But if a file is moved/renamed, its path is changed and I cannot match my DB record to the new file and have to delete/recreate the record. Even worse, if a directory is moved/renamed, then I have to delete/recreate records for all files and nested directories.
I would like to use inode number as a unique file identifier, but inode number can be reused if file is deleted and another file created.
So, I wonder whether I can use a pair of {inode,crtime} as a unique file identifier.
I hope to use i_crtime on ext4 and creation_time on NTFS.
In my limited testing (with ext4) inode and crtime do, indeed, remain unchanged when renaming or moving files or directories within the same file system.
So, the question is whether there are cases when inode or crtime of a file may change.
For example, can fsck or defragmentation or partition resizing change inode or crtime or a file?
Interesting that
http://msdn.microsoft.com/en-us/library/aa363788%28VS.85%29.aspx says:
"In the NTFS file system, a file keeps the same file ID until it is deleted."
but also:
"In some cases, the file ID for a file can change over time."
So, what are those cases they mentioned?
Note that I studied similar questions:
How to determine the uniqueness of a file in linux?
Executing 'mv A B': Will the 'inode' be changed?
Best approach to detecting a move or rename to a file in Linux?
but they do not answer my question.
{device_nr,inode_nr} are a unique identifier for an inode within a system
moving a file to a different directory does not change its inode_nr
the linux inotify interface enables you to monitor changes to inodes (either files or directories)
Extra notes:
moving files across filesystems is handled differently. (it is infact copy+delete)
networked filesystems (or a mounted NTFS) can not always guarantee the stability of inodenumbers
Microsoft is not a unix vendor, its documentation does not cover Unix or its filesystems, and should be ignored (except for NTFS's internals)
Extra text: the old Unix adagium "everything is a file" should in fact be: "everything is an inode". The inode carries all the metainformation about a file (or directory, or a special file) except the name. The filename is in fact only a directory entry that happens to link to the particular inode. Moving a file implies: creating a new link to the same inode, end deleting the old directory entry that linked to it.
The inode metatata can be obtained by the stat() and fstat() ,and lstat() system calls.
The allocation and management of i-nodes in Unix is dependent upon the filesystem. So, for each filesystem, the answer may vary.
For the Ext3 filesystem (the most popular), i-nodes are reused, and thus cannot be used as a unique file identifier, nor is does reuse occur according to any predictable pattern.
In Ext3, i-nodes are tracked in a bit vector, each bit representing a single i-node number. When an i-node is freed, it's bit is set to zero. When a new i-node is needed, the bit vector is searched for the first zero-bit and the i-node number (which may have been previously allocated to another file) is reused.
This may lead to the naive conclusion that the lowest numbered available i-node will be the one reused. However, the Ext3 file system is complex and highly optimised, so no assumptions should be made about when and how i-node numbers can be reused, even though they clearly will.
From the source code for ialloc.c, where i-nodes are allocated:
There are two policies for allocating an inode. If the new inode is a
directory, then a forward search is made for a block group with both
free space and a low directory-to-inode ratio; if that fails, then of
he groups with above-average free space, that group with the fewest
directories already is chosen. For other inodes, search forward from
the parent directory's block group to find a free inode.
The source code that manages this for Ext3 is called ialloc and the definitive version is here: https://github.com/torvalds/linux/blob/master/fs/ext3/ialloc.c
I guess the dB application would need to consider the case where the file is subject to restoration from backup, which would preserve the file crtime, but not the inode number.

How to estimate a file size from header's sector start address?

Suppose I have a deleted file in my unallocated space on a linux partition and i want to retrieve it.
Suppose I can get the start address of the file by examining the header.
Is there a way by which I can estimate the number of blocks to be analyzed hence (this depends on the size of the image.)
In general, Linux/Unix does not support recovering deleted files - if it is deleted, it should be gone. This is also good for security - one user should not be able to recover data in a file that was deleted by another user by creating huge empty file spanning almost all free space.
Some filesystems even support so called secure delete - that is, they can automatically wipe file blocks on delete (but this is not common).
You can try to write a utility which will open whole partition that your filesystem is mounted on (say, /dev/sda2) as one huge file and will read it and scan for remnants of your original data, but if file was fragmented (which is highly likely), chances are very small that you will be able to recover much of the data in some usable form.
Having said all that, there are some utilities which are trying to be a bit smarter than simple scan and can try to be undelete your files on Linux, like extundelete. It may work for you, but success is never guaranteed. Of course, you must be root to be able to use it.
And finally, if you want to be able to recover anything from that filesystem, you should unmount it right now, and take a backup of it using dd or pipe dd compressed through gzip to save space required.

Resources