I am trying to understand the difference between inode bitmap and inode table (from ext2 file system documentation), but I am not getting it. Can any one explain?
The bitmap just occupies one block and is a sequence of 0s and 1s where 0 means that the corresponding inode in the _inode_table_ is free and 1 specified that it's used.
The inode table is where the actual information about the inode is written and it occupies more than one block on the filesystem.
The bitmap technique is useful to quickly find free locations on the inode table (or data blocks) when modifying the filesystem.
On the hard drive, those sections would look like this:
inode bitmap:
11100011010010101...
inode table:
struct inode | struct inode | struct inode | struct inode | ...
Related
When I use the dumpe2fs command to look at the Block Group of the ext4 filesystem, I see "free inodes" and "unused inodes".
I want to know the difference between them ?
Why do they have different values in Group 0 ?
Group 0: (Blocks 0-32767) [ITABLE_ZEROED]
Checksum 0xd1a1, unused inodes 0
Primary superblock at 0, Group descriptors at 1-3
Reserved GDT blocks at 4-350
Block bitmap at 351 (+351), Inode bitmap at 367 (+367)
Inode table at 383-892 (+383)
12 free blocks, 1 free inodes, 1088 directories
Free blocks: 9564, 12379-12380, 12401-12408, 12411
Free inodes: 168
Group 1: (Blocks 32768-65535) [ITABLE_ZEROED]
Checksum 0x0432, unused inodes 0
Backup superblock at 32768, Group descriptors at 32769-32771
Reserved GDT blocks at 32772-33118
Block bitmap at 352 (+4294934880), Inode bitmap at 368 (+4294934896)
Inode table at 893-1402 (+4294935421)
30 free blocks, 0 free inodes, 420 directories
Free blocks: 37379-37384, 37386-37397, 42822-42823, 42856-42859, 42954-42955, 44946-44947, 45014-45015
Free inodes:
The "unused inodes" reported are inodes at the end of the inode table for each group that have never been used in the lifetime of the filesystem, so e2fsck does not need to scan them during repair. This can speed up e2fsck pass-1 scanning significantly.
The "free inodes" are the current unallocated inodes in the group. This number includes the "unused inodes" number, so that they will still be used if there are many (typically very small) inodes allocated in a single group.
From:
https://unix.stackexchange.com/a/715165/536354
In the fs code I see mark_inode_dirty() function which is passed with the parameter I_DIRTY and I_DIRTY_SYNC
What is the difference between both. I guess both will mark the inode as dirty and commit the changes to the
disk.
See here: http://ehc.ac/p/mrvopensource/linux-ppc-2.6/ci/1c0eeaf5698597146ed9b873e2f9e0961edcf0f9/tree/include/linux/fs.h?barediff=2e6883bdf49abd0e7f0d9b6297fc3be7ebb2250b
I_DIRTY is a superset of I_DIRTY_SYNC:
#define I_DIRTY (I_DIRTY_SYNC | I_DIRTY_DATASYNC | I_DIRTY_PAGES)
Which are documented as:
I_DIRTY_SYNC Inode itself is dirty.
I_DIRTY_DATASYNC Data-related inode changes pending
I_DIRTY_PAGES Inode has dirty pages. Inode itself may be clean.
in source/arch/x86/kernel/msr.c, the msr_open callback for the character device uses the following construct to extract the minor number of the character device file used:
static int msr_open(struct inode *inode, struct file *file)
{
unsigned int cpu = iminor(file_inode(file));
[...]
}
My question is:
Why not directly call iminor with the first argument of the function, like:
unsigned int cpu = iminor(inode);
The construct is used in other callbacks (e.g. read and write) as well,, where the inode is not passed as an argument, so I guess this is due to copy/paste, or is there a deeper meaning to it?
An inode is a data structure on a traditional Unix-style file system such as UFS or ext3. An inode stores basic information about a regular file, directory, or other file system object.
- http://www.cyberciti.biz/tips/understanding-unixlinux-filesystem-inodes.html
Same deal.
I have gone through the code of inode in linux kernel code but I am unable to figure where are the data pointers in inode. I know that there are 15 pointers [0-14] out of which 12 are direct, 1 single indirect, 1 double indirect and 1 triple indirect.
Can some one please locate these data members. Also please specify how you located these as I have searched on google many time with different key words but all in vain.
It is up to a specific file system to access its data, so there's no "data pointers" in general (some file systems may be virtual, that means generating their data on the fly or retrieving it from network).
If you're interested in ext4, you can look up the ext4-specific inode structure (struct ext4_inode) in fs/ext4/ext4.h, where data of an inode is indeed referenced by indices of 12 direct blocks, 1 of single indirection, 1 of double indirection and 1 of triple indirection.
This means that blocks [0..11] of an inode's data have numbers e4inode->i_block[0/1/.../11], whereas e4inode->i_block[12] is a number of a block which is filled with data block numbers itself (so it holds indices of inode's data blocks in range [12, 12 + fs->block_size / sizeof(__le32)]. The same trick is applied to i_block[13], only it holds double-indirected indices (blocks filled with indices of blocks that hold list of blocks holding the actual data) starting from index 12 + fs->block_size / sizeof(__le32), and i_block[14] holds triple indirected indices.
As explained here:
http://computer-forensics.sans.org/blog/2010/12/20/digital-forensics-understanding-ext4-part-1-extents
Ext4 uses extents instead of block pointers to track the file content.
If you are interested in ext3/ext2 datastructure where content pointer are used:
http://www.slashroot.in/how-does-file-deletion-work-linux
has many good diagrams to elaborate it. And here:
http://mcgrewsecurity.com/training/extx.pdf
at page 16 has examples of the details of "block pointers" (which are basically block numbers, or offset values relative to the start of the disk image, 1 block usually 512 bytes).
If you want to walk the filesystem phyiscally, say for a ext3 formatted hard drive, see this:
http://wiki.sleuthkit.org/index.php?title=FS_Analysis
but you can always use just "dd" command to do everything, just need to know where to start reading and stop reading, and input for the dd command is usually a replica of the harddisk image itself, for many reasons.
It seems to me that tmpfs is not re-using inode numbers, but instead creates a new inode number via a +1 sequence everytime it needs a free inode.
Do you know how this is implemented / can you pin-point me to some source code where i could check the algorithm that is used in tmpfs ?
I need to understand this in order to bypass a limitation in a caching system that uses the inode number as its cache key (hence leading to rare, but occuring collisions when inodes are re-used too often). tmpfs could save my day if I can prove that it keeps creating unique inode numbers.
Thank you for your help,
Jerome Wagner
I won't directly answer your question, so I apologize in advance for that.
The tmpfs idea is good, but I wouldn't have my program depend on a more or less obscure implementation detail for generating keys. Why don't you try another method, such as combining the inode number with some other information? Maybe modification date: it's impossible two files get the same inode number AND modification date at the time of key-generation, unless system date changes.
Cheers!
The bulk of the tmpfs code is in mm/shmem.c. New inodes are created by
static struct inode *shmem_get_inode(struct super_block *sb, const struct inode *dir,
int mode, dev_t dev, unsigned long flags)
but it delegates almost everything to the generic filesystem code.
In particular, the field i_ino is filled in in fs/inode.c:
/**
* new_inode - obtain an inode
* #sb: superblock
*
* Allocates a new inode for given superblock. The default gfp_mask
* for allocations related to inode->i_mapping is GFP_HIGHUSER_MOVABLE.
* If HIGHMEM pages are unsuitable or it is known that pages allocated
* for the page cache are not reclaimable or migratable,
* mapping_set_gfp_mask() must be called with suitable flags on the
* newly created inode's mapping
*
*/
struct inode *new_inode(struct super_block *sb)
{
/*
* On a 32bit, non LFS stat() call, glibc will generate an EOVERFLOW
* error if st_ino won't fit in target struct field. Use 32bit counter
* here to attempt to avoid that.
*/
static unsigned int last_ino;
struct inode *inode;
spin_lock_prefetch(&inode_lock);
inode = alloc_inode(sb);
if (inode) {
spin_lock(&inode_lock);
__inode_add_to_lists(sb, NULL, inode);
inode->i_ino = ++last_ino;
inode->i_state = 0;
spin_unlock(&inode_lock);
}
return inode;
}
And it does indeed just use an incrementing counter (last_ino).
Most other filesystems use information from the on-disk files to later override the i_ino field.
Note that it's perfectly possible for this to wrap all the way around. The kernel also has a "generation" field that gets filled in various ways. mm/shmem.c uses the current time.