Linux: Difference between inode and file_inode(file)? - linux

in source/arch/x86/kernel/msr.c, the msr_open callback for the character device uses the following construct to extract the minor number of the character device file used:
static int msr_open(struct inode *inode, struct file *file)
{
unsigned int cpu = iminor(file_inode(file));
[...]
}
My question is:
Why not directly call iminor with the first argument of the function, like:
unsigned int cpu = iminor(inode);
The construct is used in other callbacks (e.g. read and write) as well,, where the inode is not passed as an argument, so I guess this is due to copy/paste, or is there a deeper meaning to it?

An inode is a data structure on a traditional Unix-style file system such as UFS or ext3. An inode stores basic information about a regular file, directory, or other file system object.
- http://www.cyberciti.biz/tips/understanding-unixlinux-filesystem-inodes.html
Same deal.

Related

Retrieve the amount of bytes stored in the "write" end of a Linux anonymous PIPE

My objective is to be able to determine how many bytes have been transferred into the write end of a pipe. Perhaps, one would need to access the f_pos member of the struct file structure from linux/fs.h associated with this pipe.
struct file snipfrom fs.h
Is it possible to access this value from a userspace program? Again, I'd just like to be able to determine (perhaps based on the f_pos value) how many bytes are stored in the kernel buffer backing the pipe.
I have a feeling this isn't possible and one has to keep reading until read(int fd, void *buf, size_t count) returns less bytes than count.. then at this point, all bytes have been "emptied out" I assume..
Amount of bytes available for read from the pipe can be requested by
ioctl(fd, FIONREAD, &nbytes);
Here fd is a file descriptor, and variable nbytes, where result will be stored, is int variable.
Taken from: man 7 pipe.
Amount of bytes available for write is a different story.

About IOCTL system call

The prototype of IOCTL system call in linux is
int ioctl(struct inode *, struct file *, unsigned int, unsigned long);
All other file operations like read(),write(),llseek(),mmap() etc.. have only struct file * as argument. But, why IOCTL call needs struct inode * to be passed.
Is there any specific use of it?
Which kernel version you are taking about, now ioctl doesn't has the inode pointer as its parameter. Previously there used to be, but I think from 2.6.36 kernel onwards it has been removed.
The prototype of ioctl is, at least according to the manpage, int ioctl(int d, int request, ...);. The ... bit is important - variadic arguments, meaning the remaining arguments depend on the first ones, much like printf. Any use for a struct inode * would stem from the specific ioctl request you're making.

How device driver write/read works

Custom read and write operations are defined as
ssize_t (*read) (struct file *,char __user *, size_t, loff_t *);
ssize_t (*write) (struct file *,const char __user *, size_t, loff_t *);
What happens when a read or write is made to a device?
I couldnt find simple explanation of this in LDD book.
For example what happens when I have a device and I made a write like
echo "Hello" > /dev/newdevice
And I am writing a simple character device. Also
cat /dev/newdevice
I know it depends on my custom read/write and what I need is simple read from memory and write to memory
#user567879, Since device node is treated as a special character or block or a network file, each file has an file structure "filp" which in turn holds the pointer to the file operations table where each system call is mapped to appropriate functions in device driver.
for ex: .open = my_open
.write = my_write
.read = my_read etc.
What happens when you issue echo "Hello" > /dev/newdevice is
1) Device node i.e. "/dev/newdevice" is opened using open system call which in turn
calls your mapped open function i.e. "**my_open**"
2) If open is successful, write system call issued with appropriate file descriptor
(fd), which in turn calls "**my_write**" function present in device driver and thus
according to the functionality it writes/transmits user data to the actual
hardware.
3) Same rule applies for "cat /dev/newdevice" i.e. open the device node --> read
system call --> mapped read function in your device driver i.e. "**my_read**" -->
reads the data from actual hardware and sends the data read from the hardware to
user space (application which issued read system call)
I hope I have answered your question :-)

Are file descriptors for linux sockets always in increasing order

I have a socket server in C/linux. Each time I create a new socket it is assigned a file descriptor. I want to use these FD's as uniqueID's for each client. If they are guaranteed to always be assigned in increasing order (which is the case for the Ubuntu that I am running) then I could just use them as array indices.
So the question: Are the file descriptors that are assigned from linux sockets guaranteed to always be in increasing order?
Let's look at how this works internally (I'm using kernel 4.1.20). The way file descriptors are allocated in Linux is with __alloc_fd. When you do a open syscall, do_sys_open is called. This routine gets a free file descriptor from get_unused_fd_flags:
long do_sys_open(int dfd, const char __user *filename, int flags, umode_t mode)
{
...
fd = get_unused_fd_flags(flags);
if (fd >= 0) {
struct file *f = do_filp_open(dfd, tmp, &op);
get_unused_d_flags calls __alloc_fd setting minimum and maximum fd:
int get_unused_fd_flags(unsigned flags)
{
return __alloc_fd(current->files, 0, rlimit(RLIMIT_NOFILE), flags);
}
__alloc_fd gets the file descriptor table for the process, and gets the fd as next_fd, which is actually set from the previous time it ran:
int __alloc_fd(struct files_struct *files,
unsigned start, unsigned end, unsigned flags)
{
...
fd = files->next_fd;
...
if (start <= files->next_fd)
files->next_fd = fd + 1;
So you can see how file descriptors indeed grow monotonically... up to certain point. When the fd reaches the maximum, __alloc_fd will try to find the smallest unused file descriptor:
if (fd < fdt->max_fds)
fd = find_next_zero_bit(fdt->open_fds, fdt->max_fds, fd);
At this point the file descriptors will not be growing monotonically anymore, but instead will jump trying to find free file descriptors. After this, if the table gets full, it will be expanded:
error = expand_files(files, fd);
At which point they will grow again monotonically.
Hope this helps
FD's are guaranteed to be unique, for the lifetime of the socket. So yes, in theory, you could probably use the FD as an index into an array of clients. However, I'd caution against this for at least a couple of reasons:
As has already been said, there is no guarantee that FDs will be allocated monotonically. accept() would be within its rights to return a highly-numbered FD, which would then make your array inefficient. So short answer to your question: no, they are not guaranteed to be monotonic.
Your server is likely to end up with lots of other open FDs - stdin, stdout and stderr to name but three - so again, your array is wasting space.
I'd recommend some other way of mapping from FDs to clients. Indeed, unless you're going to be dealing with thousands of clients, searching through a list of clients should be fine - it's not really an operation that you should need to do a huge amount.
Do not depend on the monotonicity of file descriptors. Always refer to the remote system via a address:port pair.

how are inode numbers generated in linux tmpfs?

It seems to me that tmpfs is not re-using inode numbers, but instead creates a new inode number via a +1 sequence everytime it needs a free inode.
Do you know how this is implemented / can you pin-point me to some source code where i could check the algorithm that is used in tmpfs ?
I need to understand this in order to bypass a limitation in a caching system that uses the inode number as its cache key (hence leading to rare, but occuring collisions when inodes are re-used too often). tmpfs could save my day if I can prove that it keeps creating unique inode numbers.
Thank you for your help,
Jerome Wagner
I won't directly answer your question, so I apologize in advance for that.
The tmpfs idea is good, but I wouldn't have my program depend on a more or less obscure implementation detail for generating keys. Why don't you try another method, such as combining the inode number with some other information? Maybe modification date: it's impossible two files get the same inode number AND modification date at the time of key-generation, unless system date changes.
Cheers!
The bulk of the tmpfs code is in mm/shmem.c. New inodes are created by
static struct inode *shmem_get_inode(struct super_block *sb, const struct inode *dir,
int mode, dev_t dev, unsigned long flags)
but it delegates almost everything to the generic filesystem code.
In particular, the field i_ino is filled in in fs/inode.c:
/**
* new_inode - obtain an inode
* #sb: superblock
*
* Allocates a new inode for given superblock. The default gfp_mask
* for allocations related to inode->i_mapping is GFP_HIGHUSER_MOVABLE.
* If HIGHMEM pages are unsuitable or it is known that pages allocated
* for the page cache are not reclaimable or migratable,
* mapping_set_gfp_mask() must be called with suitable flags on the
* newly created inode's mapping
*
*/
struct inode *new_inode(struct super_block *sb)
{
/*
* On a 32bit, non LFS stat() call, glibc will generate an EOVERFLOW
* error if st_ino won't fit in target struct field. Use 32bit counter
* here to attempt to avoid that.
*/
static unsigned int last_ino;
struct inode *inode;
spin_lock_prefetch(&inode_lock);
inode = alloc_inode(sb);
if (inode) {
spin_lock(&inode_lock);
__inode_add_to_lists(sb, NULL, inode);
inode->i_ino = ++last_ino;
inode->i_state = 0;
spin_unlock(&inode_lock);
}
return inode;
}
And it does indeed just use an incrementing counter (last_ino).
Most other filesystems use information from the on-disk files to later override the i_ino field.
Note that it's perfectly possible for this to wrap all the way around. The kernel also has a "generation" field that gets filled in various ways. mm/shmem.c uses the current time.

Resources