How device driver write/read works - linux

Custom read and write operations are defined as
ssize_t (*read) (struct file *,char __user *, size_t, loff_t *);
ssize_t (*write) (struct file *,const char __user *, size_t, loff_t *);
What happens when a read or write is made to a device?
I couldnt find simple explanation of this in LDD book.
For example what happens when I have a device and I made a write like
echo "Hello" > /dev/newdevice
And I am writing a simple character device. Also
cat /dev/newdevice
I know it depends on my custom read/write and what I need is simple read from memory and write to memory

#user567879, Since device node is treated as a special character or block or a network file, each file has an file structure "filp" which in turn holds the pointer to the file operations table where each system call is mapped to appropriate functions in device driver.
for ex: .open = my_open
.write = my_write
.read = my_read etc.
What happens when you issue echo "Hello" > /dev/newdevice is
1) Device node i.e. "/dev/newdevice" is opened using open system call which in turn
calls your mapped open function i.e. "**my_open**"
2) If open is successful, write system call issued with appropriate file descriptor
(fd), which in turn calls "**my_write**" function present in device driver and thus
according to the functionality it writes/transmits user data to the actual
hardware.
3) Same rule applies for "cat /dev/newdevice" i.e. open the device node --> read
system call --> mapped read function in your device driver i.e. "**my_read**" -->
reads the data from actual hardware and sends the data read from the hardware to
user space (application which issued read system call)
I hope I have answered your question :-)

Related

Where does mount API implemented in Linux source code?

I am newbie for Linux kernel, I cloned the Linux source from its repo on GitHub. I cannot find the file sys/mount.h nor the mount function.
Do you know where is this file located in source code? Where can I find its implementation?
If you don't know where a system call is implemented in the kernel, there's a general sequence of steps you can use to find it. You will need to download the kernel source to your machine.
Begin by finding the number of parameters the syscall requires. eg. mount(2) requires five parameters.
Since mount(2) requires 5 parameters, search for SYSCALL_DEFINE5(mount in the kernel source:
grep -nr 'SYSCALL_DEFINE5(mount'
This will take a while to run, but it will eventually find:
./fs/compat.c:92:COMPAT_SYSCALL_DEFINE5(mount, const char __user *, dev_name,
./fs/namespace.c:3026:SYSCALL_DEFINE5(mount, char __user *, dev_name, char __user *, dir_name,
So, the syscall you're looking for is located at in ./fs/namespace.c on line 3026. (I'm using Linux 4.19.99, so the line number will probably be different on your kernel.)

Getting ENOTTY on ioctl for a Linux Kernel Module

I have the following chardev defined:
.h
#define MAJOR_NUM 245
#define MINOR_NUM 0
#define IOCTL_MY_DEV1 _IOW(MAJOR_NUM, 0, unsigned long)
#define IOCTL_MY_DEV2 _IOW(MAJOR_NUM, 1, unsigned long)
#define IOCTL_MY_DEV3 _IOW(MAJOR_NUM, 2, unsigned long)
module .c
static long device_ioctl(
struct file* file,
unsigned int ioctl_num,
unsigned long ioctl_param)
{
...
}
static int device_open(struct inode* inode, struct file* file)
{
...
}
static int device_release(struct inode* inode, struct file* file)
{
...
}
struct file_operations Fops = {
.open=device_open,
.unlocked_ioctl= device_ioctl,
.release=device_release
};
static int __init my_dev_init(void)
{
register_chrdev(MAJOR_NUM, "MY_DEV", &Fops);
...
}
module_init(my_dev_init);
My user code
ioctl(fd, IOCTL_MY_DEV1, 1);
Always fails with same error: ENOTTY
Inappropriate ioctl for device
I've seen similar questions:
i.e
Linux kernel module - IOCTL usage returns ENOTTY
Linux Kernel Module/IOCTL: inappropriate ioctl for device
But their solutions didn't work for me
ENOTTY is issued by the kernel when your device driver has not registered a ioctl function to be called. I'm afraid your function is not well registered, probably because you have registered it in the .unlocked_ioctl field of the struct file_operations structure.
Probably you'll get a different result if you register it in the locked version of the function. The most probable cause is that the inode is locked for the ioctl call (as it should be, to avoid race conditions with simultaneous read or write operations to the same device)
Sorry, I have no access to the linux source tree for the proper name of the field to use, but for sure you'll be able to find it yourself.
NOTE
I observe that you have used macro _IOW, using the major number as the unique identifier. This is probably not what you want. First parameter for _IOW tries to ensure that ioctl calls get unique identifiers. There's no general way to acquire such identifiers, as this is an interface contract you create between application code and kernel code. So using the major number is bad practice, for two reasons:
Several devices (in linux, at least) can share the same major number (minor allocation in linux kernel allows this) making it possible for a clash between devices' ioctls.
In case you change the major number (you configure a kernel where that number is already allocated) you have to recompile all your user level software to cope with the new device ioctl ids (all of them change if you do this)
_IOW is a macro built a long time ago (long ago from the birth of linux kernel) that tried to solve this problem, by allowing you to select a different character for each driver (but not dependant of other kernel parameters, for the reasons pointed above) for a device having ioctl calls not clashing with another device driver's. The probability of such a clash is low, but when it happens you can lead to an incorrect machine state (you have issued a valid, working ioctl call to the wrong device)
Ancient unix (and early linux) kernels used different chars to build these calls, so, for example, tty driver used 'T' as parameter for the _IO* macros, scsi disks used 'S', etc.
I suggest you to select a random number (not appearing elsewhere in the linux kernel listings) and then use it in all your devices (probably there will be less drivers you write than drivers in the kernel) and select a different ioctl id for each ioctl call. Maintaining a local ioctl file with the registered ioctls this way is far better than trying to guess a value that works always.
Also, a look at the definition of the _IO* macros should be very illustrative :)

Do I need to close a file before calling syncfs()

On my embedded system, I want to make sure that the data is safely written when I close a file - if the system reports that the data was saved, the user should be able to remove power immediately.
I know that the proper way to do this is fsync(), fclose(), and fsync() on the directory (cfr. this blog entry). However, it's a bit tricky to get a file descriptor for the directory in my case (I'd have to go through /proc/self/fd to find back the filename and derive the directory from there). It would be much simpler for me to just do syncfs() on the entire filesystem - I know that this is the only file that is open on the filesystem anyway.
Now my question is:
Is it sufficient to do syncfs()?
Do I need to fclose() the FILE * first (for the directory entry to be up-to-date)? Or is fflush() sufficient?
If it needs to be closed, is it useful to dup() the file descriptor before closing so I can use it directly for syncfs()?
First of all, don't mix standard library <stdio.h> calls (like fprintf(3) or fopen(3)) with system calls (like open(2) or close(2) or sync(2)) as the formers are library routines that use in-process' buffers to store temporary data, for which the system is unaware, and the others are operating system interfaces that make the system responsible for the data maintainance from now onwards. You'll distinguish them easily as the former use FILE * descriptors to operate, while the last use int integer descriptors to operate on.
So if you use a system call to ensure your data is properly synced to disk, it is absolutely neccessary to first fflush(3) your process' buffer data before you do the filesystem sync(2) or fsync(2) call.
No sync(2) is warranted to happen at fclose(3) or even on close(2) time, or in the atexit() callbacks your process does before exit().
The operating system buffers are write delayed for performance reasons, and close(2) is not an event that makes it to trigger such a thing. Just think that many processes can be reading and writing the same file at the same time, and each close(2) triggering a filesystem flush could be a pain to achieve. Operating system triggers such calls at regular intervals, on umount(2) system calls, on system shutdown, and on specific calls to the sync(2) and fsync(2) system calls.
If you need to maintain the FILE *fd descriptor open, just do a fflush(fd) for that descriptor to ensure that the operating system has all its buffers for fwrite(3)d or fprintf(3)ed data first.
So finally, if you are using <stdio.h> functions, first do a fflush() for all the FILE * descriptors you have written to, or call fflush(NULL); to tell stdio to synch all descriptors in one call. Then do the sync(2) or fsync(2) call to ensure all your data is physically on disk. No need to close anything.
FILE *fd;
...
fflush(fd);
fsync(fileno(fd));
/* here you know that up to the last write(2) or fwrite(3)...
* data is synced to disk */
By the way, your approach of going to /dev/fd/<number> to get the descriptor (that you had previously) is faulty for two reasons:
Once you close your descriptor, /dev/fd/<number> is not anymore the descriptor you want. Normally, it doesn't exist, even. Just try this:
#include <string.h>
#include <stdlib.h>
#include <fcntl.h>
#include <unistd.h>
#include <stdio.h>
#include <errno.h>
int main()
{
int fd;
char fn[] = "/dev/fd/1";
close(1); /* close standard output */
fd = open(fn, O_RDONLY); /* try to reopen from /dev/fd */
if (fd < 0) {
fprintf(stderr,
"%s: %s(errno=%d)\n",
fn,
strerror(errno),
errno);
exit(EXIT_FAILURE);
}
exit(EXIT_SUCCESS);
} /* main */
You cannot get the directory where an open file belongs to with only the file descriptor. In a multilinked file, there can be thousands of directories just pointing to it. There's nothing on the inode (or in the open file structure) that allows you to get the path used to open that file. A common way to use temporary files is just to create them and immediately unlink(2) them, so nobody can open it again. As much as you retain the file open you have access to it, but no path points to it anymore.
Enable the "sync" flag in your filesystem (/etc/fstab), default is "async" (disabled) . When this flag is enabled, all changes to the according filesystem are inmediately flushed to disk. This makes your entire filesystem slow, but depending on your embedded system requirements, this can be a great option to consider.

Linux: Difference between inode and file_inode(file)?

in source/arch/x86/kernel/msr.c, the msr_open callback for the character device uses the following construct to extract the minor number of the character device file used:
static int msr_open(struct inode *inode, struct file *file)
{
unsigned int cpu = iminor(file_inode(file));
[...]
}
My question is:
Why not directly call iminor with the first argument of the function, like:
unsigned int cpu = iminor(inode);
The construct is used in other callbacks (e.g. read and write) as well,, where the inode is not passed as an argument, so I guess this is due to copy/paste, or is there a deeper meaning to it?
An inode is a data structure on a traditional Unix-style file system such as UFS or ext3. An inode stores basic information about a regular file, directory, or other file system object.
- http://www.cyberciti.biz/tips/understanding-unixlinux-filesystem-inodes.html
Same deal.

Are file descriptors for linux sockets always in increasing order

I have a socket server in C/linux. Each time I create a new socket it is assigned a file descriptor. I want to use these FD's as uniqueID's for each client. If they are guaranteed to always be assigned in increasing order (which is the case for the Ubuntu that I am running) then I could just use them as array indices.
So the question: Are the file descriptors that are assigned from linux sockets guaranteed to always be in increasing order?
Let's look at how this works internally (I'm using kernel 4.1.20). The way file descriptors are allocated in Linux is with __alloc_fd. When you do a open syscall, do_sys_open is called. This routine gets a free file descriptor from get_unused_fd_flags:
long do_sys_open(int dfd, const char __user *filename, int flags, umode_t mode)
{
...
fd = get_unused_fd_flags(flags);
if (fd >= 0) {
struct file *f = do_filp_open(dfd, tmp, &op);
get_unused_d_flags calls __alloc_fd setting minimum and maximum fd:
int get_unused_fd_flags(unsigned flags)
{
return __alloc_fd(current->files, 0, rlimit(RLIMIT_NOFILE), flags);
}
__alloc_fd gets the file descriptor table for the process, and gets the fd as next_fd, which is actually set from the previous time it ran:
int __alloc_fd(struct files_struct *files,
unsigned start, unsigned end, unsigned flags)
{
...
fd = files->next_fd;
...
if (start <= files->next_fd)
files->next_fd = fd + 1;
So you can see how file descriptors indeed grow monotonically... up to certain point. When the fd reaches the maximum, __alloc_fd will try to find the smallest unused file descriptor:
if (fd < fdt->max_fds)
fd = find_next_zero_bit(fdt->open_fds, fdt->max_fds, fd);
At this point the file descriptors will not be growing monotonically anymore, but instead will jump trying to find free file descriptors. After this, if the table gets full, it will be expanded:
error = expand_files(files, fd);
At which point they will grow again monotonically.
Hope this helps
FD's are guaranteed to be unique, for the lifetime of the socket. So yes, in theory, you could probably use the FD as an index into an array of clients. However, I'd caution against this for at least a couple of reasons:
As has already been said, there is no guarantee that FDs will be allocated monotonically. accept() would be within its rights to return a highly-numbered FD, which would then make your array inefficient. So short answer to your question: no, they are not guaranteed to be monotonic.
Your server is likely to end up with lots of other open FDs - stdin, stdout and stderr to name but three - so again, your array is wasting space.
I'd recommend some other way of mapping from FDs to clients. Indeed, unless you're going to be dealing with thousands of clients, searching through a list of clients should be fine - it's not really an operation that you should need to do a huge amount.
Do not depend on the monotonicity of file descriptors. Always refer to the remote system via a address:port pair.

Resources