What does fget_light in Linux work? - linux

I'm studying about System Calls in Linux and I read the read() System Calls.
SYSCALL_DEFINE3(read, unsigned int, fd, char __user *, buf, size_t, count)
{
struct file *file;
ssize_t ret = -EBADF;
int fput_needed;
file = fget_light(fd, &fput_needed);
if (file) {
loff_t pos = file_pos_read(file);
ret = vfs_read(file, buf, count, &pos);
file_pos_write(file, pos);
fput_light(file, fput_needed);
}
return ret;
}
This is the definition of fget_light()
struct file *fget_light(unsigned int fd, int *fput_needed)
{
struct file *file;
struct files_struct *files = current->files;
*fput_needed = 0;
if (likely((atomic_read(&files->count) == 1))) {
file = fcheck_files(files, fd);
} else {
rcu_read_lock();
file = fcheck_files(files, fd);
if (file) {
if (atomic_long_inc_not_zero(&file->f_count))
*fput_needed = 1;
else
/* Didn't get the reference, someone's freed */
file = NULL;
}
rcu_read_unlock();
}
return file;
}
Can you explain me, what does fget_light do?

Each task has a file descriptor table. This file descriptor table is indexed by file descriptor number, and contains information (file descriptions) about each open file.
As many other objects in the kernel, file descriptions are reference-counted. This means that when some part of the kernel wants to access a file description, it has to take a reference, do whatever it needs to do, and release the reference. When the reference count drops to zero, the object can be freed. For file descriptions, open() increments the reference count and close() decrements it, so file descriptions cannot be released while they are open and/or the kernel is using them (e.g: imagine a thread in your process close()ing a file while another thread is still read()ing the file: the file description will not actually be released until the read fput()s its reference).
To get a reference to a file description from a file descriptor, the kernel has the function fget(), and fput() releases that reference. Since several threads may be accessing the same file description at the same time on different CPUs, fget() and fput() must use appropriate locking. In modern times they use RCU; mere readers of the file descriptor table incur no/almost no cost.
But RCU is not enough optimization. Consider that it's very common to have processes which are not multi-threaded. In this case you don't have to worry about other threads from the same process accessing the same file description. The only task with access to our file descriptor table is us. So, as an optimization, fget_light()/fput_light() don't touch the reference count when the current file descriptor table is only used in a single task.
struct file *fget_light(unsigned int fd, int *fput_needed)
{
struct file *file;
/* The file descriptor table for our _current_ task */
struct files_struct *files = current->files;
/* Assume we won't need to touch the reference count,
* since the count won't reach zero (we are not close(),
* and hope we don't run concurrently to close()),
* fput_light() won't actually need to fput().
*/
*fput_needed = 0;
/* Check whether we are actually the only task with access to the fd table */
if (likely((atomic_read(&files->count) == 1))) {
/* Yep, get the reference to the file description */
file = fcheck_files(files, fd);
} else {
/* Nope, we'll need some locking */
rcu_read_lock();
/* Get the reference to the file description */
file = fcheck_files(files, fd);
if (file) {
/* Increment the reference count */
if (atomic_long_inc_not_zero(&file->f_count))
/* fput_light() will actually need to fput() */
*fput_needed = 1;
else
/* Didn't get the reference, someone's freed */
/* Happens if the file was close()d and all the
* other accessors ended its work and fput().
*/
file = NULL;
}
rcu_read_unlock();
}
return file;
}

Basically, the function translates the fd passed by the user to the syscall to the kernel-internal file structure pointer by calling the fcheck_files function that looks into the file table of the process (that would be its files parameter). For more information, read this.

Related

How to create a file with content using debugfs in kernel module?

With this debugfs API I can create a file in /sys/kernel/debug/parent/name, but it's empty, no matter which data I put in void *data parameter
struct dentry *debugfs_create_file(const char *name, mode_t mode, struct dentry *parent, void *data, struct file_operations *fops);
According to documentation we need to implement file_operations ourself to handle file open and write.
A snippet of code from mine:
static ssize_t myreader(struct file *fp, char __user *user_buffer,
size_t count, loff_t *position)
{
return simple_read_from_buffer(user_buffer, count, position, ker_buf, len);
}
static ssize_t mywriter(struct file *fp, const char __user *user_buffer,
size_t count, loff_t *position)
{
if(count > len )
return -EINVAL;
return simple_write_to_buffer(ker_buf, len, position, user_buffer, count);
}
static const struct file_operations fops_debug = {
.read = myreader,
.write = mywriter,
};
static int __init init_debug(void)
{
dirret = debugfs_create_dir("dell", NULL);
fileret = debugfs_create_file("text", 0644, dirret, "HELLO WORLD", &fops_debug);
debugfs_create_u64("number", 0644, dirret, &intvalue);
return (0);
}
After installing this module to kernel, two files 'text' and 'number' will be created in the folder 'dell'. File 'number' contains the number I passed in as 'intvalue' as expected, but the other file 'text' is empty.
It's written in document that data will be stored in the i_private field of the resulting inode structure
My expectation: The string "HELLO WORLD" will be written in the file after module is loaded.
I think that the problem should be in the read and write operation functions. Is it possible to create a file with a particular content with the debugfs_create_file method?
To answer your question, whatever you are expecting from your code is correct but it is not going to produce the expected result. I believe there are other more efficient and correct ways of doing it, but to explain the current behavior:
You are initializing data as content of file text but you are reading from buffer ker_buf in user_buffer instead of file pointer using simple_read_from_buffer(user_buffer, count, position, ker_buf, len);
Similarly you are writing to kern_buf from user_buffer using simple_write_to_buffer(ker_buf, len, position, user_buffer, count);
With the existing code, if you want to achieve what you are trying to do, then you have to copy the string "HELLO WORLD" to kern_buf in init_debug()
Something like:
strscpy(kern_buf, "HELLO WORLD", strlen("HELLO WORLD") + 1);
or in form of complete function:
static int __init init_debug(void)
{
dirret = debugfs_create_dir("dell", NULL);
fileret = debugfs_create_file("text", 0644, dirret, NULL, &fops_debug);
debugfs_create_u64("number", 0644, dirret, &intvalue);
strscpy(kern_buf, "HELLO WORLD", strlen("HELLO WORLD") + 1);
return (0);
}
Edit:
Referred some online materials and found out that the void *data provided to debugfs_create_file() during initialization gets stored in the i_private field and can be later retrieved from the i_private field of the resulting inode structure.
The inode of the respective file can be fetched from struct file *fp which is the first argument of read() or write() operations.
The struct inode is a member of struct file and i_private is a member of struct inode
To fetch void *data provided during file creation via debugfs_create_file() in read() you can do something similar to as shown below:
static ssize_t myreader(struct file *fp, char __user *user_buffer,
size_t count, loff_t *position)
{
struct inode *l_inode = fp->f_inode;
strscpy(user_buffer, (char *)l_inode->i_private, PAGE_SIZE);
...
}

Multhreaded programming in C

I have been given an assignment. There is a dictionary of 25 files and each file has random text involving random IP addresses. The task is to find out and output the count of unique IP addresses among all files using the pthread library in C.
I think I have solved the race condition on count variable by mutual exclusion. But, still there is a bug and the code has different count value in each execution.
Here is the code, please suggest fixes for the bug:
#include <stdio.h>
#include <stdlib.h>
#include <sys/types.h>
#include <dirent.h>
#include <pthread.h>
#include <string.h>
//declaring structure of arguments to give arguments to thread function
struct arg_struct
{
char *arg1; //argument 1 : to pass directory name to thread function
struct dirent *arg2; //argument 2: to pass file name to thread function
};
//declaring structure of pointer which will point unique ip addresses
struct uniqueip
{
char *ip;
};
struct filenames
{
char full_filename[256];
};
struct uniqueip u[200];
int count=0;// global count variable stores total unique ip addresses.
void *ReadFile(void *thread_no);//thread declaration
pthread_mutex_t mutex;
int main(int argc, char *argv[])
{
DIR *dir; //directory stream
FILE *file; //file stream
struct dirent *ent; // directory entry structure
char *line = NULL; // pointer to
size_t len = 1000; //the length of bytes getline will allocate
size_t read;
char full_filename[256]; //will hold the entire file path with
//file name to read
int x=0;
pthread_attr_t attr;
int rc;
long thread_no;
void *status;
void *ReadFile(void *thread_no);
// check the arguments
if(argc < 2)
{
printf("Not enough arguments supplied\n");
return -1;
}
if(argc > 2)
{
printf("Too many arguments supplied\n");
return -1;
}
struct arg_struct args;
args.arg1 = argv[1];
pthread_mutex_init(&mutex, NULL); // initializing mutex
/* Initialize and set thread detached attribute */
pthread_attr_init(&attr);
pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_JOINABLE);
// try to open the directory given by the argument
if ((dir = opendir (argv[1])) != NULL)
{
/* print all the files and directories within directory */
while ((ent = readdir (dir)) != NULL)
{
// Check if the list is a regular file
if(ent->d_type == DT_REG)
{
//Get the number of files first so that we would know number
//of threads to be created
x++;
}
}
}
pthread_t thread[x];
struct filenames filenames[x];
thread_no=0;
// try to open the directory given by the argument
if ((dir = opendir (argv[1])) != NULL)
{
/* print all the files and directories within directory */
while ((ent = readdir (dir)) != NULL)
{
// Check if the list is a regular file
if(ent->d_type == DT_REG)
{
// Create the absolute path of the filename
snprintf(filenames[thread_no].full_filename, sizeof filenames[thread_no].full_filename,
"./%s/%s", argv[1], ent->d_name);
//creating threads to read files
args.arg2 = ent; //assigning file name to argument 2
printf("main: creating thread %ld %s \n", thread_no,ent->d_name);
rc = pthread_create(&thread[thread_no], &attr, ReadFile, (void *) &args);
if (rc)
{
printf("ERROR; return code from pthread_create() is %d\n", rc);
exit(-1);
}
thread_no++;
}
}
// Close the directory structure
closedir (dir);
}
else
{
/* could not open directory */
perror ("");
return -1;
}
/* Free attribute and wait for the other threads*/
pthread_attr_destroy(&attr);
for(thread_no=0; thread_no<x; thread_no++)
{
rc = pthread_join(thread[thread_no], &status);
if (rc)
{
printf("ERROR; return code from pthread_join() is %d\n", rc);
exit(-1);
}
printf("Main: completed join with thread %ld having a status of %ld\n",thread_no,(long)status);
}
printf("Main: program completed. Exiting.\n");
printf("total no. of unique ip addresses are %d\n",count-1);
pthread_mutex_destroy(&mutex);
pthread_exit(NULL);
return 0;
}
void *ReadFile(void *thread_no)
{ // in thread function
struct filenames *my_data;
my_data = (struct filenames *)thread_no;
char full_filename[256];
FILE *file; //file stream
char *line = NULL;
char *split = NULL;
size_t len = 1000; // pointer to the length of bytes getline will allocate
size_t read;
const char s[2]=" "; //used as string split to get ip address
char *token;
int flag = 0,j;
// open the file
file = fopen(my_data -> full_filename, "r");
// file was not able to be open
if (file != NULL)
{
// Print out each line in the file
while ((read = getline(&line, &len, file)) != -1)
{
split=line;
token = strtok(split,s);
pthread_mutex_lock(&mutex);
if(count==0){
//locking mutex variable to avoid race condition
u[count].ip=malloc(sizeof(token)+1);
strcpy(u[count].ip,token);
printf("%d ------ %s\n",count,u[count].ip);
free(u[count].ip);
count++;
}
pthread_mutex_unlock(&mutex); // unlocking mutex
//comparing recently received ip address to all the stored unique ip address.
for(j=0;j<count;j++)
{
if(!(strcmp(u[j].ip,token)))
{
break;
}
else
{
if(j==count-1){
pthread_mutex_lock(&mutex); //locking mutex variable to avoid race condition
u[count].ip=malloc(sizeof(read));
strcpy(u[count].ip,token);
printf("%d ------ %s\n",count,u[count].ip);
count++;
free(u[count].ip);
pthread_mutex_unlock(&mutex); // unlocking mutex
}
}
}
}
}
fclose(file);
pthread_exit((void*) thread_no);
}
There's several issues in this code.
You only ever create one instance of arg_struct, but you re-use it and pass it to every thread. This means that by the time a thread starts, the value of the arg_struct you passed it may have changed. You need to give each thread its own arg_struct - eg. you could declare an array of them alongside the pthread_t array:
pthread_t thread[x];
struct arg_struct args[x];
A similar problem exists with the struct dirent * pointer inside arg_struct - the data pointed to by the struct dirent * returned by readdir() may be overwritten by the next call to readdir() on the same directory stream. There are a few ways to solve this, but one way is to replace the char *arg1; and struct dirent * in arg_struct with a buffer to hold the filename:
struct arg_struct
{
char full_filename[256]; //will hold the entire file path with
//file name to read
};
The main thread can then be changed to put the filename straight into the arg_struct:
snprintf(args[thread_no].full_filename, sizeof args[thread_no].full_filename, "./%s/%s", argv[1], ent->d_name);
In the ReadFile() function, this creates an array of one element and then tries to write to the (non-existent) second element, which has undefined behaviour:
char * argv[1];
argv[1]= my_data->arg1;
That code can be removed entirely, though - now that main() is constructing the full filename for the thread, the thread can just directly open it from the the arg_struct:
file = fopen(my_data->full_filename, "r");
(The thread doesn't need to worry about argv[1] at all anymore).
Your thread function is reading the shared count variable without holding the mutex - you need to lock the mutex before executing if (count == 0), and don't unlock it until after the for () loop (otherwise, you might get two threads deciding to add an IP to the same array location).
When you try to create a copy of the string you want to store, you aren't allocating enough space: sizeof read is always the fixed size of a size_t variable and isn't related to the size of the string you're copying. You want:
u[count].ip = malloc(strlen(token) + 1);
strcpy(u[count].ip, token);
You don't want to immediately free the u[count].ip, either: you need that string to stay allocated. Remove the free(u[count].ip); lines.
There's some easy optimisations you could make, once you get it working. For example, because count only increases and the u[] array is static below the value of count, you can lock the mutex, save a copy of count then unlock the mutex. Loop up to the saved value of count - if you find the string then you can just move straight onto the next line of your input file. It's only if you don't find the string that you need to re-lock the mutex, then continue from the saved count value up to the current count value (which might have increased in the meantime), adding the new string to the array (and incrementing count) if nececssary.

copy_to_user not working in kernel module

I was trying to use copy_to_user in kernel module read function, but am not able to copy the data from kernel to user buffer. Please can anyone tell me if I am doing some mistake. My kernel version is 2.6.35. I am giving the portion of kernel module as well as the application being used to test it. Right now my focus is why this copy_to_user is not working. Any help will great.
///////////////////////////////////kernel module//////////////////////////////////////
#define BUF_LEN 80
static char msg[BUF_LEN];
static char *msg_Ptr;
static int device_open(struct inode *inode, struct file *file)
{
static int counter = 0;
if (Device_Open)
return -EBUSY;
Device_Open++;
printk(KERN_ALERT "In open device call\n");
sprintf(msg, "I already told you %d times Hello world!\n", counter++);
msg_Ptr = msg;
try_module_get(THIS_MODULE);
return SUCCESS;
}
static ssize_t device_read(struct file *filp,
char __user *buffer,
size_t length,
loff_t * offset)
{
/*
* Number of bytes actually written to the buffer
*/
int bytes_read = 0;
/*
* If we are at the end of the message,
* return 0 signifying end of file
*/
if (*msg_Ptr == 0)
return 0;
/*
* Actually put the data into the buffer
*/
else {
bytes_read=copy_to_user(buffer, msg, length);
if (bytes_read==-1);
{
printk(KERN_INFO "Error in else while copying the data \n");
}
}
return bytes_read;
}
////////////////////////////////////////application////////////////////////
#include <stdio.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <stdlib.h>
#define BUF_SIZE 40
int main()
{
ssize_t num_bytes;
int fd, n=0;
char buf[BUF_SIZE];
fd=open("/dev/chardev", O_RDWR);
if(fd== -1){perror("Error while opening device");exit(1);}
printf("fd=%d\n",fd);
num_bytes=read(fd, buf, BUF_SIZE);
if(num_bytes==-1){perror("Error while reading"); exit(2);}
printf("The value fetched is %lu bytes\n", num_bytes);
while(n<=num_bytes)
{
printf("%c",buf[n]);
n++;
}
close(fd);
return 0;
}
There are a few problems in the code snippet you wrote. First of all, it is not a good thing to make the call try_module_get(THIS_MODULE);
This statement tries to increase the refcount of the module ... in the module itself ! Instead, you should set the owner field of the file_ops structure to THIS_MODULE in your init method. This way, the reference handling will happen outside the module code, in the VFS layer. You might take a look at Linux Kernel Modules: When to use try_module_get / module_put.
Then, as it was stated by Vineet you should retrieve the pointer from the file_ops private_data field.
And last but not least, here is the reason why it seems an error happened while ... Actually ... It did not :
The copy_to_user call returns 0 if it has successfully copied all the desired bytes into the destination memory area and a strictly positive value stating the number of bytes that were NOT copied in case of error. That said, when you run :
/* Kernel part */
bytes_read=copy_to_user(buffer, msg, length);
/*
* Wrong error checking :
* In the below statement, "-1" is viewed as an unsigned long.
* With a simple equality test, this will not bother you
* But this is dangerous with other comparisons like "<" or ">"
* (unsigned long)(-1) is at least 2^32 - 1 so ...
*/
if (-1 == bytes_read) {
/* etc. */
}
return bytes_read;
/* App part */
num_bytes=read(fd, buf, BUF_SIZE);
/* etc.. */
while(n<=num_bytes) {
printf("%c",buf[n]);
n++;
}
You should only get one character upon a successful copy, that is only a single "I" in your case.
Moreover, you use your msg_Ptr pointer as a safeguard but you never update it. This might result in a wrong call to copy_to_user.
copy_to_user checks the user-space pointer with a call to access_ok, but if the kernel-space pointer and the given length are not allright, this might end in a Kernel Oops/Panic.
I think you should update the file->private_data in open and then you have to fetch that in your structure. Because I guess the msg buffer ( kernel buffer ) is not getting proper refernce.

How do ioctls know which function to call in linux?

So when I call an ioctl on a device, with an ioctl number, how does it know which function to call?
The ioctl(2) enters via the fs/ioctl.c function:
SYSCALL_DEFINE3(ioctl, unsigned int, fd, unsigned int, cmd, unsigned long, arg)
{
struct file *filp;
int error = -EBADF;
int fput_needed;
filp = fget_light(fd, &fput_needed);
if (!filp)
goto out;
error = security_file_ioctl(filp, cmd, arg);
if (error)
goto out_fput;
error = do_vfs_ioctl(filp, fd, cmd, arg);
out_fput:
fput_light(filp, fput_needed);
out:
return error;
}
Note that there is already a filedescriptor fd associated. The kernel then calls fget_light() to look up a filp (roughly, file pointer, but don't confuse this with the standard IO FILE * file pointer). The call into security_file_ioctl() checks whether the loaded security module will allow the ioctl (whether by name, as in AppArmor and TOMOYO, or by labels, as in SMACK and SELinux), as well as whether or not the user has the correct capability (capabilities(7)) to make the call. If the call is allowed, then do_vfs_ioctl() is called to either handle common ioctls itself:
switch (cmd) {
case FIOCLEX:
set_close_on_exec(fd, 1);
break;
/* ... */
If none of those common cases are correct, then the kernel calls a helper routine:
static long vfs_ioctl(struct file *filp, unsigned int cmd,
unsigned long arg)
{
int error = -ENOTTY;
if (!filp->f_op || !filp->f_op->unlocked_ioctl)
goto out;
error = filp->f_op->unlocked_ioctl(filp, cmd, arg);
if (error == -ENOIOCTLCMD)
error = -EINVAL;
out:
return error;
}
Drivers supply their own .unlocked_ioctl function pointer, like this pipe implementation in fs/pipe.c:
const struct file_operations rdwr_pipefifo_fops = {
.llseek = no_llseek,
.read = do_sync_read,
.aio_read = pipe_read,
.write = do_sync_write,
.aio_write = pipe_write,
.poll = pipe_poll,
.unlocked_ioctl = pipe_ioctl,
.open = pipe_rdwr_open,
.release = pipe_rdwr_release,
.fasync = pipe_rdwr_fasync,
};
There's a map in the kernel. You can register your own ioctl codes if you write a driver.
Edit: I wrote an ATA over Ethernet driver once and implemented a custom ioctl for tuning the driver at runtime.
A simplified explanation:
The file descriptor you pass to ioctl points to the inode structure that represents the device you are going to ioctl.
The inode structure contains the device number dev_t i_rdev, which is used as an index to find the device driver's file_operations structure. In this structure, there is a pointer to the ioctl function defined by the device driver.
You can read Linux Device Drivers, 3rd Edition for a more detailed explanation. It may be a bit outdated, but a good read nevertheless.

Kernel Panic after changes in sys_close

I'm doing a course on operating systems and we work in Linux Red Hat 8.0
AS part of an assignment I had to change sys close and sys open. Changes to sys close passed without an incident, but when I introduce the changes to sys close suddenly the OS encounters an error during booting, claiming it cannot mount root fs, and invokes panic. EIP is reportedly at sys close when this happens.
Here are the changes I made (look for the "HW1 additions" comment):
In fs/open.c:
asmlinkage long sys_open(const char * filename, int flags, int mode)
{
char * tmp;
int fd, error;
event_t* new_event;
#if BITS_PER_LONG != 32
flags |= O_LARGEFILE;
#endif
tmp = getname(filename);
fd = PTR_ERR(tmp);
if (!IS_ERR(tmp)) {
fd = get_unused_fd();
if (fd >= 0) {
struct file *f = filp_open(tmp, flags, mode);
error = PTR_ERR(f);
if (IS_ERR(f))
goto out_error;
fd_install(fd, f);
}
/* HW1 additions */
if (current->record_flag==1){
new_event=(event_t*)kmalloc(sizeof(event_t), GFP_KERNEL);
if (!new_event){
new_event->type=Open;
strcpy(new_event->filename, tmp);
file_queue_add(*new_event, current->queue);
}
}
/* End HW1 additions */
out:
putname(tmp);
}
return fd;
out_error:
put_unused_fd(fd);
fd = error;
goto out;
}
asmlinkage long sys_close(unsigned int fd)
{
struct file * filp;
struct files_struct *files = current->files;
event_t* new_event;
char* tmp = files->fd[fd]->f_dentry->d_name.name;
write_lock(&files->file_lock);
if (fd >= files->max_fds)
goto out_unlock;
filp = files->fd[fd];
if (!filp)
goto out_unlock;
files->fd[fd] = NULL;
FD_CLR(fd, files->close_on_exec);
__put_unused_fd(files, fd);
write_unlock(&files->file_lock);
/* HW1 additions */
if(current->record_flag == 1){
new_event=(event_t*)kmalloc(sizeof(event_t), GFP_KERNEL);
if (!new_event){
new_event->type=Close;
strcpy(new_event->filename, tmp);
file_queue_add(*new_event, current->queue);
}
}
/* End HW1 additions */
return filp_close(filp, files);
out_unlock:
write_unlock(&files->file_lock);
return -EBADF;
}
The task_struct defined in schedule.h was changed at the end to include:
unsigned int record_flag; /* when zero: do not record. when one: record. */
file_queue* queue;
And file queue as well as event t are defined in a separate file as follows:
typedef enum {Open, Close} EventType;
typedef struct event_t{
EventType type;
char filename[256];
}event_t;
typedef struct file_quque_t{
event_t queue[101];
int head, tail;
}file_queue;
file queue add works like this:
void file_queue_add(event_t event, file_queue* queue){
queue->queue[queue->head]=event;
queue->head = (queue->head+1) % 101;
if (queue->head==queue->tail){
queue->tail=(queue->tail+1) % 101;
}
}
if (!new_event) {
new_event->type = …
That's equivalent to if (new_event == NULL). I think you mean if (new_event != NULL), which the kernel folks typically write as if (new_event).
Can you please post the stackdump of the error. I don't see a place where queue_info structure is allocated memory. One more thing is you cannot be sure that process record_flag will be always zero if unassigned in kernel, because kernel is a long running program and memory contains garbage.
Its also possible to check the exact location in the function is occurring by looking at the stack trace.

Resources