I'm reading the source code in https://wayland-book.com/surfaces/shared-memory.html .
The author create a shared memory using shm_open(), and shm_unlink() it immediately, then ftruncate() the fd to a specific size, mmap() the fd and fill the region with pixels.
I'm so confused why the fd still available after shm_unlink().
according to the man page:
The operation of shm_unlink() is analogous to unlink(2): it removes a shared memory object name, and, once all processes have unmapped the object, de-allocates and destroys the contents of the associated memory region. After a successful shm_unlink(), attempts to shm_open() an object with the same name will fail (unless O_CREAT was specified, in which case a new, distinct object is created).
so shm_unlink() will cause the memory destroyed because there is no process mmap
the region. But how fd still avaliable?
here is the code:
static int
create_shm_file(void)
{
int retries = 100;
do {
char name[] = "/wl_shm-XXXXXX";
randname(name + sizeof(name) - 7);
--retries;
int fd = shm_open(name, O_RDWR | O_CREAT | O_EXCL, S_IRUSR | S_IWUSR);
if (fd >= 0) {
shm_unlink(name); // unlink immediately
return fd;
}
} while (retries > 0 && errno == EEXIST);
return -1;
}
static int
allocate_shm_file(size_t size)
{
int fd = create_shm_file();
if (fd < 0)
return -1;
int ret;
do {
ret = ftruncate(fd, size); //why the fd still available?
} while (ret < 0 && errno == EINTR);
if (ret < 0) {
close(fd);
return -1;
}
return fd;
}
//after above, there was mmap
uint32_t *data = mmap(NULL, size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
The manual mentioned splice() can transfer data between two arbitrary filedescriptors, also onto a socketfd. This works if the file is send at once. Therefore the filesize has to be lower than PIPE_BUF_SZ (=65536).
But, how to handle bigger files? I want to understand the difference to sendfile() syscall. How would you rewrite the sendfile() syscall?
The second splice returns with Invalid argument. I guess it is because the socketfd is not seekable.
size_t len = 800000; //e.g.
static int do_copy(int in_fd, int out_fd)
{
loff_t in_off = 0, out_off = 0;
static int buf_size = 65536;
off_t len;
int filedes[2];
int err = -1;
if(pipe(filedes) < 0) {
perror("pipe:");
goto out;
}
while(len > 0) {
if(buf_size > len) buf_size = len;
/* move to pipe buffer. */
err = splice(in_fd, &in_off, filedes[1], NULL, buf_size, SPLICE_F_MOVE | SPLICE_F_MORE);
if(err < 0) {
perror("splice:");
goto out_close;
}
/* move from pipe buffer to out_fd */
err = splice(filedes[0], NULL, out_fd, &out_off, buf_size, SPLICE_F_MOVE | SPLICE_F_MORE);
if(err < 0) {
perror("splice2:");
goto out_close;
}
len -= buf_size;
}
err = 0;
out_close:
close(filedes[0]);
close(filedes[1]);
out:
return err;
}
sendfile() systemcall does not check if the filedescriptor is seekable. The only check onto that fd is, if you can read (FMODE_READ) onto the fd.
splice() does some more checks. Among others, if the fd is seekable (FMODE_PREAD) / (FMODE_PWRITE).
That's why sendfile works, but splice won't.
I have a piece of code in C, and I need to know where I have the TOCTTOU vulnerability and why. Does somebody know where it is and how I can correct it?
int process(char *filename)
{
struct stat aux;
char buffer[1024];
printf("Input to be appended: ");
fgets(buffer, sizeof(buffer), stdin);
if((lstat(filename, &aux) == 0) && !S_ISLNK(aux.st_mode))
{
printf("[+] Opening\n", filename);
int fd = open(filename, O_RDWR | O_APPEND), nb;
nb = write(fd, buffer, strlen(buffer));
printf("[+] Done!\n");
return 0;
}else
printf("[-] ERROR\n", filename);
return 1;
}
int main(int argc, char * argv[])
{
if(argc != 2){
fprintf(stderr, "usage: %s filename\n", argv[0]);
exit(1);
}
return process(argv[1]);
}
Thanks!!
The use of lstat() provides a TOCTOU vulnerability because the file may be deleted after the lstat() and before the open(). Use open() instead and test the return value is a simple solution for this.
How to write char device drivers in Linux?
A very good example is the Linux "softdog", or software watchdog timer. When loaded, it will watch a special device for writes and take action depending on the frequency of those writes.
It also shows you how to implement a rudamentary ioctl interface, which is very useful.
The file to look at is drivers/watchdog/softdog.c
If you learn by example, that is a very good one to start with. The basic character devices (null, random, etc) as others suggest are also good, but do not adequately demonstrate how you need to implement an ioctl() interface.
A side note, I believe the driver was written by Alan Cox. If your going to learn from example, its never a bad idea to study the work of a top level maintainer. You can be pretty sure that the driver also illustrates adhering to proper Linux standards.
As far as drivers go (in Linux), character drivers are the easiest to write and also the most rewarding, as you can see your code working very quickly. Good luck and happy hacking.
Read this book: Linux Device Drivers published by O'Reilly.
Helped me a lot.
My favorite book for learning how the kernel works, BY FAR (and I've read most of them) is:
Linux Kernel Development (2nd Edition)
This book is fairly short, read it first, then read the O'Reilly book on drivers.
Read linux device driver 3rd edition. And the good thing is start coding. I am just pasting a simple char driver so that you can start coding.
#include<linux/module.h>
#include<linux/kernel.h>
#include<linux/fs.h> /*this is the file structure, file open read close */
#include<linux/cdev.h> /* this is for character device, makes cdev avilable*/
#include<linux/semaphore.h> /* this is for the semaphore*/
#include<linux/uaccess.h> /*this is for copy_user vice vers*/
int chardev_init(void);
void chardev_exit(void);
static int device_open(struct inode *, struct file *);
static int device_close(struct inode *, struct file *);
static ssize_t device_read(struct file *, char *, size_t, loff_t *);
static ssize_t device_write(struct file *, const char *, size_t, loff_t *);
static loff_t device_lseek(struct file *file, loff_t offset, int orig);
/*new code*/
#define BUFFER_SIZE 1024
static char device_buffer[BUFFER_SIZE];
struct semaphore sem;
struct cdev *mcdev; /*this is the name of my char driver that i will be registering*/
int major_number; /* will store the major number extracted by dev_t*/
int ret; /*used to return values*/
dev_t dev_num; /*will hold the major number that the kernel gives*/
#define DEVICENAME "megharajchard"
/*inode reffers to the actual file on disk*/
static int device_open(struct inode *inode, struct file *filp) {
if(down_interruptible(&sem) != 0) {
printk(KERN_ALERT "megharajchard : the device has been opened by some other device, unable to open lock\n");
return -1;
}
//buff_rptr = buff_wptr = device_buffer;
printk(KERN_INFO "megharajchard : device opened succesfully\n");
return 0;
}
static ssize_t device_read(struct file *fp, char *buff, size_t length, loff_t *ppos) {
int maxbytes; /*maximum bytes that can be read from ppos to BUFFER_SIZE*/
int bytes_to_read; /* gives the number of bytes to read*/
int bytes_read;/*number of bytes actually read*/
maxbytes = BUFFER_SIZE - *ppos;
if(maxbytes > length)
bytes_to_read = length;
else
bytes_to_read = maxbytes;
if(bytes_to_read == 0)
printk(KERN_INFO "megharajchard : Reached the end of the device\n");
bytes_read = bytes_to_read - copy_to_user(buff, device_buffer + *ppos, bytes_to_read);
printk(KERN_INFO "megharajchard : device has been read %d\n",bytes_read);
*ppos += bytes_read;
printk(KERN_INFO "megharajchard : device has been read\n");
return bytes_read;
}
static ssize_t device_write(struct file *fp, const char *buff, size_t length, loff_t *ppos) {
int maxbytes; /*maximum bytes that can be read from ppos to BUFFER_SIZE*/
int bytes_to_write; /* gives the number of bytes to write*/
int bytes_writen;/*number of bytes actually writen*/
maxbytes = BUFFER_SIZE - *ppos;
if(maxbytes > length)
bytes_to_write = length;
else
bytes_to_write = maxbytes;
bytes_writen = bytes_to_write - copy_from_user(device_buffer + *ppos, buff, bytes_to_write);
printk(KERN_INFO "megharajchard : device has been written %d\n",bytes_writen);
*ppos += bytes_writen;
printk(KERN_INFO "megharajchard : device has been written\n");
return bytes_writen;
}
static loff_t device_lseek(struct file *file, loff_t offset, int orig) {
loff_t new_pos = 0;
printk(KERN_INFO "megharajchard : lseek function in work\n");
switch(orig) {
case 0 : /*seek set*/
new_pos = offset;
break;
case 1 : /*seek cur*/
new_pos = file->f_pos + offset;
break;
case 2 : /*seek end*/
new_pos = BUFFER_SIZE - offset;
break;
}
if(new_pos > BUFFER_SIZE)
new_pos = BUFFER_SIZE;
if(new_pos < 0)
new_pos = 0;
file->f_pos = new_pos;
return new_pos;
}
static int device_close(struct inode *inode, struct file *filp) {
up(&sem);
printk(KERN_INFO "megharajchard : device has been closed\n");
return ret;
}
struct file_operations fops = { /* these are the file operations provided by our driver */
.owner = THIS_MODULE, /*prevents unloading when operations are in use*/
.open = device_open, /*to open the device*/
.write = device_write, /*to write to the device*/
.read = device_read, /*to read the device*/
.release = device_close, /*to close the device*/
.llseek = device_lseek
};
int chardev_init(void)
{
/* we will get the major number dynamically this is recommended please read ldd3*/
ret = alloc_chrdev_region(&dev_num,0,1,DEVICENAME);
if(ret < 0) {
printk(KERN_ALERT " megharajchard : failed to allocate major number\n");
return ret;
}
else
printk(KERN_INFO " megharajchard : mjor number allocated succesful\n");
major_number = MAJOR(dev_num);
printk(KERN_INFO "megharajchard : major number of our device is %d\n",major_number);
printk(KERN_INFO "megharajchard : to use mknod /dev/%s c %d 0\n",DEVICENAME,major_number);
mcdev = cdev_alloc(); /*create, allocate and initialize our cdev structure*/
mcdev->ops = &fops; /*fops stand for our file operations*/
mcdev->owner = THIS_MODULE;
/*we have created and initialized our cdev structure now we need to add it to the kernel*/
ret = cdev_add(mcdev,dev_num,1);
if(ret < 0) {
printk(KERN_ALERT "megharajchard : device adding to the kerknel failed\n");
return ret;
}
else
printk(KERN_INFO "megharajchard : device additin to the kernel succesful\n");
sema_init(&sem,1); /* initial value to one*/
return 0;
}
void chardev_exit(void)
{
cdev_del(mcdev); /*removing the structure that we added previously*/
printk(KERN_INFO " megharajchard : removed the mcdev from kernel\n");
unregister_chrdev_region(dev_num,1);
printk(KERN_INFO "megharajchard : unregistered the device numbers\n");
printk(KERN_ALERT " megharajchard : character driver is exiting\n");
}
MODULE_AUTHOR("A G MEGHARAJ(agmegharaj#gmail.com)");
MODULE_DESCRIPTION("A BASIC CHAR DRIVER");
//MODULE_LICENCE("GPL");
module_init(chardev_init);
module_exit(chardev_exit);
and this is the make file.
obj-m := megharajchard.o
KERNELDIR ?= /lib/modules/$(shell uname -r)/build
PWD := $(shell pwd)
all:
$(MAKE) -C $(KERNELDIR) M=$(PWD)
clean:
rm -rf *.o *~ core .depend .*.cmd *.ko *.mod.c .tmp_versions
load script. make sure the major number is 251 or else change it accordingly.
#!/bin/sh
sudo insmod megharajchard.ko
sudo mknod /dev/megharajchard c 251 0
sudo chmod 777 /dev/megharajchard
unload script,
#!/bin/sh
sudo rmmod megharajchard
sudo rm /dev/megharajchard
also a c program to test the operation of your device
#include<stdio.h>
#include<fcntl.h>
#include<string.h>
#include<malloc.h>
#define DEVICE "/dev/megharajchard"
//#define DEVICE "megharaj.txt"
int debug = 1, fd = 0;
int write_device() {
int write_length = 0;
ssize_t ret;
char *data = (char *)malloc(1024 * sizeof(char));
printf("please enter the data to write into device\n");
scanf(" %[^\n]",data); /* a space added after"so that it reads white space, %[^\n] is addeed so that it takes input until new line*/
write_length = strlen(data);
if(debug)printf("the length of dat written = %d\n",write_length);
ret = write(fd, data, write_length);
if(ret == -1)
printf("writting failed\n");
else
printf("writting success\n");
if(debug)fflush(stdout);/*not to miss any log*/
free(data);
return 0;
}
int read_device() {
int read_length = 0;
ssize_t ret;
char *data = (char *)malloc(1024 * sizeof(char));
printf("enter the length of the buffer to read\n");
scanf("%d",&read_length);
if(debug)printf("the read length selected is %d\n",read_length);
memset(data,0,sizeof(data));
data[0] = '0\';
ret = read(fd,data,read_length);
printf("DEVICE_READ : %s\n",data);
if(ret == -1)
printf("reading failed\n");
else
printf("reading success\n");
if(debug)fflush(stdout);/*not to miss any log*/
free(data);
return 0;
}
int lseek_device() {
int lseek_offset = 0,seek_value = 0;
int counter = 0; /* to check if function called multiple times or loop*/
counter++;
printf("counter value = %d\n",counter);
printf("enter the seek offset\n");
scanf("%d",&lseek_offset);
if(debug) printf("seek_offset selected is %d\n",lseek_offset);
printf("1 for SEEK_SET, 2 for SEEK_CUR and 3 for SEEK_END\n");
scanf("%d", &seek_value);
printf("seek value = %d\n", seek_value);
switch(seek_value) {
case 1: lseek(fd,lseek_offset,SEEK_SET);
return 0;
break;
case 2: lseek(fd,lseek_offset,SEEK_CUR);
return 0;
break;
case 3: lseek(fd,lseek_offset,SEEK_END);
return 0;
break;
default : printf("unknown option selected, please enter right one\n");
break;
}
/*if(seek_value == 1) {
printf("seek value = %d\n", seek_value);
lseek(fd,lseek_offset,SEEK_SET);
return 0;
}
if(seek_value == 2) {
lseek(fd,lseek_offset,SEEK_CUR);
return 0;
}
if(seek_value == 3) {
lseek(fd,lseek_offset,SEEK_END);
return 0;
}*/
if(debug)fflush(stdout);/*not to miss any log*/
return 0;
}
int lseek_write() {
lseek_device();
write_device();
return 0;
}
int lseek_read() {
lseek_device();
read_device();
return 0;
}
int main()
{
int value = 0;
if(access(DEVICE, F_OK) == -1) {
printf("module %s not loaded\n",DEVICE);
return 0;
}
else
printf("module %s loaded, will be used\n",DEVICE);
while(1) {
printf("\t\tplease enter 1 to write\n \
2 to read\n \
3 to lseek and write\
4 to lseek and read\n");
scanf("%d",&value);
switch(value) {
case 1 :printf("write option selected\n");
fd = open(DEVICE, O_RDWR);
write_device();
close(fd); /*closing the device*/
break;
case 2 :printf("read option selected\n");
/* dont know why but i am suppoesed to open it for writing and close it, i cant keep open and read.
its not working, need to sort out why is that so */
fd = open(DEVICE, O_RDWR);
read_device();
close(fd); /*closing the device*/
break;
case 3 :printf("lseek option selected\n");
fd = open(DEVICE, O_RDWR);
lseek_write();
close(fd); /*closing the device*/
break;
case 4 :printf("lseek option selected\n");
fd = open(DEVICE, O_RDWR);
lseek_read();
close(fd); /*closing the device*/
break;
default : printf("unknown option selected, please enter right one\n");
break;
}
}
return 0;
}
Have a look at some of the really simple standard ones - "null", "zero", "mem", "random", etc, in the standard kernel. They show the simple implementation.
Obviously if you're driving real hardware it's more complicated- you need to understand how to interface with the hardware as well as the subsystem APIs (PCI, USB etc) for your device. You might need to understand how to use interrupts, kernel timers etc as well.
Just check the character driver skeleton from here http://www.linuxforu.com/2011/02/linux-character-drivers/....Go ahead and read all the topics here, understand thoroughly.(this is kinda a tutorial-so play along as said).
See how "copy_to_user" and "copy_from_user" functions work, which you can use in read/write part of the driver function callbacks.
Once you are done with this, start reading a basic "tty" driver.
Focus, more on the driver registration architecture first, which means:-
See what structures are to be filled- ex:- struct file_operations f_ops = ....
Which are the function responsible to register a particular structure with core . ex:- _register_driver.
Once you are done with the above, see what functionality you want with the driver(policy), then think of way to implement that policy(called mechanism)- the policy and mechanism allows you to distinguish between various aspects of the driver.
write compilation makefiles(its hard if you have multiple files- but not tht hard).
Try to resolve the errors and warnings,
and you will be through.
When writing mechanism, never forget what it offers to the applications in user space.
I'm doing a course on operating systems and we work in Linux Red Hat 8.0
AS part of an assignment I had to change sys close and sys open. Changes to sys close passed without an incident, but when I introduce the changes to sys close suddenly the OS encounters an error during booting, claiming it cannot mount root fs, and invokes panic. EIP is reportedly at sys close when this happens.
Here are the changes I made (look for the "HW1 additions" comment):
In fs/open.c:
asmlinkage long sys_open(const char * filename, int flags, int mode)
{
char * tmp;
int fd, error;
event_t* new_event;
#if BITS_PER_LONG != 32
flags |= O_LARGEFILE;
#endif
tmp = getname(filename);
fd = PTR_ERR(tmp);
if (!IS_ERR(tmp)) {
fd = get_unused_fd();
if (fd >= 0) {
struct file *f = filp_open(tmp, flags, mode);
error = PTR_ERR(f);
if (IS_ERR(f))
goto out_error;
fd_install(fd, f);
}
/* HW1 additions */
if (current->record_flag==1){
new_event=(event_t*)kmalloc(sizeof(event_t), GFP_KERNEL);
if (!new_event){
new_event->type=Open;
strcpy(new_event->filename, tmp);
file_queue_add(*new_event, current->queue);
}
}
/* End HW1 additions */
out:
putname(tmp);
}
return fd;
out_error:
put_unused_fd(fd);
fd = error;
goto out;
}
asmlinkage long sys_close(unsigned int fd)
{
struct file * filp;
struct files_struct *files = current->files;
event_t* new_event;
char* tmp = files->fd[fd]->f_dentry->d_name.name;
write_lock(&files->file_lock);
if (fd >= files->max_fds)
goto out_unlock;
filp = files->fd[fd];
if (!filp)
goto out_unlock;
files->fd[fd] = NULL;
FD_CLR(fd, files->close_on_exec);
__put_unused_fd(files, fd);
write_unlock(&files->file_lock);
/* HW1 additions */
if(current->record_flag == 1){
new_event=(event_t*)kmalloc(sizeof(event_t), GFP_KERNEL);
if (!new_event){
new_event->type=Close;
strcpy(new_event->filename, tmp);
file_queue_add(*new_event, current->queue);
}
}
/* End HW1 additions */
return filp_close(filp, files);
out_unlock:
write_unlock(&files->file_lock);
return -EBADF;
}
The task_struct defined in schedule.h was changed at the end to include:
unsigned int record_flag; /* when zero: do not record. when one: record. */
file_queue* queue;
And file queue as well as event t are defined in a separate file as follows:
typedef enum {Open, Close} EventType;
typedef struct event_t{
EventType type;
char filename[256];
}event_t;
typedef struct file_quque_t{
event_t queue[101];
int head, tail;
}file_queue;
file queue add works like this:
void file_queue_add(event_t event, file_queue* queue){
queue->queue[queue->head]=event;
queue->head = (queue->head+1) % 101;
if (queue->head==queue->tail){
queue->tail=(queue->tail+1) % 101;
}
}
if (!new_event) {
new_event->type = …
That's equivalent to if (new_event == NULL). I think you mean if (new_event != NULL), which the kernel folks typically write as if (new_event).
Can you please post the stackdump of the error. I don't see a place where queue_info structure is allocated memory. One more thing is you cannot be sure that process record_flag will be always zero if unassigned in kernel, because kernel is a long running program and memory contains garbage.
Its also possible to check the exact location in the function is occurring by looking at the stack trace.