ioctl() call resets file descriptor to 0 - linux

Consider the following code:
file_fd = open(device, O_RDWR);
if (file_fd < 0) {
perror("open");
return -1;
}
printf("File descriptor: %d\n", file_fd);
uint32_t DskSize;
if (ioctl(file_fd, BLKGETSIZE, &DskSize) < 0) {
perror("ioctl");
return -1;
}
printf("File descriptor after: %d\n", file_fd);
This snippet yields this:
File descriptor: 3
File descriptor after: 0
Why does my file descriptor get reset to 0? The program writes the stuff out to stdout instead of my block device.
This should not happen. I expect my file_fd to be non-zero and retain its value.

Looks like you smash your stack.
Since there are only two stack variables file_fd and DskSize and changing DskSize changes file_fd suggests that DiskSize must be unsigned long or size_t (a 64-bit value), not uint32_t.
Looking at BLKGETSIZE implementation confirms that the value type is unsigned long.
You may like to run your applications under valgrind, it reports this kind of errors.

Related

Does every call to write sends switches to kernel mode?

I know that a call to the glibc "write" function calls in it's turn to the sys_call write function which is a kernel function. because sys_call is a kernel function the CPU has to change the ring to zero store the processes registers and so on.
But does it always switches to kernel mode? for example, if i do
write(-1,buffer,LENGTH)
does it still tries to find it in the file descriptors array?
I see in the glibc source code that it does check for fd>0 but i don't see any jump to the sys_call there (it seems like the baracks for main() ends before any call to the alias_write.
/* Write NBYTES of BUF to FD. Return the number written, or -1. */
ssize_t
__libc_write (int fd, const void *buf, size_t nbytes)
{
if (nbytes == 0)
return 0;
if (fd < 0)
{
__set_errno (EBADF);
return -1;
}
if (buf == NULL)
{
__set_errno (EINVAL);
return -1;
}
__set_errno (ENOSYS);
return -1;
}
libc_hidden_def (__libc_write)
stub_warning (write)
weak_alias (__libc_write, __write)
libc_hidden_weak (__write)
weak_alias (__libc_write, write)
#include <stub-tag.h>
So the question is both:
Where does the glibc actually calls the sys_write
Is it true that glibc doesn't call the sys_write if fd<0?
I see in the glibc source code that it does check for fd>0 but i don't see any jump to the sys_call there
You are looking at the wrong code.
There are multiple definitions of __libc_write used under different conditions. The one you looked at is in io/write.c.
The one that is actually used on Linux is generated from sysdeps/unix/syscall-template.S and it does actually execute the switch to kernel mode (and back to user mode) even when fd==-1, etc.

'echo' calls .write function INFINITE times

Context
I wrote a Linux device driver in which the functions read and write are implemented. The problem is with the function write, here the portion of the code:
ssize_t LED_01_write(struct file *filp, const char __user *buf, size_t count, loff_t *f_pos)
{
int retval = 0;
PDEBUG(" reading from user space -> wrinting in kernel space\n");
//struct hello_dev *dev = filp->private_data;
if (count > COMMAND_MAX_LENGHT){
printk(KERN_WARNING "[LEO] LED_01: trying to write more than possible. Aborting write\n");
retval = -EFBIG;
goto out;
}
if (down_interruptible(&(LED_01_devices->sem_LED_01))){
printk(KERN_WARNING "[LEO] LED_01: Device was busy. Operation aborted\n");
return -ERESTARTSYS;
}
if (copy_from_user((void*)&(LED_01_devices-> LED_value), buf, count)) {
printk(KERN_WARNING "[LEO] LED_01: can't use copy_from_user. \n");
retval = -EPERM;
goto out_and_Vsem;
}
write_status_to_LED();
PDEBUG(" Value instert: %u \n", LED_01_devices-> LED_value);
out_and_Vsem:
write_times++;
up(&(LED_01_devices->sem_LED_01));
out:
return retval;
}
Question
If I use the module in a C compiled program, it works properly, as expected.
When I execute echo -n 1 > /dev/LED_01 (from the Command LINE), it writes INFINITE times and, even with the Ctrl+C it doesn't stop. I need to reboot.
Here the snipped code of the test function that works properly:
// ON
result = write(fd, (void*) ON_VALUE, 1);
if ( result != 0 ){
printf("Oh dear, something went wrong with write()! %s\n", strerror(errno));
}
else{
printf("write operation executed succesfully (%u)\n",ON_VALUE[0]);
}
Is the problem in the driver or in the way I use echo?
If you need to whole source code, all the file used are stored in this git repository folder
Value returned by the kernel's .write function is interpreted as:
error code, if it is less than zero (<0),
number of bytes written, if it is more than or equal to zero (>=0)
So, for tell user that all bytes has been written, .write function should return its count parameter.
In case of .write function, returning zero has a little sense: every "standard" utility like echo will just call write() function again.

Logical block number to address (Linux filesystem)

int block_count;
struct stat statBuf;
int block;
int fd = open("file.txt", O_RDONLY);
fstat(fd, &statBuf);
block_count = (statBuf.st_size + statBuf.st_blksize - 1) / statBuf.st_blksize;
int i,add;
for(i = 0; i < block_count; i++) {
block = i;
if (ioctl(fd, FIBMAP, &block)) {
perror("FIBMAP ioctl failed");
}
printf("%3d %10d\n", i, block);
add = block;
}
char buffer[255];
int fd2 = open("/dev/sda1", O_RDONLY);
lseek(fd2, add, SEEK_SET);
read(fd2, buffer, 20);
printf("%s%s\n","ss ",buffer);
Output:
0 5038060
1 5038061
2 5038062
3 5038063
4 5038064
5 5038065
ss
I am using the above code to get the logical block number of a file. Lets suppose I want to read the contents of the last block number, How would I do that?
Is there a way to get the address of a block from logical block number?
PS: I am using linux and filesystem is ext4
The above code actually is getting you the physical block number. The FIBMAP ioctl takes as input the logical block number, and returns the physical block number. You can then multiply that by the blocksize (which is 4k for most ext4 file systems, but you can get that by using the BLKBSZGET ioctl) to get the byte offset if you want to read that disk block using lseek and read.
Note that the more modern interface that people tend to use today is the FIEMAP interface. This doesn't require root, and returns the physical byte offset, plus a lot more data. For more information, please see:
https://www.kernel.org/doc/Documentation/filesystems/fiemap.txt
Or you can look at the source code for the filefrag command, which is part of e2fsprogs:
https://git.kernel.org/cgit/fs/ext2/e2fsprogs.git/tree/misc/filefrag.c

How to get errno when epoll_wait returns EPOLLERR?

Is there a way to find out the errno when epoll_wait returns EPOLLERR for a particular fd?
Is there any further information about the nature of the error?
Edit:
Adding more information to prevent ambiguity
epoll_wait waits on a number of file descriptors. When you call epoll_wait you pass it an array of epoll_event structures:
struct epoll_event {
uint32_t events; /* Epoll events */
epoll_data_t data; /* User data variable */
};
The epoll_data_t structure has the same details as the one you used with epoll_ctl to add a file descriptor to epoll:
typedef union epoll_data {
void *ptr;
int fd;
uint32_t u32;
uint64_t u64;
} epoll_data_t;
What I'm looking for is what happens when there is an error on one of the file descriptors that epoll is waiting on.
ie: (epoll_event.events & EPOLLERR) == 1 - is there a way to find out more details of the error on the file descriptor?
Use getsockopt and SO_ERROR to get the pending error on the socket
int error = 0;
socklen_t errlen = sizeof(error);
if (getsockopt(fd, SOL_SOCKET, SO_ERROR, (void *)&error, &errlen) == 0)
{
printf("error = %s\n", strerror(error));
}
Just a minor point: Your test won't work correctly, for two reasons. If EPOLLERR is defined as, say, 0x8, then your test will be comparing 8 with one (since == has higher precedence than &), giving you a zero, then anding that with the event mask.
What you want is: (epoll_event.events & EPOLLERR) != 0 to test for the EPOLLERR bit being set.
epoll_wait returns -1 when an error occurs and sets errno appropriately. See "man 2 epoll_wait" for more info.
Include errno.h and use perror to see the error message.
Basically error is from the epfd or interupt, it will not arise from the file descriptor in your set.
include "errno.h"
if(epoll_wait() == -1)
{
perror("Epoll error : ");
}

nonblocking I/O behavior is weird for STDIN_FILENO and STDOUT_FILENO

I have the following code:
void
set_fl(int fd, int flags) /* flags are file status flags to turn on */
{
int val;
if ((val = fcntl(fd, F_GETFL, 0)) < 0)
err_sys("fcntl F_GETFL error");
val |= flags; /* turn on flags */
if (fcntl(fd, F_SETFL, val) < 0)
err_sys("fcntl F_SETFL error");
}
int
main(void)
{
char buf[BUFSIZ];
set_fl(STDOUT_FILENO, O_NONBLOCK); //set STDOUT_FILENO to nonblock
if(read(STDIN_FILENO, buf, BUFSIZ)==-1) { //read from STDIN_FILENO
printf("something went wrong with read()! %s\n", strerror(errno));
}
}
As you can see, I set STDOUT_FILENO to non-blocking mode but it seems the read operation on STDIN_FILENO finished immediately. Why?
$ ./testprog
something went wrong with read()! Resource temporarily unavailable
Thanks
That's exactly right: doing a print of errno and a perror call immediately after the read results in a "resource busy" and an error number of 11, or EAGAIN/EWOULDBLOCK, as shown in this code:
#include <stdio.h>
#include <errno.h>
#include <unistd.h>
#include <fcntl.h>
int main (void) {
char buf;
fcntl (STDOUT_FILENO, F_SETFL, fcntl (STDOUT_FILENO, F_GETFL, 0) | O_NONBLOCK);
fprintf (stderr, "%5d: ", errno); perror("");
read (STDIN_FILENO, &buf, 1);
fprintf (stderr, "%5d: ", errno); perror("");
}
which generates:
0: Success
11: Resource temporarily unavailable
The reason is that file descriptors have two different types of flags (see here in the section detailing duplicating file descriptors):
You can duplicate a file descriptor, or allocate another file descriptor that refers to the same open file as the original. Duplicate descriptors share one file position and one set of file status flags (see File Status Flags), but each has its own set of file descriptor flags (see Descriptor Flags).
The first is file descriptor flags and these are indeed unique per file descriptor. According to the documentation, FD_CLOEXEC (close on exec) is the only one currently in this camp.
All other flags are file status flags, and are shared amongst file descriptors that have been duplicated. These include the I/O operating modes such as O_NONBLOCK.
So, what's happening here is that the standard output file descriptor was duplicated from the standard input one (the order isn't relevant, just the fact that one was duplicated from the other) so that setting non-blocking mode on one affects all duplicates (and that would probably include the standard error file descriptor as well, though I haven't confirmed it).
It's not usually a good idea to muck about with blocking mode on file descriptors that are duplicated, nor with file descriptors that will likely be inherited by sub-processes - those sub-processes don't always take kindly to having their standard files misbehaving (from their point of view).
If you want more fine-grained control over individual file descriptors, consider using select to check descriptors before attempting a read.

Resources