understand read() in Linux - linux

I am looking at the man-page for read(int fd, void *buf, size_t count)
http://man7.org/linux/man-pages/man2/read.2.html
where I need some more explanation on the words "On files that support seeking, the read operation commences at the current file offset, and the file offset is incremented by the number of bytes read. "
1) If I want to read a file not from the beginning, say at offset 100 (bytes) to read 1 byte, is the offset 100 added to the fd, i.e, read(fd+100, buf, 1)? If not, how can I specify the offset in the code?
2) How do I know if "files support seeking"? I "opened" the FPGA as a spi device through spi bus, to get the fd. I am using read() to read registers of FPGA. In this case, is the file support seeking?
Thanks!

You need to first move the file pointer (current file offset) to 100, via a read or seek call.

Alternatively you might be interested in pread, depending on what you are doing. pread is equivalent of atomically (1) saving the current offset, (2) read the offset you want to read, and (3) restoring the original offset.
On your second question you will know if it isn't a seekable device because your call to lseek will fail. I don't know of any reliable way to know in advance.

You use lseek to move the possition in the file -- so you would do something like
lseek(fd, 100, SEEK_SET);
read(fd, buffer, 1);
to read one byte at position 100.
However, while this is a valid example, I would advice not to read individual bytes in a file this way, as it is very slow/expensive.
If you want to make random io getting individual bytes in a file at scale, you may be better of usin mmap rather than lseek/read

Related

how file offset of read() or write() change when file is truncated to zero by other people

Is file offset automatic changed to 0, or kept unchanged.
If the file offset is keep unchanged, what happen when read() or write() after truncate().
how file offset of read() or write() change when file is truncated
The file offset of opened file descriptors remains unchanged[1].
what happen when read() or write() after truncate().
read():
Will read valid data if the offset is in range of the file.
Will read bytes equal to 0 if the offset is after the length of file but in the range of truncate[1].
Will return 0 (ie. no bytes read) if the offset is past the end of file[3].
write():
Will write data to the file at the offset specified[4].
If the write is past the end-of-file, the file will be resized with padding zeros[2].
[1] From posix truncate:
If the file previously was larger than length, the extra data is discarded. If the file was previously shorter than length, its size is increased, and the extended area appears as if it were zero-filled.
The truncate() function shall not modify the file offset for any open file descriptions associated with the file.
[2] From posix lseek:
The lseek() function shall allow the file offset to be set beyond the end of the existing data in the file. If data is later written at this point, subsequent reads of data in the gap shall return bytes with the value 0 until data is actually written into the gap.
[3] From posix read:
No data transfer shall occur past the current end-of-file. If the starting position is at or after the end-of-file, 0 shall be returned.
[4] And from posix write:
After a write() to a regular file has successfully returned:
Any successful read() from each byte position: in the file that was modified by that write shall return the data specified by the write() for that position until such byte positions are again modified.
The same thing when you seek to post the end of the file - write extends it and read fails.
Since operating systems and file systems are the most inconsistent software in the world, no answer will spare you from just trying it out.

Retrieve the amount of bytes stored in the "write" end of a Linux anonymous PIPE

My objective is to be able to determine how many bytes have been transferred into the write end of a pipe. Perhaps, one would need to access the f_pos member of the struct file structure from linux/fs.h associated with this pipe.
struct file snipfrom fs.h
Is it possible to access this value from a userspace program? Again, I'd just like to be able to determine (perhaps based on the f_pos value) how many bytes are stored in the kernel buffer backing the pipe.
I have a feeling this isn't possible and one has to keep reading until read(int fd, void *buf, size_t count) returns less bytes than count.. then at this point, all bytes have been "emptied out" I assume..
Amount of bytes available for read from the pipe can be requested by
ioctl(fd, FIONREAD, &nbytes);
Here fd is a file descriptor, and variable nbytes, where result will be stored, is int variable.
Taken from: man 7 pipe.
Amount of bytes available for write is a different story.

(open + write) vs. (fopen + fwrite) to kernel /proc/

I have a very strange bug. If I do:
int fd = open("/proc/...", O_WRONLY);
write(fd, argv[1], strlen(argv[1]));
close(fd);
everything is working including for a very long string which length > 1024.
If I do:
FILE *fd = fopen("/proc/...", "wb");
fwrite(argv[1], 1, strlen(argv[1]), fd);
fclose(fd);
the string is cut around 1024 characters.
I'm running an ARM embedded device with a 3.4 kernel. I have debugged in the kernel and I see that the string is already cut when I reach the very early function vfs_write (I spotted this function with a WARN_ON instruction to get the stack).
The problem is the same with fputs vs. puts.
I can use fwrite for a very long string (>1024) if I write to a standard rootfs file. So the problem is really linked how the kernel handles /proc.
Any idea what's going on?
Probably the problem is with buffers.
The issue is that special files, such as those at /proc are, well..., special, they are not always simple stream of bytes, and have to be written to (or read from) with specific sizes and or offsets. You do not say what file you are writing to, so it is impossible to be sure.
Then, the call to fwrite() assumes that the output fd is a simple stream of bytes, so it does smart fancy things, such as buffering and splicing and copying the given data. In a regular file it will just work, but in a special file, funny things may happen.
Just to be sure, try to run strace with both versions of your program and compare the outputs. If you wish, post them for additional comments.

FUSE's write sequence guarantees

Should write() implementations assume random-access, or can there be some assumptions, like that they'll ever be performed sequentially, and at increasing offsets?
You'll get extra points for a link to the part of a POSIX or SUS specification that describes the VFS interface.
Random, for certain. There's a reason why the read and write interfaces take both size and offset. You'll notice that there isn't a seek field in the fuse_operations struct; when a user program calls seek/lseek on a FUSE file, the offset in the kernel file descriptor is updated, but the FUSE fs isn't notified at all. Later reads and writes just start coming to you with a different offset, and you should be able to handle that. If something about your implementation makes it impossible, you should probably return -EIO on the writes you can't satisfy.
Unless there is something unusual about your FUSE filesystem that would prevent an existing file from being opened for write, your implementation of the write operation must support writes to any offset — an application can write to any location in a file by lseek()-ing around in the file while it's open, e.g.
fd = open("file", O_WRONLY);
lseek(fd, SEEK_SET, 100);
write(fd, ...);
lseek(fd, SEEK_SET, 0);
write(fd, ...);

Are file descriptors for linux sockets always in increasing order

I have a socket server in C/linux. Each time I create a new socket it is assigned a file descriptor. I want to use these FD's as uniqueID's for each client. If they are guaranteed to always be assigned in increasing order (which is the case for the Ubuntu that I am running) then I could just use them as array indices.
So the question: Are the file descriptors that are assigned from linux sockets guaranteed to always be in increasing order?
Let's look at how this works internally (I'm using kernel 4.1.20). The way file descriptors are allocated in Linux is with __alloc_fd. When you do a open syscall, do_sys_open is called. This routine gets a free file descriptor from get_unused_fd_flags:
long do_sys_open(int dfd, const char __user *filename, int flags, umode_t mode)
{
...
fd = get_unused_fd_flags(flags);
if (fd >= 0) {
struct file *f = do_filp_open(dfd, tmp, &op);
get_unused_d_flags calls __alloc_fd setting minimum and maximum fd:
int get_unused_fd_flags(unsigned flags)
{
return __alloc_fd(current->files, 0, rlimit(RLIMIT_NOFILE), flags);
}
__alloc_fd gets the file descriptor table for the process, and gets the fd as next_fd, which is actually set from the previous time it ran:
int __alloc_fd(struct files_struct *files,
unsigned start, unsigned end, unsigned flags)
{
...
fd = files->next_fd;
...
if (start <= files->next_fd)
files->next_fd = fd + 1;
So you can see how file descriptors indeed grow monotonically... up to certain point. When the fd reaches the maximum, __alloc_fd will try to find the smallest unused file descriptor:
if (fd < fdt->max_fds)
fd = find_next_zero_bit(fdt->open_fds, fdt->max_fds, fd);
At this point the file descriptors will not be growing monotonically anymore, but instead will jump trying to find free file descriptors. After this, if the table gets full, it will be expanded:
error = expand_files(files, fd);
At which point they will grow again monotonically.
Hope this helps
FD's are guaranteed to be unique, for the lifetime of the socket. So yes, in theory, you could probably use the FD as an index into an array of clients. However, I'd caution against this for at least a couple of reasons:
As has already been said, there is no guarantee that FDs will be allocated monotonically. accept() would be within its rights to return a highly-numbered FD, which would then make your array inefficient. So short answer to your question: no, they are not guaranteed to be monotonic.
Your server is likely to end up with lots of other open FDs - stdin, stdout and stderr to name but three - so again, your array is wasting space.
I'd recommend some other way of mapping from FDs to clients. Indeed, unless you're going to be dealing with thousands of clients, searching through a list of clients should be fine - it's not really an operation that you should need to do a huge amount.
Do not depend on the monotonicity of file descriptors. Always refer to the remote system via a address:port pair.

Resources