(open + write) vs. (fopen + fwrite) to kernel /proc/ - linux

I have a very strange bug. If I do:
int fd = open("/proc/...", O_WRONLY);
write(fd, argv[1], strlen(argv[1]));
close(fd);
everything is working including for a very long string which length > 1024.
If I do:
FILE *fd = fopen("/proc/...", "wb");
fwrite(argv[1], 1, strlen(argv[1]), fd);
fclose(fd);
the string is cut around 1024 characters.
I'm running an ARM embedded device with a 3.4 kernel. I have debugged in the kernel and I see that the string is already cut when I reach the very early function vfs_write (I spotted this function with a WARN_ON instruction to get the stack).
The problem is the same with fputs vs. puts.
I can use fwrite for a very long string (>1024) if I write to a standard rootfs file. So the problem is really linked how the kernel handles /proc.
Any idea what's going on?

Probably the problem is with buffers.
The issue is that special files, such as those at /proc are, well..., special, they are not always simple stream of bytes, and have to be written to (or read from) with specific sizes and or offsets. You do not say what file you are writing to, so it is impossible to be sure.
Then, the call to fwrite() assumes that the output fd is a simple stream of bytes, so it does smart fancy things, such as buffering and splicing and copying the given data. In a regular file it will just work, but in a special file, funny things may happen.
Just to be sure, try to run strace with both versions of your program and compare the outputs. If you wish, post them for additional comments.

Related

Why does the sys_read system call end when it detects a new line?

I'm a beginner in assembly (using nasm). I'm learning assembly through a college course.
I'm trying to understand the behavior of the sys_read linux system call when it's invoked. Specifically, sys_read stops when it reads a new line or line feed. According to what I've been taught, this is true. This online tutorial article also affirms the fact/claim.
When sys_read detects a linefeed, control returns to the program and the users input is located at the memory address you passed in ECX.
I checked the linux programmer's manual for the sys_read call (via "man 2 read"). It does not mention the behavior when it's supposed to, right?
read() attempts to read up to count bytes from file descriptor fd
into the buffer starting at buf.
On files that support seeking, the read operation commences at the
file offset, and the file offset is incremented by the number of bytes
read. If the file offset is at or past the end of file, no bytes are
read, and read() returns zero.
If count is zero, read() may detect the errors described below. In
the absence of any errors, or if read() does not check for errors, a
read() with a count of 0 returns zero and has no other effects.
If count is greater than SSIZE_MAX, the result is unspecified.
So my question really is, why does the behavior happen? Is it a specification in the linux kernel that this should happen or is it a consequence of something else?
It's because you're reading from a POSIX tty in canonical mode (where backspace works before you press return to "submit" the line; that's all handled by the kernel's tty driver). Look up POSIX tty semantics / stty / ioctl. If you ran ./a.out < input.txt, you wouldn't see this behaviour.
Note that read() on a TTY will return without a newline if you hit control-d (the EOF tty control-sequence).
Assuming that read() reads whole lines is ok for a toy program, but don't start assuming that in anything that needs to be robust, even if you've checked that you're reading from a TTY. I forget what happens if the user pastes multiple lines of text into a terminal emulator. Quite probably they all end up in a single read() buffer.
See also my answer on a question about small read()s leaving unread data on the terminal: if you type more characters on one line than the read() buffer size, you'll need at least one more read system call to clear out the input.
As you noted, the read(2) libc function is just a thin wrapper around sys_read. The answer to this question really has nothing to do with assembly language, and is the same for systems programming in C (or any other language).
Further reading:
stty(1) man page: where you can change which control character does what.
The TTY demystified: some history, and some diagrams showing how xterm, the kernel, and the process reading from the tty all interact. And stuff about session management, and signals.
https://en.wikipedia.org/wiki/POSIX_terminal_interface#Canonical_mode_processing and related parts of that article.
This is not an attribute of the read() system call, but rather a property of termios, the terminal driver. In the default configuration, termios buffers incoming characters (i.e. what you type) until you press Enter, after which the entire line is sent to the program reading from the terminal. This is for convenience so you can edit the line before sending it off.
As Peter Cordes already said, this behaviour is not present when reading from other kinds of files (like regular files) and can be turned off by configuring termios.
What the tutorial says is garbage, please disregard it.

FUSE's write sequence guarantees

Should write() implementations assume random-access, or can there be some assumptions, like that they'll ever be performed sequentially, and at increasing offsets?
You'll get extra points for a link to the part of a POSIX or SUS specification that describes the VFS interface.
Random, for certain. There's a reason why the read and write interfaces take both size and offset. You'll notice that there isn't a seek field in the fuse_operations struct; when a user program calls seek/lseek on a FUSE file, the offset in the kernel file descriptor is updated, but the FUSE fs isn't notified at all. Later reads and writes just start coming to you with a different offset, and you should be able to handle that. If something about your implementation makes it impossible, you should probably return -EIO on the writes you can't satisfy.
Unless there is something unusual about your FUSE filesystem that would prevent an existing file from being opened for write, your implementation of the write operation must support writes to any offset — an application can write to any location in a file by lseek()-ing around in the file while it's open, e.g.
fd = open("file", O_WRONLY);
lseek(fd, SEEK_SET, 100);
write(fd, ...);
lseek(fd, SEEK_SET, 0);
write(fd, ...);

Linux read operations requesting duplicate bytes?

This is a bit of a strange question. I'm writing a fuse module using the go-fuse library, and at the moment I have a "fake" file with a size of 6000 bytes, and which will output some unrelated data for all read requests. My read function looks like this:
func (f *MyFile) Read(buf []byte, off int64) (fuse.ReadResult, fuse.Status) {
log.Printf("Reading into buffer of len %d from %d\n",len(buf),off)
FillBuffer(buf, uint64(off), f.secret)
return fuse.ReadResultData(buf), fuse.OK
}
As you can see I'm outputting a log on every read containing the range of the read request. The weird thing is that when I cat the file I get the following:
2013/09/13 21:09:03 Reading into buffer of len 4096 from 0
2013/09/13 21:09:03 Reading into buffer of len 8192 from 0
So cat is apparently reading the first 4096 bytes of data, discarding it, then reading 8192 bytes, which encompasses all the data and so succeeds. I've tried with other programs too, including hexdump and vim, and they all do the same thing. Interestingly, if I do a head -c 3000 dir/fakefile it still does the two reads, even though the later one is completely unnecessary. Does anyone have any insights into why this might be happening?
I suggest you strace your cat process to see for yourself. On my system, cat reads by 64K chunks, and does a final read() to make sure it read the whole file. That last read() is necessary to make the distinction between a reading a "chunk-sized file" and a bigger file. i.e. it makes sure there is nothing left to read, as the file size could have changed between the fstat() and the read() system calls.
Is your "fake file" size being returned correctly to FUSE by stat/fstat() system calls?

Write unformatted (binary data) to stdout

I want to write unformatted (binary) data to STDOUT in a Fortran 90 program. I am using AIX Unix and unfortunately it won't let me open unit 6 as "unformatted". I thought I would try and open /dev/stdout instead under a different unit number, but /dev/stdout does not exist in AIX (although this method worked under Linux).
Basically, I want to pipe my programs output directly into another program, thus avoiding having an intermediate file, a bit like gzip -c does. Is there some other way I can achieve this, considering the two problems I have encountered above?
I would try to convert the data by TRANSFER() to a long character and print it with nonadvancing i/o. The problem will be your processors' limit for the record length. If it is too short you will end up having an unexpected end of record sign somewhere. Also your processor may not write the unprintable characters the way you would like.
i.e., something like
character(len=max_length) :: buffer
buffer = transfer(data,buffer)
write(*,'(a)',advance='no') trim(buffer)
The largest problem I see in the unprintable characters. See also A suprise with non-advancing I/O
---EDIT---
Another possibility, try to use file /proc/self/fd/1 or /dev/fd/1
test:
open(11,file='/proc/self/fd/1',access='stream',action='write')
write(11) 11
write(11) 1.1
close(11)
end
This is more of a comment/addition to #VladimirF than a new answer, but I can't add those yet. You can first inquire about the location of the preconnected I/O units and then open the unformatted connection:
character(1024) :: stdout
inquire(6, name = stdout)
open(11, file = stdout, access = 'stream', action = 'write')
This is probably the most convenient way, but it uses stream access, a Fortran 2003 feature. Without this, you can only use sequential access (which adds header data to each record) or direct access (which does not add headers but requires a fixed record length).

How to get Linux kernel modules list from user space C code?

I want to get the list of kernel modules by C code, and later on print their version.
From a script this is simple:
cat /proc/modules
lsmod
and later on, run for all drivers found:
modinfo driver_name
From C code, I can open /proc/modules, and analyze the data there, but is there a simpler way of reading this drivers list?
From C code, I can open /proc/modules, and analyze the data there, but is there a simpler way of reading this drivers list?
Depends on your definition of simple. The concept in Unix land of everything being a file does make everything simpler in one respect, because:
int fd = open("/proc/modules" | O_RDONLY);
while ( read(fd, &buffer, BUFFER_LIMIT) )
{
// parse buffer
}
close(fd);
involves the same set of function calls as opening and reading any file.
The alternative mechanism would be for the kernel to allocate some memory in your process' address space pointing to that information (and you could probably do this with a custom system call) but there's really no need - as you've seen, this way works very well not just with C, but with scripts also.

Resources