Linux read operations requesting duplicate bytes? - linux

This is a bit of a strange question. I'm writing a fuse module using the go-fuse library, and at the moment I have a "fake" file with a size of 6000 bytes, and which will output some unrelated data for all read requests. My read function looks like this:
func (f *MyFile) Read(buf []byte, off int64) (fuse.ReadResult, fuse.Status) {
log.Printf("Reading into buffer of len %d from %d\n",len(buf),off)
FillBuffer(buf, uint64(off), f.secret)
return fuse.ReadResultData(buf), fuse.OK
}
As you can see I'm outputting a log on every read containing the range of the read request. The weird thing is that when I cat the file I get the following:
2013/09/13 21:09:03 Reading into buffer of len 4096 from 0
2013/09/13 21:09:03 Reading into buffer of len 8192 from 0
So cat is apparently reading the first 4096 bytes of data, discarding it, then reading 8192 bytes, which encompasses all the data and so succeeds. I've tried with other programs too, including hexdump and vim, and they all do the same thing. Interestingly, if I do a head -c 3000 dir/fakefile it still does the two reads, even though the later one is completely unnecessary. Does anyone have any insights into why this might be happening?

I suggest you strace your cat process to see for yourself. On my system, cat reads by 64K chunks, and does a final read() to make sure it read the whole file. That last read() is necessary to make the distinction between a reading a "chunk-sized file" and a bigger file. i.e. it makes sure there is nothing left to read, as the file size could have changed between the fstat() and the read() system calls.
Is your "fake file" size being returned correctly to FUSE by stat/fstat() system calls?

Related

Fastest way to read many inputs in PyPy3 and what is BytesIO doing here?

Recently I was working on a problem that required me to read many many lines of numbers (around 500,000).
Early on, I found that using input() was way too slow. Using stdin.readline() was much better. However, it still was not fast enough. I found that using the following code:
import io, os
input = io.BytesIO(os.read(0,os.fstat(0).st_size)).readline
and using input() in this manner improved the runtime. However, I don't actually understand how this code works. Reading the documentation for os.read, 0 in os.read(0, os.fstat(0).st_size) describes the file we are reading from. What file is 0 describing? Also, fstat describes the status of the file we are reading from but apparently that input is to denote the max number of bytes we are reading?
The code works but I want to understand what it is doing and why it is faster. Any help is appreciated.
0 is the file descriptor for standard input. os.fstat(0).st_size will tell Python how many bytes are currently waiting in the standard input buffer. Then os.read(0, ...) will read that many bytes in bulk, again from standard input, producing a bytestring.
(As an additional note, 1 is the file descriptor of standard output, and 2 is standard error.)
Here's a demo:
echo "five" | python3 -c "import os; print(os.stat(0).st_size)"
# => 5
Python found four single-byte characters and a newline in the standard input buffer, and reported five bytes waiting to be read.
Bytestrings are not very convenient to work with if you want text — for one thing, they don't really understand the concept of "lines" — so BytesIO fakes an input stream with the passed bytestring, allowing you to readline from it. I am not 100% sure why this is faster, but my guesses are:
Normal read is likely done character-wise, so that one can detect a line break and stop without reading too much; bulk read is more efficient (and finding newlines post-facto in memory is pretty fast)
There is no encoding processing done this way
os.read has a signature I am calling fd, size. Setting size to the bytes left in fd causes everything else to come rushing at you like a tusnami. There is also "standard file descriptors" for 0=stdin, 1=stdout, 2=stderr.
Code deconstruction:
import io, os # Utilities
input = \ # Replace the input built-in
io.BytesIO( \ # Create a fake file
os.read( \ # Read data from a file descriptor
0, \ # stdin
os.fstat(0) \ # Information about stdin
.st_size \ # Bytes left in the file
)
) \
.readline # When called, gets a line of the file

(open + write) vs. (fopen + fwrite) to kernel /proc/

I have a very strange bug. If I do:
int fd = open("/proc/...", O_WRONLY);
write(fd, argv[1], strlen(argv[1]));
close(fd);
everything is working including for a very long string which length > 1024.
If I do:
FILE *fd = fopen("/proc/...", "wb");
fwrite(argv[1], 1, strlen(argv[1]), fd);
fclose(fd);
the string is cut around 1024 characters.
I'm running an ARM embedded device with a 3.4 kernel. I have debugged in the kernel and I see that the string is already cut when I reach the very early function vfs_write (I spotted this function with a WARN_ON instruction to get the stack).
The problem is the same with fputs vs. puts.
I can use fwrite for a very long string (>1024) if I write to a standard rootfs file. So the problem is really linked how the kernel handles /proc.
Any idea what's going on?
Probably the problem is with buffers.
The issue is that special files, such as those at /proc are, well..., special, they are not always simple stream of bytes, and have to be written to (or read from) with specific sizes and or offsets. You do not say what file you are writing to, so it is impossible to be sure.
Then, the call to fwrite() assumes that the output fd is a simple stream of bytes, so it does smart fancy things, such as buffering and splicing and copying the given data. In a regular file it will just work, but in a special file, funny things may happen.
Just to be sure, try to run strace with both versions of your program and compare the outputs. If you wish, post them for additional comments.

Does tee forward data that has not made it into the file?

I'm zeroing a new hard disk like so:
pv /dev/zero | tee /dev/sdb | sha1sum -
The idea is that I will zero the disk and simultaneously compute a checksum of however many zeros got written. Then I'll sha1sum the block device and see if it matches the data that I originally wrote to it.
The question is, what happens when "tee" runs out of space on the device and terminates? Say the block device is exactly 1 million bytes; tee will obviously fill it with 1 million zero bytes, but will it forward exactly 1 million zero bytes to sha1sum?
Answer to the original question:
No, tee will not stop writing to stdout at precisely the point at which a write to a file specified in an argument fails.
But I don't see that it matters much. It appears that your goal is to ensure that the entire disk has been overwritten with zeros, without worrying about how big the disk is. So reading the disk and comparing every block read to a block of zeros should suffice. You can do that with cmp /dev/sdb /dev/zero. If it says "EOF on /dev/sdb", then all the bytes were 0.
For what it's worth, I thought of another way to do the same thing, albeit a little indirectly:
pv /dev/zero | dd bs=100M of=/dev/sdb 2> log
dd's report (captured in "log") should contain a precise count of bytes written, and you can use that to compute the sha1sum (or, alternatively, diff the block device against a generated stream of exactly that many zeros).
(bs=100M is because dd's default block size is 512 bytes, which turns out not to be performant in my use case)

how can i show the size of files in /proc? it should not be size zero

from the following message, we know that there are two characters in file /proc/sys/net/ipv4/ip_forward, but why ls just showed this file is of size zero?
i know this is not a file on disk, but a file in the memory, so is there any command which i can see the real size of the files in /proc?
root#OpenWrt:/proc/sys/net/ipv4# cat ip_forward | wc -c
2
root#OpenWrt:/proc/sys/net/ipv4# ls -l ip_forward
-rw-r--r-- 1 root root 0 Sep 3 00:20 ip_forward
root#OpenWrt:/proc/sys/net/ipv4# pwd
/proc/sys/net/ipv4
Those are not really files on disk (as you mention) but they are also not files in memory - the names in /proc correspond to calls into the running kernel in the operating system, and the contents are generated on the fly.
The system doesn't know how large the files would be without generating them, but if you read the "file" twice there's no guarantee you get the same data because the system may have changed.
You might be looking for the program
sysctl -a
instead.
Things in /proc are not really files. In most cases, they're not even files in memory. When you access these files, the proc filesystem driver performs a system call that gets data appropriate for the file, and then formats it for output. This is usually dynamic data that's constructed on the fly. An example of this is /proc/net/arp, which contains the current ARP cache.
Getting the size of these things can only be done by formatting the entire output, so it's not done just when listing the file. If you want the sizes, use wc -c as you did.
The /proc/ filesystem is an "illusion" maintained by the kernel, which does not bother giving the size of (most of) its pseudo-files (since computing that "real" size would usually involve having built the entire textual pseudo-file's content), and expects most [pseudo-] textual files from /proc/ to be read in sequence from first to last byte (i.e. till EOF), in reasonably sized (e.g. 1K) blocks. See proc(5) man page for details.
So there is no way to get the true size (of some file like /proc/self/maps or /proc/sys/net/ipv4/ip_forward) in a single syscall (like stat(2), because it would give a size of 0, as reported by stat(1) or ls(1) commands). A typical way of reading these textual files might be
FILE* f = fopen("/proc/self/maps", "r");
// or some other textual /proc file,
// e.g. /proc/sys/net/ipv4/ip_forward
if (f)
{
do {
// you could use readline instead of fgets
char line[256];
memset (line, 0, sizeof(line));
if (NULL == fgets(line, sizeof(line), f))
break;
// do something with line, for example:
fputs(line, stdout);
} while (!feof (f));
fclose (f);
}
Of course, some files (e.g. /proc/self/cmdline) are documented as possibly containing NUL bytes. You'll need some fread for them.
It's not really a file in the memory, it's an interface between the user and the kernel.

does read() clear kernel ring buffer /proc/kmsg?

I developed my own log processing program. to process logs originated from printk(), I read from kernel ring buffer like this:
#define _PATH_KLOG "/proc/kmsg"
CGR_INT kernelRingBufferFileDescriptor = open(_PATH_KLOG, O_RDONLY|O_NONBLOCK);
CGR_CHAR kernelLogMessage[MAX_KERNEL_RING_BUFFER + 1] = {'\0'};
while (1)
{
...
read(kernelRingBufferFileDescriptor, kernelLogMessage + residueSize, MAX_KERNEL_RING_BUFFER);
...
}
my program is in user space. I remember whenever someone use read() to read data in the ring buffer (like I did above), the part that is read will be cleared from the ring buffer. Is it the case, or is it not?
I am confused about this, since there is always something in the ring buffer, and as a result, my program is very busy processing all these logs. So I am not sure is it because some module is keeping sending logs to me or is it because I read the same logs again and again since logs are not cleared.
TO figure out, I use klogctl() to check the ring buffer:
CGR_CHAR buf[MAX_KERNEL_RING_BUFFER] = {0};
int byteCount = klogctl(4, buf, MAX_KERNEL_RING_BUFFER - 1); /* 4 -- Read and clear all messages remaining in the ring buffer */
printf("%s %d: data read from kernel ring buffer = \"%s\"\n",__FILE__, __LINE__, buf);
and I keep getting data all the time. Since klogctl() with argument 4 read and clear ring buffer, I kind of believing some module DOES sending logs to me all the time.
Can anyone tell me - does read() clear ring buffer?
Become root and run this cat /proc/kmsg >> File1.txt and cat /proc/kmsg >> File2.txt. Compare File1.txt and File2.txt You will immediately know whether the ring buffer is getting cleared on read() cos cat internally invokes read() anyways!
Also read about ring buffers and how they behave in the Kernel Documentation here-
http://www.mjmwired.net/kernel/Documentation/trace/ring-buffer-design.txt
EDIT: I found something interesting in the book Linux Device Drivers by Jonathan Corbet-
The printk function writes messages into a circular buffer that is
__LOG_BUF_LEN bytes long: a value from 4 KB to 1 MB chosen while configuring the kernel. The function then wakes any process that is
waiting for messages, that is, any process that is sleeping in the
syslog system call or that is reading /proc/kmsg. These two interfaces
to the logging engine are almost equivalent, but note that reading
from /proc/kmsg consumes the data from the log buffer, whereas the
syslog system call can optionally return log data while leaving it for
other processes as well. In general, reading the /proc file is easier
and is the default behavior for klogd. The dmesg command can be used
to look at the content of the buffer without flushing it; actually,
the command returns to stdout the whole content of the buffer, whether
or not it has already been read
So in your particular case, if you are using a plain read(), I think the buffer is indeed getting cleared and new data is being constantly written into it and hence you find some data all the time! Kernel experts can correct me here!
From reading the do_syslog function, it seems that messages are cleared when they're read.
By your description, you get the same behavior with klogctl(4), which also clears the buffer, so it makes sense.
So maybe there's indeed someone that keeps writing messages.
You can find which printk it is, by the text, disable it, and see what you get. Or you can add the jiffies value to the message, so you'll know if you keep getting new messages, or are these are the same ones.

Resources