does read() clear kernel ring buffer /proc/kmsg?

does read() clear kernel ring buffer /proc/kmsg? - linux

I developed my own log processing program. to process logs originated from printk(), I read from kernel ring buffer like this:
#define _PATH_KLOG "/proc/kmsg"
CGR_INT kernelRingBufferFileDescriptor = open(_PATH_KLOG, O_RDONLY|O_NONBLOCK);
CGR_CHAR kernelLogMessage[MAX_KERNEL_RING_BUFFER + 1] = {'\0'};
while (1)
{
...
read(kernelRingBufferFileDescriptor, kernelLogMessage + residueSize, MAX_KERNEL_RING_BUFFER);
...
}
my program is in user space. I remember whenever someone use read() to read data in the ring buffer (like I did above), the part that is read will be cleared from the ring buffer. Is it the case, or is it not?
I am confused about this, since there is always something in the ring buffer, and as a result, my program is very busy processing all these logs. So I am not sure is it because some module is keeping sending logs to me or is it because I read the same logs again and again since logs are not cleared.
TO figure out, I use klogctl() to check the ring buffer:
CGR_CHAR buf[MAX_KERNEL_RING_BUFFER] = {0};
int byteCount = klogctl(4, buf, MAX_KERNEL_RING_BUFFER - 1); /* 4 -- Read and clear all messages remaining in the ring buffer */
printf("%s %d: data read from kernel ring buffer = \"%s\"\n",__FILE__, __LINE__, buf);
and I keep getting data all the time. Since klogctl() with argument 4 read and clear ring buffer, I kind of believing some module DOES sending logs to me all the time.
Can anyone tell me - does read() clear ring buffer?

Become root and run this cat /proc/kmsg >> File1.txt and cat /proc/kmsg >> File2.txt. Compare File1.txt and File2.txt You will immediately know whether the ring buffer is getting cleared on read() cos cat internally invokes read() anyways!
Also read about ring buffers and how they behave in the Kernel Documentation here-
http://www.mjmwired.net/kernel/Documentation/trace/ring-buffer-design.txt
EDIT: I found something interesting in the book Linux Device Drivers by Jonathan Corbet-
The printk function writes messages into a circular buffer that is
__LOG_BUF_LEN bytes long: a value from 4 KB to 1 MB chosen while configuring the kernel. The function then wakes any process that is
waiting for messages, that is, any process that is sleeping in the
syslog system call or that is reading /proc/kmsg. These two interfaces
to the logging engine are almost equivalent, but note that reading
from /proc/kmsg consumes the data from the log buffer, whereas the
syslog system call can optionally return log data while leaving it for
other processes as well. In general, reading the /proc file is easier
and is the default behavior for klogd. The dmesg command can be used
to look at the content of the buffer without flushing it; actually,
the command returns to stdout the whole content of the buffer, whether
or not it has already been read
So in your particular case, if you are using a plain read(), I think the buffer is indeed getting cleared and new data is being constantly written into it and hence you find some data all the time! Kernel experts can correct me here!

From reading the do_syslog function, it seems that messages are cleared when they're read.
By your description, you get the same behavior with klogctl(4), which also clears the buffer, so it makes sense.
So maybe there's indeed someone that keeps writing messages.
You can find which printk it is, by the text, disable it, and see what you get. Or you can add the jiffies value to the message, so you'll know if you keep getting new messages, or are these are the same ones.

Related

Unexpected periodic, non-continuous output for OCaml program

Someone reports that given a stream of strings on the serial port which is pipelined to the OCaml program below, the output of the program is not continuous, but instead it appears in chunks (of a few tens of lines), as if buffered.
What can be the cause of the non-continuous output?
(The output buffer should be flushed after each new line due to the use of '%!'. So this shouldn't be the cause, right?)
let tp = ref 0
let get_next_entry ic =
try
let (ts, pred, v) = Scanf.fscanf ic " #%d %s#(%d)\n" (fun x y z -> (x,y,z)) in
Printf.printf "at timepoint %d (timestamp %d): %s(%d)\n%!" !tp ts pred v;
incr tp;
true
with End_of_file ->
false
let _ =
while get_next_entry stdin do
()
done
The OCaml version used is 4.05.

It is a threefold problem. From the least likely to the most likely.
The glitching output
It is all in the eye of the beholder, as how the program output will look like depends on the environment in which it is run, i.e., on a program that runs your program and renders this on a visual device. In other words, it involves a lot of variables that are beyond the context of this program.
With that said, let me explain what flush means for the printf function. The printf facility relies on buffered channels. And each channel is roughly a pair of a buffer and system-specific file descriptor. When someone (including printf) outputs to a channel, the information first goes into the buffer and remains there until the next portion of information overrides the buffer (i.e., there is no more space in the buffer) or until the flush function is called explicitly. Then the buffer is flushed, which means that the information in the buffer is transferred to the operating system (e.g., using the write system call or library function).
What happens afterward is system dependent. If the file descriptor was associated with a regular file, then you might expect that the information will be passed to it entirely(though the file system has its own hierarchy of caches, so there're caveats also). If the descriptor was associated with a Unix-style shell process through a pipe, then it will go into the pipe's buffer, extracted from it by the shell and printed using a terminal interface, usually fulfilled with some terminal emulator. By default shells are line-buffered, so the line should be printed as a whole unless the user of the shell changes its parameters somehow.
Basically, I hope you get the idea, it is not your program which is actually manipulating with the terminal and lighting up pixels on your monitors. Your program is just outputting data and some other program is receiving this data and drawing it on the screen. And this some other program (a terminal, or terminal emulator, e.g., minicom) is making this output glitchy, not your program. Your program is doing its best to be printed correctly - full line or nothing.
Your program is glitching
And it is. The in_channel is also buffered, so it will accumulate a few bytes before calling sprintf. Therefore, you can just read from the buffered channel and expect a realtime response to it. The most reliable way for you would be to use the Unix module and process the input using your own buffering.
The glitching input
Finally, the input program can also give you the information in chunks. This is especially true for serial interfaces, so make sure that you have correctly set up your terminal interface using the Unix.tcsetattr function. In particular, when your program is blocked on the input, the operating system may decide not to wake it up on each arrived character or line. This behavior is controlled by the terminal interface (see the Canonical and Non-canonical modes. If your input doesn't have newlines, then you shall use the non-canonical mode).
Finally, the device itself could be acting jittering, and if you have an oscilloscope nearby you can observe the signals it is sending. And make sure that you have configured your serial port as prescribed in the user manual of your device.

One possibility is that fscanf is waiting until it sees everything it's looking for.

What buffers are between commands in a pipeline?

I thought it was 1 buffer, but now it occurs to me that it might be 2.
I mean in a pipeline:
cmd1 | cmd2
cmd2's output might be e.g. line buffered and there is no pipeline there. This should be the buffer managed by libc FILE * stream functions like fwrite(), or is this buffer also used by write(3)? However, I just remembered that pipe(7) talks about the size of the pipe buffers that's apparently controlled in the kernel.
Is stdin buffered, too? Are there 3 buffers, 1 in kernel space and 2 in user space?
I previously thought that read(2) hung when a pipeline buffer was empty, but when stdin is not a pipe but rather a terminal, there i no pipeline buffer, right? If it doesn't have its own buffer, does it check different buffers depending on whether stdin is a pipe or a terminal or a regular file?
EDIT: Changed "how many" to "what" in the question. I hope that's not too big a change. I'm interested in knowing what the buffers are, not just a number.

Is stdin buffered, too? Are there 3 buffers, 1 in kernel space and 2 in user space?
Yes, in general there could be 3 buffers here: one for cmd1 stdout, one for cmd2 stdin, and one in kernel space.
I previously thought that read(2) hung when a pipeline buffer was empty, but when stdin is not a pipe but rather a terminal, there i no pipeline buffer, right?
The read system call blocks when there is no input, but that has nothing to do with stdio buffering.
The kernel buffer exists regardless of whether the input is from a terminal or not (it would be very inefficient for the kernel to transfer one character at a time).
By default the stdio library will not buffer terminal input, but the application can change that with explicit calls to e.g. setvbuf.
A blog post with more details is here.

Linux tty flip buffer lock when reading part of available data

I have a driver that builds on the new serdev bus in the linux kernel.
In my driver I receive messages from an external device, all messages ends with a null byte (0x00) and the protocol ensures that there are no null bytes in my data (COBS). Now I try to have the TTY layer hand me full messages by scanning for zeros in my input and if there are none I'll just return zero in the callback that is called from the tty layer when bytes are available.
This kind of works. Or rather it works for some messages. After a while though it locks up and the tty layer keeps sending the same size of received bytes indefinitely. My guess is that this happens when one half of the tty flip buffer is full and the rest of my message is in the other half.
I have two questions:
Am I correct in that the tty layer can "hang" until I read out all data in one half of the flip buffer?
If that is so, is there some way to prevent this from happening? I'd rather not implement my own buffering scheme on top of the tty buffer already available.
Thanks

It looks like (drivers/tty/tty_buffer.c and the function flush_to_ldisc) that it is not possible to do what I attempted to do. When the tty buffer is about to flip over the consumer will have to do a read and buffer any half messages.
That is, returning zero and hoping for a larger chunk of data in your callback next time will only work up until the end of the first part of the buffer then the last bit of data must be read.
This is not a problem in userspace because a read call will have an argument that is the most bytes you want but read is free to return fewer bytes than requested.

Minimizing copies when writing large data to a socket

I am writing an application server that processes images (large data). I am trying to minimize copies when sending image data back to clients. The processed images I need to send to clients are in buffers obtained from jemalloc. The ways I have thought of sending the data back to the client is:
1) Simple write call.
// Allocate buffer buf.
// Store image data in this buffer.
write(socket, buf, len);
2) I obtain the buffer through mmap instead of jemalloc, though I presume jemalloc already creates the buffer using mmap. I then make a simple call to write.
buf = mmap(file, len); // Imagine proper options.
// Store image data in this buffer.
write(socket, buf, len);
3) I obtain a buffer through mmap like before. I then use sendfile to send the data:
buf = mmap(in_fd, len); // Imagine proper options.
// Store image data in this buffer.
int rc;
rc = sendfile(out_fd, file, &offset, count);
// Deal with rc.
It seems like (1) and (2) will probably do the same thing given jemalloc probably allocates memory through mmap in the first place. I am not sure about (3) though. Will this really lead to any benefits? Figure 4 on this article on Linux zero-copy methods suggests that a further copy can be prevented using sendfile:
no data is copied into the socket buffer. Instead, only descriptors
with information about the whereabouts and length of the data are
appended to the socket buffer. The DMA engine passes data directly
from the kernel buffer to the protocol engine, thus eliminating the
remaining final copy.
This seems like a win if everything works out. I don't know if my mmaped buffer counts as a kernel buffer though. Also I don't know when it is safe to re-use this buffer. Since the fd and length is the only thing appended to the socket buffer, I assume that the kernel actually writes this data to the socket asynchronously. If it does what does the return from sendfile signify? How would I know when to re-use this buffer?
So my questions are:
What is the fastest way to write large buffers (images in my case) to a socket? The images are held in memory.
Is it a good idea to call sendfile on a mmapped file? If yes, what are the gotchas? Does this even lead to any wins?

It seems like my suspicions were correct. I got my information from this article. Quoting from it:
Also these network write system calls, including sendfile, might and
in many cases do return before the data sent over TCP by the method
call has been acknowledged. These methods return as soon as all data
is written into the socket buffers (sk buff) and is pushed to the TCP
write queue, the TCP engine can manage alone from that point on. In
other words at the time sendfile returns the last TCP send window is
not actually sent to the remote host but queued. In cases where
scatter-gather DMA is supported there is no seperate buffer which
holds these bytes, rather the buffers(sk buffs) just hold pointers to
the pages of OS buffer cache, where the contents of file is located.
This might lead to a race condition if we modify the content of the
file corresponding to the data in the last TCP send window as soon as
sendfile is returned. As a result TCP engine may send newly written
data to the remote host instead of what we originally intended to
send.
Provided the buffer from a mmapped file is even considered "DMA-able", seems like there is no way to know when it is safe to re-use it without an explicit acknowledgement (over the network) from the actual client. I might have to stick to simple write calls and incur the extra copy. There is a paper (also from the article) with more details.
Edit: This article on the splice call also shows the problems. Quoting it:
Be aware, when splicing data from a mmap'ed buffer to a network
socket, it is not possible to say when all data has been sent. Even if
splice() returns, the network stack may not have sent all data yet. So
reusing the buffer may overwrite unsent data.

For cases 1 and 2 - does the operation you marked as // Store image data in this buffer require any conversion? Is it just plain copy from the memory to buf?
If it's just plain copy, you can use write directly on the pointer obtained from jemalloc.
Assuming that img is a pointer obtained from jemalloc and size is a size of your image, just run following code:
int result;
int sent=0;
while(sent<size) {
result=write(socket,img+sent,size-sent);
if(result<0) {
/* error handling here */
break;
}
sent+=result;
}
It is working correctly for blocking I/O (the default behavior). If you need to write a data in a non-blocking manner, you should be able to rework the code on your own, but now you have the idea.
For case 3 - sendfile is for sending data from one descriptor to another. That means you can, for example, send data from file directly to tcp socket and you don't need to allocate any additional buffer. So, if the image you want to send to a client is in a file, just go for a sendfile. If you have it in memory (because you processed it somehow, or just generated), use the approach I mentioned earlier.

Linux read operations requesting duplicate bytes?

This is a bit of a strange question. I'm writing a fuse module using the go-fuse library, and at the moment I have a "fake" file with a size of 6000 bytes, and which will output some unrelated data for all read requests. My read function looks like this:
func (f *MyFile) Read(buf []byte, off int64) (fuse.ReadResult, fuse.Status) {
log.Printf("Reading into buffer of len %d from %d\n",len(buf),off)
FillBuffer(buf, uint64(off), f.secret)
return fuse.ReadResultData(buf), fuse.OK
}
As you can see I'm outputting a log on every read containing the range of the read request. The weird thing is that when I cat the file I get the following:
2013/09/13 21:09:03 Reading into buffer of len 4096 from 0
2013/09/13 21:09:03 Reading into buffer of len 8192 from 0
So cat is apparently reading the first 4096 bytes of data, discarding it, then reading 8192 bytes, which encompasses all the data and so succeeds. I've tried with other programs too, including hexdump and vim, and they all do the same thing. Interestingly, if I do a head -c 3000 dir/fakefile it still does the two reads, even though the later one is completely unnecessary. Does anyone have any insights into why this might be happening?

I suggest you strace your cat process to see for yourself. On my system, cat reads by 64K chunks, and does a final read() to make sure it read the whole file. That last read() is necessary to make the distinction between a reading a "chunk-sized file" and a bigger file. i.e. it makes sure there is nothing left to read, as the file size could have changed between the fstat() and the read() system calls.
Is your "fake file" size being returned correctly to FUSE by stat/fstat() system calls?

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string