In my application, a Tun interface was created and the process keep reading the associated fd with read(2) in a select(2) loop. But, when I was debugging an issue in the application, I found that in some moments the read(2) operation on the Tun file descriptor can return zero. Is this possible and what's the condition it can happen?
Thanks in advance.
woody
Here is the information from the manpage on read(2)click here
Return Value
On success, the number of bytes read is returned (zero indicates end of file), and the file position is advanced by this number. It is not an error if this number is smaller than the number of bytes requested; this may happen for example because fewer bytes are actually available right now (maybe because we were close to end-of-file, or because we are reading from a pipe, or from a terminal), or because read() was interrupted by a signal. On error, -1 is returned, and errno is set appropriately. In this case it is left unspecified whether the file position (if any) changes.
Related
I'm looking at the read syscall in Unix, which (at least in Linux) has this signature: [1]
ssize_t read(int fd, void* buf, size_t count);
Let's assume that the call succeeds (i.e. no negative return values) and that count > 0 (i.e. the buffer actually can store a nonzero amount of bytes). Under which circumstances would read() return 0? I can think of the following:
When fd refers to a regular file and the end of the file has been reached.
When fd refers to the receiving end of a pipe, socket or FIFO, the sending end has been closed and the pipe's/socket's/FIFO's own buffer has been exhausted.
When fd refers to the slave side of a terminal device that is in ICANON and Ctrl-D has been sent into the master side while the line buffer was empty.
I'm curious if there are any other situations that I'm not aware of, where read() would return with a result of 0. I'm especially interested (because of reasons) in situations like the last one in the list above, where read() returns 0 once, but subsequent calls to read() on the same FD could return a nonzero result. If an answer only applies to a certain flavor of Unix, I'm still interested in hearing it.
[1] I know this signature is for the libc wrapper, not the actual syscall, but that's not important right now.
If the Physical File System does not support simple reads from directories, read() will return 0 if it is used for a directory.
If no process has the pipe open for writing, read() returns 0 to indicate the end of the file.
If the connection is broken on a stream socket, but no data is available, then the read() function returns 0 bytes as EOF.
Normally a return value of 0 always means end-of-file. However, if you specify 0 as the number of bytes to read, it will always return 0 unless there's an error detected.
Terminal devices are a special case. If the terminal is in cooked mode, typing Control-d tells the device driver to return from any pending read() immediately with whatever is in the input editing buffer, rather than waiting for the user to enter a newline. If the buffer is empty, this results in a zero-length read. This is how typing the EOF character at the beginning of a line is automatically treated as EOF by applications.
I have a driver that builds on the new serdev bus in the linux kernel.
In my driver I receive messages from an external device, all messages ends with a null byte (0x00) and the protocol ensures that there are no null bytes in my data (COBS). Now I try to have the TTY layer hand me full messages by scanning for zeros in my input and if there are none I'll just return zero in the callback that is called from the tty layer when bytes are available.
This kind of works. Or rather it works for some messages. After a while though it locks up and the tty layer keeps sending the same size of received bytes indefinitely. My guess is that this happens when one half of the tty flip buffer is full and the rest of my message is in the other half.
I have two questions:
Am I correct in that the tty layer can "hang" until I read out all data in one half of the flip buffer?
If that is so, is there some way to prevent this from happening? I'd rather not implement my own buffering scheme on top of the tty buffer already available.
Thanks
It looks like (drivers/tty/tty_buffer.c and the function flush_to_ldisc) that it is not possible to do what I attempted to do. When the tty buffer is about to flip over the consumer will have to do a read and buffer any half messages.
That is, returning zero and hoping for a larger chunk of data in your callback next time will only work up until the end of the first part of the buffer then the last bit of data must be read.
This is not a problem in userspace because a read call will have an argument that is the most bytes you want but read is free to return fewer bytes than requested.
A program from the answer https://stackoverflow.com/a/1586277/6362199 uses the system call read() to receive exactly 4 bytes from a pipe. It assumes that the function read() returns -1, 0 or 4. Can the read() function return 1, 2 or 3 for example if it was interrupted by a signal?
In the man page read(2) there is:
On success, the number of bytes read is returned (zero indicates
end of file), and the file position is advanced by this number. It
is not an error if this number is smaller than the number of bytes
requested; this may happen for example because fewer bytes are
actually available right now (maybe because we were close to
end-of-file, or because we are reading from a pipe, or from a
terminal), or because read() was interrupted by a signal.
Does this mean that the read() function can be interrupted during receiving such a small amount of data as 4 bytes? Should the source code from this answer be corrected?
In the man page pipe(7) there is:
POSIX.1-2001 says that write(2)s of less than PIPE_BUF bytes must be atomic: the output data is written to the pipe as a contiguous sequence.
but there is nothing similar about read().
If the write is atomic, that means that the entire content is already present in the buffer when the read happens so the only way to have an incomplete read is if the kernel thread decides to yield before it's finished - which wouldn't happen here.
In general you can rely on small write()s on pipes on the same system mapping to identical read()s. 4 bytes is unquestionably far smaller than any buffer would ever be, so it will definitely be atomic.
From the documentation:
store() should return the number of bytes used from the buffer. If the
entire buffer has been used, just return the count argument.
What does it do with this value? What's the difference if from a buffer of size FOO I read 4 and not 6 bytes?
You must realize that by implementing a sysfs file, you are trying to behave like a file.
Let's see this from the other side first. From the man page of fwrite(3):
RETURN VALUE
fread() and fwrite() return the number of items successfully read or written (i.e., not the number of characters). If an error occurs, or the end-of-file is
reached, the return value is a short item count (or zero).
And even better, from the man page of write(2):
The number of bytes written may be less than count if, for example, there is insufficient space on the underlying physical medium, or the RLIMIT_FSIZE resource
limit is encountered (see setrlimit(2)), or the call was interrupted by a signal handler after having written less than count bytes. (See also pipe(7).)
What this means is that store(), which is implementing the other end of the write(2) function for your particular file should return the number of bytes written (i.e. read by you), in the very least so that write(2) can return that value to the user.
In most cases, if there is no error in the input, you would just want to return count to acknowledge that you have read everything and all is ok.
If I use write in this way: write (fd, buf, 10000000 /* 10MB */) where fd is a socket and uses blocking I/O, will the kernel tries to flush as many bytes as possible so that only one call is enough? Or I have to call write several times according to its return value? If that happens, does it mean something is wrong with fd?
============================== EDITED ================================
Thanks for all the answers. Furthermore, if I put fd into poll and it returns successfully with POLLOUT, so call to write cannot be blocked and writes all the data unless something is wrong with fd?
In blocking mode, write(2) will only return if specified number of bytes are written. If it can not write it'll wait.
In non-blocking (O_NONBLOCK) mode it'll not wait. It'll return right then. If it can write all of them it'll be a success other wise it'll set errno accordingly. Then you have check the errno if its EWOULDBLOCK or EAGAIN you have to invoke same write agian.
From manual of write(2)
The number of bytes written may be less than count if, for example, there is insufficient space on the underlying physical medium, or the RLIMIT_FSIZE resource
limit is encountered (see setrlimit(2)), or the call was interrupted by a signal handler after having written less than count bytes. (See also pipe(7).)
So yes, there can be something wrong with fd.
Also note this
A successful return from write() does not make any guarantee that data has been committed to disk. In fact, on some buggy implementations, it does not even guar‐
antee that space has successfully been reserved for the data. The only way to be sure is to call fsync(2) after you are done writing all your data.
/etc/sysctl.conf is used in Linux to set parameters for the TCP protocol, which is what I assume you mean by a socket. There may be a lot of parameters there, but when you dig through it, basically there is a limit to the amount of data the TCP buffers can hold at one time.
So if you tried to write 10 MB of data at one go, write would return a ssize_t value equal to that value. Always check the return value of the write() system call. If the system allowed 10MB then write would return that value.
The value is
net.core.wmem_max = [some number]
If you change some number to a value large enough to allow 10MB you can write that much. DON'T do that! You could cause other problems. Research settings before you do anything. Changing settings can decrease performance. Be careful.
http://linux.die.net/man/7/tcp
has basic C information for TCP settings. Also check out /proc/sys/net on your box.
One other point - TCP is a two way door, so just because you can send a zillion bytes at one time does not mean the other side can read it or even handle it. You socket may just block for a while. And possibly your write() return value may be less than you hoped for.