Is partial data read possible from a pipe due to delay in writing?

Is partial data read possible from a pipe due to delay in writing? - linux

I have a situation where one process is writing 512 bytes of data to a pipe every 100ms, and another process is continuously reading from the same pipe. It took three read operation to read complete 512 bytes of data from pipe. The first read operation returns only a portion of the data (e.g. 100 bytes), and the next read returns zero bytes without any error set and the last read operation reads the remaining data left in the pipe (e.g. 412 bytes).
I assume that the write operation is atomic as the 512 bytes is lesser than PIPE_BUF bytes. If so, is it possible for a read operation to return partial data, especially it took three reads for reading 512 bytes and the second read returned zero bytes.
// Create the named pipe
if (mkfifo("/tmp/my_pipe", 0666) == -1) {
perror("mkfifo");
return 1;
}
// Open the named pipe for reading and writing with non-
// blocking behavior
if ((fd = open("/tmp/my_pipe", O_RDWR | O_NONBLOCK)) == -1) {
perror("open");
return 1;
}
// Open the named pipe for reading with non-blocking behavior
if ((fd = open("/tmp/my_pipe", O_RDONLY | O_NONBLOCK)) == -1) {
perror("open");
return 1;
}

The man page for pipe (i.e., man 7 pipe) says
The communication channel provided by a pipe is a byte stream: there
is no concept of message boundaries.
This terse statement implies that reads can see what you call 'partial data' from the pipe; rather that there is no such thing as partial data, because it's a byte stream.
For what it's worth, TCP has the same feature.

Related

Do block devices (emmc partitions) have end-of-file marker?

I need to compute the sha1sum of an emmc partition and obviously this involves reading the contents of the partition (if at all it is relevant - the partition is ext4-formatted).
I am performing the read operation on /dev/mmcblkp** just like any other fd:
while ((ret = read(blk_dev_fd, buffer, BLOCKSIZE))) > 0) {
printf("Read %zd bytes from source_fd\n", ret);
// do something
}
Is this correct? Can I expect read to return 0 on EOF or is there no such thing as EOF on block devices?

Why to print a string in interrupt driven IO, only the first character needs to be copied?

Almost all materials I found online referenced the code below from Tananbaum's OS book. However I don't really understand why this would print the whole string instead of only the first character.
Is it because the interrupts will be generated recursively? But wouldn't that cost a lot of resources? Or did I miss something?
I'm really confused. Any help would be appreciated.
Code executed when print system call is made:
copy_from_user (buffer, p, count);
enable_interrupts ();
while (*printer_status_reg !=READY);
*printer_data_register = p[0];
scheduler ();
Interrupt handler:
if (count == 0) {
unblock_user ();
} else {
*printer_data_register = p[i];
count = count – 1;
i++;
}
acknowledge_interrupt ();
return_from_interrupt ();

You write first character in buffer and start the transmission.
After completion of transmission, Tx_Complete interrupt will be generated.
Now, your interrupt handler checks, whether there are any more bytes to transfer (The else part). If available, it adds next byte to transmit register, decrements number of bytes to transmit and increments buffer index.
This process goes on... When number of bytes to transmit reaches zero, you don't initiate next transfer and your interrupts stop.
By transferring first byte, you initiate the process and rest bytes are transferred by interrupt handler. You have to make sure that count is correct.
You can guess what can happen if count is less or more!

How can I detect whether a WAV file has a 44 or 46-byte header?

I've discovered it is dangerous to assume that all PCM wav audio files have 44 bytes of header data before the samples begin. Though this is common, many applications (ffmpeg for example), will generate wavs with a 46-byte header and ignoring this fact while processing will result in a corrupt and unreadable file. But how can you detect how long the header actually is?
Obviously there is a way to do this, but I searched and found little discussion about this. A LOT of audio projects out there assume 44 (or conversely, 46) depending on the authors own context.

You should be checking all of the header data to see what the actual sizes are. Broadcast Wave Format files will contain an even larger extension subchunk. WAV and AIFF files from Pro Tools have even more extension chunks that are undocumented as well as data after the audio. If you want to be sure where the sample data begins and ends you need to actually look for the data chunk ('data' for WAV files and 'SSND' for AIFF).
As a review, all WAV subchunks conform to the following format:
Subchunk Descriptor (4 bytes)
Subchunk Size (4 byte integer, little endian)
Subchunk Data (size is Subchunk Size)
This is very easy to process. All you need to do is read the descriptor, if it's not the one you are looking for, read the data size and skip ahead to the next. A simple Java routine to do that would look like this:
//
// Quick note for people who don't know Java well:
// 'in.read(...)' returns -1 when the stream reaches
// the end of the file, so 'if (in.read(...) < 0)'
// is checking for the end of file.
//
public static void printWaveDescriptors(File file)
throws IOException {
try (FileInputStream in = new FileInputStream(file)) {
byte[] bytes = new byte[4];
// Read first 4 bytes.
// (Should be RIFF descriptor.)
if (in.read(bytes) < 0) {
return;
}
printDescriptor(bytes);
// First subchunk will always be at byte 12.
// (There is no other dependable constant.)
in.skip(8);
for (;;) {
// Read each chunk descriptor.
if (in.read(bytes) < 0) {
break;
}
printDescriptor(bytes);
// Read chunk length.
if (in.read(bytes) < 0) {
break;
}
// Skip the length of this chunk.
// Next bytes should be another descriptor or EOF.
int length = (
Byte.toUnsignedInt(bytes[0])
| Byte.toUnsignedInt(bytes[1]) << 8
| Byte.toUnsignedInt(bytes[2]) << 16
| Byte.toUnsignedInt(bytes[3]) << 24
);
in.skip(Integer.toUnsignedLong(length));
}
System.out.println("End of file.");
}
}
private static void printDescriptor(byte[] bytes)
throws IOException {
String desc = new String(bytes, "US-ASCII");
System.out.println("Found '" + desc + "' descriptor.");
}
For example here is a random WAV file I had:
Found 'RIFF' descriptor.
Found 'bext' descriptor.
Found 'fmt ' descriptor.
Found 'minf' descriptor.
Found 'elm1' descriptor.
Found 'data' descriptor.
Found 'regn' descriptor.
Found 'ovwf' descriptor.
Found 'umid' descriptor.
End of file.
Notably, here both 'fmt ' and 'data' legitimately appear in between other chunks because Microsoft's RIFF specification says that subchunks can appear in any order. Even some major audio systems that I know of get this wrong and don't account for that.
So if you want to find a certain chunk, loop through the file checking each descriptor until you find the one you're looking for.

The trick is to look at the "Subchunk1Size", which is a 4-byte integer beginning at byte 16 of the header. In a normal 44-byte wav, this integer will be 16 [10, 0, 0, 0]. If it's a 46-byte header, this integer will be 18 [12, 0, 0, 0] or maybe even higher if there is extra extensible meta data (rare?).
The extra data itself (if present), begins in byte 36.
So a simple C# program to detect the header length would look like this:
static void Main(string[] args)
{
byte[] bytes = new byte[4];
FileStream fileStream = new FileStream(args[0], FileMode.Open, FileAccess.Read);
fileStream.Seek(16, 0);
fileStream.Read(bytes, 0, 4);
fileStream.Close();
int Subchunk1Size = BitConverter.ToInt32(bytes, 0);
if (Subchunk1Size < 16)
Console.WriteLine("This is not a valid wav file");
else
switch (Subchunk1Size)
{
case 16:
Console.WriteLine("44-byte header");
break;
case 18:
Console.WriteLine("46-byte header");
break;
default:
Console.WriteLine("Header contains extra data and is larger than 46 bytes");
break;
}
}

In addition to Radiodef's excellent reply, I'd like to add 3 things that aren't obvious.
The only rule for WAV files is the FMT chunk comes before the DATA chunk. Apart from that, you will find chunks you don't know about at the beginning, before the DATA chunk and after it. You must read the header for each chunk to skip forward to find the next chunk.
The FMT chunk is commonly found in 16 byte and 18 byte variations, but the spec actually allows more than 18 bytes as well.
If the FMT chunk' header size field says greater than 16, Bytes 17 and 18 also specify how many extra bytes there are, so if they are both zero, you end up with an 18 byte FMT chunk identical to the 16 byte one.
It is safe to read in just the first 16 bytes of the FMT chunk and parse those, ignoring any more.
Why does this matter? - not much any more, but Windows XP's Media Player was able to play 16 bit WAV files, but 24 bit WAV files only if the FMT chunk was the Extended (18+ byte) version. There used to be a lot of complaints that "Windows doesn't play my 24 bit WAV files", but if it had an 18 byte FMT chunk, it would... Microsoft fixed that sometime during the early days of Windows 7, so 24 bit with 16 byte FMT files work fine now.
(Newly added) Chunk sizes with odd sizes occur quite often. Mostly seen when a 24 bit mono file is made. It is unclear from the spec, but the chunk size specifies the actual data length (the odd value) and a pad byte (zero) is added after the chunk and before the start of the next chunk. So chunks always start on even boundaries, but the chunk size itself is stored as the actual odd value.

using EOF for signaling on unnamed pipes

I have a test program that uses unnamed pipes created with pipe() to communicate between parent and child processes created with fork() on a Linux system.
Normally, when the sending process closes the write fd of the pipe, the receiving process returns from read() with a value of 0, indicating EOF.
However, it seems that if I stuff the pipe with a fairly large amount of data (maybe 100K bytes0 before the receiver starts reading, the receiver blocks after reading all the data in the pipe - even though the sender has closed it.
I have verified that the sending process has closed the pipe with lsof, and it seems pretty clear that the receiver is blocked.
Which leads to the question: is closing one end of the pipe a reliable way to let the receiver know that there is no more data?
If it is, and there are no conditions that can lead to a read() blocking on an empty, closed FIFO, there's something wrong with my code. If not, it means I need to find an alternate method of signalling the end of the data stream.
Resolution
I was pretty sure that the original assumption was correct, that closing a pipe causes an EOF at the reader side, this question was just a shot in the dark - thinking maybe there was some subtle pipe behavior I was overlooking. Nearly every example you ever see with pipes is a toy that sends a few bytes and exits. Things often work differently when you are no longer doing atomic operations.
In any case, I tried to simplify my code to flush out the problem and was successful in finding my problem. In pseudocode, I ended up doing something like this:
create pipe1
if ( !fork() ) {
close pipe1 write fd
do some stuff reading pipe1 until EOF
}
create pipe2
if ( !fork() ) {
close pipe2 write fd
do some stuff reading pipe2 until EOF
}
close pipe1 read fd
close pipe2 read fd
write data to pipe1
get completion response from child 1
close pipe1 write fd
write data to pipe2
get completion response from child 2
close pipe2 write fd
wait for children to exit
The child process reading pipe1 was hanging, but only when the amount of data in the pipe became substantial. This was occurring even though I had closed the pipe that child1 was reading.
A look at the source shows the problem. When I forked the second child process, it grabbed its own copy of the pipe1 file descriptors, which were left open. Even though only one process should be writing to the pipe, having it open in the second process kept it from going into an EOF state.
The problem didn't show up with small data sets, because child2 was finishing its business quickly, and exiting. But with larger data sets, child2 wasn't returning quickly, and I ended up with a deadlock.

read should return EOF when the writers have closed the write end.
Since you do a pipe and then a fork, both processes will have the write fd open. It could be that in the reader process you have forgotten to close the write portion of the pipe.
Caveat: It has been a long time since I programmed on Unix. So it might be inaccurate.
Here is some code from: http://www.cs.uml.edu/~fredm/courses/91.308/files/pipes.html. Look at the "close unused" comments below.
#include <stdio.h>
/* The index of the "read" end of the pipe */
#define READ 0
/* The index of the "write" end of the pipe */
#define WRITE 1
char *phrase = "Stuff this in your pipe and smoke it";
main () {
int fd[2], bytesRead;
char message [100]; /* Parent process message buffer */
pipe ( fd ); /*Create an unnamed pipe*/
if ( fork ( ) == 0 ) {
/* Child Writer */
close (fd[READ]); /* Close unused end*/
write (fd[WRITE], phrase, strlen ( phrase) +1); /* include NULL*/
close (fd[WRITE]); /* Close used end*/
printf("Child: Wrote '%s' to pipe!\n", phrase);
} else {
/* Parent Reader */
close (fd[WRITE]); /* Close unused end*/
bytesRead = read ( fd[READ], message, 100);
printf ( "Parent: Read %d bytes from pipe: %s\n", bytesRead, message);
close ( fd[READ]); /* Close used end */
}
}

recv with MSG_NONBLOCK and MSG_WAITALL

I want to use recv syscall with nonblocking flags MSG_NONBLOCK. But with this flag syscall can return before full request is satisfied. So,
can I add MSG_WAITALL flag? Will it be nonblocking?
or how should I rewrite blocking recv into the loop with nonblocking recv

For IPv4 TCP receives on Linux at least, MSG_WAITALL is ignored if MSG_NONBLOCK is specified (or the file descriptor is set to non-blocking).
From tcp_recvmsg() in net/ipv4/tcp.c in the Linux kernel:
if (copied >= target && !sk->sk_backlog.tail)
break;
if (copied) {
if (sk->sk_err ||
sk->sk_state == TCP_CLOSE ||
(sk->sk_shutdown & RCV_SHUTDOWN) ||
!timeo ||
signal_pending(current))
break;
target in this cast is set to to the requested size if MSG_DONTWAIT is specified or some smaller value (at least 1) if not. The function will complete if:
Enough bytes have been copied
There's a socket error
The socket has been closed or shutdown
timeo is 0 (socket is set to non-blocking)
There's a signal pending for the process
To me this seems like it may be a bug in Linux, but either way it won't work the way you want. It looks like dec-vt100's solution will, but there is a race condition if you try to receive from the same socket in more than one process or thread.That is, another recv() call by another thread/process could occur after your thread has performed a peek, causing your thread to block on the second recv().

EDIT:
Plain recv() will return whatever is in the tcp buffer at the time of the call up to the requested number of bytes. MSG_DONTWAIT just avoids blocking if there is no data at all ready to be read on the socket. MSG_WAITALL requests blocking until the entire number of bytes requested can be read. So you won't get "all or none" behavior. At best you should get EAGAIN if no data is present and block until the full message is available otherwise.
You might be able to fashion something out of MSG_PEEK or ioctl() with a FIONREAD (if your system supports it) that effectively behaves like you want but I am unaware how you can accomplish your goal just using the recv() flags.

This is what I did for the same problem, but I'd like some confirmation that this works as expected...
ssize_t recv_allOrNothing(int socket_id, void *buffer, size_t buffer_len, bool block = false)
{
if(!block)
{
ssize_t bytes_received = recv(socket_id, buffer, buffer_len, MSG_DONTWAIT | MSG_PEEK);
if (bytes_received == -1)
return -1;
if ((size_t)bytes_received != buffer_len)
return 0;
}
return recv(socket_id, buffer, buffer_len, MSG_WAITALL);
}

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string