Reading TCP stream by one byte in rust - rust

I want to read bytes from a TCP Stream but if I read it by 100 or 1000 bytes I can cut some bytes of the next request. So I decided to read stream by one byte like this:
let mut res_buf = vec![];
loop {
let mut buf = vec![0; 1];
let n = match self.read(&mut buf) {
Err(e) => {
match e.kind() {
io::ErrorKind::WouldBlock => {
continue;
},
_ => panic!("Got an error: {}", e),
}
},
Ok(m) => {
if m == 0 {
return Err(Error::new(ErrorKind::BrokenPipe, "broken"))
}
m
},
};
buf.truncate(n);
res_buf.extend(buf.iter());
let stringed = String::from_utf8_lossy(&res_buf);
if stringed.contains("\r\n\r\n") {
// END OF PART
return Ok(stringed.to_string());
}
}
It fires read calls on every byte. Is this wrong and inefficient?

1.Reading by one byte is good if you read from the "pure" memory(std::io::Cursor over a Vec or array etc.) and bad if this reading triggers a syscall (file, socket etc.). In the latter case you better should wrap it in a BufReader, but in your code you will be loosing bytes (hold in BufReader and not read by your function) on return. Consider refactor your code, for example so you pass the res_buf for processing the whole request to another thread.
2.Your if stringed.contains("\r\n\r\n") very probably should be ends_with("\r\n\r\n")
3.You do not need a heap allocated buffer here:
let mut byte = 0 as u8;
self.read(std::slice::from_mut(&mut byte));

Related

How to read ANSI DSR Query Responses in Rust

I'm looking for a pure rust solution which either uses Rust's stdlib or doesn't utilize a crate with libc.
One example of a Device Status Report (DSR) is a simple query to find the current size of a terminal in width and height. First a CSI request: ESC + [. Then a command of 19 and finally a termination character of t. For example: \x1b[19t. The terminal will then respond with another CSI: ESC + [, a status identifier, 9 and then the row count followed by the column count and a terminating character, t. For an 80x25 terminal, the response would look like: \x1b[9;25;80t.
In Rust, it is rather straight-forward to send a request to the terminal using a print! macro call:
print!("\x1b[19t");
Alternatively, in a shell, one might run:
$ printf "\x1b[19t"
However, I've been unable to actually capture the response coming back from the terminal. This is not my initial approach, but it is supposedly the one that provides me with a non-blocking stdin read (based on experimentation, it is not non-blocking), which is something I'm expecting that I need.
use std::sync::mpsc::{channel, Receiver, TryRecvError};
use std::{
io::{self, Read},
thread, time,
};
fn main() {
let timeout = time::Duration::from_micros(50);
let rx = spawn_read();
print!("\x1b[19t");
let mut data = Vec::new();
for _ in 0..200 {
thread::sleep(timeout);
match rx.try_recv() {
Ok(value) => data.push(value),
Err(TryRecvError::Empty) => {}
Err(TryRecvError::Disconnected) => break,
}
}
let string = match std::str::from_utf8(&data) {
Ok(value) => value,
Err(_why) => "",
};
println!("Captured: {:?}", string);
}
fn spawn_read() -> Receiver<u8> {
let (tx, rx) = channel::<u8>();
thread::spawn(move || loop {
let mut buf = [0u8];
io::stdin().read_exact(&mut buf).unwrap();
if buf != [0u8] {
tx.send(buf[0]).ok();
}
});
rx
}
Running the above, I see:
Captured: ""
But with the code above, I expect (some flavor of CSI 9 ; <rows> ; <cols> t):
Captured: "\x1b[9;25;80t"

How is the UDP server example safe?

I have a UDP server example, which is working:
let mut socket = UdpSocket::bind("127.0.0.1:12345")?;
let mut buf = [0; 4096];
loop {
let sock = socket.try_clone()?;
match socket.recv_from(&mut buf) {
Ok((amt, src)) => {
thread::spawn(move || {
println!("Handling connection from {}", &src);
let buf = &mut buf[..amt];
buf.reverse();
sock.send_to(&buf, &src).expect("error sending");
});
},
Err(err) => {
eprintln!("Err: {}", err);
}
}
}
It looks like all incoming UDP packets are sharing the same buffer.
If two UDP packets arrive at the same time, won't the second packet overwrite the first in buf, leading to the second packet's (reversed) content being sent back to both senders?
How does Rust prevent this?
No, the buffer is not shared among threads but implicitly copied before starting a new one. From the array doc:
Arrays of any size are Copy if the element type is Copy
Therefore a move keyword forces a buffer movement, which leads to a copy since the same buffer can be used at the next iteration of the loop.

How can I asynchronously read from both stdout and stderr of a subprocess using Tokio? [duplicate]

I'm making a small ncurses application in Rust that needs to communicate with a child process. I already have a prototype written in Common Lisp. I'm trying to rewrite it because CL uses a huge amount of memory for such a small tool.
I'm having some trouble figuring out how to interact with the sub-process.
What I'm currently doing is roughly this:
Create the process:
let mut program = match Command::new(command)
.args(arguments)
.stdin(Stdio::piped())
.stdout(Stdio::piped())
.stderr(Stdio::piped())
.spawn()
{
Ok(child) => child,
Err(_) => {
println!("Cannot run program '{}'.", command);
return;
}
};
Pass it to an infinite (until user exits) loop, which reads and handles input and listens for output like this (and writes it to the screen):
fn listen_for_output(program: &mut Child, output_viewer: &TextViewer) {
match program.stdout {
Some(ref mut out) => {
let mut buf_string = String::new();
match out.read_to_string(&mut buf_string) {
Ok(_) => output_viewer.append_string(buf_string),
Err(_) => return,
};
}
None => return,
};
}
The call to read_to_string however blocks the program until the process exits. From what I can see read_to_end and read also seem to block. If I try running something like ls which exits right away, it works, but with something that doesn't exit like python or sbcl it only continues once I kill the subprocess manually.
Based on this answer, I changed the code to use BufReader:
fn listen_for_output(program: &mut Child, output_viewer: &TextViewer) {
match program.stdout.as_mut() {
Some(out) => {
let buf_reader = BufReader::new(out);
for line in buf_reader.lines() {
match line {
Ok(l) => {
output_viewer.append_string(l);
}
Err(_) => return,
};
}
}
None => return,
}
}
However, the problem still remains the same. It will read all lines that are available, and then block. Since the tool is supposed to work with any program, there is no way to guess out when the output will end, before trying to read. There doesn't appear to be a way to set a timeout for BufReader either.
Streams are blocking by default. TCP/IP streams, filesystem streams, pipe streams, they are all blocking. When you tell a stream to give you a chunk of bytes it will stop and wait till it has the given amout of bytes or till something else happens (an interrupt, an end of stream, an error).
The operating systems are eager to return the data to the reading process, so if all you want is to wait for the next line and handle it as soon as it comes in then the method suggested by Shepmaster in Unable to pipe to or from spawned child process more than once (and also in his answer here) works.
Though in theory it doesn't have to work, because an operating system is allowed to make the BufReader wait for more data in read, but in practice the operating systems prefer the early "short reads" to waiting.
This simple BufReader-based approach becomes even more dangerous when you need to handle multiple streams (like the stdout and stderr of a child process) or multiple processes. For example, BufReader-based approach might deadlock when a child process waits for you to drain its stderr pipe while your process is blocked waiting on it's empty stdout.
Similarly, you can't use BufReader when you don't want your program to wait on the child process indefinitely. Maybe you want to display a progress bar or a timer while the child is still working and gives you no output.
You can't use BufReader-based approach if your operating system happens not to be eager in returning the data to the process (prefers "full reads" to "short reads") because in that case a few last lines printed by the child process might end up in a gray zone: the operating system got them, but they're not large enough to fill the BufReader's buffer.
BufReader is limited to what the Read interface allows it to do with the stream, it's no less blocking than the underlying stream is. In order to be efficient it will read the input in chunks, telling the operating system to fill as much of its buffer as it has available.
You might be wondering why reading data in chunks is so important here, why can't the BufReader just read the data byte by byte. The problem is that to read the data from a stream we need the operating system's help. On the other hand, we are not the operating system, we work isolated from it, so as not to mess with it if something goes wrong with our process. So in order to call to the operating system there needs to be a transition to "kernel mode" which might also incur a "context switch". That is why calling the operating system to read every single byte is expensive. We want as few OS calls as possible and so we get the stream data in batches.
To wait on a stream without blocking you'd need a non-blocking stream. MIO promises to have the required non-blocking stream support for pipes, most probably with PipeReader, but I haven't checked it out so far.
The non-blocking nature of a stream should make it possible to read data in chunks regardless of whether the operating system prefers the "short reads" or not. Because non-blocking stream never blocks. If there is no data in the stream it simply tells you so.
In the absense of a non-blocking stream you'll have to resort to spawning threads so that the blocking reads would be performed in a separate thread and thus won't block your primary thread. You might also want to read the stream byte by byte in order to react to the line separator immediately in case the operating system does not prefer the "short reads". Here's a working example: https://gist.github.com/ArtemGr/db40ae04b431a95f2b78.
P.S. Here's an example of a function that allows to monitor the standard output of a program via a shared vector of bytes:
use std::io::Read;
use std::process::{Command, Stdio};
use std::sync::{Arc, Mutex};
use std::thread;
/// Pipe streams are blocking, we need separate threads to monitor them without blocking the primary thread.
fn child_stream_to_vec<R>(mut stream: R) -> Arc<Mutex<Vec<u8>>>
where
R: Read + Send + 'static,
{
let out = Arc::new(Mutex::new(Vec::new()));
let vec = out.clone();
thread::Builder::new()
.name("child_stream_to_vec".into())
.spawn(move || loop {
let mut buf = [0];
match stream.read(&mut buf) {
Err(err) => {
println!("{}] Error reading from stream: {}", line!(), err);
break;
}
Ok(got) => {
if got == 0 {
break;
} else if got == 1 {
vec.lock().expect("!lock").push(buf[0])
} else {
println!("{}] Unexpected number of bytes: {}", line!(), got);
break;
}
}
}
})
.expect("!thread");
out
}
fn main() {
let mut cat = Command::new("cat")
.stdin(Stdio::piped())
.stdout(Stdio::piped())
.stderr(Stdio::piped())
.spawn()
.expect("!cat");
let out = child_stream_to_vec(cat.stdout.take().expect("!stdout"));
let err = child_stream_to_vec(cat.stderr.take().expect("!stderr"));
let mut stdin = match cat.stdin.take() {
Some(stdin) => stdin,
None => panic!("!stdin"),
};
}
With a couple of helpers I'm using it to control an SSH session:
try_s! (stdin.write_all (b"echo hello world\n"));
try_s! (wait_forˢ (&out, 0.1, 9., |s| s == "hello world\n"));
P.S. Note that await on a read call in async-std is blocking as well. It's just instead of blocking a system thread it only blocks a chain of futures (a stack-less green thread essentially). The poll_read is the non-blocking interface. In async-std#499 I've asked the developers whether there's a short read guarantee from these APIs.
P.S. There might be a similar concern in Nom: "we would want to tell the IO side to refill according to the parser's result (Incomplete or not)"
P.S. Might be interesting to see how stream reading is implemented in crossterm. For Windows, in poll.rs, they are using the native WaitForMultipleObjects. In unix.rs they are using mio poll.
Tokio's Command
Here is an example of using tokio 0.2:
use std::process::Stdio;
use futures::StreamExt; // 0.3.1
use tokio::{io::BufReader, prelude::*, process::Command}; // 0.2.4, features = ["full"]
#[tokio::main]
async fn main() {
let mut cmd = Command::new("/tmp/slow.bash")
.stdout(Stdio::piped()) // Can do the same for stderr
.spawn()
.expect("cannot spawn");
let stdout = cmd.stdout().take().expect("no stdout");
// Can do the same for stderr
// To print out each line
// BufReader::new(stdout)
// .lines()
// .for_each(|s| async move { println!("> {:?}", s) })
// .await;
// To print out each line *and* collect it all into a Vec
let result: Vec<_> = BufReader::new(stdout)
.lines()
.inspect(|s| println!("> {:?}", s))
.collect()
.await;
println!("All the lines: {:?}", result);
}
Tokio-Threadpool
Here is an example of using tokio 0.1 and tokio-threadpool. We start the process in a thread using the blocking function. We convert that to a stream with stream::poll_fn
use std::process::{Command, Stdio};
use tokio::{prelude::*, runtime::Runtime}; // 0.1.18
use tokio_threadpool; // 0.1.13
fn stream_command_output(
mut command: Command,
) -> impl Stream<Item = Vec<u8>, Error = tokio_threadpool::BlockingError> {
// Ensure that the output is available to read from and start the process
let mut child = command
.stdout(Stdio::piped())
.spawn()
.expect("cannot spawn");
let mut stdout = child.stdout.take().expect("no stdout");
// Create a stream of data
stream::poll_fn(move || {
// Perform blocking IO
tokio_threadpool::blocking(|| {
// Allocate some space to store anything read
let mut data = vec![0; 128];
// Read 1-128 bytes of data
let n_bytes_read = stdout.read(&mut data).expect("cannot read");
if n_bytes_read == 0 {
// Stdout is done
None
} else {
// Only return as many bytes as we read
data.truncate(n_bytes_read);
Some(data)
}
})
})
}
fn main() {
let output_stream = stream_command_output(Command::new("/tmp/slow.bash"));
let mut runtime = Runtime::new().expect("Unable to start the runtime");
let result = runtime.block_on({
output_stream
.map(|d| String::from_utf8(d).expect("Not UTF-8"))
.fold(Vec::new(), |mut v, s| {
print!("> {}", s);
v.push(s);
Ok(v)
})
});
println!("All the lines: {:?}", result);
}
There's numerous possible tradeoffs that can be made here. For example, always allocating 128 bytes isn't ideal, but it's simple to implement.
Support
For reference, here's slow.bash:
#!/usr/bin/env bash
set -eu
val=0
while [[ $val -lt 10 ]]; do
echo $val
val=$(($val + 1))
sleep 1
done
See also:
How do I synchronously return a value calculated in an asynchronous Future in stable Rust?
If Unix support is sufficient, you can also make the two output streams as non-blocking and poll over them as you would do it on TcpStream with the set_nonblocking function.
The ChildStdout and ChildStderr returned by the Command spawn are Stdio (and contain a file descriptor), you can modify directly the read behavior of these handle to make it non-blocking.
Based on the work of jcreekmore/timeout-readwrite-rs and anowell/nonblock-rs, I use this wrapper to modify the stream handles:
extern crate libc;
use std::io::Read;
use std::os::unix::io::AsRawFd;
use libc::{F_GETFL, F_SETFL, fcntl, O_NONBLOCK};
fn set_nonblocking<H>(handle: &H, nonblocking: bool) -> std::io::Result<()>
where
H: Read + AsRawFd,
{
let fd = handle.as_raw_fd();
let flags = unsafe { fcntl(fd, F_GETFL, 0) };
if flags < 0 {
return Err(std::io::Error::last_os_error());
}
let flags = if nonblocking{
flags | O_NONBLOCK
} else {
flags & !O_NONBLOCK
};
let res = unsafe { fcntl(fd, F_SETFL, flags) };
if res != 0 {
return Err(std::io::Error::last_os_error());
}
Ok(())
}
You can manage the two streams as any other non-blocking stream. The following example is based on the polling crate which makes really easy to handle read event and BufReader for line reading:
use std::process::{Command, Stdio};
use std::path::PathBuf;
use std::io::{BufReader, BufRead};
use std::thread;
extern crate polling;
use polling::{Event, Poller};
fn main() -> Result<(), std::io::Error> {
let path = PathBuf::from("./worker.sh").canonicalize()?;
let mut child = Command::new(path)
.stdin(Stdio::null())
.stdout(Stdio::piped())
.stderr(Stdio::piped())
.spawn()
.expect("Failed to start worker");
let handle = thread::spawn({
let stdout = child.stdout.take().unwrap();
set_nonblocking(&stdout, true)?;
let mut reader_out = BufReader::new(stdout);
let stderr = child.stderr.take().unwrap();
set_nonblocking(&stderr, true)?;
let mut reader_err = BufReader::new(stderr);
move || {
let key_out = 1;
let key_err = 2;
let mut out_closed = false;
let mut err_closed = false;
let poller = Poller::new().unwrap();
poller.add(reader_out.get_ref(), Event::readable(key_out)).unwrap();
poller.add(reader_err.get_ref(), Event::readable(key_err)).unwrap();
let mut line = String::new();
let mut events = Vec::new();
loop {
// Wait for at least one I/O event.
events.clear();
poller.wait(&mut events, None).unwrap();
for ev in &events {
// stdout is ready for reading
if ev.key == key_out {
let len = match reader_out.read_line(&mut line) {
Ok(len) => len,
Err(e) => {
println!("stdout read returned error: {}", e);
0
}
};
if len == 0 {
println!("stdout closed (len is null)");
out_closed = true;
poller.delete(reader_out.get_ref()).unwrap();
} else {
print!("[STDOUT] {}", line);
line.clear();
// reload the poller
poller.modify(reader_out.get_ref(), Event::readable(key_out)).unwrap();
}
}
// stderr is ready for reading
if ev.key == key_err {
let len = match reader_err.read_line(&mut line) {
Ok(len) => len,
Err(e) => {
println!("stderr read returned error: {}", e);
0
}
};
if len == 0 {
println!("stderr closed (len is null)");
err_closed = true;
poller.delete(reader_err.get_ref()).unwrap();
} else {
print!("[STDERR] {}", line);
line.clear();
// reload the poller
poller.modify(reader_err.get_ref(), Event::readable(key_err)).unwrap();
}
}
}
if out_closed && err_closed {
println!("Stream closed, exiting process thread");
break;
}
}
}
});
handle.join().unwrap();
Ok(())
}
Additionally, used with a wrapper over an EventFd, it becomes possible to easily stop the process from another thread without blocking nor active polling and uses and only a single thread.
EDIT: It seems the polling crate sets automatically the polled handles in non-blocking mode following my tests. The set_nonblocking function is still useful in case you want to directly use the nix::poll object.
I have encountered enough use-cases where it was useful to interact with a subprocess over line-delimited text that I wrote a crate for it, interactive_process.
I expect the original problem has long since been solved, but I thought it might be helpful to others.

Why does calling BufReader::fill_buf after calling consume return fewer bytes than I expect?

I am trying to implement streaming of UTF-8 characters from a file. This is what I've got so far, please excuse the ugly code for now.
use std::fs::File;
use std::io;
use std::io::BufRead;
use std::str;
fn main() -> io::Result<()> {
let mut reader = io::BufReader::with_capacity(100, File::open("utf8test.txt")?);
loop {
let mut consumed = 0;
{
let buf = reader.fill_buf()?;
println!("buf len: {}", buf.len());
match str::from_utf8(&buf) {
Ok(s) => {
println!("====\n{}", s);
consumed = s.len();
}
Err(err) => {
if err.valid_up_to() == 0 {
println!("1. utf8 decoding failed!");
} else {
match str::from_utf8(&buf[..err.valid_up_to()]) {
Ok(s) => {
println!("====\n{}", s);
consumed = s.len();
}
_ => println!("2. utf8 decoding failed!"),
}
}
}
}
}
if consumed == 0 {
break;
}
reader.consume(consumed);
println!("consumed {} bytes", consumed);
}
Ok(())
}
I have a test file with a multibyte character at offset 98 which fails to decode as it does not fit completely into my (arbitrarily-sized) 100 byte buffer. That's fine, I just ignore it and decode what is valid up to the start of that character.
The problem is that after calling consume(98) on the BufReader, the next call to fill_buf() only returns 2 bytes... it seems to have not bothered to read any more bytes into the buffer. I don't understand why. Maybe I have misinterpreted the documentation.
Here is the sample output:
buf len: 100
====
UTF-8 encoded sample plain-text file
‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾
consumed 98 bytes
buf len: 2
1. utf8 decoding failed!
It would be nice if from_utf8() would return the partially decoded string and the position of the decoding error so I don't have to call it twice whenever this happens, but there doesn't seem to be such a function in the standard library (that I am aware of).
I encourage you to learn how to produce a Minimal, Complete, and Verifiable example. This is a valuable skill that professional programmers use to better understand problems and focus attention on the important aspects of a problem. For example, you didn't provide the actual input file, so it's very difficult for anyone to reproduce your behavior using the code you provided.
After trial-and-error, I was able to reduce your problem down to this code:
use std::io::{self, BufRead};
fn main() -> io::Result<()> {
let mut reader = io::BufReader::with_capacity(100, io::repeat(b'a'));
let a = reader.fill_buf()?.len();
reader.consume(98);
let b = reader.fill_buf()?.len();
println!("{}, {}", a, b); // 100, 2
Ok(())
}
Unfortunately for your case, this behavior is allowed by the contract of BufRead and is in fact almost required. The point of a buffered reader is to avoid making calls to the underlying reader as much as possible. The trait does not know how many bytes you need to read, and it doesn't know that 2 bytes isn't enough and it should perform another call. Flipping it the other way, pretend you had only consumed 1 byte out of 100 — would you want all 99 of those remaining bytes to be copied in memory and then perform another underlying read? That would be slower than not using a BufRead in the first place!
The trait also doesn't have any provisions for moving the remaining bytes in the buffer to the beginning and then filling the buffer again. This is something that seems like it could be added to the concrete BufReader, so you may wish to provide a pull request to add it.
For now, I'd recommend using Read::read_exact at the end of the buffer:
use std::io::{self, BufRead, Read};
fn main() -> io::Result<()> {
let mut reader = io::BufReader::with_capacity(100, io::repeat(b'a'));
let a = reader.fill_buf()?.len();
reader.consume(98);
let mut leftover = [0u8; 4]; // a single UTF-8 character is at most 4 bytes
// Assume we know we need 3 bytes based on domain knowledge
reader.read_exact(&mut leftover[..3])?;
let b = reader.fill_buf()?.len();
println!("{}, {}", a, b); // 100, 99
Ok(())
}
See also:
What is the maximum number of bytes for a UTF-8 encoded character?

How to check for EOF with `read_line()`?

Given the code below, how can I specifically check for EOF? Or rather, how can I distinguish between "there's nothing here" and "it exploded"?
match io::stdin().read_line() {
Ok(l) => print!("{}", l),
Err(_) => do_something_else(),
}
From the documentation for read_line:
If successful, this function will return the total number of bytes read.
If this function returns Ok(0), the stream has reached EOF.
This means we can check for a successful value of zero:
use std::io::{self, BufRead};
fn main() -> io::Result<()> {
let mut empty: &[u8] = &[];
let mut buffer = String::new();
let bytes = empty.read_line(&mut buffer)?;
if bytes == 0 {
println!("EOF reached");
}
Ok(())
}

Resources