Reading Bytes From a Reader - rust

I'm writing something to process stdin in blocks of bytes, but can't seem to work out a simple way to do it (though I suspect there is one).
fn run() -> int {
// Doesn't compile: types differ
let mut buffer = [0, ..100];
loop {
let block = match stdio::stdin().read(buffer) {
Ok(bytes_read) => buffer.slice_to(bytes_read),
// This captures the Err from the end of the file,
// but also actual errors while reading from stdin.
Err(message) => return 0
};
process(block).unwrap();
}
}
fn process(block: &[u8]) -> Result<(), IoError> {
// do things
}
My questions:
What's the "standard" way to do this? (I've been trying/hoping to use and_then()/or_else())
How can I differentiate between the Err(IoError) from end of the file, and the Err that's actually an error?

The previously accepted answer was outdated (Rust v1.0). EOF is no longer considered an error. You can do it like this:
use std::io::{self, Read};
fn main() {
let mut buffer = [0; 100];
while let Ok(bytes_read) = io::stdin().read(&mut buffer) {
if bytes_read == 0 { break; }
process(&buffer[..bytes_read]).unwrap();
}
}
fn process(block: &[u8]) -> Result<(), io::Error> {
Ok(()) // do things
}
Note that this may not result in the expected behavior: read doesn't have to fill the buffer, but may return with any number of bytes read. In the case of stdin the read implementation returns every time a newline is detected (pressing enter in terminal).

Rust API documentation states that:
Note that end-of-file is considered an error, and can be inspected for
in the error's kind field.
The IoError struct looks like this:
pub struct IoError {
pub kind: IoErrorKind,
pub desc: &'static str,
pub detail: Option<String>,
}
The list is all kinds is at http://doc.rust-lang.org/std/io/enum.IoErrorKind.html
You can match it like this:
match stdio::stdin().read(buffer) {
Ok(_) => println!("ok"),
Err(io::IoError{kind:io::EndOfFile, ..}) => println!("end of file"),
_ => println!("error")
}

Related

How to achieve conditional nested formatting without multiple allocations?

I have a format string consisting of multiple conditional components and I'm looking for a solution that doesn't need multiple allocations of Strings for the intermediate steps. If I create each single component of the final format string with the format!-macro then it works but I need an allocation for each component.
I tried experimenting with using only macros to generate the complex format string and its arguments. However, this always resulted in "temporary value is freed at the end of this statement" errors. I tried to use one single buffer of type impl core::fmt::Write but I couldn't make success with this either.
On a high level, I want something like this:
fn main() {
let prefix_include_a = true;
let prefix_include_b = true;
// prefix itself is a formatted string and it is further formatted here
println!("{prefix:<10}{message:>10}!",
prefix = format_prefix(prefix_include_a, prefix_include_b),
message = "message"
);
}
// formats the prefix component of the final string.
// needs multiple String allocations as `format!` is used
fn format_prefix(inc_a: bool, inc_b: bool) -> String {
format!("[{a:<5}{b:<5}]",
a = if inc_a {
format!("{:.1}", 1.234)
} else {
format!("")
},
b = if inc_b {
format!("{:.2}", 1.234)
} else {
format!("")
},
)
}
Is this possible with no or only one single allocation?
The simplest solution is to just write! directly to the underlying stream e.g.
use std::io::{stdout, Write};
fn main() {
let prefix_include_a = true;
let prefix_include_b = true;
let mut stdout = stdout();
let _ = format_prefix(&mut stdout, prefix_include_a, prefix_include_b);
let _ = write!(stdout, "{:>10}", "message");
}
// formats the prefix component of the final string.
// needs multiple String allocations as `format!` is used
fn format_prefix(mut s: impl Write, inc_a: bool, inc_b: bool) -> std::io::Result<()> {
write!(s, "[")?;
if inc_a {
write!(s, "{:<5.1}", 1.234)?;
} else {
write!(s, " ")?;
}
if inc_b {
write!(s, "{:<5.2}", 1.234)?;
} else {
write!(s, " ")?;
}
write!(s, "]")?;
Ok(())
}
An alternative is to reify prefix into a type, and implement Display for it. I would think (hope?) the formatter is a passthrough to the underlying stream, though I've never actually looked:
use std::io::{stdout, Write};
fn main() {
let prefix_include_a = true;
let prefix_include_b = true;
println!(
"{prefix:<10}{message:>10}!",
prefix = Prefix(prefix_include_a, prefix_include_b),
message = "message"
);
}
struct Prefix(bool, bool);
impl std::fmt::Display for Prefix {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
write!(f, "[")?;
if self.0 {
write!(f, "{:<5.1}", 1.234)?;
} else {
write!(f, " ")?;
}
if self.1 {
write!(f, "{:<5.2}", 1.234)?;
} else {
write!(f, " ")?;
}
write!(f, "]")?;
Ok(())
}
}
Note: I've not handled the padding of the prefix in either version, though I don't think it makes much sense: both prefix values are padded to 5, so the prefix is always at least 12 wide. Padding to 10 makes no sense.
The prefix object could, however, use the externally specified padding to distribute to internal paddings, if that's desirable. See std::fmt::Formatter for information you can obtain about formatting specifiers.
To clean up the conditionals, you could probably use format_args!, though I've pretty little experience with that.

Fastest idiomatic I/O routine in Rust for programming contests?

My question has been partially answered, so I've revised it in response to things I've learned from comments and additional experiments.
In summary, I want a fast I/O routine for programming contests, in which problems are solved with a single file and no external crates. It should read in a sequence of whitespace-separated tokens from a BufRead (either stdin or a file). The tokens may be integers, floats or ASCII words, separated by spaces and newlines, so it seems I should support FromStr types generically. A small minority of problems are interactive, meaning not all of the input is available initially, but it always comes in complete lines.
For context, here's the discussion that led me to post here. Someone wrote very fast custom code to parse integers directly from the &[u8] output of BufRead::fill_buf(), but it's not generic in FromStr.
Here is my best solution so far (emphasis on the Scanner struct):
use std::io::{self, prelude::*};
fn solve<B: BufRead, W: Write>(mut scan: Scanner<B>, mut w: W) {
let n = scan.token();
let mut a = Vec::with_capacity(n);
let mut b = Vec::with_capacity(n);
for _ in 0..n {
a.push(scan.token::<i64>());
b.push(scan.token::<i64>());
}
let mut order: Vec<_> = (0..n).collect();
order.sort_by_key(|&i| b[i] - a[i]);
let ans: i64 = order
.into_iter()
.enumerate()
.map(|(i, x)| a[x] * i as i64 + b[x] * (n - 1 - i) as i64)
.sum();
writeln!(w, "{}", ans);
}
fn main() {
let stdin = io::stdin();
let stdout = io::stdout();
let reader = Scanner::new(stdin.lock());
let writer = io::BufWriter::new(stdout.lock());
solve(reader, writer);
}
pub struct Scanner<B> {
reader: B,
buf_str: String,
buf_iter: std::str::SplitWhitespace<'static>,
}
impl<B: BufRead> Scanner<B> {
pub fn new(reader: B) -> Self {
Self {
reader,
buf_str: String::new(),
buf_iter: "".split_whitespace(),
}
}
pub fn token<T: std::str::FromStr>(&mut self) -> T {
loop {
if let Some(token) = self.buf_iter.next() {
return token.parse().ok().expect("Failed parse");
}
self.buf_str.clear();
self.reader
.read_line(&mut self.buf_str)
.expect("Failed read");
self.buf_iter = unsafe { std::mem::transmute(self.buf_str.split_whitespace()) };
}
}
}
By avoiding unnecessary allocations, this Scanner is quite fast. If we didn't care about unsafety, it can be made even faster by, instead of doing read_line() into a String, doing read_until(b'\n') into a Vec<u8>, followed by str::from_utf8_unchecked().
However, I'd also like to know what's the fastest safe solution. Is there a clever way to tell Rust that what my Scanner implementation does is actually safe, eliminating the mem::transmute? Intuitively, it seems we should think of the SplitWhitespace object as owning the buffer until it's effectively dropped after it returns None.
All else being equal, I'd like a "nice" idiomatic standard library solution, as I'm trying to present Rust to others who do programming contests.
I'm so glad you asked, as I solved this exact problem in my LibCodeJam rust implementation. Specifically, reading raw tokens from a BufRead is handled by the TokensReader type as well as some tiny related helpers.
Here's the relevant excerpt. The basic idea here is to scan the BufRead::fill_buf buffer for whitespace, and copying non-whitespace characters into a local buffer, which is reused between token calls. Once a whitespace character is found, or the stream ends, the local buffer is interpreted as UTF-8 and returned as an &str.
#[derive(Debug)]
pub enum LoadError {
Io(io::Error),
Utf8Error(Utf8Error),
OutOfTokens,
}
/// TokenBuffer is a resuable buffer into which tokens are
/// read into, one-by-one. It is cleared but not deallocated
/// between each token.
#[derive(Debug)]
struct TokenBuffer(Vec<u8>);
impl TokenBuffer {
/// Clear the buffer and start reading a new token
fn lock(&mut self) -> TokenBufferLock {
self.0.clear();
TokenBufferLock(&mut self.0)
}
}
/// TokenBufferLock is a helper type that helps manage the lifecycle
/// of reading a new token, then interpreting it as UTF-8.
#[derive(Debug, Default)]
struct TokenBufferLock<'a>(&'a mut Vec<u8>);
impl<'a> TokenBufferLock<'a> {
/// Add some bytes to a token
fn extend(&mut self, chunk: &[u8]) {
self.0.extend(chunk)
}
/// Complete the token and attempt to interpret it as UTF-8
fn complete(self) -> Result<&'a str, LoadError> {
from_utf8(self.0).map_err(LoadError::Utf8Error)
}
}
pub struct TokensReader<R: io::BufRead> {
reader: R,
token: TokenBuffer,
}
impl<R: io::BufRead> Tokens for TokensReader<R> {
fn next_raw(&mut self) -> Result<&str, LoadError> {
use std::io::ErrorKind::Interrupted;
// Clear leading whitespace
loop {
match self.reader.fill_buf() {
Err(ref err) if err.kind() == Interrupted => continue,
Err(err) => return Err(LoadError::Io(err)),
Ok([]) => return Err(LoadError::OutOfTokens),
// Got some content; scan for the next non-whitespace character
Ok(buf) => match buf.iter().position(|byte| !byte.is_ascii_whitespace()) {
Some(i) => {
self.reader.consume(i);
break;
}
None => self.reader.consume(buf.len()),
},
};
}
// If we reach this point, there is definitely a non-empty token ready to be read.
let mut token_buf = self.token.lock();
loop {
match self.reader.fill_buf() {
Err(ref err) if err.kind() == Interrupted => continue,
Err(err) => return Err(LoadError::Io(err)),
Ok([]) => return token_buf.complete(),
// Got some content; scan for the next whitespace character
Ok(buf) => match buf.iter().position(u8::is_ascii_whitespace) {
Some(i) => {
token_buf.extend(&buf[..i]);
self.reader.consume(i + 1);
return token_buf.complete();
}
None => {
token_buf.extend(buf);
self.reader.consume(buf.len());
}
},
}
}
}
}
This implementation doesn't handle parsing strings into FromStr types— that's handled separately— but it does handle correctly accumulating bytes, separating them into whitespace-separated tokens, and interpreting those tokens as UTF-8. It does assume that only ASCII whitespace will be used to separate Tokens.
It's worth noting that FromStr cannot be used directly on the fill_buf buffer, because there's no guarantee that a token doesn't straddle the boundary between two fill_buf calls, and there's no way to force a BufRead to read more bytes until the existing buffer is fully consumed. I'm assuming it's pretty obvious that once you have an Ok(&str), you can perform FromStr on it at your leisure.
This implementation is not 0-copy, but is is (amortized) 0-allocation, and it minimizes unnecessary copying or buffering. It uses a single persistent buffer that is only resized if it's too small for a single token, and it reuses this buffer between tokens. Bytes are copied into this buffer directly from the input BufRead buffer, without extra intermediary copying.

How to add special NotReady logic to tokio-io?

I'm trying to make a Stream that would wait until a specific character is in buffer. I know there's read_until() on BufRead but I actually need a custom solution, as this is a stepping stone to implement waiting until a specific string in in buffer (or, for example, a regexp match happens).
In my project where I first encountered the problem, problem was that future processing just hanged when I get a Ready(_) from inner future and return NotReady from my function. I discovered I shouldn't do that per docs (last paragraph). However, what I didn't get, is what's the actual alternative that is promised in that paragraph. I read all the published documentation on the Tokio site and it doesn't make sense for me at the moment.
So following is my current code. Unfortunately I couldn't make it simpler and smaller as it's already broken. Current result is this:
Err(Custom { kind: Other, error: Error(Shutdown) })
Err(Custom { kind: Other, error: Error(Shutdown) })
Err(Custom { kind: Other, error: Error(Shutdown) })
<ad infinum>
Expected result is getting some Ok(Ready(_)) out of it, while printing W and W', and waiting for specific character in buffer.
extern crate futures;
extern crate tokio_core;
extern crate tokio_io;
extern crate tokio_io_timeout;
extern crate tokio_process;
use futures::stream::poll_fn;
use futures::{Async, Poll, Stream};
use tokio_core::reactor::Core;
use tokio_io::AsyncRead;
use tokio_io_timeout::TimeoutReader;
use tokio_process::CommandExt;
use std::process::{Command, Stdio};
use std::sync::{Arc, Mutex};
use std::thread;
use std::time::Duration;
struct Process {
child: tokio_process::Child,
stdout: Arc<Mutex<tokio_io_timeout::TimeoutReader<tokio_process::ChildStdout>>>,
}
impl Process {
fn new(
command: &str,
reader_timeout: Option<Duration>,
core: &tokio_core::reactor::Core,
) -> Self {
let mut cmd = Command::new(command);
let cat = cmd.stdout(Stdio::piped());
let mut child = cat.spawn_async(&core.handle()).unwrap();
let stdout = child.stdout().take().unwrap();
let mut timeout_reader = TimeoutReader::new(stdout);
timeout_reader.set_timeout(reader_timeout);
let timeout_reader = Arc::new(Mutex::new(timeout_reader));
Self {
child,
stdout: timeout_reader,
}
}
}
fn work() -> Result<(), ()> {
let window = Arc::new(Mutex::new(Vec::new()));
let mut core = Core::new().unwrap();
let process = Process::new("cat", Some(Duration::from_secs(20)), &core);
let mark = Arc::new(Mutex::new(b'c'));
let read_until_stream = poll_fn({
let window = window.clone();
let timeout_reader = process.stdout.clone();
move || -> Poll<Option<u8>, std::io::Error> {
let mut buf = [0; 8];
let poll;
{
let mut timeout_reader = timeout_reader.lock().unwrap();
poll = timeout_reader.poll_read(&mut buf);
}
match poll {
Ok(Async::Ready(0)) => Ok(Async::Ready(None)),
Ok(Async::Ready(x)) => {
{
let mut window = window.lock().unwrap();
println!("W: {:?}", *window);
println!("buf: {:?}", &buf[0..x]);
window.extend(buf[0..x].into_iter().map(|x| *x));
println!("W': {:?}", *window);
if let Some(_) = window.iter().find(|c| **c == *mark.lock().unwrap()) {
Ok(Async::Ready(Some(1)))
} else {
Ok(Async::NotReady)
}
}
}
Ok(Async::NotReady) => Ok(Async::NotReady),
Err(e) => Err(e),
}
}
});
let _stream_thread = thread::spawn(move || {
for o in read_until_stream.wait() {
println!("{:?}", o);
}
});
match core.run(process.child) {
Ok(_) => {}
Err(e) => {
println!("Child error: {:?}", e);
}
}
Ok(())
}
fn main() {
work().unwrap();
}
This is complete example project.
If you need more data you need to call poll_read again until you either find what you were looking for or poll_read returns NotReady.
You might want to avoid looping in one task for too long, so you can build yourself a yield_task function to call instead if poll_read didn't return NotReady; it makes sure your task gets called again ASAP after other pending tasks were run.
To use it just run return yield_task();.
fn yield_inner() {
use futures::task;
task::current().notify();
}
#[inline(always)]
pub fn yield_task<T, E>() -> Poll<T, E> {
yield_inner();
Ok(Async::NotReady)
}
Also see futures-rs#354: Handle long-running, always-ready futures fairly #354.
With the new async/await API futures::task::current is gone; instead you'll need a std::task::Context reference, which is provided as parameter to the new std::future::Future::poll trait method.
If you're already manually implementing the std::future::Future trait you can simply insert:
context.waker().wake_by_ref();
return std::task::Poll::Pending;
Or build yourself a Future-implementing type that yields exactly once:
pub struct Yield {
ready: bool,
}
impl core::future::Future for Yield {
type Output = ();
fn poll(self: core::pin::Pin<&mut Self>, cx: &mut core::task::Context<'_>) -> core::task::Poll<Self::Output> {
let this = self.get_mut();
if this.ready {
core::task::Poll::Ready(())
} else {
cx.waker().wake_by_ref();
this.ready = true; // ready next round
core::task::Poll::Pending
}
}
}
pub fn yield_task() -> Yield {
Yield { ready: false }
}
And then use it in async code like this:
yield_task().await;

Is there a compact and idiomatic way to print an error and return without returning the error?

I'm writing a function that will be called in an infinite loop and only execute something when getting well-formed data from a web-service. If the service is down, returns non-json, or returns json we do not understand, the function should just log the error and return (to be called again after a pause).
I found myself copying and pasting something like this:
let v = match v {
Ok(data) => data,
Err(error) => {
println!("Error decoding json: {:?}", error);
return;
}
};
The body of the error matcher would be different each time. Sometimes it's panic, sometimes it has different messages, and sometimes elements of error could be broken down further to form a better message, but the rest of the construct would be the same.
Is there a shorthand for this? I'm aware of the ? syntax, but that's for propagation. I don't feel that propagation will help with the scenario when you need slightly different processing in case of the error like in the scenario described above. This is because the particular differences in handling belong right here, not up the stack.
I have not written a lot of code in Rust yet so it is very likely that I'm missing something obvious.
In C#, the above would look something like this:
if (v == null)
{
Console.WriteLine("Error decoding json!");
return;
}
or
if (error != null)
{
Console.WriteLine($"Error decoding json: {error}");
return;
}
both of which is much less verbose than in Rust.
If I understood the comments below, one way of shortening would be something like this:
if let Err(error) = v {
println!("Error decoding json: {:?}", error);
return;
}
let v = v.unwrap();
This looks more compact, thank you. Is this idiomatic? Would you write it this way?
I don't feel that propagation will help with the scenario when you need slightly different processing in case of the error like in the scenario described above. This is because the particular differences in handling belong right here, not up the stack.
This is something a custom error type can help with. In this case you have a common behavior ("log an error") and you want to do that in slightly different ways for different values. It makes sense to move the "log an error" part up to the caller (let's call the function try_poll):
loop {
if let Err(e) = try_poll() {
println!("{}", e);
}
sleep(100);
}
And create a type that implements Display, and From<E> for each error type E:
enum PollError {
NetworkError(NetworkError),
JsonParseError(JsonParseError),
}
impl fmt::Display for PollError {
fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
match *self {
PollError::NetworkError(ref e) => write!(f, "Error downloading file: {:?}", e),
PollError::JsonParseError(ref e) => write!(f, "Error parsing JSON: {:?}", e),
}
}
}
impl From<NetworkError> for PollError {
fn from(e: NetworkError) -> Self {
PollError::NetworkError(e)
}
}
impl From<JsonParseError> for PollError {
fn from(e: JsonParseError) -> Self {
PollError::JsonParseError(e)
}
}
Now you can use ? to propagate the error, but the caller still doesn't have to be concerned with which error specifically it is.
fn try_poll() -> Result<(), PollError> {
let data = try_fetch_content()?;
let json = try_parse_json(data)?;
println!("Parsed {:?}", json);
Ok(())
}
(playground)
Ok, I want that, but without all the From implementations.
The tedious part about this is all the impl Froms, which are necessary because of the custom error type. If the only thing that will ever be done with an error is log and ignore it, a custom error type is not particularly useful -- the only thing that really needs to be returned is the error message itself.
In that case, have try_poll instead return Result<(), String>, and use Result::map_err to turn each individual error immediately into an error message, before using ? to propagate it:
fn try_poll() -> Result<(), String> {
let data = try_fetch_content()
.map_err(|e| format!("Error downloading file: {:?}", e))?;
let json = try_parse_json(data)
.map_err(|e| format!("Error parsing JSON: {:?}", e))?;
println!("Parsed {:?}", json);
Ok(())
}
(playground)
The first edition of The Rust Programming Language has this to say about String as an error type:
A rule of thumb is to define your own error type, but a String error type will do in a pinch, particularly if you're writing an application. If you're writing a library, defining your own error type should be strongly preferred so that you don't remove choices from the caller unnecessarily.
As an alternative to a custom macro_rule you could also use ? with Option<T> and a trait extension for Result to print errors and convert successful values.
Playground
pub trait ResultOkPrintErrExt<T> {
fn ok_or_print_err(self, msg: &str) -> Option<T>;
}
impl<T, E> ResultOkPrintErrExt<T> for Result<T, E>
where
E: ::std::fmt::Debug,
{
fn ok_or_print_err(self, msg: &str) -> Option<T> {
match self {
Ok(v) => Some(v),
Err(e) => {
eprintln!("{}: {:?}", msg, e);
None
}
}
}
}
fn read_input() -> Result<u32, ()> {
// Ok(5)
Err(())
}
fn run() -> Option<()> {
let v: u32 = read_input().ok_or_print_err("invalid input")?;
println!("got input: {}", v);
Some(())
}
fn main() {
run();
}

Trouble with buffer types in mio

I'd like to write an asynchronous server in Rust using mio and I have trouble with the buffer types. I've tried different buffer types and can't get it to work. My current code is:
extern crate mio;
extern crate bytes;
use std::io;
use std::io::{Error, ErrorKind};
use std::net::SocketAddr;
use std::str::FromStr;
use std::io::Cursor;
use self::mio::PollOpt;
use self::mio::EventLoop;
use self::mio::EventSet;
use self::mio::Token;
use self::mio::Handler;
use self::mio::io::TryRead;
use self::mio::io::TryWrite;
//use self::mio::buf::ByteBuf;
//use self::mio::buf::Buf;
use self::mio::tcp::*;
use self::bytes::buf::Buf;
use self::bytes::buf::byte::ByteBuf;
struct EventHandler;
impl Handler for EventHandler {
type Timeout = ();
type Message = ();
fn ready(&mut self, event_loop: &mut EventLoop<EventHandler>, token: Token, events: EventSet) {
}
}
pub struct Connection {
sock: TcpStream,
send_queue: Vec<ByteBuf>,
}
impl Connection {
pub fn writable(&mut self, event_loop: &mut EventLoop<EventHandler>) -> Result<(), String> {
while !self.send_queue.is_empty() {
if !self.send_queue.first().unwrap().has_remaining() {
self.send_queue.pop();
}
let buf = self.send_queue.first_mut().unwrap();
match self.sock.try_write_buf(&mut buf) {
Ok(None) => {
return Ok(());
}
Ok(Some(n)) => {
continue;
}
Err(e) => {
return Err(format!("{}", e));
}
}
}
Ok(())
}
}
fn main() {
println!("Hello, world!");
}
The Cargo.toml contains the following dependencies:
mio = "*"
bytes = "*"
which currently translates to bytes 0.2.11 and mio 0.4.3 in Cargo.lock.
The error I am getting is this:
main.rs:45:29: 45:52 error: the trait `bytes::buf::Buf` is not implemented
for the type `&mut bytes::buf::byte::ByteBuf` [E0277]
main.rs:45 match self.sock.try_write_buf(&mut buf) {
I'd want to be able to write a Vec<u8> into the socket and handle the case when the buffer is only partially written. How can I accomplish that?
I don't need explanation about the code that properly handles the return values, this question is about the buffer type. I have no idea which buffer type I have to use.
The problem is this:
let buf = self.send_queue.first_mut().unwrap();
match self.sock.try_write_buf(&mut buf) {
You pass in an &mut &mut ByteBuf to try_write_buf because buf is already an &mut ByteBuf. Just drop the extra &mut:
let buf = self.send_queue.first_mut().unwrap();
match self.sock.try_write_buf(buf) {

Resources