Idiomatic way of mimicking Python's input function in Rust - rust

I have two three different versions of a function that mimics the input function from python.
use std::io::{self, BufRead, BufReader, Write};
// Adapted from https://docs.rs/python-input/0.8.0/src/python_input/lib.rs.html#13-23
fn input_1(prompt: &str) -> io::Result<String> {
print!("{}", prompt);
io::stdout().flush()?;
let mut buffer = String::new();
io::stdin().read_line(&mut buffer)?;
Ok(buffer.trim_end().to_string())
}
// https://www.reddit.com/r/rust/comments/6qn3y0/store_user_inputs_in_rust/
fn input_2(prompt: &str) -> io::Result<String> {
print!("{}", prompt);
io::stdout().flush()?;
BufReader::new(io::stdin())
.lines()
.next()
.ok_or_else(|| io::Error::new(io::ErrorKind::Other, "Cannot read stdin"))
.and_then(|inner| inner)
}
// tranzystorek user on Discord (edited for future reference)
fn input_3(prompt: &str) -> io::Result<String> {
print!("{}", prompt);
std::io::stdout().flush()?;
BufReader::new(std::io::stdin().lock())
.lines()
.take(1)
.collect()
}
fn main() {
let name = input_1("What's your name? ").unwrap();
println!("Hello, {}!", name);
let name = input_2("What's your name? ").unwrap();
println!("Hello, {}!", name);
let name = input_3("What's your name? ").unwrap();
println!("Hello, {}!", name);
}
But they seem to be very different aproaches and I don't know if there's any advantage using one over the other. From what I've read, having a function like python's input is not as simple as it seems which is why there's none in the standard library.
What problems could I face using any of the versions written above? Is there another, more idiomatic, way of writing this input function? (2018 edition)
Also, here: How can I read a single line from stdin? some of the answers use the lock() method but I don't get its purpose.
I'm learning Rust coming from python.

This is a question of style mostly - both methods are acceptable. Most of the Rustaceans I know would probably favour the second approach, as it's more functional in style but it really doesn't matter in this case.
The key change I'd make is use of the lock method in your second example.
To understand the lock method, consider the following scenario: if you application was multithreaded, and two threads attempted to read from stdin at the same time, what would happen?
The lock ensures that only one thread can access stdin at a time. You always access stdin through a lock. In fact, if you look at the implementation of Stdin::read_line - the method you call in the first example you'll see it's this very simple one-liner:
self.lock().read_line(buf)
So even when you aren't explicitly calling lock it's still being used behind the scenes.
Secondly .next() won't return None in this case, as it will block until data has been entered, so you can use .unwrap() safely here rather than .ok_or/.and_then.
Lastly you missed out the .trim_end() that you had in input_1 ;).
fn input_2(prompt: &str) -> io::Result<String> {
print!("{}", prompt);
io::stdout().flush()?;
io::stdin()
.lock()
.lines()
.next()
.unwrap()
.map(|x| x.trim_end().to_owned())
}

Related

How to capture the content of stdout/stderr when I cannot change the code that prints?

I have a function foo that can't be modified and contains println! and eprintln! code in it.
fn foo() {
println!("hello");
}
After I call the function, I have to test what it printed so I want to capture the stdout/stderr into a variable.
I strongly recommend against doing this, but if you are using nightly and don't mind using a feature that seems unlikely to ever be stabilized, you can directly capture stdout and stderr using hidden functionality of the standard library:
#![feature(internal_output_capture)]
use std::sync::Arc;
fn foo() {
println!("hello");
eprintln!("world");
}
fn main() {
std::io::set_output_capture(Some(Default::default()));
foo();
let captured = std::io::set_output_capture(None);
let captured = captured.unwrap();
let captured = Arc::try_unwrap(captured).unwrap();
let captured = captured.into_inner().unwrap();
let captured = String::from_utf8(captured).unwrap();
assert_eq!(captured, "hello\nworld\n");
}
It's very rare that a function "cannot be changed", so I'd encourage you to do so and use dependency injection instead. For example, if you are able to edit foo but do not want to change its signature, move all the code to a new function with generics which you can test directly:
use std::io::{self, Write};
fn foo() {
foo_inner(io::stdout(), io::stderr()).unwrap()
}
fn foo_inner(mut out: impl Write, mut err: impl Write) -> io::Result<()> {
writeln!(out, "hello")?;
writeln!(err, "world")?;
Ok(())
}
See also:
How can I test stdin and stdout?
How to take ownership of T from Arc<Mutex<T>>?
How do I convert a Vector of bytes (u8) to a string?
Not sure if this would work on windows, but should work on unix like systems. You should replace the file descriptor to something you can read later. I don't think it is really easy.
I would suggest to use stdio_override which already does that for you using files. You can redirect it, then execute the function and the read the file content.
From the example:
use stdio_override::StdoutOverride;
use std::fs;
let file_name = "./test.txt";
let guard = StdoutOverride::override_file(file_name)?;
println!("Isan to Stdout!");
let contents = fs::read_to_string(file_name)?;
assert_eq!("Isan to Stdout!\n", contents);
drop(guard);
println!("Outside!");
The library also support anything that implements AsRawFd, through the override_raw call. Confirming that it will probably just work on unix.
Otherwise, you can check on the implementation on how it is done internally, and maybe you could bypass a writer instead of a file somehow.
Shadow println!:
use std::{fs::File, io::Write, mem::MaybeUninit, sync::Mutex};
static mut FILE: MaybeUninit<Mutex<File>> = MaybeUninit::uninit();
macro_rules! println {
($($tt:tt)*) => {{
unsafe { writeln!(&mut FILE.assume_init_mut().lock().unwrap(), $($tt)*).unwrap(); }
}}
}
fn foo() {
println!("hello");
}
fn main() {
unsafe {
FILE.write(Mutex::new(File::create("out").unwrap()));
}
foo();
}

Is there a simpler way to pass a BufReader to a function?

To read the bytes of a PNG file, I want to create a function called read_8_bytes which will read the next 8 bytes in the file each time it's called.
fn main(){
let png = File::open("test.png").expect("1");
let mut png_reader = BufReader::new(png);
let mut byteBuffer: Vec<u8> = vec![0;8];
png_reader.read_exact(&mut byteBuffer).expect("2");
}
This works fine and if I keep calling read_exact from main I can read the next 8 bytes. I tried to create a function to do this and the solution just seems needlessly complicated. I'm wondering if there is a better way.
I thought I have to pass the BufReader to the function, but due to how Rust works this makes things complicated and I end up working out I need to do something like:
fn read_eight_bytes<R: BufRead>(fd: &mut R)
This compiles but I'm not happy because I don't understand why this needed to be done and seems complex. Is there a simple way of having a function I can pass a file descriptor type thing to and have it store the position like in C without having to do this?
Looking at your question, I think you are trying to say that you are confused as to why the <R: BufRead> is necessary or furthermore why this even works.
In your example, this generic is not strictly necessary. One could implement the function you describe like so:
use std::{fs, io};
fn main() -> io::Result<()> {
let mut file = fs::File::open("./path/to/file")?;
let bytes = read_eight_bytes(&mut file)?;
println!("{:?}", bytes);
Ok(())
}
fn read_eight_bytes(file: &mut fs::File) -> io::Result<[u8; 8]> {
use io::Read;
let mut bytes = [0; 8];
file.read_exact(&mut bytes)?;
Ok(bytes)
}
Playground
This is perfectly valid and hopefully should make sense.
But then, why does fn read_eight_bytes<R: BufRead>(file: &mut R) -> [u8; 8] work? First of all, I assume you understand the following concepts:
Generics
Traits
Given an understanding of the above concepts, you should know that this syntax means that the function read_eight_bytes is a generic function with a generic type named R. You should then also understand that the generic has a trait bound, requiring the type R to implement BufRead. And that this function takes a parameter which is a mutable reference to the variable file, which is of the type R.
Now taking a look at the definition of BufRead: we see that it contains several functions. But surprisingly there is no read_exact function! Why does a function like this compile?
use std::{fs, io};
use io::BufRead;
fn main() -> io::Result<()> {
let file = fs::File::open("./path/to/file")?;
let mut reader = io::BufReader::new(file);
let bytes = read_eight_bytes(&mut reader)?;
println!("{:?}", bytes);
Ok(())
}
fn read_eight_bytes<R: BufRead>(reader: &mut R) -> io::Result<[u8; 8]> {
let mut bytes = [0; 8];
reader.read_exact(&mut bytes)?;
Ok(bytes)
}
Playground
Note: I have altered the return type to io::Result<...>. This is considered to be a better practice compared to unwraping every Result.
I have also changed the function call to use a BufReader because BufReader implements BufRead whilst File does not. I will cover the difference a little further below.
The reason this works is because BufRead is a Super Trait. This means that any type that implements BufRead must also implement Read too. And thus it must have the read_exact function!
Given our function never requires the functions on BufRead we could change the trait bound to only require Read:
use std::{fs, io};
use io::Read;
fn main() -> io::Result<()> {
let file = fs::File::open("./path/to/file")?;
let mut reader = io::BufReader::new(file);
let bytes = read_eight_bytes(&mut reader)?;
println!("{:?}", bytes);
Ok(())
}
fn read_eight_bytes<R: Read>(reader: &mut R) -> io::Result<[u8; 8]> {
let mut bytes = [0; 8];
reader.read_exact(&mut bytes)?;
Ok(bytes)
}
Playground
Now here is something interesting about this change. The read_eight_bytes function can now be called in (at least) two different ways:
use std::{fs, io};
use io::Read;
fn main() -> io::Result<()> {
let mut file = fs::File::open("./path/to/file")?;
let bytes = read_eight_bytes(&mut file)?;
println!("{:?}", bytes);
let file = fs::File::open("./path/to/file")?;
let mut reader = io::BufReader::new(file);
let bytes = read_eight_bytes(&mut reader)?;
println!("{:?}", bytes);
Ok(())
}
fn read_eight_bytes<R: Read>(reader: &mut R) -> io::Result<[u8; 8]> {
let mut bytes = [0; 8];
reader.read_exact(&mut bytes)?;
Ok(bytes)
}
Playground
Why is this? This is because both File and BufReader implement the Read trait. And thus can both be used with the read_eight_bytes function!
So then why would someone want to use either File or BufReader over the other?
Well the BufReader documentation explains this:
The BufReader struct adds buffering to any reader.
It can be excessively inefficient to work directly with a Read
instance. For example, every call to read on TcpStream results in a
system call. A BufReader performs large, infrequent reads on the
underlying Read and maintains an in-memory buffer of the results.
BufReader can improve the speed of programs that make small and
repeated read calls to the same file or network socket. It does not
help when reading very large amounts at once, or reading just one or a
few times. It also provides no advantage when reading from a source
that is already in memory, like a Vec.
Now, remember how before we wrote this function just for the File type? The primary reason why one would want to write it with generics would be such that a caller can make the choice presented above. This is common practice in libraries where such a choice really does matter. However, generics come at the cost of increased compile times (when used excessively) and increased code complexity.

Rust chunks method with owned values?

I'm trying to perform a parallel operation on several chunks of strings at a time, and I'm finding having an issue with the borrow checker:
(for context, identifiers is a Vec<String> from a CSV file, client is reqwest and target is an Arc<String> that is write once read many)
use futures::{stream, StreamExt};
use std::sync::Arc;
async fn nop(
person_ids: &[String],
target: &str,
url: &str,
) -> String {
let noop = format!("{} {}", target, url);
let noop2 = person_ids.iter().for_each(|f| {f.as_str();});
"Some text".into()
}
#[tokio::main]
async fn main() {
let target = Arc::new(String::from("sometext"));
let url = "http://example.com";
let identifiers = vec!["foo".into(), "bar".into(), "baz".into(), "qux".into(), "quux".into(), "quuz".into(), "corge".into(), "grault".into(), "garply".into(), "waldo".into(), "fred".into(), "plugh".into(), "xyzzy".into()];
let id_sets: Vec<&[String]> = identifiers.chunks(2).collect();
let responses = stream::iter(id_sets)
.map(|person_ids| {
let target = target.clone();
tokio::spawn( async move {
let resptext = nop(person_ids, target.as_str(), url).await;
})
})
.buffer_unordered(2);
responses
.for_each(|b| async { })
.await;
}
Playground: https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=e41c635e99e422fec8fc8a581c28c35e
Given chunks yields a Vec<&[String]>, the compiler complains that identifiers doesn't live long enough because it potentially goes out of scope while the slices are being referenced. Realistically this won't happen because there's an await. Is there a way to tell the compiler that this is safe, or is there another way of getting chunks as a set of owned Strings for each thread?
There was a similarly asked question that used into_owned() as a solution, but when I try that, rustc complains about the slice size not being known at compile time in the request_user function.
EDIT: Some other questions as well:
Is there a more direct way of using target in each thread without needing Arc? From the moment it is created, it never needs to be modified, just read from. If not, is there a way of pulling it out of the Arc that doesn't require the .as_str() method?
How do you handle multiple error types within the tokio::spawn() block? In the real world use, I'm going to receive quick_xml::Error and reqwest::Error within it. It works fine without tokio spawn for concurrency.
Is there a way to tell the compiler that this is safe, or is there another way of getting chunks as a set of owned Strings for each thread?
You can chunk a Vec<T> into a Vec<Vec<T>> without cloning by using the itertools crate:
use itertools::Itertools;
fn main() {
let items = vec![
String::from("foo"),
String::from("bar"),
String::from("baz"),
];
let chunked_items: Vec<Vec<String>> = items
.into_iter()
.chunks(2)
.into_iter()
.map(|chunk| chunk.collect())
.collect();
for chunk in chunked_items {
println!("{:?}", chunk);
}
}
["foo", "bar"]
["baz"]
This is based on the answers here.
Your issue here is that the identifiers are a Vector of references to a slice. They will not necessarily be around once you've left the scope of your function (which is what async move inside there will do).
Your solution to the immediate problem is to convert the Vec<&[String]> to a Vec<Vec<String>> type.
A way of accomplishing that would be:
let id_sets: Vec<Vec<String>> = identifiers
.chunks(2)
.map(|x: &[String]| x.to_vec())
.collect();

Read file character-by-character in Rust

Is there an idiomatic way to process a file one character at a time in Rust?
This seems to be roughly what I'm after:
let mut f = io::BufReader::new(try!(fs::File::open("input.txt")));
for c in f.chars() {
println!("Character: {}", c.unwrap());
}
But Read::chars is still unstable as of Rust v1.6.0.
I considered using Read::read_to_string, but the file may be large and I don't want to read it all into memory.
Let's compare 4 approaches.
1. Read::chars
You could copy Read::chars implementation, but it is marked unstable with
the semantics of a partial read/write of where errors happen is currently unclear and may change
so some care must be taken. Anyway, this seems to be the best approach.
2. flat_map
The flat_map alternative does not compile:
use std::io::{BufRead, BufReader};
use std::fs::File;
pub fn main() {
let mut f = BufReader::new(File::open("input.txt").expect("open failed"));
for c in f.lines().flat_map(|l| l.expect("lines failed").chars()) {
println!("Character: {}", c);
}
}
The problems is that chars borrows from the string, but l.expect("lines failed") lives only inside the closure, so compiler gives the error borrowed value does not live long enough.
3. Nested for
This code
use std::io::{BufRead, BufReader};
use std::fs::File;
pub fn main() {
let mut f = BufReader::new(File::open("input.txt").expect("open failed"));
for line in f.lines() {
for c in line.expect("lines failed").chars() {
println!("Character: {}", c);
}
}
}
works, but it keeps allocation a string for each line. Besides, if there is no line break on the input file, the whole file would be load to the memory.
4. BufRead::read_until
A memory efficient alternative to approach 3 is to use Read::read_until, and use a single string to read each line:
use std::io::{BufRead, BufReader};
use std::fs::File;
pub fn main() {
let mut f = BufReader::new(File::open("input.txt").expect("open failed"));
let mut buf = Vec::<u8>::new();
while f.read_until(b'\n', &mut buf).expect("read_until failed") != 0 {
// this moves the ownership of the read data to s
// there is no allocation
let s = String::from_utf8(buf).expect("from_utf8 failed");
for c in s.chars() {
println!("Character: {}", c);
}
// this returns the ownership of the read data to buf
// there is no allocation
buf = s.into_bytes();
buf.clear();
}
}
I cannot use lines() because my file could be a single line that is gigabytes in size. This an improvement on #malbarbo's recommendation of copying Read::chars from the an old version of Rust. The utf8-chars crate already adds .chars() to BufRead for you.
Inspecting their repository, it doesn't look like they load more than 4 bytes at a time.
Your code will look the same as it did before Rust removed Read::chars:
use std::io::stdin;
use utf8_chars::BufReadCharsExt;
fn main() {
for c in stdin().lock().chars().map(|x| x.unwrap()) {
println!("{}", c);
}
}
Add the following to your Cargo.toml:
[dependencies]
utf8-chars = "1.0.0"
There are two solutions that make sense here.
First, you could copy the implementation of Read::chars() and use it; that would make it completely trivial to move your code over to the standard library implementation if/when it stabilizes.
On the other hand, you could simply iterate line by line (using f.lines()) and then use line.chars() on each line to get the chars. This is a little more hacky, but it will definitely work.
If you only wanted one loop, you could use flat_map() with a lambda like |line| line.chars().

How to create chaining API after read_to_string was changed to take a buffer?

I'm trying to port my library clog to the latest Rust version.
Rust changed a lot in the previous month and so I'm scratching my head over this code asking myself if there's really no way anymore to write this in a completely chained way?
fn get_last_commit () -> String {
let output = Command::new("git")
.arg("rev-parse")
.arg("HEAD")
.output()
.ok().expect("error invoking git rev-parse");
let encoded = String::from_utf8(output.stdout).ok().expect("error parsing output of git rev-parse");
encoded
}
In an older version of Rust the code could be written like that
pub fn get_last_commit () -> String {
Command::new("git")
.arg("rev-parse")
.arg("HEAD")
.spawn()
.ok().expect("failed to invoke rev-parse")
.stdout.as_mut().unwrap().read_to_string()
.ok().expect("failed to get last commit")
}
It seems there is no read_to_string() method anymore that doesn't take a buffer which makes it hard to implement a chaining API unless I'm missing something.
UPDATE
Ok, I figured I can use map to get it chaining.
fn get_last_commit () -> String {
Command::new("git")
.arg("rev-parse")
.arg("HEAD")
.output()
.map(|output| {
String::from_utf8(output.stdout).ok().expect("error reading into string")
})
.ok().expect("error invoking git rev-parse")
}
Actually I wonder if I could use and then but it seems the errors don't line up correctly ;)
As others have said, this was changed to allow reusing buffers/avoiding allocations.
Another alternative is to use read_to_string and manually provide the buffer:
pub fn get_last_commit () -> String {
let mut string = String::new();
Command::new("git")
.arg("rev-parse")
.arg("HEAD")
.spawn()
.ok().expect("failed to invoke rev-parse")
.stdout.as_mut().unwrap()
.read_to_string(&mut string)
.ok().expect("failed to get last commit");
string
}
This API was changed so that you didn't have to re-allocate a new String each time. However, as you've noticed, there's some convenience loss if you don't care about allocation. It might be a good idea to suggest re-adding this back in, like what happened with Vec::from_elem. Maybe open a small RFC?
While it may make sense to try to add this back to the standard library, here's a version of read_to_string that allocates on its own that you can use today:
#![feature(io)]
use std::io::{self,Read,Cursor};
trait MyRead: Read {
fn read_full_string(&mut self) -> io::Result<String> {
let mut s = String::new();
let r = self.read_to_string(&mut s);
r.map(|_| s)
}
}
impl<T> MyRead for T where T: Read {}
fn main() {
let bytes = b"hello";
let mut input = Cursor::new(bytes);
let s = input.read_full_string();
println!("{}", s.unwrap());
}
This should allow you to use the chaining style you had before.

Resources