How to store an iterator over stdin in a structure? - rust

I created a structure where an iterator over a file or stdin should be stored, but compiler yelling at me :)
I decided that Lines is the struct I need to store in my struct to iterate using it later and Box will allow to store variable with unknown size, so I define my structure like that:
pub struct A {
pub input: Box<Lines<BufRead>>,
}
I want to do something like this later:
let mut a = A {
input: /* don't know what should be here yet */,
};
if something {
a.input = Box::new(io::stdin().lock().lines());
} else {
a.input = Box::new(BufReader::new(file).lines());
}
And finally
for line in a.input {
// ...
}
But I got an error from the compiler
error[E0277]: the size for values of type `(dyn std::io::BufRead + 'static)` cannot be known at compilation time
--> src/context.rs:11:5
|
11 | pub input: Box<Lines<BufRead>>,
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ doesn't have a size known at compile-time
|
= help: the trait `std::marker::Sized` is not implemented for `(dyn std::io::BufRead + 'static)`
= note: to learn more, visit <https://doc.rust-lang.org/book/second-edition/ch19-04-advanced-types.html#dynamically-sized-types-and-sized>
= note: required by `std::io::Lines`
How can I achieve my goal?

The most generic answer to your question is that you don't / can't. Locking stdin returns a type that references the Stdin value. You cannot create a local value (stdin()), take a reference to it (.lock()), and then return that reference.
If you just want to do this inside of a function without returning it, then you can create a trait object:
use std::io::{self, prelude::*, BufReader};
fn example(file: Option<std::fs::File>) {
let stdin;
let mut stdin_lines;
let mut file_lines;
let input: &mut Iterator<Item = _> = match file {
None => {
stdin = io::stdin();
stdin_lines = stdin.lock().lines();
&mut stdin_lines
}
Some(file) => {
file_lines = BufReader::new(file).lines();
&mut file_lines
}
};
for line in input {
// ...
}
}
Or create a new generic function that you can pass either type of concrete iterator to:
use std::io::{self, prelude::*, BufReader};
fn example(file: Option<std::fs::File>) {
match file {
None => finally(io::stdin().lock().lines()),
Some(file) => finally(BufReader::new(file).lines()),
}
}
fn finally(input: impl Iterator<Item = io::Result<String>>) {
for line in input {
// ...
}
}
You could put either the trait object or the generic type into a structure even though you can't return it:
struct A<'a> {
input: &mut Iterator<Item = io::Result<String>>,
}
struct A<I>
where
I: Iterator<Item = io::Result<String>>,
{
input: I,
}
If you are feeling adventurous, you might be able to use some unsafe code / crates wrapping unsafe code to store the Stdin value and the iterator referencing it together, which is not universally safe.
See also:
Is there a way to use locked standard input and output in a constructor to live as long as the struct you're constructing?
Is there any way to return a reference to a variable created in a function?
Why can't I store a value and a reference to that value in the same struct?
How can I store a Chars iterator in the same struct as the String it is iterating on?
Are polymorphic variables allowed?
input: Box<Lines<BufRead>>,
This is invalid because Lines is not a trait. You want either:
use std::io::{prelude::*, Lines};
pub struct A {
pub input: Lines<Box<BufRead>>,
}
Or
use std::io;
pub struct A {
pub input: Box<Iterator<Item = io::Result<String>>>,
}

Related

Pass an iterable to a function and iterate twice in rust

I have a function which looks like
fn do_stuff(values: HashSet<String>) {
// Count stuff
for s in values.iter() {
prepare(s);
}
// Process stuff
for s in values.iter() {
process(s);
}
}
This works fine. For a unit test, I want to pass a two value collection where the elements are passed in a known order. (Processing them in the other order won't test the case I am trying to test.) HashSet doesn't guarantee an order, so I would like to pass a Vec instead.
I would like to change the argument to Iterable, but it appears that only IntoIter exists. I tried
fn do_stuff<C>(values: C)
where C: IntoIterator<Item=String>
{
// Count stuff
for s in values {
prepare(s);
}
// Process stuff
for s in values {
process(s);
}
}
which fails because the first iteration consumes values. The compiler suggests borrowing values, but
fn do_stuff<C>(values: C)
where C: IntoIterator<Item=String>
{
// Count stuff
for s in &values {
prepare(s);
}
// Process stuff
for s in values {
process(s);
}
}
fails because
the trait Iterator is not implemented for &C
I could probably make something with clone work, but the actual set will be large and I would like to avoid copying it if possible.
Thinking about that, the signature probably should be do_stuff(values: &C), so if that makes the problem simpler, then that is an acceptable solution.
SO suggests Writing a generic function that takes an iterable container as parameter in Rust as a related question, but that is a lifetime problem. I am not having problems with lifetimes.
It looks like How to create an `Iterable` trait for references in Rust? may actually be the solution. But I'm having trouble getting it to compile.
My first attempt is
pub trait Iterable {
type Item;
type Iter: Iterator<Item = Self::Item>;
fn iterator(&self) -> Self::Iter;
}
impl Iterable for HashSet<String> {
type Item = String;
type Iter = HashSet<String>::Iterator;
fn iterator(&self) -> Self::Iter {
self.iter()
}
}
which fails with
error[E0223]: ambiguous associated type
--> src/file.rs:178:17
|
178 | type Iter = HashSet<String>::Iterator;
| ^^^^^^^^^^^^^^^^^^^^^^^^^ help: use fully-qualified syntax: `<HashSet<std::string::String> as Trait>::Iterator`
Following that suggestion:
impl Iterable for HashSet<String> {
type Item = String;
type Iter = <HashSet<std::string::String> as Trait>::Iterator;
fn iterator(&self) -> Self::Iter {
self.iter()
}
}
failed with
error[E0433]: failed to resolve: use of undeclared type `Trait`
--> src/file.rs:178:50
|
178 | type Iter = <HashSet<std::string::String> as Trait>::Iterator;
| ^^^^^ use of undeclared type `Trait`
The rust documents don't seem to include Trait as a known type. If I replace Trait with HashSet, it doesn't recognize Iterator or IntoIter as the final value in the expression.
Implementation of accepted answer
Attempting to implement #eggyal answer, I was able to get this to compile
use std::collections::HashSet;
fn do_stuff<I>(iterable: I)
where
I: IntoIterator + Copy,
I::Item: AsRef<str>,
{
// Count stuff
for s in iterable {
prepare(s.as_ref());
}
// Process stuff
for s in iterable {
process(s.as_ref());
}
}
fn prepare(s: &str) {
println!("prepare: {}", s)
}
fn process(s: &str) {
println!("process: {}", s)
}
#[cfg(test)]
mod test_cluster {
use super::*;
#[test]
fn doit() {
let vec: Vec<String> = vec!["a".to_string(), "b".to_string(), "c".to_string()];
let set = vec.iter().cloned().collect::<HashSet<_>>();
do_stuff(&vec);
do_stuff(&set);
}
}
which had this output
---- simple::test_cluster::doit stdout ----
prepare: a
prepare: b
prepare: c
process: a
process: b
process: c
prepare: c
prepare: b
prepare: a
process: c
process: b
process: a
IntoIterator is not only implemented by the collection types themselves, but in most cases (including Vec and HashSet) it is also implemented by their borrows (yielding an iterator of borrowed items). Moreover, immutable borrows are always Copy. So you can do:
fn do_stuff<I>(iterable: I)
where
I: IntoIterator + Copy,
I::Item: AsRef<str>,
{
// Count stuff
for s in iterable {
prepare(s);
}
// Process stuff
for s in iterable {
process(s);
}
}
And this would then be invoked by passing in a borrow of the relevant collection:
let vec = vec!["a", "b", "c"];
let set = vec.iter().cloned().collect::<HashSet<_>>();
do_stuff(&vec);
do_stuff(&set);
Playground.
However, depending on your requirements (whether all items must first be prepared before any can be processed), it may be possible in this case to combine the preparation and processing into a single pass of the iterator.
Iterators over containers can be cloned if you want to iterate the container twice, so accepting an IntoIterator + Clone should work for you. Example code:
fn do_stuff<I>(values: I)
where
I: IntoIterator + Clone,
{
// Count stuff
for s in values.clone() {
prepare(s);
}
// Process stuff
for s in values {
process(s);
}
}
You can now pass in e.g. either a hash set or a vector, and both of them can be iterated twice:
let vec = vec!["a", "b", "c"];
let set: HashSet<_> = vec.iter().cloned().collect();
do_stuff(vec);
do_stuff(set);
(Playground)

Writing to a file or String in Rust

TL;DR: I want to implement trait std::io::Write that outputs to a memory buffer, ideally String, for unit-testing purposes.
I must be missing something simple.
Similar to another question, Writing to a file or stdout in Rust, I am working on a code that can work with any std::io::Write implementation.
It operates on structure defined like this:
pub struct MyStructure {
writer: Box<dyn Write>,
}
Now, it's easy to create instance writing to either a file or stdout:
impl MyStructure {
pub fn use_stdout() -> Self {
let writer = Box::new(std::io::stdout());
MyStructure { writer }
}
pub fn use_file<P: AsRef<Path>>(path: P) -> Result<Self> {
let writer = Box::new(File::create(path)?);
Ok(MyStructure { writer })
}
pub fn printit(&mut self) -> Result<()> {
self.writer.write(b"hello")?;
Ok(())
}
}
But for unit testing, I also need to have a way to run the business logic (here represented by method printit()) and trap its output, so that its content can be checked in the test.
I cannot figure out how to implement this. This playground code shows how I would like to use it, but it does not compile because it breaks borrowing rules.
// invalid code - does not compile!
fn main() {
let mut buf = Vec::new(); // This buffer should receive output
let mut x2 = MyStructure { writer: Box::new(buf) };
x2.printit().unwrap();
// now, get the collected output
let output = std::str::from_utf8(buf.as_slice()).unwrap().to_string();
// here I want to analyze the output, for instance in unit-test asserts
println!("Output to string was {}", output);
}
Any idea how to write the code correctly? I.e., how to implement a writer on top of a memory structure (String, Vec, ...) that can be accessed afterwards?
Something like this does work:
let mut buf = Vec::new();
{
// Use the buffer by a mutable reference
//
// Also, we're doing it inside another scope
// to help the borrow checker
let mut x2 = MyStructure { writer: Box::new(&mut buf) };
x2.printit().unwrap();
}
let output = std::str::from_utf8(buf.as_slice()).unwrap().to_string();
println!("Output to string was {}", output);
However, in order for this to work, you need to modify your type and add a lifetime parameter:
pub struct MyStructure<'a> {
writer: Box<dyn Write + 'a>,
}
Note that in your case (where you omit the + 'a part) the compiler assumes that you use 'static as the lifetime of the trait object:
// Same as your original variant
pub struct MyStructure {
writer: Box<dyn Write + 'static>
}
This limits the set of types which could be used here, in particular, you cannot use any kinds of borrowed references. Therefore, for maximum genericity we have to be explicit here and define a lifetime parameter.
Also note that depending on your use case, you can use generics instead of trait objects:
pub struct MyStructure<W: Write> {
writer: W
}
In this case the types are fully visible at any point of your program, and therefore no additional lifetime annotation is needed.

How are trait objects implemented in Rust?

I'm trying to understand how trait objects are implemented in Rust. Please let me know if the following understanding is correct.
I have a function that takes any type that implements the Write trait:
fn some_func(write_to: &mut Write) {}
In any place where we have a type that implements this trait and calls the above function, the compiler generates a "trait object", probably by adding a call to TraitObject::new(data, vtable).
If we have something like:
let input = get_user_input(); // say we are expecting the input to be 1 or 2
let mut file = File::new("blah.txt").unwrap();
let mut vec: Vec<u8> = vec![1, 2, 3];
match input {
1 => some_func(&mut file),
2 => some_func(&mut vec),
}
will probably turn out to be:
match input {
1 => {
let file_write_trait_object: &mut Write =
TraitObject::new(&file, &vtable_for_file_write_trait);
some_func(file_write_trait_object);
}
2 => {
let vec_write_trait_object: &mut Write =
TraitObject::new(&vec, &vtable_for_vec_write_trait);
some_func(vec_write_trait_object);
}
}
Inside some_func the compiler will just access the methods used based on the vtable in the TraitObject passed along.
Trait objects are fat pointers, so fn some_func(write_to: &mut Write) compiles to something like fn some_func(_: *mut OpaqueStruct, _: *const WriteVtable).

Cannot split a string into string slices with explicit lifetimes because the string does not live long enough

I'm writing a library that should read from something implementing the BufRead trait; a network data stream, standard input, etc. The first function is supposed to read a data unit from that reader and return a populated struct filled mostly with &'a str values parsed from a frame from the wire.
Here is a minimal version:
mod mymod {
use std::io::prelude::*;
use std::io;
pub fn parse_frame<'a, T>(mut reader: T)
where
T: BufRead,
{
for line in reader.by_ref().lines() {
let line = line.expect("reading header line");
if line.len() == 0 {
// got empty line; done with header
break;
}
// split line
let splitted = line.splitn(2, ':');
let line_parts: Vec<&'a str> = splitted.collect();
println!("{} has value {}", line_parts[0], line_parts[1]);
}
// more reads down here, therefore the reader.by_ref() above
// (otherwise: use of moved value).
}
}
use std::io;
fn main() {
let stdin = io::stdin();
let locked = stdin.lock();
mymod::parse_frame(locked);
}
An error shows up which I cannot fix after trying different solutions:
error: `line` does not live long enough
--> src/main.rs:16:28
|
16 | let splitted = line.splitn(2, ':');
| ^^^^ does not live long enough
...
20 | }
| - borrowed value only lives until here
|
note: borrowed value must be valid for the lifetime 'a as defined on the body at 8:4...
--> src/main.rs:8:5
|
8 | / {
9 | | for line in reader.by_ref().lines() {
10 | | let line = line.expect("reading header line");
11 | | if line.len() == 0 {
... |
22 | | // (otherwise: use of moved value).
23 | | }
| |_____^
The lifetime 'a is defined on a struct and implementation of a data keeper structure because the &str requires an explicit lifetime. These code parts were removed as part of the minimal example.
BufReader has a lines() method which returns Result<String, Err>. I handle errors using expect or match and thus unpack the Result so that the program now has the bare String. This will then be done multiple times to populate a data structure.
Many answers say that the unwrap result needs to be bound to a variable otherwise it gets lost because it is a temporary value. But I already saved the unpacked Result value in the variable line and I still get the error.
How to fix this error - could not get it working after hours trying.
Does it make sense to do all these lifetime declarations just for &str in a data keeper struct? This will be mostly a readonly data structure, at most replacing whole field values. String could also be used, but have found articles saying that String has lower performance than &str - and this frame parser function will be called many times and is performance-critical.
Similar questions exist on Stack Overflow, but none quite answers the situation here.
For completeness and better understanding, following is an excerpt from complete source code as to why lifetime question came up:
Data structure declaration:
// tuple
pub struct Header<'a>(pub &'a str, pub &'a str);
pub struct Frame<'a> {
pub frameType: String,
pub bodyType: &'a str,
pub port: &'a str,
pub headers: Vec<Header<'a>>,
pub body: Vec<u8>,
}
impl<'a> Frame<'a> {
pub fn marshal(&'a self) {
//TODO
println!("marshal!");
}
}
Complete function definition:
pub fn parse_frame<'a, T>(mut reader: T) -> Result<Frame<'a>, io::Error> where T: BufRead {
Your problem can be reduced to this:
fn foo<'a>() {
let thing = String::from("a b");
let parts: Vec<&'a str> = thing.split(" ").collect();
}
You create a String inside your function, then declare that references to that string are guaranteed to live for the lifetime 'a. Unfortunately, the lifetime 'a isn't under your control — the caller of the function gets to pick what the lifetime is. That's how generic parameters work!
What would happen if the caller of the function specified the 'static lifetime? How would it be possible for your code, which allocates a value at runtime, to guarantee that the value lives longer than even the main function? It's not possible, which is why the compiler has reported an error.
Once you've gained a bit more experience, the function signature fn foo<'a>() will jump out at you like a red alert — there's a generic parameter that isn't used. That's most likely going to mean bad news.
return a populated struct filled mostly with &'a str
You cannot possibly do this with the current organization of your code. References have to point to something. You are not providing anywhere for the pointed-at values to live. You cannot return an allocated String as a string slice.
Before you jump to it, no you cannot store a value and a reference to that value in the same struct.
Instead, you need to split the code that creates the String and that which parses a &str and returns more &str references. That's how all the existing zero-copy parsers work. You could look at those for inspiration.
String has lower performance than &str
No, it really doesn't. Creating lots of extraneous Strings is a bad idea, sure, just like allocating too much is a bad idea in any language.
Maybe the following program gives clues for others who also also having their first problems with lifetimes:
fn main() {
// using String und &str Slice
let my_str: String = "fire".to_owned();
let returned_str: MyStruct = my_func_str(&my_str);
println!("Received return value: {ret}", ret = returned_str.version);
// using Vec<u8> und &[u8] Slice
let my_vec: Vec<u8> = "fire".to_owned().into_bytes();
let returned_u8: MyStruct2 = my_func_vec(&my_vec);
println!("Received return value: {ret:?}", ret = returned_u8.version);
}
// using String -> str
fn my_func_str<'a>(some_str: &'a str) -> MyStruct<'a> {
MyStruct {
version: &some_str[0..2],
}
}
struct MyStruct<'a> {
version: &'a str,
}
// using Vec<u8> -> & [u8]
fn my_func_vec<'a>(some_vec: &'a Vec<u8>) -> MyStruct2<'a> {
MyStruct2 {
version: &some_vec[0..2],
}
}
struct MyStruct2<'a> {
version: &'a [u8],
}

How to pass a member function of a struct to another struct as callback

I want to pass a member function of a struct to another struct.
Sorry, poor English, can't say more details.
use std::thread;
struct Struct1 {}
impl Struct1 {
pub fn do_some(&mut self, s: &str) {
// do something use s to modify self
}
}
struct Struct2 {
cb1: Box<Fn(&mut Struct1, &str)>,
}
fn main() {
let s1 = Struct1 {};
let s2 = Struct2 {
cb1: Box::new(s1.do_some), // how to store do_some function in cb1 ?
};
}
You were very close! To refer to a method or any other symbol you use the :: separator and specify the path to said symbol. Methods or associated functions live in the namespace of the type, therefore the path of your method is Struct1::do_some. In Java you would also use the . operator to access those, but in Rust the . operator is only used on existing objects, not on type names.
The solution thus is:
let s2 = Struct2 {
cb1: Box::new(Struct1::do_some),
};
However, you could possibly improve the type of your function a bit. Box<Fn(...)> is a boxed trait object, but you don't necessarily need that if you don't want to work with closures. If you just want to refer to "normal functions" (those who don't have an environment), you can use a function pointer instead:
struct Struct2 {
cb1: fn(&mut Struct1, &str),
}
Note the lowercase fn and that we don't need the Box.

Resources