Unwrapping a skipped Chars iterator - string

Many iterator methods is Rust generate iterators wrapped up in iterators. One such case is the skip method, that skips the given number of elements and yields the remaining ones wrapped in the Skip struct that implements the Iterator trait.
I would like to read a file line by line, and sometimes skip the n first characters of a line. I figured that using Iterator.skip would work, but now I'm stuck figuring out how I can actually unwrap the yielded Chars iterator so I could materialize the remaining &str with chars.as_str().
What is the idiomatic way of unwrapping an iterator in rust? The call chain
let line: &String = ...;
let remaining = line.chars().skip(n).as_str().trim();
raises the error
error[E0599]: no method named `as_str` found for struct `std::iter::Skip<std::str::Chars<'_>>` in the current scope
--> src/parser/directive_parsers.rs:367:63
|
367 | let option_val = line.chars().skip(option_val_indent).as_str().trim();
| ^^^^^^ method not found in `std::iter::Skip<std::str::Chars<'_>>`
error: aborting due to previous error

You can retrieve the start byte index of the nth character using the nth() method on the char_indices() iterator on the string. Once you have this byte index, you can use it to get a subslice of the original string:
let line = "This is a line.";
let index = line.char_indices().nth(n).unwrap().0;
let remaining = &line[index..];

Rather than iterate over chars, you can use char_indices to find the exact point at which to take a slice from the string, ensuring that you don't index into the middle of a multi-byte character. This will save on an allocation for each line in the iterator:
input
.iter()
.map(|line| {
let n = 2; // get n from somewhere?
let (index, _) = line.char_indices().nth(n).unwrap();// better error handling
&line[index..]
})

Related

Get elements from Vector of tab delimited Strings

I have a vector of Strings as in the example below, and for every element in that vector, I want to get the second and third items. I don't know if I should be collecting a &str or String, but I haven't gotten to that part because this does not compile.
Everything is "fine" until I add the slicing [1..]
let elements: Vec<&str> = vec!["foo\tbar\tbaz", "ffoo\tbbar\tbbaz"]
.iter()
.map(|rec| rec.rsplit('\t').collect::<Vec<_>>()[1..])
.collect();
It complains because
the size for values of type `[&str]` cannot be known at compilation time
the trait `std::marker::Sized` is not implemented for `[&str]`rustcE0277
As the compiler tells you, the slicing is broken because in Rust a slice returns, well, the slice. Whose size is unknown at compile-time (hence the compiler complaining that it's unsized).
That's why you normally reference the slice e.g.
&thing[1..]
unless it's a context where it doesn't matter. Or you immediately convert the slice to a vector or array.
However here it would not work, because a slice is a "borrowing" structure, it doesn't own anything. And it borrows the Vec being created inside the map, which means you'll get a borrowing error, because the Vec will be destroyed at the end of the callback, and thus the slice would be referencing invalid memory:
error[E0515]: cannot return value referencing temporary value
--> src/main.rs:5:16
|
5 | .map(|rec| &rec.rsplit('\t').collect::<Vec<_>>()[1..])
| ^------------------------------------^^^^^
| ||
| |temporary value created here
| returns a value referencing data owned by the current function
The solution is to filter the iterator before collecting the vec, using Iterator::skip:
let elements: Vec<&str> = my_vec
.iter()
.map(|rec| rec.rsplit('\t').skip(1).collect::<Vec<_>>())
.collect();
However this means you now have an Iterator<Item=Vec<&str>>, which doesn't collect to a Vec<&str>.
You could always Iterator::flatten the inner vecs, but in reality they're completely unnecessary: you can just Iterator::flat_map each original string into a stream of strings which automatically get folded into the parent:
https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=f2c33c1b6a30224202357dc4bd5c1d19
let my_vec = vec!["foo\tbar\tbaz", "ffoo\tbbar\tbbaz"];
let elements: Vec<&str> = my_vec
.iter()
.flat_map(|rec| rec.rsplit('\t').skip(1))
.collect();
dbg!(elements);
By the by, the code you're showing doesn't match the description, you say:
for every element in that vector, I want to get the second and third items
but since you're using rsplit what you're getting is the second and first: rsplit will iterate from the end, hence the r for reverse.

Immutable access in rust

I am new to rust from python and have used the functional style in python extensively.
What I am trying to do is to take in a string (slice) (or any iterable) and iterate with a reference to the current index and the next index. Here is my attempt:
fn main() {
// intentionally immutable, this should not change
let x = "this is a
multiline string
with more
then 3 lines.";
// initialize multiple (mutable) iterators over the slice
let mut lineiter = x.chars();
let mut afteriter = x.chars();
// to have some reason to do this
afteriter.skip(1);
// zip them together, comparing the current line with the next line
let mut zipped = lineiter.zip(afteriter);
for (char1, char2) in zipped {
println!("{:?} {:?}", char1, char2);
}
}
I think it should be possible to get different iterators that have different positions in the slice but are referring to the same parts of memory without having to copy the string, but the error I get is as follows:
error[E0382]: use of moved value: `afteriter`
--> /home/alex/Documents/projects/simple-game-solver/src/src.rs:15:35
|
10 | let afteriter = x.chars();
| --------- move occurs because `afteriter` has type `std::str::Chars<'_>`, which does not implement the `Copy` trait
11 | // to have some reason to do this
12 | afteriter.skip(1);
| --------- value moved here
...
15 | let mut zipped = lineiter.zip(afteriter);
| ^^^^^^^^^ value used here after move
I also get a warning telling me that zipped does not need to be mutable.
Is it possible to instantiate multiple iterators over a single variable and if so how can it be done?
Is it possible to instantiate multiple iterators over a single variable and if so how can it be done?
If you check the signature and documentation for Iterator::skip:
fn skip(self, n: usize) -> Skip<Self>
Creates an iterator that skips the first n elements.
After they have been consumed, the rest of the elements are yielded. Rather than overriding this method directly, instead override the nth method.
You can see that it takes self by value (consumes the input iterator) and returns a new iterator. This is not a method which consumes the first n elements of the iterator in-place, it's one which converts the existing iterator into one which skips the first n elements.
So instead of:
let mut afteriter = x.chars();
afteriter.skip(1);
you just write:
let mut afteriter = x.chars().skip(1);
I also get a warning telling me that zipped does not need to be mutable.
That's because Rust for loop uses the IntoIterator trait, which moves the iterable into the loop. It's not creating a mutable reference, it's just consuming whatever the RHS is.
Therefore it doesn't care what the mutability of the variable. You do need mut if you iterate explicitly, or if you call some other "terminal" method (e.g. nth or try_fold or all), or if you want to iterate on the mutable reference (that's mostly useful for collections though), but not to hand off iterators to some other combinator method, or to a for loop.
A for loop takes self, if you will. Just as for_each does in fact.
Thanks to #Stargateur for giving me the solution. The .skip(1) takes ownership of afteriter and returns ownership to a version without the first element. What was happening before was ownership was lost on the .skip and so the variable could not be mutated anymore (I am pretty sure)

Creating word iterator from line iterator

I have a string iterator lines that I get from stdin with
use std::io::{self, BufRead};
let mut stdin = io::stdin();
let lines = stdin.lock().lines().map(|l| l.unwrap());
The lines iterator yields values of type String, not &str. I want to create an iterator that iterates over the input words instead of lines. It seems like this should be doable but my naive attempt does not work:
let words = lines.flat_map(|l| l.split_whitespace());
The compiler tells me that l is being dropped while still borrowed, which makes sense:
error[E0597]: `l` does not live long enough
--> src/lib.rs:6:36
|
6 | let words = lines.flat_map(|l| l.split_whitespace());
| ^ - `l` dropped here while still borrowed
| |
| borrowed value does not live long enough
7 | }
| - borrowed value needs to live until here
Is there some other clean way that accomplishes this?
In your example code, lines is an iterator over the lines read in from the reader you have obtained from stdin. As you say, it returns String instances, but you are not storing them anywhere.
std::string::String::split_whitespace is defined like this:
pub fn split_whitespace(&self) -> SplitWhitespace
So, it takes a reference to a string - it does not consume the string. It returns an iterator that yields string slices &str - which reference portions of the string, but don't own it.
In fact as soon as the closure you have passed to flat_map is done with it, no-one owns it, so it is dropped. That would leave the &str yielded by words dangling, thus the error.
One solution is to collect the lines into a vector, like this:
let lines: Vec<String> = stdin.lock().lines().map(|l| l.unwrap()).collect();
let words = lines.iter().flat_map(|l| l.split_whitespace());
The String instances are kept in the Vec<String>, which can live on so that the &str yielded by words have something to refer to.
If there were a lot of lines, and you did not want to keep them all in memory, you might prefer to do it a line at a time:
let lines = stdin.lock().lines().map(|l| l.unwrap());
let words = lines.flat_map(|l| {
l.split_whitespace()
.map(|s| s.to_owned())
.collect::<Vec<String>>()
.into_iter()
});
Here the words of each line are collected into a Vec, a line at a time. The trade-off is less overall memory consumption, against the overhead of constructing a Vec<String> for each line, and copy each word into it.
You might have been hoping for a zero-copy implementation, which consumed the Strings that lines produces. I think that would be possible to create, by creating a split_whitespace() function that takes ownership of the String and returns an iterator that owns the string.

How do I create an iterator of lines from a file that have been split into pieces?

I have a file that I need to read line-by-line and break into two sentences separated by a "=". I am trying to use iterators, but I can't find how to use it properly within split. The documentation says that std::str::Split implements the trait, but I'm still clueless how to use it.
use std::{
fs::File,
io::{prelude::*, BufReader},
};
fn example(path: &str) {
for line in BufReader::new(File::open(path).expect("Failed at opening file.")).lines() {
let words = line.unwrap().split("="); //need to make this an iterable
}
}
How can I use a trait I know is already implemented into something like split?
As #Mateen commented, split already returns an iterable. To fix the lifetime problems, save the value returned by unwrap() into a variable before calling split.
I'll try to explain the lifetime issue here.
First it really helps to look at the function signatures.
pub fn unwrap(self) -> T
pub fn split<'a, P: Pattern<'a>>(&'a self, pat: P) -> Split<'a, P>
unwrap is pretty simple, it takes ownership of itself and returns the inner value.
split looks scary, but it's not too difficult, 'a is just a name for the lifetime, and it just states how long the return value can be used for. In this case it means that both the input arguments must live at least as long as the return value.
// Takes by reference, no ownership change
// v
pub fn split<'a, P: Pattern<'a>>(&'a self, pat: P) -> Split<'a, P>
// ^ ^ ^ ^
// | |--|---| |
// This just declares a name. | |
// | |
// Both of these values must last longer than -----|
This is because split doesn't copy any of the string, it just points to the position on the original string where the split takes place. If the original string for some reason was dropped, the Split will not point to invalid data.
A variable's lifetime (unless the ownership is passed to something else) lasts till it is out of scope, this is either at the closing } if it is named (e.g. with let) or it is at the end of line / ;
That's why there is a lifetime problem in your code:
for line in std::io::BufReader::new(std::fs::File::open(path).expect("Failed at opening file.")).lines() {
let words = line
.unwrap() // <--- Unwrap consumes `line`, `line` can not be used after calling unwrap(),
.split("=") // Passed unwrap()'s output to split as a reference
; //<-- end of line, unwrap()'s output is dropped due to it not being saved to a variable, the result of split now points to nothing, so the compiler complains.
}
Solutions
Saving the return value of unwrap()
for line in std::io::BufReader::new(std::fs::File::open("abc").expect("Failed at opening file.")).lines() {
let words = line.unwrap();
let words_split = words.split("=");
} // <--- `word`'s lifetime ends here, but there is no lifetime issues since `words_split` also ends here.
You can rename words_split to words to shadow the original variable to not clutter variable names if you want, this also doesn't cause an issue since shadowed variables are not dropped immediately, but at the end of its original scope.
Or
Rather than having a iterator of type str, all of which are just fancy pointers to the original string, you can copy each slice out to it's own string, removing the reliance on keeping the original string in scope.
There is almost certainly no reason to do this in your case, since copying each slice takes more processing power and more memory, but rust gives you this control.
let words = line
.unwrap()
.split("=")
.map(|piece|
piece.to_owned() // <--- This copies all the characters in the str into it's own String.
).collect::<Vec<String>>()
; // <--- unwrap()'s output dropped here, but it doesn't matter since the pieces no longer points to the original line string.
let words_iterator = words.iter();
collect gives you the error cannot infer type because you didn't state what you wanted to collect into, either use the turbofish syntax above, or state it on words i.e. let words: Vec<String> = ...
You have to call collect because map doesn't do anything unless you use it, but that's out of the scope of this answer.

How to get a slice from an Iterator?

I started to use clippy as a linter. Sometimes, it shows this warning:
writing `&Vec<_>` instead of `&[_]` involves one more reference and cannot be
used with non-Vec-based slices. Consider changing the type to `&[...]`,
#[warn(ptr_arg)] on by default
I changed the parameter to a slice but this adds boilerplate on the call side. For instance, the code was:
let names = args.arguments.iter().map(|arg| {
arg.name.clone()
}).collect();
function(&names);
but now it is:
let names = args.arguments.iter().map(|arg| {
arg.name.clone()
}).collect::<Vec<_>>();
function(&names);
otherwise, I get the following error:
error: the trait `core::marker::Sized` is not implemented for the type
`[collections::string::String]` [E0277]
So I wonder if there is a way to convert an Iterator to a slice or avoid having to specify the collected type in this specific case.
So I wonder if there is a way to convert an Iterator to a slice
There is not.
An iterator only provides one element at a time, whereas a slice is about getting several elements at a time. This is why you first need to collect all the elements yielded by the Iterator into a contiguous array (Vec) before being able to use a slice.
The first obvious answer is not to worry about the slight overhead, though personally I would prefer placing the type hint next to the variable (I find it more readable):
let names: Vec<_> = args.arguments.iter().map(|arg| {
arg.name.clone()
}).collect();
function(&names);
Another option would be for function to take an Iterator instead (and an iterator of references, at that):
let names = args.arguments.iter().map(|arg| &arg.name);
function(names);
After all, iterators are more general, and you can always "realize" the slice inside the function if you need to.
So I wonder if there is a way to convert an Iterator to a slice
There is. (in applicable cases)
Got here searching "rust iter to slice", for my use-case, there was a solution:
fn main() {
// example struct
#[derive(Debug)]
struct A(u8);
let list = vec![A(5), A(6), A(7)];
// list_ref passed into a function somewhere ...
let list_ref: &[A] = &list;
let mut iter = list_ref.iter();
// consume some ...
let _a5: Option<&A> = iter.next();
// now want to eg. return a slice of the rest
let slice: &[A] = iter.as_slice();
println!("{:?}", slice); // [A(6), A(7)]
}
That said, .as_slice is defined on an iter of an existing slice, so the previous answerer was correct in that if you've got, eg. a map iter, you would need to collect it first (so there is something to slice from).
docs: https://doc.rust-lang.org/std/slice/struct.Iter.html#method.as_slice

Resources