Get elements from Vector of tab delimited Strings - string

I have a vector of Strings as in the example below, and for every element in that vector, I want to get the second and third items. I don't know if I should be collecting a &str or String, but I haven't gotten to that part because this does not compile.
Everything is "fine" until I add the slicing [1..]
let elements: Vec<&str> = vec!["foo\tbar\tbaz", "ffoo\tbbar\tbbaz"]
.iter()
.map(|rec| rec.rsplit('\t').collect::<Vec<_>>()[1..])
.collect();
It complains because
the size for values of type `[&str]` cannot be known at compilation time
the trait `std::marker::Sized` is not implemented for `[&str]`rustcE0277

As the compiler tells you, the slicing is broken because in Rust a slice returns, well, the slice. Whose size is unknown at compile-time (hence the compiler complaining that it's unsized).
That's why you normally reference the slice e.g.
&thing[1..]
unless it's a context where it doesn't matter. Or you immediately convert the slice to a vector or array.
However here it would not work, because a slice is a "borrowing" structure, it doesn't own anything. And it borrows the Vec being created inside the map, which means you'll get a borrowing error, because the Vec will be destroyed at the end of the callback, and thus the slice would be referencing invalid memory:
error[E0515]: cannot return value referencing temporary value
--> src/main.rs:5:16
|
5 | .map(|rec| &rec.rsplit('\t').collect::<Vec<_>>()[1..])
| ^------------------------------------^^^^^
| ||
| |temporary value created here
| returns a value referencing data owned by the current function
The solution is to filter the iterator before collecting the vec, using Iterator::skip:
let elements: Vec<&str> = my_vec
.iter()
.map(|rec| rec.rsplit('\t').skip(1).collect::<Vec<_>>())
.collect();
However this means you now have an Iterator<Item=Vec<&str>>, which doesn't collect to a Vec<&str>.
You could always Iterator::flatten the inner vecs, but in reality they're completely unnecessary: you can just Iterator::flat_map each original string into a stream of strings which automatically get folded into the parent:
https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=f2c33c1b6a30224202357dc4bd5c1d19
let my_vec = vec!["foo\tbar\tbaz", "ffoo\tbbar\tbbaz"];
let elements: Vec<&str> = my_vec
.iter()
.flat_map(|rec| rec.rsplit('\t').skip(1))
.collect();
dbg!(elements);
By the by, the code you're showing doesn't match the description, you say:
for every element in that vector, I want to get the second and third items
but since you're using rsplit what you're getting is the second and first: rsplit will iterate from the end, hence the r for reverse.

Related

How to return a Vec<String> from a collection in rust?

here I am splitting vec into sub-vectors of equal size 4 and then returning a collection, I want the type returned from the collection to be Vec<String>. How can I do so?
let mut split_operations : Vec<String> = vec[2..].chunks(4).collect::<String>();
This generates the following error:
a value of type std::string::String cannot be built from an iterator over elements of type &[std::string::String]
value of type std::string::String cannot be built from std::iter::Iterator<Item=&[std::string::String]>
help: the trait std::iter::FromIterator<&[std::string::String]> is not implemented for std::string::Stringrustc(E0277)
The type parameter of collect() is what the entire thing should be collected to. So the parameter to collect and the actual type are redundant, only one is needed.
However Rust doesn't want to guess how or what to the chunks are supposed to change: here you have an iterator of slices (Iterator<Item=&[String]). You can collect an Iterator<Item=String> to a Vec<String> or to a single String (it'll join all the items), but the intermediate slice means Rust's got no idea what's supposed to happen.
If you want each chunk to become a single string, you need to do that explicitely in a map e.g.
let mut split_operations : Vec<String> = vec[2..].chunks(4)
.map(|c| c.concat())
.collect();
If you want to play (and / or infuriate your colleagues), you can also write the map callback as <[_]>::concat.

Immutable access in rust

I am new to rust from python and have used the functional style in python extensively.
What I am trying to do is to take in a string (slice) (or any iterable) and iterate with a reference to the current index and the next index. Here is my attempt:
fn main() {
// intentionally immutable, this should not change
let x = "this is a
multiline string
with more
then 3 lines.";
// initialize multiple (mutable) iterators over the slice
let mut lineiter = x.chars();
let mut afteriter = x.chars();
// to have some reason to do this
afteriter.skip(1);
// zip them together, comparing the current line with the next line
let mut zipped = lineiter.zip(afteriter);
for (char1, char2) in zipped {
println!("{:?} {:?}", char1, char2);
}
}
I think it should be possible to get different iterators that have different positions in the slice but are referring to the same parts of memory without having to copy the string, but the error I get is as follows:
error[E0382]: use of moved value: `afteriter`
--> /home/alex/Documents/projects/simple-game-solver/src/src.rs:15:35
|
10 | let afteriter = x.chars();
| --------- move occurs because `afteriter` has type `std::str::Chars<'_>`, which does not implement the `Copy` trait
11 | // to have some reason to do this
12 | afteriter.skip(1);
| --------- value moved here
...
15 | let mut zipped = lineiter.zip(afteriter);
| ^^^^^^^^^ value used here after move
I also get a warning telling me that zipped does not need to be mutable.
Is it possible to instantiate multiple iterators over a single variable and if so how can it be done?
Is it possible to instantiate multiple iterators over a single variable and if so how can it be done?
If you check the signature and documentation for Iterator::skip:
fn skip(self, n: usize) -> Skip<Self>
Creates an iterator that skips the first n elements.
After they have been consumed, the rest of the elements are yielded. Rather than overriding this method directly, instead override the nth method.
You can see that it takes self by value (consumes the input iterator) and returns a new iterator. This is not a method which consumes the first n elements of the iterator in-place, it's one which converts the existing iterator into one which skips the first n elements.
So instead of:
let mut afteriter = x.chars();
afteriter.skip(1);
you just write:
let mut afteriter = x.chars().skip(1);
I also get a warning telling me that zipped does not need to be mutable.
That's because Rust for loop uses the IntoIterator trait, which moves the iterable into the loop. It's not creating a mutable reference, it's just consuming whatever the RHS is.
Therefore it doesn't care what the mutability of the variable. You do need mut if you iterate explicitly, or if you call some other "terminal" method (e.g. nth or try_fold or all), or if you want to iterate on the mutable reference (that's mostly useful for collections though), but not to hand off iterators to some other combinator method, or to a for loop.
A for loop takes self, if you will. Just as for_each does in fact.
Thanks to #Stargateur for giving me the solution. The .skip(1) takes ownership of afteriter and returns ownership to a version without the first element. What was happening before was ownership was lost on the .skip and so the variable could not be mutated anymore (I am pretty sure)

Creating word iterator from line iterator

I have a string iterator lines that I get from stdin with
use std::io::{self, BufRead};
let mut stdin = io::stdin();
let lines = stdin.lock().lines().map(|l| l.unwrap());
The lines iterator yields values of type String, not &str. I want to create an iterator that iterates over the input words instead of lines. It seems like this should be doable but my naive attempt does not work:
let words = lines.flat_map(|l| l.split_whitespace());
The compiler tells me that l is being dropped while still borrowed, which makes sense:
error[E0597]: `l` does not live long enough
--> src/lib.rs:6:36
|
6 | let words = lines.flat_map(|l| l.split_whitespace());
| ^ - `l` dropped here while still borrowed
| |
| borrowed value does not live long enough
7 | }
| - borrowed value needs to live until here
Is there some other clean way that accomplishes this?
In your example code, lines is an iterator over the lines read in from the reader you have obtained from stdin. As you say, it returns String instances, but you are not storing them anywhere.
std::string::String::split_whitespace is defined like this:
pub fn split_whitespace(&self) -> SplitWhitespace
So, it takes a reference to a string - it does not consume the string. It returns an iterator that yields string slices &str - which reference portions of the string, but don't own it.
In fact as soon as the closure you have passed to flat_map is done with it, no-one owns it, so it is dropped. That would leave the &str yielded by words dangling, thus the error.
One solution is to collect the lines into a vector, like this:
let lines: Vec<String> = stdin.lock().lines().map(|l| l.unwrap()).collect();
let words = lines.iter().flat_map(|l| l.split_whitespace());
The String instances are kept in the Vec<String>, which can live on so that the &str yielded by words have something to refer to.
If there were a lot of lines, and you did not want to keep them all in memory, you might prefer to do it a line at a time:
let lines = stdin.lock().lines().map(|l| l.unwrap());
let words = lines.flat_map(|l| {
l.split_whitespace()
.map(|s| s.to_owned())
.collect::<Vec<String>>()
.into_iter()
});
Here the words of each line are collected into a Vec, a line at a time. The trade-off is less overall memory consumption, against the overhead of constructing a Vec<String> for each line, and copy each word into it.
You might have been hoping for a zero-copy implementation, which consumed the Strings that lines produces. I think that would be possible to create, by creating a split_whitespace() function that takes ownership of the String and returns an iterator that owns the string.

How to get a slice from an Iterator?

I started to use clippy as a linter. Sometimes, it shows this warning:
writing `&Vec<_>` instead of `&[_]` involves one more reference and cannot be
used with non-Vec-based slices. Consider changing the type to `&[...]`,
#[warn(ptr_arg)] on by default
I changed the parameter to a slice but this adds boilerplate on the call side. For instance, the code was:
let names = args.arguments.iter().map(|arg| {
arg.name.clone()
}).collect();
function(&names);
but now it is:
let names = args.arguments.iter().map(|arg| {
arg.name.clone()
}).collect::<Vec<_>>();
function(&names);
otherwise, I get the following error:
error: the trait `core::marker::Sized` is not implemented for the type
`[collections::string::String]` [E0277]
So I wonder if there is a way to convert an Iterator to a slice or avoid having to specify the collected type in this specific case.
So I wonder if there is a way to convert an Iterator to a slice
There is not.
An iterator only provides one element at a time, whereas a slice is about getting several elements at a time. This is why you first need to collect all the elements yielded by the Iterator into a contiguous array (Vec) before being able to use a slice.
The first obvious answer is not to worry about the slight overhead, though personally I would prefer placing the type hint next to the variable (I find it more readable):
let names: Vec<_> = args.arguments.iter().map(|arg| {
arg.name.clone()
}).collect();
function(&names);
Another option would be for function to take an Iterator instead (and an iterator of references, at that):
let names = args.arguments.iter().map(|arg| &arg.name);
function(names);
After all, iterators are more general, and you can always "realize" the slice inside the function if you need to.
So I wonder if there is a way to convert an Iterator to a slice
There is. (in applicable cases)
Got here searching "rust iter to slice", for my use-case, there was a solution:
fn main() {
// example struct
#[derive(Debug)]
struct A(u8);
let list = vec![A(5), A(6), A(7)];
// list_ref passed into a function somewhere ...
let list_ref: &[A] = &list;
let mut iter = list_ref.iter();
// consume some ...
let _a5: Option<&A> = iter.next();
// now want to eg. return a slice of the rest
let slice: &[A] = iter.as_slice();
println!("{:?}", slice); // [A(6), A(7)]
}
That said, .as_slice is defined on an iter of an existing slice, so the previous answerer was correct in that if you've got, eg. a map iter, you would need to collect it first (so there is something to slice from).
docs: https://doc.rust-lang.org/std/slice/struct.Iter.html#method.as_slice

Why does cloned() allow this function to compile

I'm starting to learn Rust and I tried to implement a function to reverse a vector of strings. I found a solution but I don't understand why it works.
This works:
fn reverse_strings(strings:Vec<&str>) -> Vec<&str> {
let actual: Vec<_> = strings.iter().cloned().rev().collect();
return actual;
}
But this doesn't.
fn reverse_strings(strings:Vec<&str>) -> Vec<&str> {
let actual: Vec<_> = strings.iter().rev().collect(); // without clone
return actual;
}
Error message
src/main.rs:28:10: 28:16 error: mismatched types:
expected `collections::vec::Vec<&str>`,
found `collections::vec::Vec<&&str>`
(expected str,
found &-ptr) [E0308]
Can someone explain to me why? What happens in the second function? Thanks!
So the call to .cloned() is essentially like doing .map(|i| i.clone()) in the same position (i.e. you can replace the former with the latter).
The thing is that when you call iter(), you're iterating/operating on references to the items being iterated. Notice that the vector already consists of 'references', specifically string slices.
So to zoom in a bit, let's replace cloned() with the equivalent map() that I mentioned above, for pedagogical purposes, since they are equivalent. This is what it actually looks like:
.map(|i: & &str| i.clone())
So notice that that's a reference to a reference (slice), because like I said, iter() operates on references to the items, not the items themselves. So since a single element in the vector being iterated is of type &str, then we're actually getting a reference to that, i.e. & &str. By calling clone() on each of these items, we go from a & &str to a &str, just like calling .clone() on a &i64 would result in an i64.
So to bring everything together, iter() iterates over references to the elements. So if you create a new vector from the collected items yielded by the iterator you construct (which you constructed by calling iter()) you would get a vector of references to references, that is:
let actual: Vec<& &str> = strings.iter().rev().collect();
So first of all realize that this is not the same as the type you're saying the function returns, Vec<&str>. More fundamentally, however, the lifetimes of these references would be local to the function, so even if you changed the return type to Vec<& &str> you would get a lifetime error.
Something else you could do, however, is to use the into_iter() method. This method actually does iterate over each element, not a reference to it. However, this means that the elements are moved from the original iterator/container. This is only possible in your situation because you're passing the vector by value, so you're allowed to move elements out of it.
fn reverse_strings(strings:Vec<&str>) -> Vec<&str> {
let actual: Vec<_> = strings.into_iter().rev().collect();
return actual;
}
playpen
This probably makes a bit more sense than cloning, since we are passed the vector by value, we're allowed to do anything with the elements, including moving them to a different location (in this case the new, reversed vector). And even if we don't, the vector will be dropped at the end of that function anyways, so we might as well. Cloning would be more appropriate if we're not allowed to do that (e.g. if we were passed the vector by reference, or a slice instead of a vector more likely).

Resources