How to avoid deep copy when using groupby in polars rust? - rust

I have a dataset where I need to do groupby operation on different columns. Here is minimal working code using polars version "0.21.1"
use polars::prelude::*;
use polars_lazy::prelude::*;
use polars::df;
fn main(){
let df = df![
"x1" => ["a", "b", "c", "a"],
"x2" => ["A", "A", "B", "B"],
"y" => [1, 2, 3, 4],
].unwrap();
let lf: LazyFrame = df.lazy();
let out1 = groupby_x1(&lf);
println!("{:?}", out1.collect());
let out2 = groupby_x2(&lf);
println!("{:?}", out2.collect());
}
fn groupby_x1(lf: &LazyFrame) -> LazyFrame {
let lf1: LazyFrame = lf.clone().groupby([col("x1")]).agg([
col("y").sum().alias("y_sum"),
]);
lf1
}
fn groupby_x2(lf: &LazyFrame) -> LazyFrame {
let lf1: LazyFrame = lf.clone().groupby([col("x2")]).agg([
col("y").sum().alias("y_sum"),
]);
lf1
}
But in the code I am making deep copies of whole lazyframe lf (using lf.clone(). How can I avoid that? If I replace lf.clone() with lf in functions groupby_x1 and groupby_x2 I get following error
error[E0507]: cannot move out of `*lf` which is behind a shared reference
--> src/main.rs:22:24
|
22 | let lf1: LazyFrame = lf.groupby([col("x1")]).agg([
| ^^^^^^^^^^^^^^^^^^^^^^^ move occurs because `*lf` has type `polars_lazy::frame::LazyFrame`, which does not implement the `Copy` trait
error[E0507]: cannot move out of `*lf` which is behind a shared reference
--> src/main.rs:29:24
|
29 | let lf1: LazyFrame = lf.groupby([col("x2")]).agg([
| ^^^^^^^^^^^^^^^^^^^^^^^ move occurs because `*lf` has type `polars_lazy::frame::LazyFrame`, which does not implement the `Copy` trait
For more information about this error, try `rustc --explain E0507`.
error: could not compile `polars_try` due to 2 previous errors

Polars Series are a newtype around Arc<Vec<ArrowRef>>. When you clone a DataFrame only the reference count of the Arc is incremented.
In other words, polars never does deep clones. Clones of a DataFrame are super cheap.

From the documentation, LazyFrame:
Lazy abstraction over an eager DataFrame. It really is an abstraction
over a logical plan. The methods of this struct will incrementally
modify a logical plan until output is requested (via collect)
Meaning there is no deep copy of the Dataframe, nothing is performed until you actually collect it.
Hence you have two options:
You keep copying them if you want to keep the original plan intact
You take ownership of the plan groupby_x1(lf: LazyFrame), and let the user of the function deal with the need of actually cloning the original plan if needed.

Related

Converting a Utf8 Series into a Series of List<Utf8> via a custom function in Rust polars

I have a Utf8 column in my DataFrame, and from that I want to create a column of List<Utf8>.
In particular for each row I am taking the text of a HTML document and using soup to parse out all the paragraphs of class <p>, and store the collection of text of each separate paragraph as a Vec<String> or Vec<&str>. I have this as a standalone function:
fn parse_paragraph(s: &str) -> Vec<&str> {
let soup = Soup::new(s);
soup.tag(p).find_all().iter().map(|&p| p.text()).collect()
}
In trying to adapt the few available examples of applying custom functions in Rust polars, I can't seem to get the conversion to compile.
Take this MVP example, using a simpler string-to-vec-of-strings example, borrowing from the Iterators example from the documentation:
use polars::prelude::*;
fn vector_split(text: &str) -> Vec<&str> {
text.split(' ').collect()
}
fn vector_split_series(s: &Series) -> PolarsResult<Series> {
let output : Series = s.utf8()
.expect("Text data")
.into_iter()
.map(|t| t.map(vector_split))
.collect();
Ok(output)
}
fn main() {
let df = df! [
"text" => ["a cat on the mat", "a bat on the hat", "a gnat on the rat"]
].unwrap();
df.clone().lazy()
.select([
col("text").apply(|s| vector_split_series(&s), GetOutput::default())
.alias("words")
])
.collect();
}
(Note: I know there is an in-built split function for utf8 Series, but I needed a simpler example than parsing HTML)
I get the following error from cargo check:
error[E0277]: a value of type `polars::prelude::Series` cannot be built from an iterator over elements of type `Option<Vec<&str>>`
--> src/main.rs:11:27
|
11 | let output : Series = s.utf8()
| ___________________________^
12 | | .expect("Text data")
13 | | .into_iter()
14 | | .map(|t| t.map(vector_split))
| |_____________________________________^ value of type `polars::prelude::Series` cannot be built from `std::iter::Iterator<Item=Option<Vec<&str>>>`
15 | .collect();
| ------- required by a bound introduced by this call
|
= help: the trait `FromIterator<Option<Vec<&str>>>` is not implemented for `polars::prelude::Series`
= help: the following other types implement trait `FromIterator<A>`:
<polars::prelude::Series as FromIterator<&'a bool>>
<polars::prelude::Series as FromIterator<&'a f32>>
<polars::prelude::Series as FromIterator<&'a f64>>
<polars::prelude::Series as FromIterator<&'a i32>>
<polars::prelude::Series as FromIterator<&'a i64>>
<polars::prelude::Series as FromIterator<&'a str>>
<polars::prelude::Series as FromIterator<&'a u32>>
<polars::prelude::Series as FromIterator<&'a u64>>
and 15 others
note: required by a bound in `std::iter::Iterator::collect`
What is the correct idiom for this kind of procedure? Is there a simpler way to apply a function?
For future seekers, I will explain the general solution and then the specific code to make the example work. I'll also point out some gotchas for this specific example.
Explanation
If you need to use a custom function instead of using the convenient Expr expressions, at the core of it you'll need to make a function that converts the Series of the input column into a Series backed by a ChunkedArray of the correct output type. This function is what you give to map in the select statement in main. The type of the ChunkedArray is the type you provide as GetOutput.
The code inside vector_split_series in the question works for conversion functions of standard numeric types, or List of numeric types. It does not work automatically for Lists of Utf8 strings, for example, as they are treated specially for ChunkedArrays. This is for performance reasons. You need to build up the Series explicitly, via the correct type builder.
In the question's case, we need to use a ListUtf8ChunkedBuilder which will create a ChunkedArray of List<Utf8>.
So in general, the question's code works for conversion outputs that are numeric or Lists of numerics. But for lists of strings, you need to use a ListUtf8ChunkedBuilder.
Correct code
The correct code for the question's example looks like this:
use polars::prelude::*;
fn vector_split(text: &str) -> Vec<String> {
text.split(' ').map(|x| x.to_owned()).collect()
}
fn vector_split_series(s: Series) -> PolarsResult<Series> {
let ca = s.utf8()?;
let mut builder = ListUtf8ChunkedBuilder::new("words", s.len(), ca.get_values_size());
ca.into_iter()
.for_each(|opt_s| match opt_s {
None => builder.append_null(),
Some(s) => {
builder.append_series(
&Series::new("words", vector_split(s).into_iter() )
)
}});
Ok(builder.finish().into_series())
}
fn main() {
let df = df! [
"text" => ["a cat on the mat", "a bat on the hat", "a gnat on the rat"]
].unwrap();
let df2 = df.clone().lazy()
.select([
col("text")
.apply(|s| vector_split_series(s), GetOutput::from_type(DataType::List(Box::new(DataType::Utf8))))
// Can instead use default if the compiler can determine the types
//.apply(|s| vector_split_series(s), GetOutput::default())
.alias("words")
])
.collect()
.unwrap();
println!("{:?}", df2);
}
The core is in vector_split_series. It has that function definition to be used in map.
The match statement is required because Series can have null entries, and to preserve the length of the Series, you need to pass nulls through. We use the builder here so it appends the appropriate null.
For non-null entries the builder needs to append Series. Normally you can append_from_iter, but there is (as of polars 0.26.1) no implementation of FromIterator for Iterator<Item=Vec<T>>. So you need to convert the collection into an iterator on values, and that iterator into a new Series.
Once the larger ChunkedArray (of type ListUtf8ChunkedArray) is built, you can convert it into a PolarsResult<Series> to return to map.
Gotcha
In the above example, vector_split can return Vec<String> or Vec<&str>. This is because split creates its iterator of &str in a nice way.
If you are using something more complicated --- like my original example of extracting text via Soup queries --- if they output iterators of &str, the references may be considered owned by temporary and then you will have issues about returning references to temporaries.
This is why in the working code, I pass Vec<String> back to the builder, even though it is not strictly required.

How to create my own type wrapping the array type, in order to avoid the Copy trait?

You know how you can create a new type by wrapping an existing type just because you want to avoid the Copy trait?! For example, you have bool and you want a new type MyBool, you do struct MyBool(bool); and then you can use that to avoid the Copy trait.
What I want to know is how do you do it(create a new type, that is) for an array type? eg. for the type of a in let a = [0; 4]; which is [{integer}; 4], array of four integer elements. And can you do it for an array of a specific type X and of a specific len Y ? or, only for an array of type T (think generics) and with/without(?) a specific array length embedded in the new type ?
So, for bool, the following code doesn't show any warnings/errors(even if you run cargo clippy on it):
#![deny(clippy::all, clippy::pedantic, clippy::nursery, warnings, future_incompatible,
nonstandard_style, non_ascii_idents, clippy::restriction, rust_2018_compatibility,
rust_2021_compatibility, unused)]
#![allow(clippy::print_stdout, clippy::use_debug, clippy::missing_docs_in_private_items)]
#![allow(clippy::blanket_clippy_restriction_lints)] //workaround clippy
// might want to deny later:
//#![allow(clippy::default_numeric_fallback)] // might want to deny later!
//#![allow(clippy::dbg_macro)]
fn main() {
let mut has_spawned:bool=false;
//...
let handler=std::thread::spawn(move || {
println!("Before {has_spawned}!"); // false
has_spawned=true;
println!("Set {has_spawned}!"); // true
});
#[allow(clippy::unwrap_used)]
handler.join().unwrap();
println!("Current {has_spawned}!"); // false
}
But I want it to show me errors, thus, I use a new type for bool, MyBool:
#![deny(clippy::all, clippy::pedantic, clippy::nursery, warnings, future_incompatible,
nonstandard_style, non_ascii_idents, clippy::restriction, rust_2018_compatibility,
rust_2021_compatibility, unused)]
#![allow(clippy::print_stdout, clippy::use_debug, clippy::missing_docs_in_private_items)]
#![allow(clippy::blanket_clippy_restriction_lints)] //workaround clippy
#[derive(Debug)]
struct MyBool(bool);
fn main() {
let mut my_has_spawned:MyBool=MyBool(false);
//...
let handler=std::thread::spawn(move || {
println!("Before {my_has_spawned:?}!"); //MyBool(false)
//my_has_spawned=MyBool(true);
my_has_spawned.0=true;
println!("Set {my_has_spawned:?}!"); // MyBool(true)
});
#[allow(clippy::unwrap_used)]
handler.join().unwrap();
println!("Current {my_has_spawned:#?}!"); // value borrowed here after move, XXX: this is what
// I wanted!
}
and now it shows me exactly what I want to see:
error[E0382]: borrow of moved value: `my_has_spawned`
--> /home/user/sandbox/rust/copy_trait/gotcha1/copy_trait_thread_newtype/src/main.rs:20:24
|
10 | let mut my_has_spawned:MyBool=MyBool(false);
| ------------------ move occurs because `my_has_spawned` has type `MyBool`, which does not implement the `Copy` trait
11 | //...
12 | let handler=std::thread::spawn(move || {
| ------- value moved into closure here
13 | println!("Before {my_has_spawned:?}!"); //MyBool(false)
| -------------- variable moved due to use in closure
...
20 | println!("Current {my_has_spawned:#?}!"); // value borrowed here after move, XXX: this is what
| -------------------^^^^^^^^^^^^^^-------
| | |
| | value borrowed here after move
| in this macro invocation (#1)
|
::: /usr/lib/rust/1.64.0/lib/rustlib/src/rust/library/std/src/macros.rs:101:1
|
101 | macro_rules! println {
| -------------------- in this expansion of `println!` (#1)
...
106 | $crate::io::_print($crate::format_args_nl!($($arg)*));
| --------------------------------- in this macro invocation (#2)
|
::: /usr/lib/rust/1.64.0/lib/rustlib/src/rust/library/core/src/macros/mod.rs:906:5
|
906 | macro_rules! format_args_nl {
| --------------------------- in this expansion of `$crate::format_args_nl!` (#2)
For more information about this error, try `rustc --explain E0382`.
error: could not compile `copy_trait_thread_newtype` due to previous error
Similarly to the above, I want to use the new type for this array program that yields no warnings or errors(even through cargo clippy):
#![deny(clippy::all, clippy::pedantic, clippy::nursery, warnings, future_incompatible,
nonstandard_style, non_ascii_idents, clippy::restriction, rust_2018_compatibility,
rust_2021_compatibility, unused)]
#![allow(clippy::print_stdout, clippy::use_debug, clippy::missing_docs_in_private_items)]
#![allow(clippy::blanket_clippy_restriction_lints)] //workaround clippy
// might want to deny later:
#![allow(clippy::default_numeric_fallback)] // might want to deny later!
#![allow(clippy::dbg_macro)]
//src: https://users.rust-lang.org/t/rust-book-suggestion-add-a-section-regarding-copy-vs-move/1549/2
fn foo(mut x: [i32; 4]) {
println!("x(before) = {:?}", x);
x = [1, 2, 3, 4];
println!("x(after) = {:?}", x);
}
//src: https://stackoverflow.com/a/58119924/19999437
fn print_type_of<T>(_: &T) {
//println!("{}", std::any::type_name::<T>());
println!("{}", core::any::type_name::<T>());
}
//struct MyArray(array); //TODO: how?
fn main() {
let a = [0; 4];
//a.something();//method not found in `[{integer}; 4]`
//a=1;//so this is an array
//dbg!(a);
println!("{:#?}", print_type_of(&a)); // i32
foo(a); //sneakily copied! thanks Copy trait!
println!("a = {:?}", a);//unchanged, doh! but since it was just copied above, can use it here
//without errors!
}
output of that is:
[i32; 4]
()
x(before) = [0, 0, 0, 0]
x(after) = [1, 2, 3, 4]
a = [0, 0, 0, 0]
So you see, I want a new type in order to avoid the copying of a when foo(a); is called, so that the computer/compiler can keep track of when I would introduce such easy bugs in my code, by error-ing instead, which happens only when a is moved(instead of just copied) when foo(a); is called.
Side note: do you think that this clippy lint https://github.com/rust-lang/rust-clippy/issues/9061 if implemented would be able to warn/err for me in such cases? That would be cool!
Personally I dislike the Copy trait for the only reason that it bypasses the borrow checker and thus allows you to write code that introduces subtle bugs. ie. you can keep using the stale value of the "moved"(copied) variable and the compiler won't complain.
Even if there are better ways to do what I want, please also do answer the title question: how I can wrap the array type in my own new type?
You can wrap an array the same way you would wrap a bool:
struct MyArray ([i32; 4]);
And you can use generics if you don't want to redefine different wrapper types for each array:
struct MyArray<T, const N: usize> ([T; N]);
I don't understand why you don't want your variable to be copied. Your variable being copied isn't a bug or won't create bugs.
point 1
The only way in rust to modify data via a function is either :
to pass a mutable reference of your variable to a function
to assign the result of the function back to your original mutable value
In the example you show, your variable isn't even defined as mutable in the first place, there is no way it would change.
point 2
The Copy trait doesn't bypass the borrow checker, as the borrow checker only goal is :
to make sure a pointer always points to valid data
to make sure there is always only one owner of the data
In the case you show, you don't involve any pointer, and the data itself is duplicated, so the borrow checker won't give a crap, it isn't even involved. Whatever happens in the function stays in the function.
anyway
Now if you want, for whatever reason, the borrow checker to show you errors in this situation (once again, there is no reason for the borrow checker to be involved), the proper way would probably be to give the ownership of your variable to a dedicated type like a Box, and use the box for your operations (box don't implement the Copy trait).
The following code involving a bool works
fn main() {
let a = false;
effect(a);
println!("main function : {a}");
}
fn effect(mut a: bool) {
println!("effect before : {a}");
a = !a;
println!("effect after : {a}");
}
The following code involving a bool wrapped inside a box doesn't work
`fn main() {
let a = Box::from(false);
effect(a);
println!("main function : {a}"); // error
}
fn effect(mut a: Box<bool>) {
println!("effect before : {a}");
*a = !*a;
println!("effect after : {a}");
}
Now, once again, I don't understand why anyone would want to do this. This change has a performance impact, as you now need to take your variable from the heap before any operation, instead if just reading it from the stack.

Rust check borrow with the whole HashMap, not check the key, is there any good way?

I want move the elements of HashMap<u64, Vec> key=1 to key=2
use std::collections::HashMap;
fn main() {
let mut arr: HashMap<u64, Vec<u64>> = HashMap::new();
arr.insert(1, vec![10, 11, 12]); // in fact, elments more than 1,000,000, so can't use clone()
arr.insert(2, vec![20, 21, 22]);
// in fact, following operator is in recusive closure, I simplify the expression:
let mut vec1 = arr.get_mut(&1).unwrap();
let mut vec2 = arr.get_mut(&2).unwrap();
// move the elements of key=1 to key=2
for v in vec1 {
vec2.push(vec1.pop().unwrap());
}
}
got error:
error[E0499]: cannot borrow `arr` as mutable more than once at a time
--> src/main.rs:10:20
|
9 | let mut vec1 = arr.get_mut(&1).unwrap();
| --- first mutable borrow occurs here
10 | let mut vec2 = arr.get_mut(&2).unwrap();
| ^^^ second mutable borrow occurs here
11 | for v in vec1 {
| ---- first borrow later used here
Rust check borrow with the whole HashMap, not check the key.
Is there any good way ?
It's not clear what the context / constraints are, so depending on those there are various possibilities of different impact and levels of complexity
if you don't care about keeping an empty version of the first entry, you can just use HashMap::remove as it returns the value for the removed key: https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=1734142acb598bad2ff460fdff028b6e
otherwise, you can use something like mem::swap to swap the vector held by key 1 with an empty vector, then you can update the vector held by key 2: https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=e05941cb4d7ddf8982baf7c9437a0446
because HashMap doesn't have splits, the final option would be to use a mutable iterator, iterators are inherently non-overlapping so they provide mutable references to individual values, meaning they would let you obtain mutable references to both values simultanously, though the code is a lot more complicated: https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=87d3c0a151382ce2f47dda59dc089d70
While the third option has to traverse the entire hashmap (which is less efficient than directly finding the correct entries by hashing), it has the possible advantage of not "losing" v1's allocation which is useful if v1 will be filled again in the future: in the first option v1 is completely dropped, and in the second option v1 becomes a vector of capacity 0 (unless you swap in a vector with a predefined capacity, but that's still an extra allocation)
You can put the Vec into a RefCell moving the borrow check to runtime:
use std::cell::RefCell;
use std::collections::HashMap;
fn main() {
let mut arr: HashMap<u64, RefCell<Vec<u64>>> = HashMap::new();
arr.insert(1, RefCell::new(vec![10, 11, 12])); // in fact, elments more than 1,000,000, so can't use clone()
arr.insert(2, RefCell::new(vec![20, 21, 22]));
// in fact, following operator is in recusive closure, I simplify the expression:
let mut vec1 = arr.get(&1).unwrap().borrow_mut();
let mut vec2 = arr.get(&2).unwrap().borrow_mut();
// move the elements of key=1 to key=2
vec2.append(&mut vec1);
}
Tip: Use Vec::append which moves the values from one vector to another.

Are there any constructs I can use to enable mutating a HashMap within a call to HashMap::get?

I'm trying to implement Dijkstra's algorithm in Rust. I am getting an error because I'm mutating a HashMap inside a match expression on a value obtained from .get(). I understand why this is a violation of the borrowing and mutation principles of Rust, but I haven't found a workaround.
I have tried using .entry() and .and_modify() to perform an in-place modification, but I also need to insert and/or modify other keys besides the one being matched on.
Are there any constructs/functions I can use to safely enable this mutation within a get, or, failing that, is there another approach to this algorithm that steers clear of this issue?
(Unfinished) Code Snippet:
use std::collections::{HashMap, HashSet};
struct Graph {
vertices: Vec<i32>,
edges: HashMap<i32, HashMap<i32, i32>>, // start -> {end -> weight}
}
fn dijkstra(start: i32, g: Graph) -> HashMap<i32, HashMap<i32, (i32, Vec<i32>)>> {
let mut known_paths: HashMap<i32, (i32, Vec<i32>)> = HashMap::new(); // end -> (total_weight, path)
known_paths.insert(start, (0, Vec::new()));
let mut current = &start;
let mut unvisited: HashSet<i32> = g.edges.keys().cloned().collect();
while unvisited.len() > 0 {
match known_paths.get(current) {
Some((current_dist, current_path)) => if let Some(ref incident_edges) =
g.edges.get(current)
{
for (neighbor, dist) in incident_edges.iter() {
match known_paths.get(&neighbor) {
Some((old_distance, _old_path)) => if current_dist + dist < *old_distance {
let mut new_path = current_path.clone();
new_path.push(*current);
known_paths.insert(*neighbor, (current_dist + dist, new_path));
},
None => {
let mut new_path = current_path.clone();
new_path.push(*current);
known_paths.insert(*neighbor, (current_dist + dist, new_path));
}
}
}
},
None => panic!("Something went wrong with current={}", current),
}
}
HashMap::new()
}
Error:
error[E0502]: cannot borrow `known_paths` as mutable because it is also borrowed as immutable
--> src/lib.rs:26:29
|
17 | match known_paths.get(current) {
| ----------- immutable borrow occurs here
...
26 | known_paths.insert(*neighbor, (current_dist + dist, new_path));
| ^^^^^^^^^^^ mutable borrow occurs here
...
37 | }
| - immutable borrow ends here
error[E0502]: cannot borrow `known_paths` as mutable because it is also borrowed as immutable
--> src/lib.rs:31:29
|
17 | match known_paths.get(current) {
| ----------- immutable borrow occurs here
...
31 | known_paths.insert(*neighbor, (current_dist + dist, new_path));
| ^^^^^^^^^^^ mutable borrow occurs here
...
37 | }
| - immutable borrow ends here
No.
I am getting an error because I'm mutating a HashMap inside a match expression on a value obtained from .get(). I understand why this is a violation of the borrowing and mutation principles of Rust, but I haven't found a workaround.
I am afraid you have not actually understood the borrowing principles.
The key principle which underpins Rust's safety is: Mutation NAND Aliasing.
This principle is then enforced either at run-time or at compile-time, with a strong preference for compile-time when feasible since it is free of run-time overhead.
In your situation, you have:
A reference inside HashMap, obtained from HashMap::get().
You attempt to modify said HashMap while holding onto the reference.
Let's imagine, for a second, that some constructs allows this code to compile and run; what would happen? BOOM.
When inserting an element in the HashMap, the HashMap may:
Shuffle existing elements, since it is using Robin Hood hashing.
Transfer all elements to another (larger) heap-allocated array.
Therefore, HashMap::insert means that any existing reference to an element of the HashMap may become dangling and point to either another element, or random memory.
There is no safe way to insert into the HashMap while holding a reference to an element of the HashMap.
You need to find a solution which does not involve keeping a reference to the HashMap.
My advice would be to simply clone: match known_paths.get(current).clone(). Once you have a first working implementation of the algorithm, you can look into possibly improving its performance, as necessity dictates.

How do I cope with lazy iterators?

I'm trying to sort an array with a map() over an iterator.
struct A {
b: Vec<B>,
}
#[derive(PartialEq, Eq, PartialOrd, Ord)]
struct B {
c: Vec<i32>,
}
fn main() {
let mut a = A { b: Vec::new() };
let b = B { c: vec![5, 2, 3] };
a.b.push(b);
a.b.iter_mut().map(|b| b.c.sort());
}
Gives the warning:
warning: unused `std::iter::Map` that must be used
--> src/main.rs:16:5
|
16 | a.b.iter_mut().map(|b| b.c.sort());
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
= note: #[warn(unused_must_use)] on by default
= note: iterators are lazy and do nothing unless consumed
Which is true, sort() isn't actually called here. This warning is described in the book, but I don't understand why this variation with iter_mut() works fine:
a.b.iter_mut().find(|b| b == b).map(|b| b.c.sort());
As the book you linked to says:
If you are trying to execute a closure on an iterator for its side effects, use for instead.
That way it works, and it's much clearer to anyone reading the code. You should use map when you want to transform a vector to a different one.
I don't understand why this variation with iter_mut() works fine:
a.b.iter_mut().find(|b| b == b).map(|b| b.c.sort());
It works because find is not lazy; it's an iterator consumer. It returns an Option not an Iterator. This might be why it is confusing you, because Option also has a map method, which is what you are using here.
As others have said, map is intended for transforming data, without modifying it and without any other side-effects. If you really want to use map, you can map over the collection and assign it back:
fn main() {
let mut a = A { b: Vec::new() };
let mut b = B { c: vec![5, 2, 3] };
a.b.push(b);
a.b =
a.b.into_iter()
.map(|mut b| {
b.c.sort();
b
})
.collect();
}
Note that vector's sort method returns (), so you have to explicitly return the sorted vector from the mapping function.
I use for_each.
According to the doc:
It is equivalent to using a for loop on the iterator, although break and continue are not possible from a closure. It's generally more idiomatic to use a for loop, but for_each may be more legible when processing items at the end of longer iterator chains. In some cases for_each may also be faster than a loop, because it will use internal iteration on adaptors like Chain.

Resources