Why rust collect method works different on two almost identical functions - rust

I was doing rustlings to learn rust and I just finished iterators3, but I don't understand why funtions:
// Output: Ok([1, 11, 1426, 3])
fn result_with_list() -> Result<Vec<i32>, DivisionError> {
let numbers = vec![27, 297, 38502, 81];
let division_results = numbers.into_iter().map(|n| divide(n, 27));
let x: Result<Vec<i32>, DivisionError> = division_results.collect();
println!("{x:?}");
x
}
and
// Output: [Ok(1), Ok(11), Ok(1426), Ok(3)]
fn list_of_results() -> Vec<Result<i32, DivisionError>> {
let numbers = vec![27, 297, 38502, 81];
let division_results = numbers.into_iter().map(|n| divide(n, 27));
let x:Vec<Result<i32, DivisionError>> = division_results.collect();
println!("{x:?}");
x
}
I don't understand why they returns other values despite the fact that they are very similar
(P.S. that's how function divide looks like: pub fn divide(a: i32, b: i32) -> Result<i32, DivisionError>)
rustling iterators3 exercise

I don't understand why they returns other values despite the fact that they are almost identical
Because the part which is different is very relevant: one uses the implementation of FromIterator for Result:
Takes each element in the Iterator: if it is an Err, no further elements are taken, and the Err is returned. Should no Err occur, a container with the values of each Result is returned.
while the other uses the implementation of FromIterator for Vec, which just creates a vector from the iterator.
So the first version accumulates successful results, and returns the first failure if there is one, while the second version just collects all the results regardless of their success or failure.

Related

Can I convert an Iterator of Result to Result of Iterator?

Until now, I have used std::fs::read_to_string and then String.lines's std::str::Lines (which is an Iterator<Item = &str>) to read a file "line by line". This obviously reads the whole file into memory, which is not ideal.
So, there's BufRead.lines() to read a file truly line by line. This returns std::io::Lines (which is an Iterator<Item = Result<String>>).
How do I convert from one iterator type to the other without collecting first?
You cannot transform a Iterator<Item = Result<_, _>> into Result<Iterator<Item = _>, _> because if we haven't iterated the iterator yet we don't know whether we yield an error.
What you can do is to collect() all items ahead of time into a Result<Vec<_>, _> (which of course you can iterate over) since Result implements FromIterator.
If you're fine with getting Err only for the first Err (and successfully iterating over all items until that), you can also use itertools::process_results():
let result: Result<SomeType, _> = itertools::process_results(iter, |iter| -> SomeType {
// Here we have `iter` of type `Iterator<Item = _>`. Process it and return some result.
});
You can't there has to be an owner of the values which is the full String in case of String.lines.
You can however turn the Iterator<Item = Result<String> into an iterator over Strings:
let mut read = BufReader::new(File::open("src/main.rs").unwrap());
let lines_iter = read.lines().map(Result::unwrap_or_default);
You can take an Iterator over items of either String or &str like this:
fn solve<T: AsRef<str>>(input: impl Iterator<Item = T>) {
for line in input {
let line = line.as_ref();
// do something with line
}
}

Rust string comparison same speed as Python . Want to parallelize the program

I am new to rust. I want to write a function which later can be imported into Python as a module using the pyo3 crate.
Below is the Python implementation of the function I want to implement in Rust:
def pcompare(a, b):
letters = []
for i, letter in enumerate(a):
if letter != b[i]:
letters.append(f'{letter}{i + 1}{b[i]}')
return letters
The first Rust implemention I wrote looks like this:
use pyo3::prelude::*;
#[pyfunction]
fn compare_strings_to_vec(a: &str, b: &str) -> PyResult<Vec<String>> {
if a.len() != b.len() {
panic!(
"Reads are not the same length!
First string is length {} and second string is length {}.",
a.len(), b.len());
}
let a_vec: Vec<char> = a.chars().collect();
let b_vec: Vec<char> = b.chars().collect();
let mut mismatched_chars = Vec::new();
for (mut index,(i,j)) in a_vec.iter().zip(b_vec.iter()).enumerate() {
if i != j {
index += 1;
let mutation = format!("{i}{index}{j}");
mismatched_chars.push(mutation);
}
}
Ok(mismatched_chars)
}
#[pymodule]
fn compare_strings(_py: Python<'_>, m: &PyModule) -> PyResult<()> {
m.add_function(wrap_pyfunction!(compare_strings_to_vec, m)?)?;
Ok(())
}
Which I builded in --release mode. The module could be imported to Python, but the performance was quite similar to the performance of the Python implementation.
My first question is: Why is the Python and Rust function similar in speed?
Now I am working on a parallelization implementation in Rust. When just printing the result variable, the function works:
use rayon::prelude::*;
fn main() {
let a: Vec<char> = String::from("aaaa").chars().collect();
let b: Vec<char> = String::from("aaab").chars().collect();
let length = a.len();
let index: Vec<_> = (1..=length).collect();
let mut mismatched_chars: Vec<String> = Vec::new();
(a, index, b).into_par_iter().for_each(|(x, i, y)| {
if x != y {
let mutation = format!("{}{}{}", x, i, y).to_string();
println!("{mutation}");
//mismatched_chars.push(mutation);
}
});
}
However, when I try to push the mutation variable to the mismatched_charsvector:
use rayon::prelude::*;
fn main() {
let a: Vec<char> = String::from("aaaa").chars().collect();
let b: Vec<char> = String::from("aaab").chars().collect();
let length = a.len();
let index: Vec<_> = (1..=length).collect();
let mut mismatched_chars: Vec<String> = Vec::new();
(a, index, b).into_par_iter().for_each(|(x, i, y)| {
if x != y {
let mutation = format!("{}{}{}", x, i, y).to_string();
//println!("{mutation}");
mismatched_chars.push(mutation);
}
});
}
I get the following error:
error[E0596]: cannot borrow `mismatched_chars` as mutable, as it is a captured variable in a `Fn` closure
--> src/main.rs:16:13
|
16 | mismatched_chars.push(mutation);
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ cannot borrow as mutable
For more information about this error, try `rustc --explain E0596`.
error: could not compile `testing_compare_strings` due to previous error
I tried A LOT of different things. When I do:
use rayon::prelude::*;
fn main() {
let a: Vec<char> = String::from("aaaa").chars().collect();
let b: Vec<char> = String::from("aaab").chars().collect();
let length = a.len();
let index: Vec<_> = (1..=length).collect();
let mut mismatched_chars: Vec<&str> = Vec::new();
(a, index, b).into_par_iter().for_each(|(x, i, y)| {
if x != y {
let mutation = format!("{}{}{}", x, i, y).to_string();
mismatched_chars.push(&mutation);
}
});
}
The error becomes:
error[E0596]: cannot borrow `mismatched_chars` as mutable, as it is a captured variable in a `Fn` closure
--> src/main.rs:16:13
|
16 | mismatched_chars.push(&mutation);
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ cannot borrow as mutable
error[E0597]: `mutation` does not live long enough
--> src/main.rs:16:35
|
10 | let mut mismatched_chars: Vec<&str> = Vec::new();
| -------------------- lifetime `'1` appears in the type of `mismatched_chars`
...
16 | mismatched_chars.push(&mutation);
| ----------------------^^^^^^^^^-
| | |
| | borrowed value does not live long enough
| argument requires that `mutation` is borrowed for `'1`
17 | }
| - `mutation` dropped here while still borrowed
I suspect that the solution is quite simple, but I cannot see it myself.
You have the right idea with what you are doing, but you will want to try to use an iterator chain with filter and map to remove or convert iterator items into different values. Rayon also provides a collect method similar to regular iterators to convert items into a type T: FromIterator (such as Vec<T>).
fn compare_strings_to_vec(a: &str, b: &str) -> Vec<String> {
// Same as with the if statement, but just a little shorter to write
// Plus, it will print out the two values it is comparing if it errors.
assert_eq!(a.len(), b.len(), "Reads are not the same length!");
// Zip the character iterators from a and b together
a.chars().zip(b.chars())
// Iterate with the index of each item
.enumerate()
// Rayon function which turns a regular iterator into a parallel one
.par_bridge()
// Filter out values where the characters are the same
.filter(|(_, (a, b))| a != b)
// Convert the remaining values into an error string
.map(|(index, (a, b))| {
format!("{}{}{}", a, index + 1, b)
})
// Turn the items of this iterator into a Vec (Or any other FromIterator type).
.collect()
}
Rust Playground
Optimizing for speed
On the other hand, if you want speed we need to approach this problem from a different direction. You may have noticed, but the rayon version is quite slow since the cost of spawning a thread and using concurrency structures is orders of magnitude more than just simply comparing the bytes in the original thread. In my benchmarks, I found that even with better workload distribution, additional threads were only helpful on my machine (64GB RAM, 16 cores) when the strings were at least 1-2 million bytes long. Given that you have stated they are typically ~30,000 bytes long I think using rayon (or really any other threading for comparisons of this size) will only slow down your code.
Using criterion for benchmarking, I eventually came to this implementation. It generally gets about 2.8156 µs per run on strings of 30,000 characters with 10 different bytes. For comparison, the code posted in the original question usually gets around 61.156 µs on my system under the same conditions so this should give a ~20x speedup. It can vary a bit, but it consistently got the best results in the benchmark. I'm guessing this should be fast enough to have this step no-longer be the bottleneck in your code.
This key focus of this implementation is to do the comparisons in batches. We can take advantage of the 128bit registers on most CPUs to compare the input in 16 byte batches. Upon an inequality being found, the 16 byte section it covers is re-scanned for the exact position of the discrepancy. This gives a decent boost to performance. I initially thought that a usize would work better, but it seems that was not the case. I also attempted to use the portable_simd nightly feature to write a simd version of this code, but I was unable to match the speed of this code. I suspect this was either due to missed optimizations or a lack of experience to effectively use simd on my part.
I was worried about drops in speed due to alignment of chunks not being enforced for u128 values, but it seems to mostly be a non-issue. First of all, it is generally quite difficult to find allocators which are willing to allocate to an address which is not a multiple of the system word size. Of course, this is due to practicality rather than any actual requirement. When I manually gave it unaligned slices (unaligned for u128s), it is not significantly effected. This is why I do not attempt to enforce that the start index of the slice be aligned to align_of::<u128>().
fn compare_strings_to_vec(a: &str, b: &str) -> Vec<String> {
let a_bytes = a.as_bytes();
let b_bytes = b.as_bytes();
let remainder = a_bytes.len() % size_of::<u128>();
// Strongly suggest to the compiler we are iterating though u128
a_bytes
.chunks_exact(size_of::<u128>())
.zip(b_bytes.chunks_exact(size_of::<u128>()))
.enumerate()
.filter(|(_, (a, b))| {
let a_block: &[u8; 16] = (*a).try_into().unwrap();
let b_block: &[u8; 16] = (*b).try_into().unwrap();
u128::from_ne_bytes(*a_block) != u128::from_ne_bytes(*b_block)
})
.flat_map(|(word_index, (a, b))| {
fast_path(a, b).map(move |x| word_index * size_of::<u128>() + x)
})
.chain(
fast_path(
&a_bytes[a_bytes.len() - remainder..],
&b_bytes[b_bytes.len() - remainder..],
)
.map(|x| a_bytes.len() - remainder + x),
)
.map(|index| {
format!(
"{}{}{}",
char::from(a_bytes[index]),
index + 1,
char::from(b_bytes[index])
)
})
.collect()
}
/// Very similar to regular route, but with nothing fancy, just get the indices of the overlays
#[inline(always)]
fn fast_path<'a>(a: &'a [u8], b: &'a [u8]) -> impl 'a + Iterator<Item = usize> {
a.iter()
.zip(b.iter())
.enumerate()
.filter_map(|(x, (a, b))| (a != b).then_some(x))
}
You cannot directly access the field mismatched_chars in a multithreading environment.
You can use Arc<RwLock> to access the field in multithreading.
use rayon::prelude::*;
use std::sync::{Arc, RwLock};
fn main() {
let a: Vec<char> = String::from("aaaa").chars().collect();
let b: Vec<char> = String::from("aaab").chars().collect();
let length = a.len();
let index: Vec<_> = (1..=length).collect();
let mismatched_chars: Arc<RwLock<Vec<String>>> = Arc::new(RwLock::new(Vec::new()));
(a, index, b).into_par_iter().for_each(|(x, i, y)| {
if x != y {
let mutation = format!("{}{}{}", x, i, y);
mismatched_chars
.write()
.expect("could not acquire write lock")
.push(mutation);
}
});
for mismatch in mismatched_chars
.read()
.expect("could not acquire read lock")
.iter()
{
eprintln!("{}", mismatch);
}
}

How does one get an iterator to the max value element in Rust?

I want to access the element next to the maximal one in a Vec<i32>. I'm looking for something like this:
let v = vec![1, 3, 2];
let it = v.iter().max_element();
assert_eq!(Some(&2), it.next());
In C++, I would go with std::max_element and then just increase the iterator (with or without bounds checking, depending on how adventurous I feel at the moment). The Rust max only returns a reference to the element, which is not good enough for my use case.
The only solution I came up with is using enumerate to get the index of the item - but this seems manual and cumbersome when compared to the C++ way.
I would prefer something in the standard library.
This example is simplified - I actually want to attach to the highest value and then from that point loop over the whole container (possibly with cycle() or something similar).
C++ iterators are not the same as Rust iterators. Rust iterators are forward-only and can only be traversed once. C++ iterators can be thought of as cursors. See What are the main differences between a Rust Iterator and C++ Iterator? for more details.
In order to accomplish your goal in the most generic way possible, you have to walk through the entire iterator to find the maximum value. Along the way, you have to duplicate the iterator each time you find a new maximum value. At the end, you can return the iterator corresponding to the point after the maximum value.
trait MaxElement {
type Iter;
fn max_element(self) -> Self::Iter;
}
impl<I> MaxElement for I
where
I: Iterator + Clone,
I::Item: PartialOrd,
{
type Iter = Self;
fn max_element(mut self) -> Self::Iter {
let mut max_iter = self.clone();
let mut max_val = None;
while let Some(val) = self.next() {
if max_val.as_ref().map_or(true, |m| &val > m) {
max_iter = self.clone();
max_val = Some(val);
}
}
max_iter
}
}
fn main() {
let v = vec![1, 3, 2];
let mut it = v.iter().max_element();
assert_eq!(Some(&2), it.next());
}
See also:
How can I add new methods to Iterator?
I actually want to attach to the highest value and then from that point loop over the whole container (possibly with cycle() or something similar).
In that case, I'd attempt to be more obvious:
fn index_of_max(values: &[i32]) -> Option<usize> {
values
.iter()
.enumerate()
.max_by_key(|(_idx, &val)| val)
.map(|(idx, _val)| idx)
}
fn main() {
let v = vec![1, 3, 2];
let idx = index_of_max(&v).unwrap_or(0);
let (a, b) = v.split_at(idx);
let mut it = b.iter().chain(a).skip(1);
assert_eq!(Some(&2), it.next());
}
See also:
What's the fastest way of finding the index of the maximum value in an array?
Using max_by_key on a vector of floats
What is the idiomatic way to get the index of a maximum or minimum floating point value in a slice or Vec in Rust?
Find the item in an array with the largest property
a simple solution is to use fold,
the following code produces "largest num is: 99"
let vv:Vec<i32> = (1..100).collect();
let largest = vv.iter().fold(std::i32::MIN, |a,b| a.max(*b));
println!("largest {} ", largest);
If all you want is the value of the item following the maximum, I would do it with a simple call to fold, keeping track of the max found so far and the corresponding next value:
fn main() {
let v = vec![1, 3, 2];
let nxt = v.iter().fold (
(None, None),
|acc, x| {
match acc {
(Some (max), _) if x > max => (Some (x), None),
(Some (max), None) => (Some (max), Some (x)),
(None, _) => (Some (x), None),
_ => acc
}
}
).1;
assert_eq!(Some(&2), nxt);
}
playground
Depending on what you want to do with the items following the max, a similar approach may allow you to do it in a single pass.

How to translate "x-y" to vec![x, x+1, … y-1, y]?

This solution seems rather inelegant:
fn parse_range(&self, string_value: &str) -> Vec<u8> {
let values: Vec<u8> = string_value
.splitn(2, "-")
.map(|part| part.parse().ok().unwrap())
.collect();
{ values[0]..(values[1] + 1) }.collect()
}
Since splitn(2, "-") returns exactly two results for any valid string_value, it would be better to assign the tuple directly to two variables first and last rather than a seemingly arbitrary-length Vec. I can't seem to do this with a tuple.
There are two instances of collect(), and I wonder if it can be reduced to one (or even zero).
Trivial implementation
fn parse_range(string_value: &str) -> Vec<u8> {
let pos = string_value.find(|c| c == '-').expect("No valid string");
let (first, second) = string_value.split_at(pos);
let first: u8 = first.parse().expect("Not a number");
let second: u8 = second[1..].parse().expect("Not a number");
{ first..second + 1 }.collect()
}
Playground
I would recommend returning a Result<Vec<u8>, Error> instead of panicking with expect/unwrap.
Nightly implementation
My next thought was about the second collect. Here is a code example which uses nightly code, but you won't need any collect at all.
#![feature(conservative_impl_trait, inclusive_range_syntax)]
fn parse_range(string_value: &str) -> impl Iterator<Item = u8> {
let pos = string_value.find(|c| c == '-').expect("No valid string");
let (first, second) = string_value.split_at(pos);
let first: u8 = first.parse().expect("Not a number");
let second: u8 = second[1..].parse().expect("Not a number");
first..=second
}
fn main() {
println!("{:?}", parse_range("3-7").collect::<Vec<u8>>());
}
Instead of calling collect the first time, just advance the iterator:
let mut values = string_value
.splitn(2, "-")
.map(|part| part.parse().unwrap());
let start = values.next().unwrap();
let end = values.next().unwrap();
Do not call .ok().unwrap() — that converts the Result with useful error information to an Option, which has no information. Just call unwrap directly on the Result.
As already mentioned, if you want to return a Vec, you'll want to call collect to create it. If you want to return an iterator, you can. It's not bad even in stable Rust:
fn parse_range(string_value: &str) -> std::ops::Range<u8> {
let mut values = string_value
.splitn(2, "-")
.map(|part| part.parse().unwrap());
let start = values.next().unwrap();
let end = values.next().unwrap();
start..end + 1
}
fn main() {
assert!(parse_range("1-5").eq(1..6));
}
Sadly, inclusive ranges are not yet stable, so you'll need to continue to use +1 or switch to nightly.
Since splitn(2, "-") returns exactly two results for any valid string_value, it would be better to assign the tuple directly to two variables first and last rather than a seemingly arbitrary-length Vec. I can't seem to do this with a tuple.
This is not possible with Rust's type system. You are asking for dependent types, a way for runtime values to interact with the type system. You'd want splitn to return a (&str, &str) for a value of 2 and a (&str, &str, &str) for a value of 3. That gets even more complicated when the argument is a variable, especially when it's set at run time.
The closest workaround would be to have a runtime check that there are no more values:
assert!(values.next().is_none());
Such a check doesn't feel valuable to me.
See also:
What is the correct way to return an Iterator (or any other trait)?
How do I include the end value in a range?

What type signature to use for an iterator generated from a slice?

I have this toy example, but it's what I'm trying to accomplish:
fn lazy_vec() {
let vec: Vec<i64> = vec![1, 2, 3, 4, 5];
let mut iter: Box<Iterator<Item = i64>> = Box::new(vec.into_iter());
iter = Box::new(iter.map(|x| x + 1));
// potentially do additional similar transformations to iter
println!("{:?}", iter.collect::<Vec<_>>());
}
This (if I'm not mistaken) is a lazy iterator pattern, and the actual map operation doesn't occur until .collect() is called. I want to do the same thing with slices:
fn lazy_slice() {
let vec: Vec<i64> = vec![1, 2, 3, 4, 5];
let slice: &[i64] = &vec[..3];
let mut iter: Box<Iterator<Item = i64>> = Box::new(slice.into_iter());
iter = Box::new(iter.map(|x| x + 1));
// potentially do additional similar transformations to iter
println!("{:?}", iter.collect::<Vec<_>>());
}
This results in a type mismatch:
error[E0271]: type mismatch resolving `<std::slice::Iter<'_, i64> as std::iter::Iterator>::Item == i64`
--> src/main.rs:4:47
|
4 | let mut iter: Box<Iterator<Item = i64>> = Box::new(slice.into_iter());
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^ expected reference, found i64
|
= note: expected type `&i64`
found type `i64`
= note: required for the cast to the object type `std::iter::Iterator<Item=i64>`
I can't figure out what I need to do to resolve this error. The second note made me think I needed:
iter = Box::new(iter.map(|x| x + 1) as Iterator<Item = i64>);
or
iter = Box::new(iter.map(|x| x + 1)) as Box<Iterator<Item = i64>>;
These fail with other errors depending on the exact syntax (e.g. expected reference, found i64, or expected i64, found &i64). I've tried other ways to declare the types involved, but I'm basically just blindly adding & and * in places and not making any progress.
What am I missing here? What do I need to change in order to make this compile?
Edit
Here's a slightly more concrete example - I need iter to be mut so that I can compose an unknown number of such transformations before actually invoking .collect(). My impression was this was a somewhat common pattern, apologies if that wasn't correct.
fn lazy_vec(n: i64) {
let vec: Vec<i64> = vec![1, 2, 3, 4, 5];
let mut iter: Box<Iterator<Item = i64>> = Box::new(vec.into_iter());
for _ in 0..n {
iter = Box::new(iter.map(|x| x + 1));
}
println!("{:?}", iter.collect::<Vec<_>>());
}
I'm aware I could rewrite this specific task in a simpler way (e.g. a single map that adds n to each element) - it's an oversimplified MCVE of the problem I'm running into. My issue is this works for lazy_vec, but I'm not sure how to do the same with slices.
Edit 2
I'm just learning Rust and some of the nomenclature and concepts are new to me. Here's what I'm envisioning doing in Python, for comparison. My intent is to do the same thing with slices that I can currently do with vectors.
#!/usr/bin/env python3
import itertools
ls = [i for i in range(10)]
def lazy_work(input):
for i in range(10):
input = (i + 1 for i in input)
# at this point no actual work has been done
return input
print("From list: %s" % list(lazy_work(ls)))
print("From slice: %s" % list(lazy_work(itertools.islice(ls, 5))))
Obviously in Python there's no issues with typing, but hopefully that more clearly demonstrates my intent?
As discussed in What is the difference between iter and into_iter?, these methods create iterators which yield different types when called on a Vec compared to a slice.
[T]::iter and [T]::into_iter both return an iterator which yields values of type &T. That means that the returned value doesn't implement Iterator<Item = i64> but instead Iterator<Item = &i64>, as the error message states.
However, your subsequent map statements change the type of the iterator's item to an i64, which means the type of the iterator would also need to change. As an analogy, you've essentially attempted this:
let mut a: &i64 = &42;
a = 99;
Iterator::cloned exists to make clones of the iterated value. In this case, it converts a &i64 to an i64 essentially dereferencing the value:
fn lazy_slice(n: i64) {
let array = [1i64, 2, 3, 4, 5];
let mut iter: Box<Iterator<Item = i64>> = Box::new(array.iter().cloned());
for _ in 0..n {
iter = Box::new(iter.map(|x| x + 1));
}
println!("{:?}", iter.collect::<Vec<_>>());
}

Resources