I'm trying to get use Rayon::prelude::into_par_iter to sum up a bunch of instances of a struct. I've implemented std::iter::Sum for the struct, but am still running into an error. Here is my example code;
use std::iter::Sum;
use rayon::prelude::*;
pub struct TestStruct {
val: f32,
}
impl TestStruct {
pub fn new(v: f32) -> Self {
Self { val: v }
}
}
impl<'a> Sum<&'a Self> for TestStruct {
fn sum<I>(iter: I) -> Self
where
I: Iterator<Item = &'a Self>,
{
iter.fold(Self { val: 0. }, |a, b| Self {
val: a.val + b.val,
})
}
}
fn main() {
let default_val: f32 = 1.1;
let sum: TestStruct = (0..5).into_par_iter().map(|&default_val| {
TestStruct::new(default_val)
})
.sum();
println!("{}", sum.val);
}
and I am told the trait 'std::iter::Sum' is not implemented for 'TestStruct', but I think I am implementing it. Am I missing something here?
Two small compounded issues. One is that the std::iter::Sum is defined as:
pub trait Sum<A = Self> {
fn sum<I>(iter: I) -> Self
where
I: Iterator<Item = A>;
}
The Self here refers to whatever the struct gets the implementation and the default generic parameter has already been filled in. You don't need to specify it any more if implementing Sum for struct itself.
The other is that the map closure parameter default_val, which is an integer, shadows the previous float one with the same name. Since TestStruct::new expects a f32, it leads to a type error.
This works:
impl Sum for TestStruct {
fn sum<I>(iter: I) -> Self
where
I: Iterator<Item = Self>,
{
...
}
}
fn main() {
let default_val: f32 = 1.1;
let sum: TestStruct = (0..5)
.into_par_iter()
.map(|_| TestStruct::new(default_val))
.sum();
println!("{}", sum.val);
}
Related
For example, this works:
pub struct SquareVecIter<'a> {
current: f64,
iter: core::slice::Iter<'a, f64>,
}
pub fn square_iter<'a>(vec: &'a Vec<f64>) -> SquareVecIter<'a> {
SquareVecIter {
current: 0.0,
iter: vec.iter(),
}
}
impl<'a> Iterator for SquareVecIter<'a> {
type Item = f64;
fn next(&mut self) -> Option<Self::Item> {
if let Some(next) = self.iter.next() {
self.current = next * next;
Some(self.current)
} else {
None
}
}
}
// switch to test module
#[cfg(test)]
mod tests_2 {
use super::*;
#[test]
fn test_square_vec() {
let vec = vec![1.0, 2.0];
let mut iter = square_iter(&vec);
assert_eq!(iter.next(), Some(1.0));
assert_eq!(iter.next(), Some(4.0));
assert_eq!(iter.next(), None);
}
}
https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=531edc40dcca4a79d11af3cbd29943b7
But if I have to return references to self.current then I can't get the lifetimes to work.
Yes.
An Iterator cannot yield elements that reference itself simply due to how the trait is designed (search "lending iterator" for more info). So even if you wanted to return Some(&self.current) and deal with the implications therein, you could not.
Returning an f64 (non-reference) is perfectly acceptable and would be expected because this is a kind of generative iterator. And you wouldn't need to store current at all:
pub struct SquareVecIter<'a> {
iter: core::slice::Iter<'a, f64>,
}
pub fn square_iter<'a>(vec: &'a Vec<f64>) -> SquareVecIter<'a> {
SquareVecIter {
iter: vec.iter(),
}
}
impl<'a> Iterator for SquareVecIter<'a> {
type Item = f64;
fn next(&mut self) -> Option<Self::Item> {
if let Some(next) = self.iter.next() {
Some(next * next)
} else {
None
}
}
}
For an example of this in the standard library, look at the Chars iterator for getting characters of a string. It keeps a reference to the original str, but it yields owned chars and not references.
What #kmdreko says, of course.
I just wanted to add a couple of nitpicks:
The pattern of the if let Some(...) = ... {Some} else {None} is so common that it made its way into the standard library as the .map function.
Taking a &Vec<f64> is an antipattern. Use &[f64] instead. It is more general without any drawbacks.
All of your lifetime annotations (except of the one in the struct definition) can be derived automatically, so you can simply omit them.
pub struct SquareVecIter<'a> {
iter: core::slice::Iter<'a, f64>,
}
pub fn square_iter(vec: &[f64]) -> SquareVecIter {
SquareVecIter { iter: vec.iter() }
}
impl Iterator for SquareVecIter<'_> {
type Item = f64;
fn next(&mut self) -> Option<Self::Item> {
self.iter.next().map(|next| next * next)
}
}
Rustaceans. when I start to write a BloomFilter example in rust. I found I have serveral problems have to solve. I struggle to solve them but no progress in a day. I need help, any suggestion will help me a lot, Thanks.
Problems
How to solve lifetime when pass a Iterator into another function?
// let bits = self.hash(value); // how to solve such lifetime error without use 'static storage?
// Below is a workaround code but need to computed in advanced.
let bits = Box::new(self.hash(value).collect::<Vec<u64>>().into_iter());
self.0.set(bits);
How to solve cyclic-dependency between struts without modify lower layer code, e.g: bloom_filter ?
// cyclic-dependency:
// RedisCache -> BloomFilter -> Storage
// | ^
// ------------<impl>------------
//
// v--- cache ownership has moved here
let filter = BloomFilter::by(Box::new(cache));
cache.1.replace(filter);
Since rust does not have null value, How can I solve the cyclic-dependency initialization without any stubs?
let mut cache = RedisCache(
Client::open("redis://localhost").unwrap(),
// I found can use Weak::new() to solve it,but need to downgrade a Rc reference.
// v-- need a BloomFilter stub to create RedisCache
RefCell::new(BloomFilter::new()),
);
Code
#![allow(unused)]
mod bloom_filter {
use std::{hash::Hash, marker::PhantomData};
pub type BitsIter = Box<dyn Iterator<Item = u64>>;
pub trait Storage {
fn set(&mut self, bits: BitsIter);
fn contains_all(&self, bits: BitsIter) -> bool;
}
pub struct BloomFilter<T: Hash>(Box<dyn Storage>, PhantomData<T>);
impl<T: Hash> BloomFilter<T> {
pub fn new() -> BloomFilter<T> {
return Self::by(Box::new(ArrayStorage([0; 5000])));
struct ArrayStorage<const N: usize>([u8; N]);
impl<const N: usize> Storage for ArrayStorage<N> {
fn set(&mut self, bits: BitsIter) {
let size = self.0.len() as u64;
bits.map(|bit| (bit % size) as usize)
.for_each(|index| self.0[index] = 1);
}
fn contains_all(&self, bits: BitsIter) -> bool {
let size = self.0.len() as u64;
bits.map(|bit| (bit % size) as usize)
.all(|index| self.0[index] == 1)
}
}
}
pub fn by(storage: Box<dyn Storage>) -> BloomFilter<T> {
BloomFilter(storage, PhantomData)
}
pub fn add(&mut self, value: T) {
// let bits = self.hash(value); // how to solve such lifetime error?
let bits = Box::new(self.hash(value).collect::<Vec<u64>>().into_iter());
self.0.set(bits);
}
pub fn contains(&self, value: T) -> bool {
// lifetime problem same as Self::add(T)
let bits = Box::new(self.hash(value).collect::<Vec<u64>>().into_iter());
self.0.contains_all(bits)
}
fn hash<'a, H: Hash + 'a>(&self, _value: H) -> Box<dyn Iterator<Item = u64> + 'a> {
todo!()
}
}
}
mod spi {
use super::bloom_filter::*;
use redis::{Client, Commands, RedisResult};
use std::{
cell::RefCell,
rc::{Rc, Weak},
};
pub struct RedisCache<'a>(Client, RefCell<BloomFilter<&'a str>>);
impl<'a> RedisCache<'a> {
pub fn new() -> RedisCache<'a> {
let mut cache = RedisCache(
Client::open("redis://localhost").unwrap(),
// v-- need a BloomFilter stub to create RedisCache
RefCell::new(BloomFilter::new()),
);
// v--- cache ownership has moved here
let filter = BloomFilter::by(Box::new(cache));
cache.1.replace(filter);
return cache;
}
pub fn get(&mut self, key: &str, load_value: fn() -> Option<String>) -> Option<String> {
let filter = self.1.borrow();
if filter.contains(key) {
if let Ok(value) = self.0.get::<&str, String>(key) {
return Some(value);
}
if let Some(actual_value) = load_value() {
let _: () = self.0.set(key, &actual_value).unwrap();
return Some(actual_value);
}
}
return None;
}
}
impl<'a> Storage for RedisCache<'a> {
fn set(&mut self, bits: BitsIter) {
todo!()
}
fn contains_all(&self, bits: BitsIter) -> bool {
todo!()
}
}
}
Updated
First, thanks #Colonel Thirty Two give me a lot of information that I haven't mastered and help me fixed the problem of the iterator lifetime.
The cyclic-dependency I have solved by break the responsibility of the Storage into another struct RedisStorage without modify the bloom_filter module, but make the example bloated. Below is their relationships:
RedisCache -> BloomFilter -> Storage <---------------
| |
|-------> redis::Client <- RedisStorage ---<impl>---
I realized the ownership & lifetime system is not only used by borrow checker, but also Rustaceans need a bigger front design to obey the rules than in a GC language, e.g: java. Am I right?
Final Code
mod bloom_filter {
use std::{
hash::{Hash, Hasher},
marker::PhantomData,
};
pub type BitsIter<'a> = Box<dyn Iterator<Item = u64> + 'a>;
pub trait Storage {
fn set(&mut self, bits: BitsIter);
fn contains_all(&self, bits: BitsIter) -> bool;
}
pub struct BloomFilter<T: Hash>(Box<dyn Storage>, PhantomData<T>);
impl<T: Hash> BloomFilter<T> {
#[allow(unused)]
pub fn new() -> BloomFilter<T> {
return Self::by(Box::new(ArrayStorage([0; 5000])));
struct ArrayStorage<const N: usize>([u8; N]);
impl<const N: usize> Storage for ArrayStorage<N> {
fn set(&mut self, bits: BitsIter) {
let size = self.0.len() as u64;
bits.map(|bit| (bit % size) as usize)
.for_each(|index| self.0[index] = 1);
}
fn contains_all(&self, bits: BitsIter) -> bool {
let size = self.0.len() as u64;
bits.map(|bit| (bit % size) as usize)
.all(|index| self.0[index] == 1)
}
}
}
pub fn by(storage: Box<dyn Storage>) -> BloomFilter<T> {
BloomFilter(storage, PhantomData)
}
pub fn add(&mut self, value: T) {
self.0.set(self.hash(value));
}
pub fn contains(&self, value: T) -> bool {
self.0.contains_all(self.hash(value))
}
fn hash<'a, H: Hash + 'a>(&self, value: H) -> BitsIter<'a> {
Box::new(
[3, 11, 31, 71, 131]
.into_iter()
.map(|salt| SimpleHasher(0, salt))
.map(move |mut hasher| hasher.hash(&value)),
)
}
}
struct SimpleHasher(u64, u64);
impl SimpleHasher {
fn hash<H: Hash>(&mut self, value: &H) -> u64 {
value.hash(self);
self.finish()
}
}
impl Hasher for SimpleHasher {
fn finish(&self) -> u64 {
self.0
}
fn write(&mut self, bytes: &[u8]) {
self.0 += bytes.iter().fold(0u64, |acc, k| acc * self.1 + *k as u64)
}
}
}
mod spi {
use super::bloom_filter::*;
use redis::{Client, Commands};
use std::{cell::RefCell, rc::Rc};
pub struct RedisCache<'a>(Rc<RefCell<Client>>, BloomFilter<&'a str>);
impl<'a> RedisCache<'a> {
pub fn new(client: Rc<RefCell<Client>>, filter: BloomFilter<&'a str>) -> RedisCache<'a> {
RedisCache(client, filter)
}
pub fn get<'f>(
&mut self,
key: &str,
load_value: fn() -> Option<&'f str>,
) -> Option<String> {
if self.1.contains(key) {
let mut redis = self.0.as_ref().borrow_mut();
if let Ok(value) = redis.get::<&str, String>(key) {
return Some(value);
}
if let Some(actual_value) = load_value() {
let _: () = redis.set(key, &actual_value).unwrap();
return Some(actual_value.into());
}
}
return None;
}
}
struct RedisStorage(Rc<RefCell<Client>>);
const BLOOM_FILTER_KEY: &str = "bloom_filter";
impl Storage for RedisStorage {
fn set(&mut self, bits: BitsIter) {
bits.for_each(|slot| {
let _: bool = self
.0
.as_ref()
.borrow_mut()
.setbit(BLOOM_FILTER_KEY, slot as usize, true)
.unwrap();
})
}
fn contains_all(&self, mut bits: BitsIter) -> bool {
bits.all(|slot| {
self.0
.as_ref()
.borrow_mut()
.getbit(BLOOM_FILTER_KEY, slot as usize)
.unwrap()
})
}
}
#[test]
fn prevent_cache_penetration_by_bloom_filter() {
let client = Rc::new(RefCell::new(Client::open("redis://localhost").unwrap()));
redis::cmd("FLUSHDB").execute(&mut *client.as_ref().borrow_mut());
let mut filter: BloomFilter<&str> = BloomFilter::by(Box::new(RedisStorage(client.clone())));
assert!(!filter.contains("Rust"));
filter.add("Rust");
assert!(filter.contains("Rust"));
let mut cache = RedisCache::new(client, filter);
assert_eq!(
cache.get("Rust", || Some("System Language")),
Some("System Language".to_string())
);
assert_eq!(
cache.get("Rust", || panic!("must never be called after cached")),
Some("System Language".to_string())
);
assert_eq!(
cache.get("Go", || panic!("reject to loading `Go` from external storage")),
None
);
}
}
pub type BitsIter = Box<dyn Iterator<Item = u64>>;
In this case, the object in the box must be valid for the 'static lifetime. This isn't the case for the iterator returned by hash - its limited to the lifetime of self.
Try replacing with:
pub type BitsIter<'a> = Box<dyn Iterator<Item = u64> + 'a>;
Or using generics instead of boxed trait objects.
So your RedisClient needs a BloomFilter, but the BloomFilter also needs the RedisClient?
Your BloomFilter should not use the RedisCache that itself uses the BloomFilter - that's a recipe for infinitely recursing calls (how do you know what calls to RedisCache::add should update the bloom filter and which calls are from the bloom filter?).
If you really have to, you need some form of shared ownership, like Rc or Arc. Your BloomFilter will also need to use a weak reference, or else the two objects will refer to each other and will never free.
I want to create a vector containing trait objects that have a val method that returns a comparable value:
trait Comparable {
fn val(&self) -> Box<dyn std::cmp::Ord>;
}
struct Int64Array {}
impl Comparable for Int64Array {
fn val(&self, i: usize) -> Box<dyn std::cmp::Ord> {
Box::new(i+1)
}
}
struct StringArray {}
impl Comparable for StringArray {
fn val(&self, i: usize) -> Box<dyn std::cmp::Ord> {
Box::new(format!("foo_{}", i))
}
}
fn main() {
// each row is guaranteed to have vector elements with the same type
let rows: Vec<Box<dyn Comparable>> = vec![
Box::new(Int64Array{}),
Box::new(StringArray{}),
];
println!("{}", rows[0].val(0).cmp(rows[0].val(1)));
println!("{}", rows[1].val(0).cmp(rows[1].val(1)));
}
However, because std::cmp::Ord is not a object-safe trait, this won't compile. What's the recommend way to workaround this limitation?
Thanks everyone for the quick replies.
Combing suggestions from #FrancisGagné and #loganfsmyth, here is a workaround that avoids using Ord trait directly:
trait Comparable {
fn compare(&self, i: usize, j: usize) -> std::cmp::Ordering;
}
struct Int64Array {}
impl Comparable for Int64Array {
fn compare(&self, _i: usize, _j: usize) -> std::cmp::Ordering {
std::cmp::Ordering::Equal
}
}
struct StringArray {}
impl Comparable for StringArray {
fn compare(&self, _i: usize, _j: usize) -> std::cmp::Ordering {
std::cmp::Ordering::Equal
}
}
fn main() {
// each row is guaranteed to have vector elements with the same type
let rows: Vec<Box<dyn Comparable>> = vec![
Box::new(Int64Array{}),
Box::new(StringArray{}),
];
println!("{:#?}", rows[0].compare(0, 1));
println!("{:#?}", rows[1].compare(0, 1));
}
I wonder if there is other ways to implement this feature.
I'm trying to implement a mean method for Iterator, like it is done with sum.
However, sum is Iterator method, so I decided to implement trait for any type that implements Iterator:
pub trait Mean<A = Self>: Sized {
fn mean<I: Iterator<Item = A>>(iter: I) -> f64;
}
impl Mean for u64 {
fn mean<I: Iterator<Item = u64>>(iter: I) -> f64 {
//use zip to start enumeration from 1, not 0
iter.zip((1..))
.fold(0., |s, (e, i)| (e as f64 + s * (i - 1) as f64) / i as f64)
}
}
impl<'a> Mean<&'a u64> for u64 {
fn mean<I: Iterator<Item = &'a u64>>(iter: I) -> f64 {
iter.zip((1..))
.fold(0., |s, (&e, i)| (e as f64 + s * (i - 1) as f64) / i as f64)
}
}
trait MeanIterator: Iterator {
fn mean(self) -> f64;
}
impl<T: Iterator> MeanIterator for T {
fn mean(self) -> f64 {
Mean::mean(self)
}
}
fn main() {
assert_eq!([1, 2, 3, 4, 5].iter().mean(), 3.);
}
Playground
The error:
error[E0282]: type annotations needed
--> src/main.rs:26:9
|
26 | Mean::mean(self)
| ^^^^^^^^^^ cannot infer type for `Self`
Is there any way to fix the code, or it is impossible in Rust?
like it is done with sum
Let's review how sum works:
pub fn sum<S>(self) -> S
where
S: Sum<Self::Item>,
sum is implemented on any iterator, so long as the result type S implements Sum for the iterated value. The caller gets to pick the result type. Sum is defined as:
pub trait Sum<A = Self> {
pub fn sum<I>(iter: I) -> Self
where
I: Iterator<Item = A>;
}
Sum::sum takes an iterator of A and produces a value of the type it is implemented from.
We can copy-paste the structure, changing Sum for Mean and put the straightforward implementations:
trait MeanExt: Iterator {
fn mean<M>(self) -> M
where
M: Mean<Self::Item>,
Self: Sized,
{
M::mean(self)
}
}
impl<I: Iterator> MeanExt for I {}
trait Mean<A = Self> {
fn mean<I>(iter: I) -> Self
where
I: Iterator<Item = A>;
}
impl Mean for f64 {
fn mean<I>(iter: I) -> Self
where
I: Iterator<Item = f64>,
{
let mut sum = 0.0;
let mut count: usize = 0;
for v in iter {
sum += v;
count += 1;
}
if count > 0 {
sum / (count as f64)
} else {
0.0
}
}
}
impl<'a> Mean<&'a f64> for f64 {
fn mean<I>(iter: I) -> Self
where
I: Iterator<Item = &'a f64>,
{
iter.copied().mean()
}
}
fn main() {
let mean: f64 = [1.0, 2.0, 3.0].iter().mean();
println!("{:?}", mean);
let mean: f64 = std::array::IntoIter::new([-1.0, 2.0, 1.0]).mean();
println!("{:?}", mean);
}
You can do it like this, for example:
pub trait Mean {
fn mean(self) -> f64;
}
impl<F, T> Mean for T
where T: Iterator<Item = F>,
F: std::borrow::Borrow<f64>
{
fn mean(self) -> f64 {
self.zip((1..))
.fold(0.,
|s, (e, i)| (*e.borrow() + s * (i - 1) as f64) / i as f64)
}
}
fn main() {
assert_eq!([1f64, 2f64, 3f64, 4f64, 5f64].iter().mean(), 3.);
assert_eq!(vec![1f64, 2f64, 3f64, 4f64, 5f64].iter().mean(), 3.);
assert_eq!(vec![1f64, 2f64, 3f64, 4f64, 5f64].into_iter().mean(), 3.);
}
I used Borrow trait to support iterators over f64 and &f64.
pub struct Storage<T>{
vec: Vec<T>
}
impl<T: Clone> Storage<T>{
pub fn new() -> Storage<T>{
Storage{vec: Vec::new()}
}
pub fn get<'r>(&'r self, h: &Handle<T>)-> &'r T{
let index = h.id;
&self.vec[index]
}
pub fn set(&mut self, h: &Handle<T>, t: T){
let index = h.id;
self.vec[index] = t;
}
pub fn create(&mut self, t: T) -> Handle<T>{
self.vec.push(t);
Handle{id: self.vec.len()-1}
}
}
struct Handle<T>{
id: uint
}
I am currently trying to create a handle system in Rust and I have some problems. The code above is a simple example of what I want to achieve.
The code works but has one weakness.
let mut s1 = Storage<uint>::new();
let mut s2 = Storage<uint>::new();
let handle1 = s1.create(5);
s1.get(handle1); // works
s2.get(handle1); // unsafe
I would like to associate a handle with a specific storage like this
//Pseudo code
struct Handle<T>{
id: uint,
storage: &Storage<T>
}
impl<T> Handle<T>{
pub fn get(&self) -> &T;
}
The problem is that Rust doesn't allow this. If I would do that and create a handle with the reference of a Storage I wouldn't be allowed to mutate the Storage anymore.
I could implement something similar with a channel but then I would have to clone T every time.
How would I express this in Rust?
The simplest way to model this is to use a phantom type parameter on Storage which acts as a unique ID, like so:
use std::kinds::marker;
pub struct Storage<Id, T> {
marker: marker::InvariantType<Id>,
vec: Vec<T>
}
impl<Id, T> Storage<Id, T> {
pub fn new() -> Storage<Id, T>{
Storage {
marker: marker::InvariantType,
vec: Vec::new()
}
}
pub fn get<'r>(&'r self, h: &Handle<Id, T>) -> &'r T {
let index = h.id;
&self.vec[index]
}
pub fn set(&mut self, h: &Handle<Id, T>, t: T) {
let index = h.id;
self.vec[index] = t;
}
pub fn create(&mut self, t: T) -> Handle<Id, T> {
self.vec.push(t);
Handle {
marker: marker::InvariantLifetime,
id: self.vec.len() - 1
}
}
}
pub struct Handle<Id, T> {
id: uint,
marker: marker::InvariantType<Id>
}
fn main() {
struct A; struct B;
let mut s1 = Storage::<A, uint>::new();
let s2 = Storage::<B, uint>::new();
let handle1 = s1.create(5);
s1.get(&handle1);
s2.get(&handle1); // won't compile, since A != B
}
This solves your problem in the simplest case, but has some downsides. Mainly, it depends on the use to define and use all of these different phantom types and to prove that they are unique. It doesn't prevent bad behavior on the user's part where they can use the same phantom type for multiple Storage instances. In today's Rust, however, this is the best we can do.
An alternative solution that doesn't work today for reasons I'll get in to later, but might work later, uses lifetimes as anonymous id types. This code uses the InvariantLifetime marker, which removes all sub typing relationships with other lifetimes for the lifetime it uses.
Here is the same system, rewritten to use InvariantLifetime instead of InvariantType:
use std::kinds::marker;
pub struct Storage<'id, T> {
marker: marker::InvariantLifetime<'id>,
vec: Vec<T>
}
impl<'id, T> Storage<'id, T> {
pub fn new() -> Storage<'id, T>{
Storage {
marker: marker::InvariantLifetime,
vec: Vec::new()
}
}
pub fn get<'r>(&'r self, h: &Handle<'id, T>) -> &'r T {
let index = h.id;
&self.vec[index]
}
pub fn set(&mut self, h: &Handle<'id, T>, t: T) {
let index = h.id;
self.vec[index] = t;
}
pub fn create(&mut self, t: T) -> Handle<'id, T> {
self.vec.push(t);
Handle {
marker: marker::InvariantLifetime,
id: self.vec.len() - 1
}
}
}
pub struct Handle<'id, T> {
id: uint,
marker: marker::InvariantLifetime<'id>
}
fn main() {
let mut s1 = Storage::<uint>::new();
let s2 = Storage::<uint>::new();
let handle1 = s1.create(5);
s1.get(&handle1);
// In theory this won't compile, since the lifetime of s2
// is *slightly* shorter than the lifetime of s1.
//
// However, this is not how the compiler works, and as of today
// s2 gets the same lifetime as s1 (since they can be borrowed for the same period)
// and this (unfortunately) compiles without error.
s2.get(&handle1);
}
In a hypothetical future, the assignment of lifetimes may change and we may grow a better mechanism for this sort of tagging. However, for now, the best way to accomplish this is with phantom types.