I wish to create an IndexedParallelIterator<Item = [I::Item; N]> from an array [I; N] where I: IndexedParallelIterator. Let's call it ParConstZip<I, N>.
As I understand, there's a couple of steps here:
Create the corresponding non-parallel ConstZip<I, N> iterator (which must implement ExactSizeIterator and DoubleEndedIterator).
Create a Producer to split the input and create the ConstZip<I, N> iterators.
Implement ParallelIterator and IndexedParallelIterator for ParConstZip<I, N> using the Producer.
I think I'm alright on points 1. and 2. Specifically, here's the ConstZip<I, N> implementation:
pub struct ConstZip<I, const N: usize>([I; N]);
impl<I, const N: usize> Iterator for ConstZip<I, N>
where
I: Iterator,
{
type Item = [I::Item; N];
fn next(&mut self) -> Option<Self::Item> {
let mut dst = MaybeUninit::uninit_array();
for (i, iter) in self.0.iter_mut().enumerate() {
dst[i] = MaybeUninit::new(iter.next()?);
}
// SAFETY: If we reach this point, `dst` has been fully initialized
unsafe { Some(MaybeUninit::array_assume_init(dst)) }
}
}
impl<I, const N: usize> ExactSizeIterator for ConstZip<I, N>
where
I: ExactSizeIterator,
{
fn len(&self) -> usize {
self.0.iter().map(|x| x.len()).min().unwrap()
}
}
impl<I, const N: usize> DoubleEndedIterator for ConstZip<I, N>
where
I: DoubleEndedIterator,
{
fn next_back(&mut self) -> Option<Self::Item> {
let mut dst = MaybeUninit::uninit_array();
for (i, iter) in self.0.iter_mut().enumerate() {
dst[i] = MaybeUninit::new(iter.next_back()?);
}
// SAFETY: If we reach this point, `dst` has been fully initialized
unsafe { Some(MaybeUninit::array_assume_init(dst)) }
}
}
And here's what I believe to be an appropriate producer:
pub struct ParConstZipProducer<P, const N: usize>([P; N]);
impl<P, const N: usize> Producer for ParConstZipProducer<P, N>
where
P: Producer,
{
type Item = [P::Item; N];
type IntoIter = ConstZip<P::IntoIter, N>;
fn into_iter(self) -> Self::IntoIter {
ConstZip(self.0.map(Producer::into_iter))
}
fn split_at(self, index: usize) -> (Self, Self) {
let mut left_array = MaybeUninit::uninit_array();
let mut right_array = MaybeUninit::uninit_array();
for (i, producer) in self.0.into_iter().enumerate() {
let (left, right) = producer.split_at(index);
left_array[i] = MaybeUninit::new(left);
right_array[i] = MaybeUninit::new(right);
}
// SAFETY: Arrays are guaranteed to be fully initialised at length `N`
let left_array = unsafe { MaybeUninit::array_assume_init(left_array) };
let right_array = unsafe { MaybeUninit::array_assume_init(right_array) };
(
ParConstZipProducer(left_array),
ParConstZipProducer(right_array),
)
}
}
However, I stumble when it comes to the actual implementation of IndexedParallelIterator. Most of it seems to be boilerplate, but the with_producer method I cannot figure out how to implement:
pub struct ParConstZip<I, const N: usize>([I; N]);
impl<I, const N: usize> ParallelIterator for ParConstZip<I, N>
where
I: IndexedParallelIterator,
{
type Item = [I::Item; N];
fn drive_unindexed<C>(self, consumer: C) -> C::Result
where
C: UnindexedConsumer<Self::Item>,
{
bridge(self, consumer)
}
}
impl<I, const N: usize> IndexedParallelIterator for ParConstZip<I, N>
where
I: IndexedParallelIterator,
{
fn drive<C>(self, consumer: C) -> C::Result
where
C: Consumer<Self::Item>,
{
bridge(self, consumer)
}
fn len(&self) -> usize {
self.0.iter().map(|x| x.len()).min().unwrap()
}
fn with_producer<CB>(self, callback: CB) -> CB::Output
where
CB: ProducerCallback<Self::Item>,
{
todo!()
}
}
Playground link.
I needed the same thing about a year ago, and asked a question on the users.rust-lang forum. I got some helpful high-level pointers from one of the authors of rayon, but after reading the rayon plumbing README a number of times both then and now, I have to admit I'm still a bit lost.
Related
For example, this works:
pub struct SquareVecIter<'a> {
current: f64,
iter: core::slice::Iter<'a, f64>,
}
pub fn square_iter<'a>(vec: &'a Vec<f64>) -> SquareVecIter<'a> {
SquareVecIter {
current: 0.0,
iter: vec.iter(),
}
}
impl<'a> Iterator for SquareVecIter<'a> {
type Item = f64;
fn next(&mut self) -> Option<Self::Item> {
if let Some(next) = self.iter.next() {
self.current = next * next;
Some(self.current)
} else {
None
}
}
}
// switch to test module
#[cfg(test)]
mod tests_2 {
use super::*;
#[test]
fn test_square_vec() {
let vec = vec![1.0, 2.0];
let mut iter = square_iter(&vec);
assert_eq!(iter.next(), Some(1.0));
assert_eq!(iter.next(), Some(4.0));
assert_eq!(iter.next(), None);
}
}
https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=531edc40dcca4a79d11af3cbd29943b7
But if I have to return references to self.current then I can't get the lifetimes to work.
Yes.
An Iterator cannot yield elements that reference itself simply due to how the trait is designed (search "lending iterator" for more info). So even if you wanted to return Some(&self.current) and deal with the implications therein, you could not.
Returning an f64 (non-reference) is perfectly acceptable and would be expected because this is a kind of generative iterator. And you wouldn't need to store current at all:
pub struct SquareVecIter<'a> {
iter: core::slice::Iter<'a, f64>,
}
pub fn square_iter<'a>(vec: &'a Vec<f64>) -> SquareVecIter<'a> {
SquareVecIter {
iter: vec.iter(),
}
}
impl<'a> Iterator for SquareVecIter<'a> {
type Item = f64;
fn next(&mut self) -> Option<Self::Item> {
if let Some(next) = self.iter.next() {
Some(next * next)
} else {
None
}
}
}
For an example of this in the standard library, look at the Chars iterator for getting characters of a string. It keeps a reference to the original str, but it yields owned chars and not references.
What #kmdreko says, of course.
I just wanted to add a couple of nitpicks:
The pattern of the if let Some(...) = ... {Some} else {None} is so common that it made its way into the standard library as the .map function.
Taking a &Vec<f64> is an antipattern. Use &[f64] instead. It is more general without any drawbacks.
All of your lifetime annotations (except of the one in the struct definition) can be derived automatically, so you can simply omit them.
pub struct SquareVecIter<'a> {
iter: core::slice::Iter<'a, f64>,
}
pub fn square_iter(vec: &[f64]) -> SquareVecIter {
SquareVecIter { iter: vec.iter() }
}
impl Iterator for SquareVecIter<'_> {
type Item = f64;
fn next(&mut self) -> Option<Self::Item> {
self.iter.next().map(|next| next * next)
}
}
I'm working on a Cartesian power implementation for iterators. I'm running into a snag where I don't seem able to store a vector of peekable copies of the iterator I'm passed. No matter how much I finagle with boxes and pointers, it doesn't work due to Vec<Peekable<dyn Iterator<Item = T>>> not having a known size at compile time.
Any ideas for how to make this size be known at compile time? I really just need to be store a pointer to the vector, right? There's no reason it can't be created on the heap, is there?
Here's what I have so far (ignore the next() implementation, that was just to test to see if I could store the iterator and correctly use it's next function):
mod cartesian_power {
use core::iter::Peekable;
pub struct CartesianPower<T> {
prototype: Box<dyn Iterator<Item = T>>,
iters: Vec<Peekable<dyn Iterator<Item = T>>>,
}
impl<T> CartesianPower<T> {
pub fn new<I>(vals: I, power: usize) -> CartesianPower<T>
where
I: IntoIterator<Item = T>,
I: Clone,
<I as IntoIterator>::IntoIter: 'static,
{
let iters = Vec::with_capacity(power);
for _ in 0..power {
iters.push(vals.clone().into_iter().peekable());
}
Self {
prototype: Box::new(vals.into_iter()),
iters: iters,
}
}
}
impl<T> Iterator for CartesianPower<T> {
type Item = T;
fn next(&mut self) -> Option<T> {
self.prototype.next()
}
}
}
You cannot store a Vec<dyn Trait> (or Vec<Peekable<dyn Trait>>) directly, you have to use an indirection as each element can have a different size: Vec<Box<Peekable<dyn Iterator<Item = T>>>>.
However, since all iterators in your case are the same, you can use generics and avoid the performance hit of Box and dynamic dispatch:
pub struct CartesianPower<I: Iterator> {
prototype: I,
iters: Vec<Peekable<I>>,
}
impl<Iter: Iterator> CartesianPower<Iter> {
pub fn new<I>(vals: I, power: usize) -> CartesianPower<Iter>
where
I: IntoIterator<IntoIter = Iter>,
I: Clone,
<I as IntoIterator>::IntoIter: 'static,
{
let mut iters = Vec::with_capacity(power);
for _ in 0..power {
iters.push(vals.clone().into_iter().peekable());
}
Self {
prototype: vals.into_iter(),
iters: iters,
}
}
}
impl<I: Iterator> Iterator for CartesianPower<I> {
type Item = I::Item;
fn next(&mut self) -> Option<I::Item> {
self.prototype.next()
}
}
Rustaceans. when I start to write a BloomFilter example in rust. I found I have serveral problems have to solve. I struggle to solve them but no progress in a day. I need help, any suggestion will help me a lot, Thanks.
Problems
How to solve lifetime when pass a Iterator into another function?
// let bits = self.hash(value); // how to solve such lifetime error without use 'static storage?
// Below is a workaround code but need to computed in advanced.
let bits = Box::new(self.hash(value).collect::<Vec<u64>>().into_iter());
self.0.set(bits);
How to solve cyclic-dependency between struts without modify lower layer code, e.g: bloom_filter ?
// cyclic-dependency:
// RedisCache -> BloomFilter -> Storage
// | ^
// ------------<impl>------------
//
// v--- cache ownership has moved here
let filter = BloomFilter::by(Box::new(cache));
cache.1.replace(filter);
Since rust does not have null value, How can I solve the cyclic-dependency initialization without any stubs?
let mut cache = RedisCache(
Client::open("redis://localhost").unwrap(),
// I found can use Weak::new() to solve it,but need to downgrade a Rc reference.
// v-- need a BloomFilter stub to create RedisCache
RefCell::new(BloomFilter::new()),
);
Code
#![allow(unused)]
mod bloom_filter {
use std::{hash::Hash, marker::PhantomData};
pub type BitsIter = Box<dyn Iterator<Item = u64>>;
pub trait Storage {
fn set(&mut self, bits: BitsIter);
fn contains_all(&self, bits: BitsIter) -> bool;
}
pub struct BloomFilter<T: Hash>(Box<dyn Storage>, PhantomData<T>);
impl<T: Hash> BloomFilter<T> {
pub fn new() -> BloomFilter<T> {
return Self::by(Box::new(ArrayStorage([0; 5000])));
struct ArrayStorage<const N: usize>([u8; N]);
impl<const N: usize> Storage for ArrayStorage<N> {
fn set(&mut self, bits: BitsIter) {
let size = self.0.len() as u64;
bits.map(|bit| (bit % size) as usize)
.for_each(|index| self.0[index] = 1);
}
fn contains_all(&self, bits: BitsIter) -> bool {
let size = self.0.len() as u64;
bits.map(|bit| (bit % size) as usize)
.all(|index| self.0[index] == 1)
}
}
}
pub fn by(storage: Box<dyn Storage>) -> BloomFilter<T> {
BloomFilter(storage, PhantomData)
}
pub fn add(&mut self, value: T) {
// let bits = self.hash(value); // how to solve such lifetime error?
let bits = Box::new(self.hash(value).collect::<Vec<u64>>().into_iter());
self.0.set(bits);
}
pub fn contains(&self, value: T) -> bool {
// lifetime problem same as Self::add(T)
let bits = Box::new(self.hash(value).collect::<Vec<u64>>().into_iter());
self.0.contains_all(bits)
}
fn hash<'a, H: Hash + 'a>(&self, _value: H) -> Box<dyn Iterator<Item = u64> + 'a> {
todo!()
}
}
}
mod spi {
use super::bloom_filter::*;
use redis::{Client, Commands, RedisResult};
use std::{
cell::RefCell,
rc::{Rc, Weak},
};
pub struct RedisCache<'a>(Client, RefCell<BloomFilter<&'a str>>);
impl<'a> RedisCache<'a> {
pub fn new() -> RedisCache<'a> {
let mut cache = RedisCache(
Client::open("redis://localhost").unwrap(),
// v-- need a BloomFilter stub to create RedisCache
RefCell::new(BloomFilter::new()),
);
// v--- cache ownership has moved here
let filter = BloomFilter::by(Box::new(cache));
cache.1.replace(filter);
return cache;
}
pub fn get(&mut self, key: &str, load_value: fn() -> Option<String>) -> Option<String> {
let filter = self.1.borrow();
if filter.contains(key) {
if let Ok(value) = self.0.get::<&str, String>(key) {
return Some(value);
}
if let Some(actual_value) = load_value() {
let _: () = self.0.set(key, &actual_value).unwrap();
return Some(actual_value);
}
}
return None;
}
}
impl<'a> Storage for RedisCache<'a> {
fn set(&mut self, bits: BitsIter) {
todo!()
}
fn contains_all(&self, bits: BitsIter) -> bool {
todo!()
}
}
}
Updated
First, thanks #Colonel Thirty Two give me a lot of information that I haven't mastered and help me fixed the problem of the iterator lifetime.
The cyclic-dependency I have solved by break the responsibility of the Storage into another struct RedisStorage without modify the bloom_filter module, but make the example bloated. Below is their relationships:
RedisCache -> BloomFilter -> Storage <---------------
| |
|-------> redis::Client <- RedisStorage ---<impl>---
I realized the ownership & lifetime system is not only used by borrow checker, but also Rustaceans need a bigger front design to obey the rules than in a GC language, e.g: java. Am I right?
Final Code
mod bloom_filter {
use std::{
hash::{Hash, Hasher},
marker::PhantomData,
};
pub type BitsIter<'a> = Box<dyn Iterator<Item = u64> + 'a>;
pub trait Storage {
fn set(&mut self, bits: BitsIter);
fn contains_all(&self, bits: BitsIter) -> bool;
}
pub struct BloomFilter<T: Hash>(Box<dyn Storage>, PhantomData<T>);
impl<T: Hash> BloomFilter<T> {
#[allow(unused)]
pub fn new() -> BloomFilter<T> {
return Self::by(Box::new(ArrayStorage([0; 5000])));
struct ArrayStorage<const N: usize>([u8; N]);
impl<const N: usize> Storage for ArrayStorage<N> {
fn set(&mut self, bits: BitsIter) {
let size = self.0.len() as u64;
bits.map(|bit| (bit % size) as usize)
.for_each(|index| self.0[index] = 1);
}
fn contains_all(&self, bits: BitsIter) -> bool {
let size = self.0.len() as u64;
bits.map(|bit| (bit % size) as usize)
.all(|index| self.0[index] == 1)
}
}
}
pub fn by(storage: Box<dyn Storage>) -> BloomFilter<T> {
BloomFilter(storage, PhantomData)
}
pub fn add(&mut self, value: T) {
self.0.set(self.hash(value));
}
pub fn contains(&self, value: T) -> bool {
self.0.contains_all(self.hash(value))
}
fn hash<'a, H: Hash + 'a>(&self, value: H) -> BitsIter<'a> {
Box::new(
[3, 11, 31, 71, 131]
.into_iter()
.map(|salt| SimpleHasher(0, salt))
.map(move |mut hasher| hasher.hash(&value)),
)
}
}
struct SimpleHasher(u64, u64);
impl SimpleHasher {
fn hash<H: Hash>(&mut self, value: &H) -> u64 {
value.hash(self);
self.finish()
}
}
impl Hasher for SimpleHasher {
fn finish(&self) -> u64 {
self.0
}
fn write(&mut self, bytes: &[u8]) {
self.0 += bytes.iter().fold(0u64, |acc, k| acc * self.1 + *k as u64)
}
}
}
mod spi {
use super::bloom_filter::*;
use redis::{Client, Commands};
use std::{cell::RefCell, rc::Rc};
pub struct RedisCache<'a>(Rc<RefCell<Client>>, BloomFilter<&'a str>);
impl<'a> RedisCache<'a> {
pub fn new(client: Rc<RefCell<Client>>, filter: BloomFilter<&'a str>) -> RedisCache<'a> {
RedisCache(client, filter)
}
pub fn get<'f>(
&mut self,
key: &str,
load_value: fn() -> Option<&'f str>,
) -> Option<String> {
if self.1.contains(key) {
let mut redis = self.0.as_ref().borrow_mut();
if let Ok(value) = redis.get::<&str, String>(key) {
return Some(value);
}
if let Some(actual_value) = load_value() {
let _: () = redis.set(key, &actual_value).unwrap();
return Some(actual_value.into());
}
}
return None;
}
}
struct RedisStorage(Rc<RefCell<Client>>);
const BLOOM_FILTER_KEY: &str = "bloom_filter";
impl Storage for RedisStorage {
fn set(&mut self, bits: BitsIter) {
bits.for_each(|slot| {
let _: bool = self
.0
.as_ref()
.borrow_mut()
.setbit(BLOOM_FILTER_KEY, slot as usize, true)
.unwrap();
})
}
fn contains_all(&self, mut bits: BitsIter) -> bool {
bits.all(|slot| {
self.0
.as_ref()
.borrow_mut()
.getbit(BLOOM_FILTER_KEY, slot as usize)
.unwrap()
})
}
}
#[test]
fn prevent_cache_penetration_by_bloom_filter() {
let client = Rc::new(RefCell::new(Client::open("redis://localhost").unwrap()));
redis::cmd("FLUSHDB").execute(&mut *client.as_ref().borrow_mut());
let mut filter: BloomFilter<&str> = BloomFilter::by(Box::new(RedisStorage(client.clone())));
assert!(!filter.contains("Rust"));
filter.add("Rust");
assert!(filter.contains("Rust"));
let mut cache = RedisCache::new(client, filter);
assert_eq!(
cache.get("Rust", || Some("System Language")),
Some("System Language".to_string())
);
assert_eq!(
cache.get("Rust", || panic!("must never be called after cached")),
Some("System Language".to_string())
);
assert_eq!(
cache.get("Go", || panic!("reject to loading `Go` from external storage")),
None
);
}
}
pub type BitsIter = Box<dyn Iterator<Item = u64>>;
In this case, the object in the box must be valid for the 'static lifetime. This isn't the case for the iterator returned by hash - its limited to the lifetime of self.
Try replacing with:
pub type BitsIter<'a> = Box<dyn Iterator<Item = u64> + 'a>;
Or using generics instead of boxed trait objects.
So your RedisClient needs a BloomFilter, but the BloomFilter also needs the RedisClient?
Your BloomFilter should not use the RedisCache that itself uses the BloomFilter - that's a recipe for infinitely recursing calls (how do you know what calls to RedisCache::add should update the bloom filter and which calls are from the bloom filter?).
If you really have to, you need some form of shared ownership, like Rc or Arc. Your BloomFilter will also need to use a weak reference, or else the two objects will refer to each other and will never free.
I am trying to write a function template which I want to have the following signature
pub fn convert_to_one_hot<T>(category_id: T, num_classes: usize) -> Vec<bool> {...}
Here T can be usize or Vec<usize>. How should I determine the trait bounds for T to
write such a function ?
In the following are instances of the function for the two cases :
When T is usize
pub fn convert_to_one_hot(category_id: usize, num_classes: usize) -> Vec<bool> {
let mut one_hot = Vec::<bool>::with_capacity(num_classes);
for index in 0usize..num_classes{
one_hot[index] = false;
}
one_hot[category_id] = true;
one_hot
}
When T is Vec<usize>
pub fn convert_to_one_hot(category_id: Vec<usize>, num_classes: usize) -> Vec<bool> {
let mut one_hot = Vec::<bool>::with_capacity(num_classes);
for index in 0usize..num_classes {
one_hot[index] = false;
}
for category in category_id{
one_hot[category] = true;
}
one_hot
}
You can create a helper trait and implement it for usize and Vec<T>:
pub trait ToOneHot {
fn convert(self, num_clases: usize) -> Vec<bool>;
}
impl ToOneHot for usize {
fn convert(self, num_classes: usize) -> Vec<bool> {
let mut one_hot = vec![false; num_classes];
one_hot[self] = true;
one_hot
}
}
impl ToOneHot for Vec<usize> {
fn convert(self, num_classes: usize) -> Vec<bool> {
let mut one_hot = vec![false; num_classes];
for category in self {
one_hot[category] = true;
}
one_hot
}
}
With that trait in place, the implementation of convert_to_one_hot becomes trivial:
pub fn convert_to_one_hot<T: ToOneHot>(category_id: T, num_classes: usize) -> Vec<bool> {
category_id.convert(num_classes)
}
Playground
Note that your vector creation was incorrect. Vec::with_capacity() is just an optimization, it does pre-allocate the space, but still returns an empty vector, so assigning elements to it will panic because it's out of bounds. You need to either call push() to append elements to the vector or create a non-empty vector to begin with, as shown in the edited code.
You can actually make do without a custom trait here, without sacrificing much convenience. If you accept an IntoIterator<Item = usize>, then you can pass in anything that can be iterated to yield usize items. This include Vec<usize> and Option<usize>. The latter can be used to conveniently pass in a single usize value.
pub fn convert_to_one_hot<I>(category_id: I, num_classes: usize) -> Vec<bool>
where
I: IntoIterator<Item = usize>,
{
let mut result = vec![false; num_classes];
for id in category_id {
result[id] = true;
}
result
}
This function can be called for vectors like this
convert_to_one_hot(vec![1, 3, 5], 6)
and for single integer values like this
convert_to_one_hot(Some(3), 6)
(Playground)
I have a number of generic functions that need to return new instances of the generic. However, the fields in the generic are not known, but I need to get/set the field values to create the instance (a simple Default will not work). For primitive fields, I can update my struct, but for fields with Vector or HashMap types, I get the error:
the size for values of type `[usize]` cannot be known at compilation time
Here is a minimal working example of my issue:
https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=6db1dd0b5982eca526725f4e5b423263
trait MyTrait {
fn id(&self) -> &usize;
fn id_mut(&mut self) -> &mut usize;
fn data(&self) -> &[usize];
fn data_mut(&mut self) -> &mut [usize];
}
#[derive(Debug)]
struct MyStruct {
id: usize,
data: Vec<usize>,
}
impl MyTrait for MyStruct {
fn id(&self) -> &usize {
&self.id
}
fn id_mut(&mut self) -> &mut usize {
&mut self.id
}
fn data(&self) -> &[usize] {
&self.data
}
fn data_mut(&mut self) -> &mut [usize] {
&mut self.data
}
}
impl Default for MyStruct {
fn default() -> MyStruct {
MyStruct {
id: 0,
data: vec![],
}
}
}
fn my_func<T: MyTrait + Default>() -> T {
let mut d = T::default();
// this correctly updates the struct field "id" to 26
*d.id_mut() = 26;
// this however, does not work. i get an error:
// the size for values of type `[usize]` cannot be known at compilation time
*d.data_mut() = vec![1, 2, 3].as_slice();
d
}
fn main() {
let _my_instance = my_func::<MyStruct>();
}
How can I create a generic function that returns an instance of the generic type?
For your Default instance of MyStruct, data is an empty Vec, but you are exposing it as a mutable slice. This is pretty meaningless because you can't change the length of a mutable slice; you can only mutate existing elements.
You will need to expose data as a &mut Vec<usize> instead, so that you can insert elements. The immutable getter can stay the same.
trait MyTrait {
fn id(&self) -> &usize;
fn id_mut(&mut self) -> &mut usize;
fn data(&self) -> &[usize];
fn data_mut(&mut self) -> &mut Vec<usize>;
}
And change the code that updates it:
fn my_func<T: MyTrait + Default>() -> T {
let mut d = T::default();
*d.id_mut() = 26;
*d.data_mut() = vec![1, 2, 3];
d
}