Is there something similar to try_retain()? - rust

Is it possible to apply FnMut(&mut V) -> Result<bool> for collection elements and if result is:
Ok(false) -- remove element and continue
Ok(true) -- leave element in collection and continue
Err(_) -- stop and return the error
Basically, how to code a try_retain(), an equivalent of this C++ code:
struct S {
// ...
};
// Updates `s` (and possibly some external state), returns true if `s` is no longer needed
// throws on error
bool bar(S& s);
void foo(std::map<int, S>& m) {
try
{
for(auto it = m.begin(), it_end = m.end(); it != it_end; ) {
if (bar(it->second))
m.erase(it++);
else
++it;
}
}
catch(...)
{
printf("bailed early\n");
throw;
}
}
For some reason I keep running into this pattern and I can't figure out how to do it in Rust without traversing collection twice or using additional memory...

This won't short-circuit, but you can do it by storing the global result in a mutable variable that's updated by the closure you pass to retain. Something like this:
use std::collections::HashMap;
fn try_retain<K, V, E, F>(m: &mut HashMap<K, V>, mut f: F) -> Result<(), E>
where
F: FnMut(&K, &mut V) -> Result<bool, E>,
{
let mut result = Ok(());
m.retain(|k, v| match f(k, v) {
Ok(b) => b,
Err(e) => {
result = Err(e);
true
}
});
return result;
}
Playground

If you're fine with switching to hashbrown::HashMap (which is what std::collections::HashMap uses as it's inner implementation), you could copy the implementation of fn retain and slightly change it to use the ? operator to return the first error:
fn try_retain<K, V, E, F>(m: &mut hashbrown::HashMap<K, V>, mut f: F) -> Result<(), E>
where
F: FnMut(&K, &mut V) -> Result<bool, E>,
{
let mut raw_table = m.raw_table();
// Here we only use `iter` as a temporary, preventing use-after-free
unsafe {
for item in raw_table.iter() {
let &mut (ref key, ref mut value) = item.as_mut();
if !f(key, value)? {
raw_table.erase(item);
}
}
}
Ok(())
}
This requires hashbrown's raw feature to be enabled, so you can access the underlying raw table.
This stops at the first error, and the elements are visited in a random order, so it's non-deterministic which elements are removed if an error occurs.

We could use try_for_each and remember the keys to remove (not to retain) till the end of the iteration because of the borrow checker:
fn try_retain(
map: &mut HashMap<i32, i32>,
to_retain: fn(k: i32, v: i32) -> bool,
) -> Result<bool, String> {
let mut to_remove: Vec<i32> = vec![];
map.iter().try_for_each(|(k, v)| {
if has_error() {
return Err("bailed early".to_string());
}
if !to_retain(*k, *v) {
to_remove.push(*k);
}
Ok::<_, String>(())
})?; // short-circuiting
if to_remove.is_empty() {
Ok(true)
} else {
(*map).retain(|&k, _| !to_remove.contains(&k));
Ok(false)
}
}
The return value results from whether there are keys to remove.
Can short-circuiting.
Playground
If the extra space for the to-remove-items is still too much, here is an unstable solution with drain_filter:
fn try_retain(
map: &mut HashMap<i32, i32>,
to_retain: fn(k: i32, v: i32) -> bool,
) -> bool {
let mut rslt = true;
map.drain_filter(|k, v| {
if !to_retain(*k, *v) {
rslt = false;
true
} else {
false
}
});
rslt
}
Short-circuiting is only possible through panic.
Playground

Related

How to drop a MaybeUninit of vector or array which is partially initialized?

I'm looking for information and good practices for using MaybeUninit
to directly initialize collections (typically arrays or vectors) and
drop them properly if initialization failed.
Thanks to the API examples, I was able to get by fairly quickly with
arrays but it was much trickier with vectors. On the example that
follows (which is a toy simplification of what I did in my project),
generic function, try_new<T: TryFrom<()>, A:ArrayUninit<T>>(len: usize), tries to create an array or a vector of objects T by means
of a fallible data generator TryFrom::try_from(_:()) implemented by
T. The order in which the array is generated is random
(asynchronism); this is simulated by function indices(len:usize).
Function, try_new<A:ArrayUninit>(len: usize), uses method
ArrayUninit::try_uninit(len: usize), implemented by Vec<Data> and
[Data;N], for building uninitialized array or vector.
In our main, we use data type, Data, as example, for which
generator, TryFrom<()> is implemented.
The following code seems to work, but I'm wondering
how to drop uninitialized data:
(playground)
use core::{ time::Duration, mem::MaybeUninit, };
use std::thread;
use rand::prelude::*;
// trait with method for building uninited array/vector
// implementations for Vec<T> and [T;N] after the main()
trait ArrayUninit<T>: AsMut<[T]> + Sized {
fn try_uninit(len: usize) -> Result<MaybeUninit<Self>,String>;
}
// generate shuffled indices
fn indices(len: usize) -> Box<dyn Iterator<Item = usize>> {
let mut vec: Vec<usize> = (0..len).collect();
vec.shuffle(&mut thread_rng());
Box::new(vec.into_iter())
}
// try to build an array or a vector of objects T
fn try_new<T: TryFrom<()>, A:ArrayUninit<T>>(len: usize) -> Result<A,String> {
// build uninitialized collection
let mut uninited = A::try_uninit(len)?;
// simulate initialization in random order
let indices = indices(len);
// build a mutable ref to the array/vector
let ra: &mut A = unsafe {(uninited.as_mut_ptr() as *mut A).as_mut() }.unwrap();
let mut failed = false;
for i in indices {
// get ptr at i
let ptr_arr: * mut T = unsafe{AsMut::<[T]>::as_mut(ra).as_mut_ptr().add(i)};
// get object and break if failed
let data = match T::try_from(()) {
Ok(data) => data, Err(_) => { failed = true; break; },
};
// set object
unsafe { *ptr_arr = data };
}
if !failed {
Ok(unsafe{ uninited.assume_init() }) // return array, if successful
} else {
// if failed, then
for i in 0..len { // drop all objects within array/vector
let ptr_arr: * mut T = unsafe{AsMut::<[T]>::as_mut(ra).as_mut_ptr().add(i)};
drop(unsafe { ptr_arr.read() });
}
drop(uninited); // and drop uninited array/vector
Err(format!("failed to init"))
}
}
// Object Data
#[derive(Debug)]
struct Data(f64);
impl TryFrom<()> for Data {
type Error = ();
// generate a float with errors; time consuming
fn try_from(_:()) -> Result<Self,()> {
thread::sleep(Duration::from_millis(10));
let f = rand::random();
if f <= 0.99 { Ok(Data(f)) } else { Err(()) }
}
}
fn main() {
let result: Result<Vec<Data>,_> = try_new(3);
println!("result: {:?}",result);
let result: Result<[Data;3],_> = try_new(3);
println!("result: {:?}",result);
let result: Result<Vec<Data>,_> = try_new(1000);
println!("result: {:?}",result);
let result: Result<[Data;1000],_> = try_new(1000);
println!("result: {:?}",result);
}
impl<T> ArrayUninit<T> for Vec<T> {
fn try_uninit(len: usize) -> Result<MaybeUninit<Self>,String> {
let mut v: MaybeUninit<Vec<T>> = MaybeUninit::uninit();
let mut vv = Vec::with_capacity(len);
unsafe { vv.set_len(len) };
v.write(vv);
Ok(v)
}
}
impl<T,const N: usize> ArrayUninit<T> for [T;N] {
fn try_uninit(len: usize) -> Result<MaybeUninit<Self>,String> {
if len == N {
Ok(MaybeUninit::uninit())
} else { Err(format!("len differs from array size")) }
}
}
Here is an example of run (results are random):
Standard Error
Compiling playground v0.0.1 (/playground)
Finished dev [unoptimized + debuginfo] target(s) in 0.84s
Running `target/debug/playground`
Standard Output
result: Ok([Data(0.9778296353515407), Data(0.9319034033060891), Data(0.11046580243682291)])
result: Ok([Data(0.749182522350767), Data(0.5432451150541627), Data(0.6840763419767837)])
result: Err("failed to init")
result: Err("failed to init")
For now, in case of failure, I drop all the addresses within the
array/vector, both initialized and uninitialized, then I drop the
array/vector. It seems to work, but I'm surprised that one can also
drop uninitialized data.
Can anyone confirm if this is a right approach to drop the
uninitialized data? If not, what are the rules to follow?
[EDIT:]
Thanks to the remarks of isaactfa and Chayim, I updated the code as follows (playgroud):
use core::{ time::Duration, mem::MaybeUninit, };
use std::thread;
use rand::prelude::*;
// trait with method for building uninited array/vector
// implementations for Vec<T> and [T;N] after the main()
trait ArrayUninit<T>: AsMut<[T]> + Sized {
type Uninited: Sized;
fn try_uninit(len: usize) -> Result<Self::Uninited,String>;
unsafe fn set(uninit: &mut Self::Uninited, i: usize, t: T);
unsafe fn destructor(uninit: &mut Self::Uninited,);
unsafe fn finalize(uninit: Self::Uninited) -> Self;
}
// generate shuffled indices
fn indices(len: usize) -> Box<dyn Iterator<Item = usize>> {
let mut vec: Vec<usize> = (0..len).collect();
vec.shuffle(&mut thread_rng());
Box::new(vec.into_iter())
}
// try to build an array or a vector of objects T
fn try_new<T: TryFrom<()>, A:ArrayUninit<T>>(len: usize) -> Result<A,String> {
// build uninitialized collection
let mut uninited = A::try_uninit(len)?;
// simulate initialization in random order
let indices = indices(len);
let mut failed = false;
for i in indices {
// get object and break if failed
let data = match T::try_from(()) {
Ok(data) => { data }, Err(_) => { failed = true; break; },
};
// set object
unsafe { A::set(&mut uninited,i,data) };
}
if !failed {
Ok(unsafe{ A::finalize(uninited) }) // return array, if successful
} else {
unsafe { A::destructor(&mut uninited) };
Err(format!("failed to init"))
}
}
// Object Data
#[derive(Debug)]
struct Data(String);
impl TryFrom<()> for Data {
type Error = ();
// generate a float with errors; time consuming
fn try_from(_:()) -> Result<Self,()> {
thread::sleep(Duration::from_millis(10));
let f:f32 = rand::random();
if f <= 0.99 { Ok(Data(format!("Value = {}",f))) } else { Err(()) }
}
}
fn main() {
let result: Result<Vec<Data>,_> = try_new(3);
println!("result: {:?}",result);
let result: Result<[Data;3],_> = try_new(3);
println!("result: {:?}",result);
let result: Result<Vec<Data>,_> = try_new(3);
println!("result: {:?}",result);
let result: Result<[Data;3],_> = try_new(3);
println!("result: {:?}",result);
let result: Result<Vec<Data>,_> = try_new(1000);
println!("result: {:?}",result);
let result: Result<[Data;1000],_> = try_new(1000);
println!("result: {:?}",result);
let result: Result<Vec<Data>,_> = try_new(1000);
println!("result: {:?}",result);
let result: Result<[Data;1000],_> = try_new(1000);
println!("result: {:?}",result);
}
impl<T> ArrayUninit<T> for Vec<T> {
type Uninited = (Vec<T>,Vec<bool>);
fn try_uninit(len: usize) -> Result<Self::Uninited,String> {
Ok((Vec::with_capacity(len),vec![false;len]))
}
unsafe fn set((uninit,flag): &mut Self::Uninited, i: usize, t: T) {
uninit.as_mut_ptr().offset(i as isize).write(t); flag[i] = true;
}
unsafe fn destructor((uninit,flag): &mut Self::Uninited,) {
for i in 0..flag.len() {
if flag[i] { std::ptr::drop_in_place(uninit.as_mut_ptr().offset(i as isize)); }
}
}
unsafe fn finalize((mut uninit,flag): Self::Uninited) -> Self {
uninit.set_len(flag.len());
uninit
}
}
impl<T,const N: usize> ArrayUninit<T> for [T;N] {
type Uninited = ([MaybeUninit<T>;N],[bool;N]);
fn try_uninit(len: usize) -> Result<Self::Uninited,String> {
if len == N {
let uninit = unsafe{ MaybeUninit::uninit().assume_init() };
Ok((uninit,[false;N]))
} else { Err(format!("len differs from array size")) }
}
unsafe fn set((uninit,flag): &mut Self::Uninited, i: usize, t: T) {
uninit[i].write(t); flag[i] = true;
}
unsafe fn destructor((uninit,flag): &mut Self::Uninited,) {
for i in 0..N {
if flag[i] { std::ptr::drop_in_place(uninit[i].as_mut_ptr()); }
}
}
unsafe fn finalize((uninit,_): Self::Uninited) -> Self {
(&uninit as *const _ as *const Self).read()
}
}
The idea here is to use specific approaches for arrays and vecs, which are encoded within trait ArrayUninit. MaybeUninit is used only for arrays, while it is not needed for vecs.
Your code contains multiple points of UB:
Calling set_len() when the elements in range are uninitialized (you're doing that in try_uninit() for Vec<T>) is UB (see set_len()'s docs).
When initializing arrays, you create uninitialized storage for the array in try_uninit() and then turns that into a reference to an initialized array in try_new(). This may be undefined behavior (but not necessarily), see https://github.com/rust-lang/unsafe-code-guidelines/issues/84.
When setting the value at the index (unsafe { *ptr_arr = data } in try_new()), you drop the old value. If the value has no drop glue this is likely fine, but if it has this is undefined behavior since your drop uninitialized data. You need to use std::ptr::write() instead.
You're doing a typed copy of the values by drop(unsafe { ptr_arr.read() }). Doing a typed copy of uninitialized values is definitely UB (Miri is even flagging this one).

How to resize and rehash for SeparateChainingHashST?

I am implementing the SeparateChainingHashST in Rust. The complete code can be found here.
pub struct SeparateChainingHashST<K, V> {
n: usize, // number of key-value pairs
m: usize, // hash table size
st: Vec<SequentialSearchST<K, V>>,
}
, where SequentialSearchST is a linked list.
What bothers me now is how to implement the resize method. Its main idea is: create a new SeparateChainingHashST with the new size, and then put every key/value into this new symbol table.
What I have done:
fn resize(self, chains: usize) -> Self {
let mut tmp = SeparateChainingHashST::new(chains);
for table in self.st.into_iter() {
for (k, v) in table.into_items() {
tmp.put(k, v);
}
}
tmp
}
Since put() method accepts K and V by value, I add an into iterator into_items to generate (K, V).
This version does not really make sense, and what I want is fn resize(&mut self, chains: usize). Note that I don't like to add Copy or Clone trait to K or V.
The key point is to take value from self.st (a vector), and since the order is not important here, one solution is to use swap_remove:
fn resize(&mut self, chains: usize) {
let mut tmp = SeparateChainingHashST::new(chains);
for _ in 0..self.m {
for (k, v) in self.st.swap_remove(0).into_items() {
tmp.put(k, v);
}
}
*self = tmp;
}
Alternatively, using pop.
fn resize(&mut self, chains: usize) {
let mut tmp = SeparateChainingHashST::new(chains);
while let Some(table) = self.st.pop() {
for (k, v) in table.into_items() {
tmp.put(k, v);
}
}
*self = tmp;
}

How do I mutate in a match which borrows an immutable value?

I can understand borrowing/ownership concepts in Rust, but I have no idea how to work around this case:
use std::collections::{HashMap, HashSet};
struct Val {
t: HashMap<u16, u16>,
l: HashSet<u16>,
}
impl Val {
fn new() -> Val {
Val {
t: HashMap::new(),
l: HashSet::new(),
}
}
fn set(&mut self, k: u16, v: u16) {
self.t.insert(k, v);
self.l.insert(v);
}
fn remove(&mut self, v: &u16) -> bool {
self.l.remove(v)
}
fn do_work(&mut self, v: u16) -> bool {
match self.t.get(&v) {
None => false,
Some(r) => self.remove(r),
}
}
}
fn main() {
let mut v = Val::new();
v.set(123, 100);
v.set(100, 1234);
println!("Size before: {}", v.l.len());
println!("Work: {}", v.do_work(123));
println!("Size after: {}", v.l.len());
}
playground
The compiler has the error:
error[E0502]: cannot borrow `*self` as mutable because it is also borrowed as immutable
--> src/main.rs:28:24
|
26 | match self.t.get(&v) {
| ------ immutable borrow occurs here
27 | None => false,
28 | Some(r) => self.remove(r),
| ^^^^^------^^^
| | |
| | immutable borrow later used by call
| mutable borrow occurs here
I don't understand why I can't mutate in the match arm when I did a get (read value) before; the self.t.get is finished when the mutation via remove begins.
Is this due to scope of the result (Option<&u16>) returned by the get? It's true that the lifetime of the result has a scope inside the match expression, but this design-pattern is used very often (mutate in a match expression).
How do I work around the error?
The declaration of function HashMap::<K,V>::get() is, a bit simplified:
pub fn get<'s>(&'s self, k: &K) -> Option<&'s V>
This means that it returns an optional reference to the contained value, not the value itself. Since the returned reference points to a value inside the map, it actually borrows the map, that is, you cannot mutate the map while this reference exists. This restriction is there to protect you, what would happen if you remove this value while the reference is still alive?
So when you write:
match self.t.get(&v) {
None => false,
//r: &u16
Some(r) => self.remove(r)
}
the captured r is of type &u16 and its lifetime is that of self.t, that is, it is borrowing it. Thus you cannot get a mutable reference to self, that is needed to call remove.
The simplest solution for your problem is the clone() solves every lifetime issue pattern. Since your values are of type u16, that is Copy, it is actually trivial:
match self.t.get(&v) {
None => false,
//r: u16
Some(&r) => self.remove(&r)
}
Now r is actually of type u16 so it borrows nothing and you can mutate self at will.
If your key/value types weren't Copy you could try and clone them, if you are willing to pay for that. If not, there is still another option as your remove() function does not modify the HashMap but an unrelated HashSet. You can still mutate that set if you take care not to reborrow self:
fn remove2(v: &u16, l: &mut HashSet<u16>) -> bool {
l.remove(v)
}
fn do_work(&mut self, v: u16) -> bool {
match self.t.get(&v) {
None => false,
//selt.t is borrowed, now we mut-borrow self.l, no problem
Some(r) => Self::remove2(r, &mut self.l)
}
}
You are trying to remove value from HashMap by using value you get, not key.
Only line 26 is changed Some(_) => self.remove(&v)
This will work:
use std::collections::HashMap;
struct Val {
t: HashMap<u16, u16>
}
impl Val {
fn new() -> Val {
Val { t: HashMap::new() }
}
fn set(&mut self, k: u16, v: u16) {
self.t.insert(k, v);
}
fn remove(&mut self, v: &u16) -> bool {
match self.t.remove(v) {
None => false,
_ => true,
}
}
fn do_work(&mut self, v: u16) -> bool {
match self.t.get(&v) {
None => false,
Some(_) => self.remove(&v)
}
}
}
fn main() {
let mut v = Val::new();
v.set(123, 100);
v.set(1100, 1234);
println!("Size before: {}", v.t.len());
println!("Work: {}", v.do_work(123));
println!("Size after: {}", v.t.len());
}
play.rust
It seems that the following solution is good for primitive types like here u16. For other types, the ownership is moved.
use std::collections::HashMap;
struct Val {
t: HashMap<u16, u16>,
}
impl Val {
fn new() -> Val {
Val { t: HashMap::new() }
}
fn set(&mut self, k: u16, v: u16) {
self.t.insert(k, v);
}
fn remove(&mut self, v: &u16) -> bool {
match self.t.remove(v) {
None => false,
_ => true,
}
}
fn do_work(&mut self, v: u16) -> bool {
match self.t.get(&v) {
None => false,
Some(&v) => self.remove(&v)
}
}
}
fn main() {
let mut v = Val::new();
v.set(123, 100);
v.set(100, 1234);
println!("Size before: {}", v.t.len());
println!("Work: {}", v.do_work(123));
println!("Size after: {}", v.t.len());
}
For other types, we must clone the value:
use std::collections::{HashMap, HashSet};
#[derive(Debug)]
struct Val {
t: HashMap<String, String>,
l: HashSet<String>
}
impl Val {
fn new() -> Val {
Val { t: HashMap::new(), l: HashSet::new() }
}
fn set(&mut self, k: String, v: String) {
self.l.insert(v.clone());
self.t.insert(k, v);
}
fn remove(&mut self, v: &String) -> bool {
self.l.remove(v)
}
fn do_work(&mut self, i: &String) -> bool {
match self.t.get(i) {
None => false,
Some(v) => {
let x = v.clone();
self.remove(&x)
}
}
}
fn do_task(&mut self, i: &String) -> bool {
match self.t.get(i) {
None => false,
Some(v) => self.l.insert(v.clone())
}
}
}
fn main() {
let mut v = Val::new();
v.set("AA".to_string(), "BB".to_string());
v.set("BB".to_string(), "CC".to_string());
println!("Start: {:#?}", v);
println!("Size before: {}", v.l.len());
println!("Work: {}", v.do_work(&"AA".to_string()));
println!("Size after: {}", v.l.len());
println!("After: {:#?}", v);
println!("Task [Exist]: {}", v.do_task(&"BB".to_string()));
println!("Task [New]: {}", v.do_task(&"AA".to_string()));
println!("End: {:#?}", v);
}
But i'd like a solution that has no allocation

How do I handle/circumvent "Cannot assign to ... which is behind a & reference" in Rust?

I'd implementing a simple linked list. This is the (working) code I had so far:
pub struct LinkedList<T> {
start: Option<Box<Link<T>>>,
}
impl<T> LinkedList<T> {
pub fn new() -> LinkedList<T> {
return LinkedList { start: None };
}
}
struct Link<T> {
value: Box<T>,
next: Option<Box<Link<T>>>,
}
impl<T> Link<T> {
fn new_end(value: T) -> Link<T> {
return Link::new(value, None);
}
fn new(value: T, next: Option<Box<Link<T>>>) -> Link<T> {
return Link {
value: Box::new(value),
next,
};
}
}
Next on the list is a method to append to the list; this is what I came up with:
pub fn append(&mut self, element: T) {
// Create the link to append
let new_link = Some(Box::new(Link::new_end(element)));
// Find the last element of the list. None, if the list is empty
let mut last = &self.start;
while let Some(link) = last {
last = &link.next;
}
// Insert the new link at the correct position
match last {
None => self.start = new_link,
Some(last) => last.next = new_link, // This fails
}
}
The precise compiler error is
error[E0594]: cannot assign to `last.next` which is behind a `&` reference
I vaguely get the problem; you cannot mutate an immutable reference. But making the references mutable does seem to make the errors even worse.
How does one handle these kinds of errors? Is there a simple quick-fix, or do you structure your code completely different in Rust?
Your code almost worked. It will if you bind mutably:
impl<T> LinkedList<T> {
pub fn append(&mut self, element: T) {
// Create the link to append
let new_link = Some(Box::new(Link::new_end(element)));
// Find the last element of the list. None, if the list is empty
let mut last = &mut self.start;
while let Some(link) = last {
last = &mut link.next;
}
// Insert the new link at the correct position
match last {
None => self.start = new_link,
Some(ref mut last) => last.next = new_link,
}
}
}
FYI, the answer to this recent question is very good at clarifying the matter about mutability, type and binding in Rust.

Elegant way to borrow and return a mutable reference in Rust

I'm trying to return a mutable reference after doing some operation on it. This is best explained by a piece of code:
#[derive(PartialEq)]
pub enum Value {
Null,
Array(Vec<Value>),
}
impl Value {
pub fn new() -> Value {
Value::Array(Vec::new())
}
pub fn push<'a, T> (&'a mut self, value: T) -> Option<&'a mut Value>
where T:Into<Value> {
let temp = match *self {
Value::Array(ref mut vec) => {
vec.push(value.into());
true
},
_ => false,
};
if temp {
Some(self)
} else {
None
}
}
}
#[test]
fn push_test() {
let mut val = Value::new();
val.push(Value::Null);
assert!(val == Value::Array(vec![Value::Null]));
}
The play version is here. The workaround with boolean values is because I would be borrowing multiple times if I return Some(self) from within the match block. Is there an elegant way to implement the push function without using boolean values? If its possible to retain the function signature then its a bonus. Thank you!
The workaround with boolean values is because I would be borrowing multiple times if I return Some(self) from within the match block
Another option is to replace self temporally, so v can take the ownership of the vector (avoiding the borrow). After adding the new item to v, we reconstruct the self value:
// the lifetime 'a can be omitted
pub fn push<T>(&mut self, value: T) -> Option<&mut Value>
where T: Into<Value>
{
// replace put Value::Null on self and return the old value
match ::std::mem::replace(self, Value::Null) {
Value::Array(mut v) => {
v.push(value.into());
*self = Value::Array(v);
Some(self)
},
_ => None,
}
}

Resources