Union-Find implementation does not update parent tags - rust

I'm trying to create some sets of Strings and then merge some of these sets so that they have the same tag (of type usize). Once I initialize the map, I start adding strings:
self.clusters.make_set("a");
self.clusters.make_set("b");
When I call self.clusters.find("a") and self.clusters.find("b"), different values are returned, which is fine because I haven't merged the sets yet. Then I call the following method to merge two sets
let _ = self.clusters.union("a", "b");
If I call self.clusters.find("a") and self.clusters.find("b") now, I get the same value. However, when I call the finalize() method and try to iterate through the map, the original tags are returned, as if I never merged the sets.
self.clusters.finalize();
for (address, tag) in &self.clusters.map {
self.clusterizer_writer.write_all(format!("{};{}\n", address,
self.clusters.parent[*tag]).as_bytes()).unwrap();
}
// to output all keys with the same tag as a list.
let a: Vec<(usize, Vec<String>)> = {
let mut x = HashMap::new();
for (k, v) in self.clusters.map.clone() {
x.entry(v).or_insert_with(Vec::new).push(k)
}
x.into_iter().collect()
};
I can't figure out why this is the case, but I'm relatively new to Rust; maybe its an issue with pointers?
Instead of "a" and "b", I'm actually using something like utils::arr_to_hex(&input.outpoint.txid) of type String.
This is the Rust implementation of the Union-Find algorithm that I am using:
/// Tarjan's Union-Find data structure.
#[derive(RustcDecodable, RustcEncodable)]
pub struct DisjointSet<T: Clone + Hash + Eq> {
set_size: usize,
parent: Vec<usize>,
rank: Vec<usize>,
map: HashMap<T, usize>, // Each T entry is mapped onto a usize tag.
}
impl<T> DisjointSet<T>
where
T: Clone + Hash + Eq,
{
pub fn new() -> Self {
const CAPACITY: usize = 1000000;
DisjointSet {
set_size: 0,
parent: Vec::with_capacity(CAPACITY),
rank: Vec::with_capacity(CAPACITY),
map: HashMap::with_capacity(CAPACITY),
}
}
pub fn make_set(&mut self, x: T) {
if self.map.contains_key(&x) {
return;
}
let len = &mut self.set_size;
self.map.insert(x, *len);
self.parent.push(*len);
self.rank.push(0);
*len += 1;
}
/// Returns Some(num), num is the tag of subset in which x is.
/// If x is not in the data structure, it returns None.
pub fn find(&mut self, x: T) -> Option<usize> {
let pos: usize;
match self.map.get(&x) {
Some(p) => {
pos = *p;
}
None => return None,
}
let ret = DisjointSet::<T>::find_internal(&mut self.parent, pos);
Some(ret)
}
/// Implements path compression.
fn find_internal(p: &mut Vec<usize>, n: usize) -> usize {
if p[n] != n {
let parent = p[n];
p[n] = DisjointSet::<T>::find_internal(p, parent);
p[n]
} else {
n
}
}
/// Union the subsets to which x and y belong.
/// If it returns Ok<u32>, it is the tag for unified subset.
/// If it returns Err(), at least one of x and y is not in the disjoint-set.
pub fn union(&mut self, x: T, y: T) -> Result<usize, ()> {
let x_root;
let y_root;
let x_rank;
let y_rank;
match self.find(x) {
Some(x_r) => {
x_root = x_r;
x_rank = self.rank[x_root];
}
None => {
return Err(());
}
}
match self.find(y) {
Some(y_r) => {
y_root = y_r;
y_rank = self.rank[y_root];
}
None => {
return Err(());
}
}
// Implements union-by-rank optimization.
if x_root == y_root {
return Ok(x_root);
}
if x_rank > y_rank {
self.parent[y_root] = x_root;
return Ok(x_root);
} else {
self.parent[x_root] = y_root;
if x_rank == y_rank {
self.rank[y_root] += 1;
}
return Ok(y_root);
}
}
/// Forces all laziness, updating every tag.
pub fn finalize(&mut self) {
for i in 0..self.set_size {
DisjointSet::<T>::find_internal(&mut self.parent, i);
}
}
}

I think you're just not extracting the information out of your DisjointSet struct correctly.
I got sniped by this and implemented union find. First, with a basic usize implemention:
pub struct UnionFinderImpl {
parent: Vec<usize>,
}
Then with a wrapper for more generic types:
pub struct UnionFinder<T: Hash> {
rev: Vec<Rc<T>>,
fwd: HashMap<Rc<T>, usize>,
uf: UnionFinderImpl,
}
Both structs implement a groups() method that returns a Vec<Vec<>> of groups. Clone isn't required because I used Rc.
Playground

Related

Peek inmplementation for linked list in rust

https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=693594655ea355b40e2175542c653879
I want peek() to remove the last element of the list, returning data. What am I missing?
type Link<T> = Option<Box<Node<T>>>;
struct Node<T> {
pub data: T,
pub next: Link<T>,
}
struct List<T> {
pub head: Link<T>,
}
impl<T> List<T> {
fn peek(&mut self) -> Option<T> {
let mut node = &self.head;
while let Some(cur_node) = &mut node {
if cur_node.next.is_some() {
node = &cur_node.next;
continue;
}
}
let last = node.unwrap();
let last = last.data;
return Some(last);
}
}
#[test]
fn peek_test() {
let mut q = List::new();
q.push(1);
q.push(2);
q.push(3);
assert_eq!(q.empty(), false);
assert_eq!(q.peek().unwrap(), 1);
assert_eq!(q.peek().unwrap(), 2);
assert_eq!(q.peek().unwrap(), 3);
assert_eq!(q.empty(), true);
}
To save the head, I need to access the elements by reference, but the puzzle does not fit in my head. I looked at "too-many-lists", but the value is simply returned by reference, and I would like to remove the tail element.
To make this work you have to switch from taking a shared reference (&) to a mutable one.
This results in borrow checker errors with your code wihch is why I had to change the while let loop into one
which checks if the next element is Some and only then borrows node's content mutably and advances it.
At last I Option::take that last element and return it's data. I use Option::map to avoid having to unwrap which would panic for empty lists anyways if you wanted to keep your variant you should replace unwrap with the try operator ?.
So in short you can implement a pop_back like this:
pub fn pop_back(&mut self) -> Option<T> {
let mut node = &mut self.head;
while node.as_ref().map(|n| n.next.is_some()).unwrap_or_default() {
node = &mut node.as_mut().unwrap().next;
}
node.take().map(|last| last.data)
}
I suggest something like below, Just because I spent time on it .-)
fn peek(&mut self) -> Option<T> {
match &self.head {
None => return None,
Some(v) =>
if v.next.is_none() {
let last = self.head.take();
let last = last.unwrap().data;
return Some(last);
}
}
let mut current = &mut self.head;
loop {
match current {
None => return None,
Some(node) if node.next.is_some() && match &node.next { None => false, Some(v) => v.next.is_none()} => {
let last = node.next.take();
let last = last.unwrap().data;
return Some(last);
},
Some(node) => {
current = &mut node.next;
}
}
}
}

Why does wasm-opt fail in wasm-pack builds when generating a function returning a string?

I'm working through the Rust WASM tutorial for Conway's game of life.
One of the simplest functions in the file is called Universe.render (it's the one for rendering a string representing game state). It's causing an error when I run wasm-pack build:
Fatal: error in validating input
Error: failed to execute `wasm-opt`: exited with exit code: 1
full command: "/home/vaer/.cache/.wasm-pack/wasm-opt-4d7a65327e9363b7/wasm-opt" "/home/vaer/src/learn-rust/wasm-game-of-life/pkg/wasm_game_of_life_bg.wasm" "-o" "/home/vaer/src/learn-rust/wasm-game-of-life/pkg/wasm_game_of_life_bg.wasm-opt.wasm" "-O"
To disable `wasm-opt`, add `wasm-opt = false` to your package metadata in your `Cargo.toml`.
If I remove that function, the code builds without errors. If I replace it with the following function, the build fails with the same error:
pub fn wtf() -> String {
String::from("wtf")
}
It seems like any function that returns a String causes this error. Why?
Following is the entirety of my code:
mod utils;
use wasm_bindgen::prelude::*;
// When the `wee_alloc` feature is enabled, use `wee_alloc` as the global
// allocator.
#[cfg(feature = "wee_alloc")]
#[global_allocator]
static ALLOC: wee_alloc::WeeAlloc = wee_alloc::WeeAlloc::INIT;
// Begin game of life impl
use std::fmt;
#[wasm_bindgen]
#[repr(u8)]
#[derive(Clone, Copy, Debug, PartialEq, Eq)]
pub enum Cell {
Dead = 0,
Alive = 1,
}
#[wasm_bindgen]
pub struct Universe {
width: u32,
height: u32,
cells: Vec<Cell>,
}
impl fmt::Display for Universe {
fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
for line in self.cells.as_slice().chunks(self.width as usize) {
for &cell in line {
let symbol = if cell == Cell::Dead { '◻' } else { '◼' };
write!(f, "{}", symbol)?;
}
write!(f, "\n")?;
}
Ok(())
}
}
impl Universe {
fn get_index(&self, row: u32, column: u32) -> usize {
(row * self.width + column) as usize
}
fn live_neighbor_count(&self, row: u32, column: u32) -> u8 {
let mut count = 0;
for delta_row in [self.height - 1, 0, 1].iter().cloned() {
for delta_col in [self.width - 1, 0, 1].iter().cloned() {
if delta_row == 0 && delta_col == 0 {
continue;
}
let neighbor_row = (row + delta_row) % self.height;
let neighbor_col = (column + delta_col) % self.width;
let idx = self.get_index(neighbor_row, neighbor_col);
count += self.cells[idx] as u8;
}
}
count
}
}
/// Public methods, exported to JavaScript.
#[wasm_bindgen]
impl Universe {
pub fn tick(&mut self) {
let mut next = self.cells.clone();
for row in 0..self.height {
for col in 0..self.width {
let idx = self.get_index(row, col);
let cell = self.cells[idx];
let live_neighbors = self.live_neighbor_count(row, col);
let next_cell = match (cell, live_neighbors) {
// Rule 1: Any live cell with fewer than two live neighbours
// dies, as if caused by underpopulation.
(Cell::Alive, x) if x < 2 => Cell::Dead,
// Rule 2: Any live cell with two or three live neighbours
// lives on to the next generation.
(Cell::Alive, 2) | (Cell::Alive, 3) => Cell::Alive,
// Rule 3: Any live cell with more than three live
// neighbours dies, as if by overpopulation.
(Cell::Alive, x) if x > 3 => Cell::Dead,
// Rule 4: Any dead cell with exactly three live neighbours
// becomes a live cell, as if by reproduction.
(Cell::Dead, 3) => Cell::Alive,
// All other cells remain in the same state.
(otherwise, _) => otherwise,
};
next[idx] = next_cell;
}
}
self.cells = next;
}
pub fn new() -> Universe {
let width = 64;
let height = 64;
let cells = (0..width * height)
.map(|i| {
if i % 2 == 0 || i % 7 == 0 {
Cell::Alive
} else {
Cell::Dead
}
})
.collect();
Universe {
width,
height,
cells,
}
}
pub fn render(&self) -> String {
self.to_string()
}
}
Simply removing the render function at the bottom of this file causes the build to succeed. Replacing the render function with any function returning a String causes the build to fail. Why?
It turns out that this is not expected behavior; instead it is a bug with wasm-pack.
The issue can be resolved for now by adding the following to the project's cargo.toml:
[package.metadata.wasm-pack.profile.release]
wasm-opt = ["-Oz", "--enable-mutable-globals"]

How can I return the combination of two borrowed RefCells?

I have a struct with two Vecs wrapped in RefCells. I want to have a method on that struct that combines the two vectors and returns them as a new RefCell or RefMut:
use std::cell::{RefCell, RefMut};
struct World {
positions: RefCell<Vec<Option<Position>>>,
velocities: RefCell<Vec<Option<Velocity>>>,
}
type Position = i32;
type Velocity = i32;
impl World {
pub fn new() -> World {
World {
positions: RefCell::new(vec![Some(1), None, Some(2)]),
velocities: RefCell::new(vec![None, None, Some(1)]),
}
}
pub fn get_pos_vel(&self) -> RefMut<Vec<(Position, Velocity)>> {
let mut poses = self.positions.borrow_mut();
let mut vels = self.velocities.borrow_mut();
poses
.iter_mut()
.zip(vels.iter_mut())
.filter(|(e1, e2)| e1.is_some() && e2.is_some())
.map(|(e1, e2)| (e1.unwrap(), e2.unwrap()))
.for_each(|elem| println!("{:?}", elem));
}
}
fn main() {
let world = World::new();
world.get_pos_vel();
}
How would I return the zipped contents of the vectors as a new RefCell? Is that possible?
I know there is RefMut::map() and I tried to nest two calls to map, but didn't succeed with that.
You want to be able to modify the positions and velocities. If these have to be stored in two separate RefCells, what about side-stepping the problem and using a callback to do the modification?
use std::cell::RefCell;
struct World {
positions: RefCell<Vec<Option<Position>>>,
velocities: RefCell<Vec<Option<Velocity>>>,
}
type Position = i32;
type Velocity = i32;
impl World {
pub fn new() -> World {
World {
positions: RefCell::new(vec![Some(1), None, Some(2)]),
velocities: RefCell::new(vec![None, None, Some(1)]),
}
}
pub fn modify_pos_vel<F: FnMut(&mut Position, &mut Velocity)>(&self, mut f: F) {
let mut poses = self.positions.borrow_mut();
let mut vels = self.velocities.borrow_mut();
poses
.iter_mut()
.zip(vels.iter_mut())
.filter_map(|pair| match pair {
(Some(e1), Some(e2)) => Some((e1, e2)),
_ => None,
})
.for_each(|pair| f(pair.0, pair.1))
}
}
fn main() {
let world = World::new();
world.modify_pos_vel(|position, velocity| {
// Some modification goes here, for example:
*position += *velocity;
});
}
If you want to return a new Vec, then you don't need to wrap it in RefMut or RefCell:
Based on your code with filter and map
pub fn get_pos_vel(&self) -> Vec<(Position, Velocity)> {
let mut poses = self.positions.borrow_mut();
let mut vels = self.velocities.borrow_mut();
poses.iter_mut()
.zip(vels.iter_mut())
.filter(|(e1, e2)| e1.is_some() && e2.is_some())
.map(|(e1, e2)| (e1.unwrap(), e2.unwrap()))
.collect()
}
Alternative with filter_map
poses.iter_mut()
.zip(vels.iter_mut())
.filter_map(|pair| match pair {
(Some(e1), Some(e2)) => Some((*e1, *e2)),
_ => None,
})
.collect()
You can wrap it in RefCell with RefCell::new, if you really want to, but I would leave it up to the user of the function to wrap it in whatever they need.

How to generalise access to struct fields?

I try to find differences from two streams (represented by iterators) for later analysis, the code below works just fine, but looks a little bit ugly and error prone (copy-paste!) in updating values in update_v? functions. Is there any ways to generalise it assuming that source is matter?
struct Data {};
struct S {
v1: Option<Data>,
v2: Option<Data>
}
...
fn update_v1(diffs: &mut HashMap<u64, Data>, key: u64, data: Data) {
match diffs.entry(key) {
Entry::Vacant(v) => {
let variant = S {
v1: Some(data),
v2: None
};
v.insert(variant);
},
Entry::Occupied(e) => {
let new_variant = Some(data);
if e.get().v2 == new_variant {
e.remove();
} else {
let existing = e.into_mut();
existing.v1 = new_variant;
}
}
}
}
fn update_v2(diffs: &mut HashMap<u64, Data>, key: u64, data: Data) {
match diffs.entry(key) {
Entry::Vacant(v) => {
let variant = S {
v2: Some(data),
v1: None
};
v.insert(variant);
},
Entry::Occupied(e) => {
let new_variant = Some(data);
if e.get().v1 == new_variant {
e.remove();
} else {
let existing = e.into_mut();
existing.v2 = new_variant;
}
}
}
}
Instead of writing one function for each field, receive a pair of Fns as arguments:
fn(&S) -> Option<Data>, which can be used to replace this condition
if e.get().v1 == new_variant { /* ... */ }
with this
if getter(e.get()) == new_variant { /* ... */ }
fn(&mut S, Option<Data>) -> (), which replaces
existing.v2 = new_variant;
with
setter(&mut existing, new_variant);
Then on the call site you pass a couple lambdas like this
Getter: |d| d.v1
Setter: |s, d| s.v2 = d
Or vice-versa for the other function.
And if you want to keep the update_v1 and update_v2 function names, just write those as wrappers to this new generalized function that automatically pass the proper lambdas.
You can create a trait to facilitate different ways of accessing the structure.
trait SAccessor {
type RV;
fn new(Data) -> S;
fn v2(&S) -> &Self::RV;
fn v1_mut(&mut S) -> &mut Self::RV;
}
struct DirectSAccessor;
impl SAccessor for DirectSAccessor {
type RV = Option<Data>;
fn new(data: Data) -> S {
S {
v1: Some(data),
v2: None
}
}
fn v2(s: &S) -> &Self::RV {
&s.v2
}
fn v1_mut(s: &mut S) -> &mut Self::RV {
&mut s.v1
}
}
fn update<A>(diffs: &mut HashMap<u64, S>, key: u64, data: Data)
where A: SAccessor<RV=Option<Data>>
{
match diffs.entry(key) {
Entry::Vacant(v) => {
let variant = A::new(data);
v.insert(variant);
},
Entry::Occupied(e) => {
let new_variant = Some(data);
if A::v2(e.get()) == &new_variant {
e.remove();
} else {
let existing = e.into_mut();
*A::v1_mut(existing) = new_variant;
}
}
}
}
// ...
// update::<DirectSAccessor>( ... );
Full code

Iterating over the contents of an Option, or over a specific value

Let's say that we have the following C-code (assume that srclen == dstlen and the length is divisible by 64).
void stream(uint8_t *dst, uint8_t *src, size_t dstlen) {
int i;
uint8_t block[64];
while (dstlen > 64) {
some_function_that_initializes_block(block);
for (i=0; i<64; i++) {
dst[i] = ((src != NULL)?src[i]:0) ^ block[i];
}
dst += 64;
dstlen -= 64;
if (src != NULL) { src += 64; }
}
}
That is a function that takes a source and a destination and xors source with some value that
the function computes. When source is set to a NULL-pointer dst is just the computed value.
In rust it is quite simple to do this when src cannot be null, we can do something like:
fn stream(dst: &mut [u8], src: &[u8]) {
let mut block = [0u8, ..64];
for (dstchunk, srcchunk) in dst.chunks_mut(64).zip(src.chunks(64)) {
some_function_that_initializes_block(block);
for (d, (&s, &b)) in dstchunk.iter_mut().zip(srcchunk.iter().zip(block.iter())) {
*d = s ^ b;
}
}
}
However let us assume that we want to be able to mimic the original C-function. Then we would like to do something like:
fn stream(dst: &mut[u8], osrc: Option<&[u8]>) {
let srciter = match osrc {
None => repeat(0),
Some(src) => src.iter()
};
// the rest of the code as above
}
Alas, this won't work since repeat(0) and src.iter() have different types. However it doesn't seem possible to solve this by using a trait object since we get a compiler error saying cannot convert to a trait object because trait 'core::iter::Iterator' is not object safe. (also there is no function in the standard library that chunks an iterator).
Is there any nice way to solve this, or should I just duplicate the code in each arm of the match statement?
Instead of repeating the code in each arm, you can call a generic inner function:
fn stream(dst: &mut[u8], osrc: Option<&[u8]>) {
fn inner<T>(dst: &mut[u8], srciter: T) where T: Iterator<u8> {
let mut block = [0u8, ..64];
//...
}
match osrc {
None => inner(dst, repeat(0)),
Some(src) => inner(dst, src.iter().map(|a| *a))
}
}
Note the additional map to make both iterators compatible (Iterator<u8>).
As you mentioned, Iterator doesn't have a built-in way to do chunking. Let's incorporate Vladimir's solution and use an iterator over chunks:
fn stream(dst: &mut[u8], osrc: Option<&[u8]>) {
const CHUNK_SIZE: uint = 64;
fn inner<'a, T>(dst: &mut[u8], srciter: T) where T: Iterator<&'a [u8]> {
let mut block = [0u8, ..CHUNK_SIZE];
for (dstchunk, srcchunk) in dst.chunks_mut(CHUNK_SIZE).zip(srciter) {
some_function_that_initializes_block(block);
for (d, (&s, &b)) in dstchunk.iter_mut().zip(srcchunk.iter().zip(block.iter())) {
*d = s ^ b;
}
}
}
static ZEROES: &'static [u8] = &[0u8, ..CHUNK_SIZE];
match osrc {
None => inner(dst, repeat(ZEROES)),
Some(src) => inner(dst, src.chunks(CHUNK_SIZE))
}
}
Unfortunately, it is impossible to use different iterators directly or with trait objects (which have recently been changed to disallow instantiation of trait objects with inappropriate methods i.e. ones which use Self type in their signature). There is a workaround for your particular case, however. Just use enums:
fn stream(dst: &mut [u8], src: Option<&[u8]>) {
static EMPTY: &'static [u8] = &[0u8, ..64]; // '
enum DifferentIterators<'a> { // '
FromSlice(std::slice::Chunks<'a, u8>), // '
FromRepeat(std::iter::Repeat<&'a [u8]>) // '
}
impl<'a> Iterator<&'a [u8]> for DifferentIterators<'a> { // '
#[inline]
fn next(&mut self) -> Option<&'a [u8]> { // '
match *self {
FromSlice(ref mut i) => i.next(),
FromRepeat(ref mut i) => i.next()
}
}
}
let srciter = match src {
None => FromRepeat(repeat(EMPTY)),
Some(src) => FromSlice(src.chunks(64))
};
let mut block = [0u8, ..64];
for (dstchunk, srcchunk) in dst.chunks_mut(64).zip(srciter) {
some_function_that_initializes_block(block);
for (d, (&s, &b)) in dstchunk.iter_mut().zip(srcchunk.iter().zip(block.iter())) {
*d = s ^ b;
}
}
}
This is a lot of code, unfortunately, but in return it is more safe and less error-prone than the C version. It is also possible to optimize it in order not to require repeat() at all:
fn stream(dst: &mut [u8], src: Option<&[u8]>) {
static EMPTY: &'static [u8] = &[0u8, ..64]; // '
enum DifferentIterators<'a> { // '
FromSlice(std::slice::Chunks<'a, u8>), // '
AlwaysZeros
}
impl<'a> Iterator<&'a [u8]> for DifferentIterators<'a> { // '
#[inline]
fn next(&mut self) -> Option<&'a [u8]> { // '
match *self {
FromSlice(ref mut i) => i.next(),
AlwaysZeros => Some(STATIC),
}
}
}
let srciter = match src {
None => AlwaysZeros,
Some(src) => FromSlice(src.chunks(64))
};
let mut block = [0u8, ..64];
for (dstchunk, srcchunk) in dst.chunks_mut(64).zip(srciter) {
some_function_that_initializes_block(block);
for (d, (&s, &b)) in dstchunk.iter_mut().zip(srcchunk.iter().zip(block.iter())) {
*d = s ^ b;
}
}
}

Resources