Incrementally build data structures which reference the same data - rust

I have an EntityMap object which deals with spatially indexing anything that can have a bounding box. I've implemented it in such a way that it stores references to objects rather than owned values (this may not be the best way to do, but I think changing it to owned values would only shift my problem).
What I am currently attempting to do is add objects to a vector in such a way that they do not collide with anything that was previously added to the vector. The following is pseudo-rust of what I just described:
let mut final_moves = Vec::new();
let mut move_map = EntityMap::new();
for m in moves.into_iter() {
let close_moves = move_map.find_intersecting(m.bounding_box());
let new_move = m.modify_until_not_intersecting(close_moves);
final_moves.push(new_move);
move_map.insert(final_moves.last().unwrap());
}
final_moves is what is being returned from the function. The issue here is that I'm mutably borrowing final_moves, but I want to store references to its objects inside move_map. The conflict I can't figure out how to resolve is that I want to incrementally build final_moves and move_map at the same time, but this seems to require that they both be mutable, which then stops me from borrowing data from move_map.
How do I restructure my code to accomplish my goals?

You could change the types of move_map and final_moves to contain Rc<Move> instead of just Move.
let mut final_moves = Vec::new();
let mut move_map = EntityMap::new();
for m in moves.into_iter() {
let close_moves = move_map.find_intersecting(m.bounding_box());
let new_move = Rc::new(m.modify_until_not_intersecting(close_moves));
final_moves.push(Rc::clone(new_move));
move_map.insert(new_move);
}

Related

Understanding the effects of shared references on a nested data structure

Ownership Tree
Hi,
I was trying to understand ownership concepts in Rust and came across this image (attached in this post) in "Programming Rust" book.
In particular am concerned about the "Borrowing a shared reference" part. In the book, the author says
Values borrowed by shared references are read-only. Across the
lifetime of a shared reference, neither its referent, nor anything
reachable from that referent, can be changed by anything. There exist
no live mutable references to anything in that structure, its owner is
held read-only, and so on. It’s really frozen
In the image, he goes on to highlight the path along the ownership tree that becomes immutable once a shared reference is taken to a particular section of the ownership tree. But what confused me is that the author also mentions that certain other parts of the ownership tree are not read only.
So I tried to test out with this code:
fn main(){
let mut v = Vec::new();
v.push(Vec::new());
v[0].push(vec!["alpha".to_string()]);
v[0].push(vec!["beta".to_string(), "gamma".to_string()]);
let r2 = &(v[0][1]); //Taking a shared reference here
v[0][0].push("pi".to_string());
println!("{:?}", r2)
}
I understand that v[0][0] cannot be mutable because v itself is a immutable shared reference (as a consequence of the shared reference to v[0][1]) and the Rust compiler helpfully points it out. My question is that when the author marks certain parts along the ownership tree as "not read only", how can we access these parts to change them?
If my code snippet is not a correct example for what the author intended to convey, kindly help me with an example that demonstrates what the author is trying to imply here. Thanks.
There are particular cases where you can split borrows, creating simultaneously existing references that can be any mix of mutable and immutable as long as they don't overlap. These are:
Anything where the compiler can statically track the lack of overlap: that is, fields in a struct, tuple, or enum.
Specifically written unsafe code which provides this feature, such as mutable-reference iterators over collections.
Your code as written does not compile because the compiler does not attempt to understand what indexing a Vec does, so it does not possess and cannot use the fact that v[0][0] does not overlap v[0][1].
Here is program which works with a direct translation of the tree shown in the figure:
#[derive(Debug)]
struct Things {
label: &'static str,
a: Option<Box<Things>>,
b: Option<Box<Things>>,
c: Option<Box<Things>>,
}
fn main() {
// Construct depicted structure
let mut root = Box::new(Things {
label: "root",
a: None,
b: None,
c: Some(Box::new(Things {
label: "root.c",
a: None,
b: None,
c: None,
})),
});
// "Borrowing a shared reference"
// .as_ref().unwrap() gets `&Things` out of `&Option<Things>`
// (there are several other ways this could be done)
let shared_reference = &root.c.as_ref().unwrap();
let mutable_reference = &mut root.a;
// Now, root and root.a are in the "inaccessible" state because they are
// borrowed. (We could still create an &root.b reference).
// Mutate while the shared reference must still exist
dbg!(shared_reference);
*mutable_reference = Some(Box::new(Things {
label: "new",
a: None,
b: None,
c: None,
}));
dbg!(shared_reference);
// Now the references are not used any more, so we can access the root.
// Let's look at the change we made.
dbg!(root);
}
This program is accepted by the compiler because it understands that struct fields do not overlap, so the root may be split.
It is possible to split borrows of vectors — just not with the indexing operator. You can do it with pattern matching, mutable iteration, or with .split_at_mut(). Here's that last option, which is the most “random access” capable one:
fn main() {
let mut v = Vec::new();
v.push(Vec::new());
v[0].push(vec!["alpha".to_string()]);
v[0].push(vec!["beta".to_string(), "gamma".to_string()]);
let (half1, half2): (&mut [Vec<String>], &mut [Vec<String>]) =
v[0].split_at_mut(1);
let r1 = &mut half1[0];
let r2 = &half2[0];
r1.push("pi".to_string());
println!("{:?}", r2);
}
This program works because split_at_mut() contains unsafe code that specifically creates two non-overlapping slices. This is one of the fundamental tools of Rust: using unsafe inside of libraries to create sound abstractions that wouldn't be possible using just the concepts the compiler understands.
With a pattern match instead, it would be:
if let [r1, r2] = &mut *v[0] {
r1.push("pi".to_string());
println!("{:?}", r2);
} else {
// Pattern failed because the length did not match
panic!("oops, v was not two elements long");
}
This compiles because the compiler understands that pattern-matching a slice (or a struct, or anything else matchable) creates non-overlapping references to each element. (Pattern matching is implemented by the compiler and never runs Rust code to make decisions about the structure being matched.)
(This version has an explicit failure branch; the previous version would panic on the split_at_mut() or on half2[0] if v[0] was too short.)
Someone should probably check my answer, as I am fairly new to Rust myself.
But...
I think this is because a Vec doesn't uphold the same invariance as, say, a tuple or nested structs.
Here's a tuple version of the example you gave (Although tuples don't support pushing, so I'm just incrementing an integer):
fn main() {
let mut v = (((1, 3), (5)));
let r2 = &v.0.1; //Taking a shared reference here
let v2 = &mut v.0.0;
*v2 += 1;
println!("{:?}", r2);
}
The above compiles. But if you attempt to borrow: let r2 = &v.0.0;, you'll get the same error as before.
Now, if you want to actually use nested vectors for trees. There are some crates to help with that, which do not incur runtime costs. Namely token_cell (or its inspiration, ghost_cell):
https://docs.rs/token-cell/1.1.0/token_cell/index.html
https://docs.rs/ghost-cell/latest/ghost_cell/
Here's the example with a token_cell wrapping the vec tree structure:
use token_cell::*;
generate_static_token!(Token);
fn main() {
let mut token = Token::new();
let token2 = Token::new();
let v = TokenCell::new(vec![vec![
vec!["beta".to_string()],
vec!["gamma".to_string()],
]]);
let r2 = &v.borrow(&token2)[0][1]; //Taking a shared reference here
v.borrow_mut(&mut token)[0][0].push("pi".to_string());
println!("{:?}", r2)
}
I hope this clears some confusion up at least.

Is there a zero-copy way to find the intersection of an arbitrary number of sets?

Here is a simple example demonstrating what I'm trying to do:
use std::collections::HashSet;
fn main() {
let mut sets: Vec<HashSet<char>> = vec![];
let mut set = HashSet::new();
set.insert('a');
set.insert('b');
set.insert('c');
set.insert('d');
sets.push(set);
let mut set = HashSet::new();
set.insert('a');
set.insert('b');
set.insert('d');
set.insert('e');
sets.push(set);
let mut set = HashSet::new();
set.insert('a');
set.insert('b');
set.insert('f');
set.insert('g');
sets.push(set);
// Simple intersection of two sets
let simple_intersection = sets[0].intersection(&sets[1]);
println!("Intersection of 0 and 1: {:?}", simple_intersection);
let mut iter = sets.iter();
let base = iter.next().unwrap().clone();
let intersection = iter.fold(base, |acc, set| acc.intersection(set).map(|x| x.clone()).collect());
println!("Intersection of all: {:?}", intersection);
}
This solution uses fold to "accumulate" the intersection, using the first element as the initial value.
Intersections are lazy iterators which iterate through references to the involved sets. Since the accumulator has to have the same type as the first element, we have to clone each set's elements. We can't make a set of owned data from references without cloning. I think I understand this.
For example, this doesn't work:
let mut iter = sets.iter();
let mut base = iter.next().unwrap();
let intersection = iter.fold(base, |acc, set| acc.intersection(set).collect());
println!("Intersection of all: {:?}", intersection);
error[E0277]: a value of type `&HashSet<char>` cannot be built from an iterator over elements of type `&char`
--> src/main.rs:41:73
|
41 | let intersection = iter.fold(base, |acc, set| acc.intersection(set).collect());
| ^^^^^^^ value of type `&HashSet<char>` cannot be built from `std::iter::Iterator<Item=&char>`
|
= help: the trait `FromIterator<&char>` is not implemented for `&HashSet<char>`
Even understanding this, I still don't want to clone the data. In theory it shouldn't be necessary, I have the data in the original vector, I should be able to work with references. That would speed up my algorithm a lot. This is a purely academic pursuit, so I am interested in getting it to be as fast as possible.
To do this, I would need to accumulate in a HashSet<&char>s, but I can't do that because I can't intersect a HashSet<&char> with a HashSet<char> in the closure. So it seems like I'm stuck. Is there any way to do this?
Alternatively, I could make a set of references for each set in the vector, but that doesn't really seem much better. Would it even work? I might run into the same problem but with double references instead.
Finally, I don't actually need to retain the original data, so I'd be okay moving the elements into the accumulator set. I can't figure out how to make this happen, since I have to go through intersection which gives me references.
Are any of the above proposals possible? Is there some other zero copy solution that I'm not seeing?
Finally, I don't actually need to retain the original data.
This makes it really easy.
First, optionally sort the sets by size. Then:
let (intersection, others) = sets.split_at_mut(1);
let intersection = &mut intersection[0];
for other in others {
intersection.retain(|e| other.contains(e));
}
You can do it in a fully lazy way using filter and all:
sets[0].iter().filter (move |c| sets[1..].iter().all (|s| s.contains (c)))
Playground
Finally, I don't actually need to retain the original data, so I'd be okay moving the elements into the accumulator set.
The retain method will work perfectly for your requirements then:
fn intersection(mut sets: Vec<HashSet<char>>) -> HashSet<char> {
if sets.is_empty() {
return HashSet::new();
}
if sets.len() == 1 {
return sets.pop().unwrap();
}
let mut result = sets.pop().unwrap();
result.retain(|item| {
sets.iter().all(|set| set.contains(item))
});
result
}
playground

Get mutable reference from immutable array

I don't know if I phrased the title correctly, but here's the issue:
let mut rows: Vec<Box<String>> = vec![];
let row1 = &mut **rows.get(0).unwrap();
I want to store mutable references to multiple strings which are stored in boxes in the vector. This should be perfectly safe since I'm not referencing anything in the vector, just getting a box from the vector, dereferencing it and changing the memory it points to. If the vector gets too large and needs to reallocate its data my strings stay intact. But rust's compiler won't let me do that, I get the error cannot borrow data in a '&' reference as mutable
How do I design around this?
I could make rows mutable and use get_mut, but then I wouldn't be able to, for example, have mutable references to two rows at the same time:
let mut rows: Vec<Box<String>> = vec![];
let row1 = &mut **rows.get_mut(0).unwrap();
let row2 = &mut **rows.get_mut(1).unwrap();
*row1 = String::from("aaa");
*row2 = String::from("bbb");
This gives cannot borrow 'rows' as mutable more than once at a time.
Another solution would be to get each row only when I need to use it and then get it again if I need to use it again, but I don't think that's a very performant idea since I'd have to loop through the array to find the row I want every time I need to change something in it (I wouldn't be getting the array element by index).
EDIT: I am trying to design around storing mutable references to the strings, since Rust's compiler won't let me do that. Either I'm missing something I can do to have multiple mutable references to the strings or I need find another way to accomplish that. I need to have mutable references to what the boxes contain, in my program they're not strings, they're structs which have mutable functions, but I used strings here for the sake of simplicity. Here's a bit of code to clarify it
struct Table
{
cols: Vec<Box<Col>>
}
//...
let mut table = Table::new();
let mut id_col = table.new_col("ID" .to_owned());
let mut status_col = table.new_col("Status" .to_owned());
let mut title_col = table.new_col("Title" .to_owned());
let mut deadline_col = table.new_col("Deadline" .to_owned());
let mut tags_col = table.new_col("Tags" .to_owned());
let mut repeat_col = table.new_col("Repeat" .to_owned());
(I used rows in my first example to make it easier to understand)
I will iterate over some data and append stuff to these columns, so I don't want to search for them by name in the vector in each iteration, I want to have "cached" references to them (which are all of these variables). The problem is that my compiler won't let me do that because I can't borrow table as mutable more than once. So I mean by "designing around" is changing my way of thinking and restructuring my code so I don't have this problem.
I want to store mutable references to multiple strings which are stored in boxes in the vector. This should be perfectly safe since I'm not referencing anything in the vector, just getting a box from the vector, dereferencing it and changing the memory it points to. If the vector gets too large and needs to reallocate its data my strings stay intact.
The compiler has literally no idea about this, and it's not a certainty either: in your scheme nothing prevents getting a second mutable handle on the same box or string and breaking that.
RefCell exists for this sort of situations ("interior mutability" is the concept your want to look for), and it implies a runtime performance hit as it needs to keep track of extant borrows (it's basically a single-threaded RWLock).
I could make rows mutable and use get_mut, but then I wouldn't be able to, for example, have mutable references to two rows at the same time
That's what split_mut (and friends) is for.
(I used rows in my first example to make it easier to understand) I will iterate over some data and append stuff to these columns, so I don't want to search for them by name in the vector in each iteration, I want to have "cached" references to them (which are all of these variables). The problem is that my compiler won't let me do that because I can't borrow table as mutable more than once. So I mean by "designing around" is changing my way of thinking and restructuring my code so I don't have this problem.
Yeah no, that's not just having mutable handles on separate parts of a collection, it's having mutable handles while modifying the parent. You probably need Rc (instead of Box) and a RefCell (for interior mutability).
Or do a second pass of split_first_mut or an unrolled iterator:
let mut t = table.cols.iter_mut();
let id_col = t.next().unwrap();
let status_col = t.next().unwrap();
let title_col = t.next().unwrap();
let deadline_col = t.next().unwrap();
let tags_col = t.next().unwrap();
let repeat_col = t.next().unwrap();
with the issue that you will not be able to modify table until all of these references are dead. And that it's pretty ugly.

Is there a way to add elements to a container while immutably borrowing earlier elements?

I'm building a GUI and I want to store all used textures in one place, but I have to add new textures while older textures are already immutably borrowed.
let (cat, mouse, dog) = (42, 360, 420); // example values
let mut container = vec![cat, mouse]; // new container
let foo = &container[0]; // now container is immutably borrowed
container.push(dog); // error: mutable borrow
Is there any kind of existing structure that allows something like this,
or can I implement something like this using raw pointers?
The absolute easiest thing is to introduce shared ownership:
use std::rc::Rc;
fn main() {
let (cat, mouse, dog) = (42, 360, 420);
let mut container = vec![Rc::new(cat), Rc::new(mouse)];
let foo = container[0].clone();
container.push(Rc::new(dog));
}
Now container and foo jointly own cat.
Is there any kind of existing structure that allows something like this,
Yes, but there are always tradeoffs. Above, we used Rc to share ownership, which involves a reference counter.
Another potential solution is to use an arena:
extern crate typed_arena;
use typed_arena::Arena;
fn main() {
let container = Arena::new();
let cat = container.alloc(42);
let mouse = container.alloc(360);
let dog = container.alloc(420);
}
This isn't indexable, you cannot take ownership of the value again, and you cannot remove a value.
Being able to remove things from the collection always makes invalidating references dangerous.
can I implement something like this using raw pointers
Almost certainly. Will you get it right is always the tricky point.
but I have to add new textures while older textures are already immutably borrowed
Many times, you don't have to do any such thing. For example, you can split up your logic into phases. You have two containers; one that people take references into, and another to collect new values. At the end of the phase, you combine the two collections into one. You have to ensure that no references are used after the end of the phase, of course.

Visibility of set outside of infinite loop filling it [duplicate]

I set myself a little task to acquire some basic Rust knowledge. The task was:
Read some key-value pairs from stdin and put them into a hashmap.
This, however, turned out to be a trickier challenge than expected. Mainly due to the understanding of lifetimes. The following code is what I currently have after a few experiments, but the compiler just doesn't stop yelling at me.
use std::io;
use std::collections::HashMap;
fn main() {
let mut input = io::stdin();
let mut lock = input.lock();
let mut lines_iter = lock.lines();
let mut map = HashMap::new();
for line in lines_iter {
let text = line.ok().unwrap();
let kv_pair: Vec<&str> = text.words().take(2).collect();
map.insert(kv_pair[0], kv_pair[1]);
}
println!("{}", map.len());
}
The compiler basically says:
`text` does not live long enough
As far as I understand, this is because the lifetime of 'text' is limited to the scope of the loop.
The key-value pair that I'm extracting within the loop is therefore also bound to the loops boundaries. Thus, inserting them to the outer map would lead to a dangling pointer since 'text' will be destroyed after each iteration. (Please tell me if I'm wrong)
The big question is: How to solve this issue?
My intuition says:
Make an "owned copy" of the key value pair and "expand" it's lifetime to the outer scope .... but I have no idea how to achieve this.
The lifetime of 'text' is limited to the scope of the loop. The key-value pair that I'm extracting within the loop is therefore also bound to the loops boundaries. Thus, inserting them to the outer map would lead to an dangling pointer since 'text' will be destroyed after each iteration.
Sounds right to me.
Make an "owned copy" of the key value pair.
An owned &str is a String:
map.insert(kv_pair[0].to_string(), kv_pair[1].to_string());
Edit
The original code is below, but I've updated the answer above to be more idiomatic
map.insert(String::from_str(kv_pair[0]), String::from_str(kv_pair[1]));
In Rust 1.1 the function words was marked as deprecated. Now you should use split_whitespace.
Here is an alternative solution which is a bit more functional and idiomatic (works with 1.3).
use std::io::{self, BufRead};
use std::collections::HashMap;
fn main() {
let stdin = io::stdin();
// iterate over all lines, "change" the lines and collect into `HashMap`
let map: HashMap<_, _> = stdin.lock().lines().filter_map(|line_res| {
// convert `Result` to `Option` and map the `Some`-value to a pair of
// `String`s
line_res.ok().map(|line| {
let kv: Vec<_> = line.split_whitespace().take(2).collect();
(kv[0].to_owned(), kv[1].to_owned())
})
}).collect();
println!("{}", map.len());
}

Resources