How to force rust to drop a mutable borrow?

How to force rust to drop a mutable borrow? - rust

How can I force Rust borrow checker to forget about mutable borrow, with unsafe or not? I assumed that even lexical lifetimes should allow reusing mutable reference after a brackets block, but even Non-Lexical-Lifetimes from the fresh Rust release forbids that. With no outcome I tried some variants of unsafe and std::mem::drop.
#[derive(Debug)]
struct Storage {
s: String,
v: Vec<u8>,
}
#[derive(Debug)]
enum Ret<'a> {
S(&'a mut String),
V(&'a mut Storage),
}
fn fake_get_s<'a, 'b:'a>(_unused_storage: &'b mut Storage) -> Option<&'a mut String> {
None
}
fn get_ret<'a> (storage: &'a mut Storage) -> Ret<'a> {
// Borrowed for a limited block, assuming to release..
{
match fake_get_s(storage) {
Some(string_ref) => {
return Ret::S(string_ref);
}
None => (),
}
// ..with no one keeping the borrow..
}
// ..but, surprisingly, I still can't borrow the origin variable
// Ridiculous, right?
Ret::V(storage)
}
fn main() {
let mut storage = Storage{s: "string".to_string(), v: vec![1, 2, 3]};
let res = get_ret(&mut storage);
println!("{:?}", res);
}
make sure this code does not compile

Your code sadly runs into a known false positive in the borrow checker.
At the time of writing, there is no way to fix this issue without unsafe. Luckily, there's a workaround crate that encapsulates the unsafe in a sound manner, called polonius-the-crab.
Here it is, applied to your code:
use polonius_the_crab::{polonius, polonius_return};
#[derive(Debug)]
struct Storage {
s: String,
v: Vec<u8>,
}
#[derive(Debug)]
enum Ret<'a> {
S(&'a mut String),
V(&'a mut Storage),
}
fn fake_get_s<'a, 'b: 'a>(_unused_storage: &'b mut Storage) -> Option<&'a mut String> {
None
}
fn get_ret<'a>(mut storage: &'a mut Storage) -> Ret<'a> {
polonius!(|storage| -> Ret<'polonius> {
match fake_get_s(storage) {
Some(string_ref) => {
polonius_return!(Ret::S(string_ref));
}
None => (),
}
});
Ret::V(storage)
}
fn main() {
let mut storage = Storage {
s: "string".to_string(),
v: vec![1, 2, 3],
};
let res = get_ret(&mut storage);
println!("{:?}", res);
}
Note that storage now has to be a mut variable, because polonius borrows it indefinitely (like before), but then writes it back with the lifetime recovered through unsafe.
Alternatives
One solution I've used myself already is to decouple the decision from the result computation by querying twice:
fn get_ret(storage: &mut Storage) -> Ret<'_> {
if fake_get_s(storage).is_some() {
return Ret::S(fake_get_s(storage).unwrap());
}
Ret::V(storage)
}
Or, if you must overwrite the borrow checker with unsafe:
fn get_ret(storage: &mut Storage) -> Ret<'_> {
{
let storage = unsafe { &mut *(storage as *mut Storage) };
match fake_get_s(storage) {
Some(string_ref) => {
return Ret::S(string_ref);
}
None => (),
}
}
Ret::V(storage)
}

Related

Struct with field holding a mutable reference to another instance of this same struct

I have a struct which has a field (named outer) that I would like to hold a mutable reference to another instance of this same struct. I want to specify to the compiler that this outer field will always live for at least as long as the current instance.
A simplified version of the issue (that does not compile yet):
use std::collections::HashMap;
#[derive(Debug)]
struct Env<'a> {
symbols: HashMap<String, i64>,
outer: Option<&'a mut Env<'a>>
}
impl <'a> Env<'a> {
fn new() -> Self {
Env {
symbols: HashMap::new(),
outer: None,
}
}
fn new_enclosed(outer: &'a mut Env<'a>) -> Self {
Env {
symbols: HashMap::new(),
outer: Some(outer),
}
}
fn resolve(&self, name: &str) -> i64 {
if self.symbols.contains_key(name) {
return *self.symbols.get(name).unwrap();
}
if let Some(outer) = &self.outer {
return outer.resolve(name);
}
return 0;
}
fn insert(&mut self, name: &str, value: i64) {
self.symbols.insert(name.to_owned(), value);
}
fn update(&mut self, name: &str, value: i64) {
if self.symbols.contains_key(name) {
self.symbols.insert(name.to_owned(), value);
} else {
if let Some(outer) = &mut self.outer {
return outer.update(name, value);
}
}
}
}
fn main() {
// outer env
let mut env = Env::new();
env.insert("foo", 0);
env.insert("bar", 0);
one(&mut env);
assert_eq!(env.resolve("foo"), 0); // <-- Cannot borrow as mutable more than once.
assert_eq!(env.resolve("bar"), 2); // <-- Cannot borrow as mutable more than once.
}
fn one<'a>(outer: &'a mut Env<'a>) {
let mut env = Env::new_enclosed(outer);
env.insert("foo", 1);
env.update("bar", 1);
for i in 1..3 {
two(&mut env); // <-- Cannot borrow as mutable more than once.
}
}
fn two<'b>(outer: &'b mut Env<'b>) {
let mut env = Env::new_enclosed(outer);
env.insert("foo", 2);
env.update("bar", 2);
}
Playground link: https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=0836f29242e6683a15b57d829657306b
In theory, this sort of structure should be possible without wrapping Env in <Rc<RefCell<...>>, right? So I'm guessing I am messing up the lifetime parameters for the functions one and two in this example, but can't seem to figure it out.
Any pointers (pun intended) would be greatly appreciated.
EDIT 1:
So now I don't think this sort of setup is possible without RefCell because I am passing the mutable reference inside a loop, which is never possible (right?). So there probably isn't an easy answer to this besides a completely different structure altogether, which I would still be very interested to hear other people's thoughts on. But also maybe I should not be afraid to use RefCell at all?
For example, are there any runtime costs associated with RefCell when simply using it to borrow a non-mutable reference?

Borrow checker and shared I/O

I'm hitting a problem in my code where multiple structs need to send data to a shared output sink and the borrow checker doesn't like it.
struct SharedWriter {
count: u32,
}
impl SharedWriter {
pub fn write(&mut self) {
self.count += 1;
}
}
struct Container<'a> {
writer: &'a mut SharedWriter,
}
impl Container<'_> {
pub fn write(&mut self) {
self.writer.write();
}
}
pub fn test() {
let mut writer = SharedWriter { count: 0 };
let mut c0 = Container {
writer: &mut writer,
};
let mut c1 = Container {
// compiler chokes here with:
// cannot borrow `writer` as mutable more than once at a time
writer: &mut writer,
};
c0.write();
c1.write();
}
I understand the problem and why it's happening; you can't borrow something as mutable more than once at a time.
What I don't understand is a good general solution. This pattern happens a lot. You've got a common output sink, like a file or a socket or a database, and you want to feed multiple streams of data to it. It has to be mutable if it maintains any kind of state. It has to be just a single entity if it holds any resources.
You could pass a reference to the sink in every single write() method (write(&mut writer, some_data)), but this clutters the code and will get called (in my particular app) millions of times per second. I'm speculating that there is some extra overhead in passing this parameter over and over.
Is there some syntax that will get past this problem?

Interior mutability.
In your case the easiest way is probably to use RefCell. It will have some runtime overhead, but it is safe.
use std::cell::RefCell;
struct SharedWriter {
count: RefCell<u32>,
}
impl SharedWriter {
pub fn new(count: u32) -> Self {
Self { count: RefCell::new(count) }
}
pub fn write(&self) {
*self.count.borrow_mut() += 1;
}
}
If the data is Copy (like u32, in case this is your real data), you may want to use Cell. It is applicable to less types but zero-cost:
use std::cell::Cell;
struct SharedWriter {
count: Cell<u32>,
}
impl SharedWriter {
pub fn new(count: u32) -> Self {
Self { count: Cell::new(count) }
}
pub fn write(&self) {
self.count.set(self.count.get() + 1);
}
}
There are more interior mutability primitives (for example, UnsafeCell for zero-cost but unsafe access, or mutexes and atomics for thread safe mutation).

One option would be to use a channel. Here is an example of how that might look. This also has the added benefit of allowing you to scale across multiple threads with your io. It takes a handler which it runs in a loop on a new thread. It blocks until a value sent through the sender is received then calls func with a given value and a mutable reference to the handler. The thread exits when all the senders have been dropped. However one downside of this approach is the channel only works in one direction.
use std::sync::mpsc::{channel, Sender};
use std::thread::{self, JoinHandle};
pub fn create_shared_io<T, H, F>(mut handler: H, mut func: F) -> (JoinHandle<H>, Sender<T>)
where
T: 'static + Send,
H: 'static + Send,
F: 'static + FnMut(&mut H, T) + Send,
{
let (send, recv) = channel();
let join_handle = thread::spawn(move || loop {
let value = match recv.recv() {
Ok(v) => v,
Err(_) => break handler,
};
func(&mut handler, value);
});
(join_handle, send)
}
And then it can be used similarly to your example. Since no data was passed in your example, it sends () as a placeholder.
pub fn main() {
let writer = SharedWriter { count: 0 };
println!("Starting!");
let (join_handle, sender) = create_shared_io(writer, |writer, _| {
writer.count += 1;
println!("Current count: {}", writer.count);
});
let mut c0 = Container {
writer: sender.clone(),
};
let mut c1 = Container {
writer: sender,
};
c0.write();
c1.write();
// Ensure the senders are dropped before we join the io thread to avoid possible deadlock
// where the compiler attempts to drop these values after the join.
std::mem::drop((c0, c1));
// Writer is returned when the thread is joined
let writer = join_handle.join().unwrap();
println!("Finished!");
}
struct SharedWriter {
count: u32,
}
struct Container {
writer: Sender<()>,
}
impl Container {
pub fn write(&mut self) {
self.writer.send(()).unwrap();
}
}

Rust struct within struct: borrowing, lifetime, generic types and more total confusion

I'm trying to modify an existing application that forces me to learn rust and it's giving me a hard time (reformulating...)
I would like to have a struct with two fields:
pub struct Something<'a> {
pkt_wtr: PacketWriter<&'a mut Vec<u8>>,
buf: Vec<u8>,
}
Where 'buf' will be used as an io for PacketWriter to write its results. So PacketWriter is something like
use std::io::{self};
pub struct PacketWriter<T :io::Write> {
wtr :T,
}
impl <T :io::Write> PacketWriter<T> {
pub fn new(wtr :T) -> Self {
return PacketWriter {
wtr,
};
}
pub fn into_inner(self) -> T {
self.wtr
}
pub fn write(&mut self) {
self.wtr.write_all(&[10,11,12]).unwrap();
println!("wrote packet");
}
}
Then inside 'Something' I want to use PacketWriter this way: let it write what it needs in 'buf' and drain it by pieces.
impl Something<'_> {
pub fn process(&mut self) {
self.pkt_wtr.write();
let c = self.buf.drain(0..1);
}
}
What seems to be impossible is to create a workable constructor for 'Something'
impl Something<'_> {
pub fn new() -> Self {
let mut buf = Vec::new();
let pkt_wtr = PacketWriter::new(&mut buf);
return Something {
pkt_wtr: pkt_wtr,
buf: buf,
};
}
}
What does not seem to be doable is, however I try, to have PacketWriter being constructed on a borrowed reference from 'buf' while 'buf' is also stored in the 'Something' object.
I can give 'buf' fully to 'PacketWriter' (per example below) but I cannot then access the content of 'buf' later. I know that it works in the example underneath, but it's because I can have access to the 'buf' after it is given to the "PacketWriter' (through 'wtr'). In reality, the 'PacketWriter' has that field (wtr) private and in addition it's a code that I cannot modify to, for example, obtain a getter for 'wtr'
Thanks
I wrote a small working program to describe the intent and the problem, with the two options
use std::io::{self};
pub struct PacketWriter<T :io::Write> {
wtr :T,
}
impl <T :io::Write> PacketWriter<T> {
pub fn new(wtr :T) -> Self {
return PacketWriter {
wtr,
};
}
pub fn into_inner(self) -> T {
self.wtr
}
pub fn write(&mut self) {
self.wtr.write_all(&[10,11,12]).unwrap();
println!("wrote packet");
}
}
/*
// that does not work of course because buf is local but this is not the issue
pub struct Something<'a> {
pkt_wtr: PacketWriter<&'a mut Vec<u8>>,
buf: Vec<u8>,
}
impl Something<'_> {
pub fn new() -> Self {
let mut buf = Vec::new();
let pkt_wtr = PacketWriter::new(&mut buf);
//let mut pkt_wtr = PacketWriter::new(buf);
return Something {
pkt_wtr,
buf,
};
}
pub fn process(&mut self) {
self.pkt_wtr.write();
println!("process {:?}", self.buf);
}
}
*/
pub struct Something {
pkt_wtr: PacketWriter<Vec<u8>>,
}
impl Something {
pub fn new() -> Self {
let pkt_wtr = PacketWriter::new(Vec::new());
return Something {
pkt_wtr,
};
}
pub fn process(&mut self) {
self.pkt_wtr.write();
let file = &mut self.pkt_wtr.wtr;
println!("processing Something {:?}", file);
let c = file.drain(0..1);
println!("Drained {:?}", c);
}
}
fn main() -> std::io::Result<()> {
let mut file = Vec::new();
let mut wtr = PacketWriter::new(&mut file);
wtr.write();
println!("Got data {:?}", file);
{
let c = file.drain(0..2);
println!("Drained {:?}", c);
}
println!("Remains {:?}", file);
let mut data = Something::new();
data.process();
Ok(())
}

It's not totally clear what the question is, given that the code appears to compile, but I can take a stab at one part: why can't you use into_inner() on self.wtr inside the process function?
into_inner takes ownership of the PacketWriter that gets passed into its self parameter. (You can tell this because the parameter is spelled self, rather than &self or &mut self.) Taking ownership means that it is consumed: it cannot be used anymore by the caller and the callee is responsible for dropping it (read: running destructors). After taking ownership of the PacketWriter, the into_inner function returns just the wtr field and drops (runs destructors on) the rest. But where does that leave the Something struct? It has a field that needs to contain a PacketWriter, and you just took its PacketWriter away and destroyed it! The function ends, and the value held in the PacketWriter field is unknown: it can't be thing that was in there from the beginning, because that was taken over by into_inner and destroyed. But it also can't be anything else.
Rust generally forbids structs from having uninitialized or undefined fields. You need to have that field defined at all times.
Here's the worked example:
pub fn process(&mut self) {
self.pkt_wtr.write();
// There's a valid PacketWriter in pkt_wtr
let raw_wtr: Vec<u8> = self.pkt_wtr.into_inner();
// The PacketWriter in pkt_wtr was consumed by into_inner!
// We have a raw_wtr of type Vec<u8>, but that's not the right type for pkt_wtr
// We could try to call this function here, but what would it do?
self.pkt_wtr.write();
println!("processing Something");
}
(Note: The example above has slightly squishy logic. Formally, because you don't own self, you can't do anything that would take ownership of any part of it, even if you put everything back neatly when you're done.)
You have a few options to fix this, but with one major caveat: with the public interface you have described, there is no way to get access to the PacketWriter::wtr field and put it back into the same PacketWriter. You'll have to extract the PacketWriter::wtr field and put it into a new PacketWriter.
Here's one way you could do it. Remember, the goal is to have self.packet_wtr defined at all times, so we'll use a function called mem::replace to put a dummy PacketWriter into self.pkt_wtr. This ensures that self.pkt_wtr always has something in it.
pub fn process(&mut self) {
self.pkt_wtr.write();
// Create a new dummy PacketWriter and swap it with self.pkt_wtr
// Returns an owned version of pkt_wtr that we're free to consume
let pkt_wtr_owned = std::mem::replace(&mut self.pkt_wtr, PacketWriter::new(Vec::new()));
// Consume pkt_wtr_owned, returning its wtr field
let raw_wtr = pkt_wtr_owned.into_inner();
// Do anything you want with raw_wtr here -- you own it.
println!("The vec is: {:?}", &raw_wtr);
// Create a new PacketWriter with the old PacketWriter's buffer.
// The dummy PacketWriter is dropped here.
self.pkt_wtr = PacketWriter::new(raw_wtr);
println!("processing Something");
}
Rust Playground
This solution is definitely a hack, and it's potentially a place where the borrow checker could be improved to realize that leaving a field temporarily undefined is fine, as long as it's not accessed before it is assigned again. (Though there may be an edge case I missed; this stuff is hard to reason about in general.) Additionally, this is the kind of thing that can be optimized away by later compiler passes through dead store elimination.
If this turns out to be a hotspot when profiling, there are unsafe techniques that would allow the field to be invalid for that period, but that would probably need a new question.
However, my recommendation would be to find a way to get an "escape hatch" function added to PacketWriter that lets you do exactly what you want to do: get a mutable reference to the inner wtr without taking ownership of PacketWriter.
impl<T: io::Write> PacketWriter<T> {
pub fn inner_mut(&mut self) -> &mut T {
&mut self.wtr
}
}

For clarification, I found a solution using Rc+RefCell or Arc+Mutex. I encapsulated the buffer in a Rc/RefCell and added a Write
pub struct WrappedWriter {
data :Arc<Mutex<Vec<u8>>>,
}
impl WrappedWriter {
pub fn new(data : Arc<Mutex<Vec<u8>>>) -> Self {
return WrappedWriter {
data,
};
}
}
impl Write for WrappedWriter {
fn write(&mut self, buf: &[u8]) -> Result<usize, Error> {
let mut data = self.data.lock().unwrap();
data.write(buf)
}
fn flush(&mut self) -> Result<(), Error> {
Ok(())
}
}
pub struct Something {
wtr: PacketWriter<WrappedWriter>,
data : Arc<Mutex<Vec<u8>>>,
}
impl Something {
pub fn new() -> Result<Self, Error> {
let data :Arc<Mutex<Vec<u8>>> = Arc::new(Mutex::new(Vec::new()));
let wtr = PacketWriter::new(WrappedWriter::new(Arc::clone(&data)));
return Ok(PassthroughDecoder {
wtr,
data,
});
}
pub fn process(&mut self) {
let mut data = self.data.lock().unwrap();
data.clear();
}
}
You can replace Arc by Rc and Mutex by RefCell if you don't have thread-safe issues in which case the reference access becomes
let data = self.data.borrow_mut();

How do I efficiently build a vector and an index of that vector while processing a data stream?

I have a struct Foo:
struct Foo {
v: String,
// Other data not important for the question
}
I want to handle a data stream and save the result into Vec<Foo> and also create an index for this Vec<Foo> on the field Foo::v.
I want to use a HashMap<&str, usize> for the index, where the keys will be &Foo::v and the value is the position in the Vec<Foo>, but I'm open to other suggestions.
I want to do the data stream handling as fast as possible, which requires not doing obvious things twice.
For example, I want to:
allocate a String only once per one data stream reading
not search the index twice, once to check that the key does not exist, once for inserting new key.
not increase the run time by using Rc or RefCell.
The borrow checker does not allow this code:
let mut l = Vec::<Foo>::new();
{
let mut hash = HashMap::<&str, usize>::new();
//here is loop in real code, like:
//let mut s: String;
//while get_s(&mut s) {
let s = "aaa".to_string();
let idx: usize = match hash.entry(&s) { //a
Occupied(ent) => {
*ent.get()
}
Vacant(ent) => {
l.push(Foo { v: s }); //b
ent.insert(l.len() - 1);
l.len() - 1
}
};
// do something with idx
}
There are multiple problems:
hash.entry borrows the key so s must have a "bigger" lifetime than hash
I want to move s at line (b), while I have a read-only reference at line (a)
So how should I implement this simple algorithm without an extra call to String::clone or calling HashMap::get after calling HashMap::insert?

In general, what you are trying to accomplish is unsafe and Rust is correctly preventing you from doing something you shouldn't. For a simple example why, consider a Vec<u8>. If the vector has one item and a capacity of one, adding another value to the vector will cause a re-allocation and copying of all the values in the vector, invalidating any references into the vector. This would cause all of your keys in your index to point to arbitrary memory addresses, thus leading to unsafe behavior. The compiler prevents that.
In this case, there's two extra pieces of information that the compiler is unaware of but the programmer isn't:
There's an extra indirection — String is heap-allocated, so moving the pointer to that heap allocation isn't really a problem.
The String will never be changed. If it were, then it might reallocate, invalidating the referred-to address. Using a Box<[str]> instead of a String would be a way to enforce this via the type system.
In cases like this, it is OK to use unsafe code, so long as you properly document why it's not unsafe.
use std::collections::HashMap;
#[derive(Debug)]
struct Player {
name: String,
}
fn main() {
let names = ["alice", "bob", "clarice", "danny", "eustice", "frank"];
let mut players = Vec::new();
let mut index = HashMap::new();
for &name in &names {
let player = Player { name: name.into() };
let idx = players.len();
// I copied this code from Stack Overflow without reading the prose
// that describes why this unsafe block is actually safe
let stable_name: &str = unsafe { &*(player.name.as_str() as *const str) };
players.push(player);
index.insert(idx, stable_name);
}
for (k, v) in &index {
println!("{:?} -> {:?}", k, v);
}
for v in &players {
println!("{:?}", v);
}
}
However, my guess is that you don't want this code in your main method but want to return it from some function. That will be a problem, as you will quickly run into Why can't I store a value and a reference to that value in the same struct?.
Honestly, there's styles of code that don't fit well within Rust's limitations. If you run into these, you could:
decide that Rust isn't a good fit for you or your problem.
use unsafe code, preferably thoroughly tested and only exposing a safe API.
investigate alternate representations.
For example, I'd probably rewrite the code to have the index be the primary owner of the key:
use std::collections::BTreeMap;
#[derive(Debug)]
struct Player<'a> {
name: &'a str,
data: &'a PlayerData,
}
#[derive(Debug)]
struct PlayerData {
hit_points: u8,
}
#[derive(Debug)]
struct Players(BTreeMap<String, PlayerData>);
impl Players {
fn new<I>(iter: I) -> Self
where
I: IntoIterator,
I::Item: Into<String>,
{
let players = iter
.into_iter()
.map(|name| (name.into(), PlayerData { hit_points: 100 }))
.collect();
Players(players)
}
fn get<'a>(&'a self, name: &'a str) -> Option<Player<'a>> {
self.0.get(name).map(|data| Player { name, data })
}
}
fn main() {
let names = ["alice", "bob", "clarice", "danny", "eustice", "frank"];
let players = Players::new(names.iter().copied());
for (k, v) in &players.0 {
println!("{:?} -> {:?}", k, v);
}
println!("{:?}", players.get("eustice"));
}
Alternatively, as shown in What's the idiomatic way to make a lookup table which uses field of the item as the key?, you could wrap your type and store it in a set container instead:
use std::collections::BTreeSet;
#[derive(Debug, PartialEq, Eq)]
struct Player {
name: String,
hit_points: u8,
}
#[derive(Debug, Eq)]
struct PlayerByName(Player);
impl PlayerByName {
fn key(&self) -> &str {
&self.0.name
}
}
impl PartialOrd for PlayerByName {
fn partial_cmp(&self, other: &Self) -> Option<std::cmp::Ordering> {
Some(self.cmp(other))
}
}
impl Ord for PlayerByName {
fn cmp(&self, other: &Self) -> std::cmp::Ordering {
self.key().cmp(&other.key())
}
}
impl PartialEq for PlayerByName {
fn eq(&self, other: &Self) -> bool {
self.key() == other.key()
}
}
impl std::borrow::Borrow<str> for PlayerByName {
fn borrow(&self) -> &str {
self.key()
}
}
#[derive(Debug)]
struct Players(BTreeSet<PlayerByName>);
impl Players {
fn new<I>(iter: I) -> Self
where
I: IntoIterator,
I::Item: Into<String>,
{
let players = iter
.into_iter()
.map(|name| {
PlayerByName(Player {
name: name.into(),
hit_points: 100,
})
})
.collect();
Players(players)
}
fn get(&self, name: &str) -> Option<&Player> {
self.0.get(name).map(|pbn| &pbn.0)
}
}
fn main() {
let names = ["alice", "bob", "clarice", "danny", "eustice", "frank"];
let players = Players::new(names.iter().copied());
for player in &players.0 {
println!("{:?}", player.0);
}
println!("{:?}", players.get("eustice"));
}
not increase the run time by using Rc or RefCell
Guessing about performance characteristics without performing profiling is never a good idea. I honestly don't believe that there'd be a noticeable performance loss from incrementing an integer when a value is cloned or dropped. If the problem required both an index and a vector, then I would reach for some kind of shared ownership.

not increase the run time by using Rc or RefCell.
#Shepmaster already demonstrated accomplishing this using unsafe, once you have I would encourage you to check how much Rc actually would cost you. Here is a full version with Rc:
use std::{
collections::{hash_map::Entry, HashMap},
rc::Rc,
};
#[derive(Debug)]
struct Foo {
v: Rc<str>,
}
#[derive(Debug)]
struct Collection {
vec: Vec<Foo>,
index: HashMap<Rc<str>, usize>,
}
impl Foo {
fn new(s: &str) -> Foo {
Foo {
v: s.into(),
}
}
}
impl Collection {
fn new() -> Collection {
Collection {
vec: Vec::new(),
index: HashMap::new(),
}
}
fn insert(&mut self, foo: Foo) {
match self.index.entry(foo.v.clone()) {
Entry::Occupied(o) => panic!(
"Duplicate entry for: {}, {:?} inserted before {:?}",
foo.v,
o.get(),
foo
),
Entry::Vacant(v) => v.insert(self.vec.len()),
};
self.vec.push(foo)
}
}
fn main() {
let mut collection = Collection::new();
for foo in vec![Foo::new("Hello"), Foo::new("World"), Foo::new("Go!")] {
collection.insert(foo)
}
println!("{:?}", collection);
}

The error is:
error: `s` does not live long enough
--> <anon>:27:5
|
16 | let idx: usize = match hash.entry(&s) { //a
| - borrow occurs here
...
27 | }
| ^ `s` dropped here while still borrowed
|
= note: values in a scope are dropped in the opposite order they are created
The note: at the end is where the answer is.
s must outlive hash because you are using &s as a key in the HashMap. This reference will become invalid when s is dropped. But, as the note says, hash will be dropped after s. A quick fix is to swap the order of their declarations:
let s = "aaa".to_string();
let mut hash = HashMap::<&str, usize>::new();
But now you have another problem:
error[E0505]: cannot move out of `s` because it is borrowed
--> <anon>:22:33
|
17 | let idx: usize = match hash.entry(&s) { //a
| - borrow of `s` occurs here
...
22 | l.push(Foo { v: s }); //b
| ^ move out of `s` occurs here
This one is more obvious. s is borrowed by the Entry, which will live to the end of the block. Cloning s will fix that:
l.push(Foo { v: s.clone() }); //b
I only want to allocate s only once, not cloning it
But the type of Foo.v is String, so it will own its own copy of the str anyway. Just that type means you have to copy the s.
You can replace it with a &str instead which will allow it to stay as a reference into s:
struct Foo<'a> {
v: &'a str,
}
pub fn main() {
// s now lives longer than l
let s = "aaa".to_string();
let mut l = Vec::<Foo>::new();
{
let mut hash = HashMap::<&str, usize>::new();
let idx: usize = match hash.entry(&s) {
Occupied(ent) => {
*ent.get()
}
Vacant(ent) => {
l.push(Foo { v: &s });
ent.insert(l.len() - 1);
l.len() - 1
}
};
}
}
Note that, previously I had to move the declaration of s to before hash, so that it would outlive it. But now, l holds a reference to s, so it has to be declared even earlier, so that it outlives l.

What are the Rust borrowing rules regarding mutable internal references?

It's not intuitive to me why a program like
#[derive(Debug)]
struct Test {
buf: [u8; 16],
}
impl Test {
fn new() -> Test {
Test {
buf: [0u8; 16],
}
}
fn hi(&mut self) {
self.buf[0] = 'H' as u8;
self.buf[1] = 'i' as u8;
self.buf[2] = '!' as u8;
self.print();
}
fn print(&self) {
println!("{:?}", self);
}
}
fn main() {
Test::new().hi();
}
compiles and runs without any problem, but a program like
#[derive(Debug)]
enum State {
Testing([u8; 16]),
}
#[derive(Debug)]
struct Test {
state: State,
}
impl Test {
fn new() -> Test {
Test {
state: State::Testing([0u8; 16]),
}
}
fn hi(&mut self) {
match self.state {
State::Testing(ref mut buf) => {
buf[0] = 'H' as u8;
buf[1] = 'i' as u8;
buf[2] = '!' as u8;
self.print();
},
}
}
fn print(&self) {
println!("{:?}", self);
}
}
fn main() {
Test::new().hi();
}
errors during compilation with an error of
error[E0502]: cannot borrow *self as immutable because
self.state.0 is also borrowed as mutable
Since both programs do essentially the same thing, the second doesn't seem like it would be somehow more unsafe from a memory perspective. I know there must be something about the borrowing and scoping rules that I must be missing, but have no idea what.

In order to make your hi function work you just need to move print out of the scope of the mutable borrow introduced in its match expression:
fn hi(&mut self) {
match self.state {
State::Testing(ref mut buf) => {
buf[0] = 'H' as u8;
buf[1] = 'i' as u8;
buf[2] = '!' as u8;
},
}
self.print();
}
Your two variants are not equivalent due to the presence of the match block in the second case. I don't know how to directly access the tuple struct in the enum without pattern matching (or if this is even possible right now), but if it was the case, then there would in fact be not much difference and both versions would work.

In the match statement, you borrow self.state. Borrow scopes are lexical, so it is borrowed in the entire match block. When you call self.print(), you need to borrow self. But that is not possible, because part of self is already borrowed. If you move self.print() after the match statement, it will work.
Regarding the lexical borrow scope, you can read more in the second part of Two bugs in the borrow checker every Rust developer should know about. Related issues: #6393, #811.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

How to force rust to drop a mutable borrow? - rust

Related

Struct with field holding a mutable reference to another instance of this same struct

Borrow checker and shared I/O

Rust struct within struct: borrowing, lifetime, generic types and more total confusion

How do I efficiently build a vector and an index of that vector while processing a data stream?

What are the Rust borrowing rules regarding mutable internal references?

Categories

Resources