I think this is a kind of common questions. I've read some solutions but my situation is a bit different...
It complains at line 8. However, I can't change the function signature of new_b as well as everything below the common line. They are all external packages.
So how can I design the function new_a to workaround?
fn main() {
new_a();
}
fn new_a() -> A<'static> {
let b = B {};
A { c: new_b(&b) }
}
pub struct A<'a> {
c: C<'a>,
}
// Below are exteral packages
fn new_b<'a>(b: &'a B) -> C<'a> {
C { b: &b }
}
pub struct B {}
pub struct C<'a> {
b: &'a B,
}
https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=160e63f1300506472b7c1b0811b37453
I am struggling with Rust Rc/RefCell. Essentially, I would like to mutate the state of T inside the struct which owns this data (Rc<Ref>). However I would also want to share this data (with immutability such as Rc) to external struct.
How do I achieve that?
pub struct A
pub struct B
{
a : Rc<RefCell<A>>
}
pub struct C
{
a : Rc<A>
}
Basically I am trying to store a reference of A in C but as you know in Rust you need to do 'a life time annotation which i cannot make it work in general.
And i do not want to pass the mutability of A into C.
Let me put similar code as of C++,
struct A;
struct B
{
std::unique_ptr<A> a;
}
struct C
{
const A & a;
}
It's self-contradictory to say that C::a is immutable while B::a is not, since C can be used to observe the mutations. I'm going to assume that you want C to provide read-only access to A. In that case, you must still specify RefCell in the type of C::a for storage purposes, but then you don't expose the mutation part of the RefCell. Just avoid having any public methods of C that expose the RefCell or the ability to get a RefMut from it, and you're done.
use std::rc::Rc;
use std::cell::{RefCell, Ref};
pub struct A;
pub struct B {
a: Rc<RefCell<A>>,
}
pub struct C {
a: Rc<RefCell<A>>,
}
impl B {
// These are just simple examples; create whatever methods make sense for your situation
pub fn replace(&self, a: A) {
*self.a.borrow_mut() = a;
}
pub fn read(&self) -> Ref<'_, A> {
self.a.borrow()
}
pub fn as_read_only(&self) -> C {
C { a: self.a.clone() }
}
}
impl C {
// There are no mutation methods here so A cannot be mutated through C
pub fn read(&self) -> Ref<'_, A> {
self.a.borrow()
}
}
I have the structure B that implements the trait Trait with the method do_something. I need to perform some additional actions when struct B is dropped if this function has not been called. Specifically, if do_something was never called, Vec<A> should be filled with A::None:
enum A {
V1,
V2,
None,
}
struct B {
data: Option<(A, Vec<A>)>,
}
trait Trait {
fn do_something(self) -> Vec<A>;
}
impl Trait for B {
fn do_something(mut self) -> Vec<A> {
let (a, mut vec) = self.data.take().unwrap();
vec.push(a);
vec
}
}
impl Drop for B {
fn drop(&mut self) {
match self.data.take() {
Some((a, mut vec)) => vec.push(A::None),
_ => {}
}
}
}
This has some logically unnecessary match checks. I want to avoid them and came up with the following solution:
struct B {
data: (A, Vec<A>),
}
trait Trait {
fn do_something(self) -> Vec<A>;
}
impl Trait for B {
fn do_something(mut self) -> Vec<A> {
let (a, mut vec) = std::mem::replace(&mut self.data, unsafe {
std::mem::MaybeUninit::<(A, Vec<A>)>::uninit().assume_init()
});
std::mem::forget(self);
vec.push(a);
vec
}
}
impl Drop for B {
fn drop(&mut self) {
self.data.1.push(A::None)
}
}
Is my unsafe solution correct? Does it contain undefined behavior?
Is it possible to avoid using either unsafe or wrapping B.data in Option to achieve the above behavior?
This not only moves the values out of the struct, it also writes uninitialized memory into the struct. The compiler warns that this will cause undefined behavior:
warning: the type `(A, std::vec::Vec<A>)` does not permit being left uninitialized
--> src/main.rs:21:22
|
21 | unsafe { std::mem::MaybeUninit::<(A, Vec<A>)>::uninit().assume_init() },
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
| |
| this code causes undefined behavior when executed
| help: use `MaybeUninit<T>` instead, and only call `assume_init` after initialization is done
|
= note: `#[warn(invalid_value)]` on by default
note: std::ptr::Unique<A> must be non-null (in this struct field)
A better solution is to use mem::transmute():
#[repr(transparent)]
struct B {
data: (A, Vec<A>),
}
impl Trait for B {
fn do_something(self) -> Vec<A> {
// Safety: Since `B` is transparent, `B` and (A, Vec<A>) have the same size and layout.
let (a, mut vec): (A, Vec<A>) = unsafe { std::mem::transmute(self) };
vec.push(a);
vec
}
}
Note that this prevents Drop from being called. If the Drop implementation frees memory or other resources, you have to do this manually.
My program is structured as a series of function calls building up the resulting value - each function returns (moves) the returned value to it's caller. This is a simplified version:
struct Value {}
struct ValueBuilder {}
impl ValueBuilder {
pub fn do_things_with_value(&mut self, v : &Value) {
// expensive computations
}
pub fn make_value(&self) -> Value {
Value {}
}
pub fn f(&mut self) -> Value {
let v = self.make_value();
self.do_things_with_value(&v);
v
}
pub fn g(&mut self) -> Value {
let v = self.f();
self.do_things_with_value(&v);
v
}
}
play.rust-lang version
Imagine that there are many more functions similar to f and g, both between them and above. You can see that do_things_with_value is called twice with the same value. I would like to cache/memoize this call so that in the example below "expensive computations" are performed only once. This is my (obviously incorrect) attempt:
#[derive(PartialEq)]
struct Value {}
struct ValueBuilder<'a> {
seen_values: Vec<&'a Value>,
}
impl<'a> ValueBuilder<'a> {
pub fn do_things_with_value(&mut self, v: &'a Value) {
if self.seen_values.iter().any(|x| **x == *v) {
return;
}
self.seen_values.push(v)
// expensive computations
}
pub fn make_value(&self) -> Value {
Value {}
}
pub fn f(&mut self) -> Value {
let v = self.make_value();
self.do_things_with_value(&v); // error: `v` does not live long enough
v
}
pub fn g(&mut self) -> Value {
let v = self.f();
self.do_things_with_value(&v);
v
}
}
play.rust-lang version
I understand why the compiler is doing it - while in this case it happens that v is not dropped between two calls to do_things_with_value, there is no guarantee that it will not be dropped, and dereferencing it would crash the program.
What is a better way to structure this program? Let's assume that:
cloning and storing Values is expensive, and we can't afford seen_values keeping a copy of everything we've ever seen
we also can't refactor the code / Value object to carry additional data (i.e. a bool indicating whether we did expensive computations with this value). It needs to rely on comparing the values using PartialEq
If you need to keep the same value at different points in the program it's easiest to copy or clone it.
However, if cloning is not an option because it is too expensive wrap the values in an Rc. That is a reference counted smart pointer which allows shared ownership of its content. It is relatively cheap to clone without duplicating the contained value.
Note that simply storing Rc<Value> in seen_values will keep all values alive at least as long as the value builder lives. You can avoid that by storing Weak references.
use std::rc::{Rc, Weak};
#[derive(PartialEq)]
struct Value {}
struct ValueBuilder {
seen_values: Vec<Weak<Value>>,
}
impl ValueBuilder {
pub fn do_things_with_value(&mut self, v: &Rc<Value>) {
if self
.seen_values
.iter()
.any(|x| x.upgrade().as_ref() == Some(v))
{
return;
}
self.seen_values.push(Rc::downgrade(v))
// expensive computations
}
pub fn make_value(&self) -> Rc<Value> {
Rc::new(Value {})
}
pub fn f(&mut self) -> Rc<Value> {
let v = self.make_value();
self.do_things_with_value(&v);
v
}
pub fn g(&mut self) -> Rc<Value> {
let v = self.f();
self.do_things_with_value(&v);
v
}
}
While a Rc<Value> is in use by the chain of functions do_things() will remember the value and skip computations. If a value becomes unused (all references dropped) and is later created again, do_things() will repeat the computations.
I have a struct Foo:
struct Foo {
v: String,
// Other data not important for the question
}
I want to handle a data stream and save the result into Vec<Foo> and also create an index for this Vec<Foo> on the field Foo::v.
I want to use a HashMap<&str, usize> for the index, where the keys will be &Foo::v and the value is the position in the Vec<Foo>, but I'm open to other suggestions.
I want to do the data stream handling as fast as possible, which requires not doing obvious things twice.
For example, I want to:
allocate a String only once per one data stream reading
not search the index twice, once to check that the key does not exist, once for inserting new key.
not increase the run time by using Rc or RefCell.
The borrow checker does not allow this code:
let mut l = Vec::<Foo>::new();
{
let mut hash = HashMap::<&str, usize>::new();
//here is loop in real code, like:
//let mut s: String;
//while get_s(&mut s) {
let s = "aaa".to_string();
let idx: usize = match hash.entry(&s) { //a
Occupied(ent) => {
*ent.get()
}
Vacant(ent) => {
l.push(Foo { v: s }); //b
ent.insert(l.len() - 1);
l.len() - 1
}
};
// do something with idx
}
There are multiple problems:
hash.entry borrows the key so s must have a "bigger" lifetime than hash
I want to move s at line (b), while I have a read-only reference at line (a)
So how should I implement this simple algorithm without an extra call to String::clone or calling HashMap::get after calling HashMap::insert?
In general, what you are trying to accomplish is unsafe and Rust is correctly preventing you from doing something you shouldn't. For a simple example why, consider a Vec<u8>. If the vector has one item and a capacity of one, adding another value to the vector will cause a re-allocation and copying of all the values in the vector, invalidating any references into the vector. This would cause all of your keys in your index to point to arbitrary memory addresses, thus leading to unsafe behavior. The compiler prevents that.
In this case, there's two extra pieces of information that the compiler is unaware of but the programmer isn't:
There's an extra indirection — String is heap-allocated, so moving the pointer to that heap allocation isn't really a problem.
The String will never be changed. If it were, then it might reallocate, invalidating the referred-to address. Using a Box<[str]> instead of a String would be a way to enforce this via the type system.
In cases like this, it is OK to use unsafe code, so long as you properly document why it's not unsafe.
use std::collections::HashMap;
#[derive(Debug)]
struct Player {
name: String,
}
fn main() {
let names = ["alice", "bob", "clarice", "danny", "eustice", "frank"];
let mut players = Vec::new();
let mut index = HashMap::new();
for &name in &names {
let player = Player { name: name.into() };
let idx = players.len();
// I copied this code from Stack Overflow without reading the prose
// that describes why this unsafe block is actually safe
let stable_name: &str = unsafe { &*(player.name.as_str() as *const str) };
players.push(player);
index.insert(idx, stable_name);
}
for (k, v) in &index {
println!("{:?} -> {:?}", k, v);
}
for v in &players {
println!("{:?}", v);
}
}
However, my guess is that you don't want this code in your main method but want to return it from some function. That will be a problem, as you will quickly run into Why can't I store a value and a reference to that value in the same struct?.
Honestly, there's styles of code that don't fit well within Rust's limitations. If you run into these, you could:
decide that Rust isn't a good fit for you or your problem.
use unsafe code, preferably thoroughly tested and only exposing a safe API.
investigate alternate representations.
For example, I'd probably rewrite the code to have the index be the primary owner of the key:
use std::collections::BTreeMap;
#[derive(Debug)]
struct Player<'a> {
name: &'a str,
data: &'a PlayerData,
}
#[derive(Debug)]
struct PlayerData {
hit_points: u8,
}
#[derive(Debug)]
struct Players(BTreeMap<String, PlayerData>);
impl Players {
fn new<I>(iter: I) -> Self
where
I: IntoIterator,
I::Item: Into<String>,
{
let players = iter
.into_iter()
.map(|name| (name.into(), PlayerData { hit_points: 100 }))
.collect();
Players(players)
}
fn get<'a>(&'a self, name: &'a str) -> Option<Player<'a>> {
self.0.get(name).map(|data| Player { name, data })
}
}
fn main() {
let names = ["alice", "bob", "clarice", "danny", "eustice", "frank"];
let players = Players::new(names.iter().copied());
for (k, v) in &players.0 {
println!("{:?} -> {:?}", k, v);
}
println!("{:?}", players.get("eustice"));
}
Alternatively, as shown in What's the idiomatic way to make a lookup table which uses field of the item as the key?, you could wrap your type and store it in a set container instead:
use std::collections::BTreeSet;
#[derive(Debug, PartialEq, Eq)]
struct Player {
name: String,
hit_points: u8,
}
#[derive(Debug, Eq)]
struct PlayerByName(Player);
impl PlayerByName {
fn key(&self) -> &str {
&self.0.name
}
}
impl PartialOrd for PlayerByName {
fn partial_cmp(&self, other: &Self) -> Option<std::cmp::Ordering> {
Some(self.cmp(other))
}
}
impl Ord for PlayerByName {
fn cmp(&self, other: &Self) -> std::cmp::Ordering {
self.key().cmp(&other.key())
}
}
impl PartialEq for PlayerByName {
fn eq(&self, other: &Self) -> bool {
self.key() == other.key()
}
}
impl std::borrow::Borrow<str> for PlayerByName {
fn borrow(&self) -> &str {
self.key()
}
}
#[derive(Debug)]
struct Players(BTreeSet<PlayerByName>);
impl Players {
fn new<I>(iter: I) -> Self
where
I: IntoIterator,
I::Item: Into<String>,
{
let players = iter
.into_iter()
.map(|name| {
PlayerByName(Player {
name: name.into(),
hit_points: 100,
})
})
.collect();
Players(players)
}
fn get(&self, name: &str) -> Option<&Player> {
self.0.get(name).map(|pbn| &pbn.0)
}
}
fn main() {
let names = ["alice", "bob", "clarice", "danny", "eustice", "frank"];
let players = Players::new(names.iter().copied());
for player in &players.0 {
println!("{:?}", player.0);
}
println!("{:?}", players.get("eustice"));
}
not increase the run time by using Rc or RefCell
Guessing about performance characteristics without performing profiling is never a good idea. I honestly don't believe that there'd be a noticeable performance loss from incrementing an integer when a value is cloned or dropped. If the problem required both an index and a vector, then I would reach for some kind of shared ownership.
not increase the run time by using Rc or RefCell.
#Shepmaster already demonstrated accomplishing this using unsafe, once you have I would encourage you to check how much Rc actually would cost you. Here is a full version with Rc:
use std::{
collections::{hash_map::Entry, HashMap},
rc::Rc,
};
#[derive(Debug)]
struct Foo {
v: Rc<str>,
}
#[derive(Debug)]
struct Collection {
vec: Vec<Foo>,
index: HashMap<Rc<str>, usize>,
}
impl Foo {
fn new(s: &str) -> Foo {
Foo {
v: s.into(),
}
}
}
impl Collection {
fn new() -> Collection {
Collection {
vec: Vec::new(),
index: HashMap::new(),
}
}
fn insert(&mut self, foo: Foo) {
match self.index.entry(foo.v.clone()) {
Entry::Occupied(o) => panic!(
"Duplicate entry for: {}, {:?} inserted before {:?}",
foo.v,
o.get(),
foo
),
Entry::Vacant(v) => v.insert(self.vec.len()),
};
self.vec.push(foo)
}
}
fn main() {
let mut collection = Collection::new();
for foo in vec![Foo::new("Hello"), Foo::new("World"), Foo::new("Go!")] {
collection.insert(foo)
}
println!("{:?}", collection);
}
The error is:
error: `s` does not live long enough
--> <anon>:27:5
|
16 | let idx: usize = match hash.entry(&s) { //a
| - borrow occurs here
...
27 | }
| ^ `s` dropped here while still borrowed
|
= note: values in a scope are dropped in the opposite order they are created
The note: at the end is where the answer is.
s must outlive hash because you are using &s as a key in the HashMap. This reference will become invalid when s is dropped. But, as the note says, hash will be dropped after s. A quick fix is to swap the order of their declarations:
let s = "aaa".to_string();
let mut hash = HashMap::<&str, usize>::new();
But now you have another problem:
error[E0505]: cannot move out of `s` because it is borrowed
--> <anon>:22:33
|
17 | let idx: usize = match hash.entry(&s) { //a
| - borrow of `s` occurs here
...
22 | l.push(Foo { v: s }); //b
| ^ move out of `s` occurs here
This one is more obvious. s is borrowed by the Entry, which will live to the end of the block. Cloning s will fix that:
l.push(Foo { v: s.clone() }); //b
I only want to allocate s only once, not cloning it
But the type of Foo.v is String, so it will own its own copy of the str anyway. Just that type means you have to copy the s.
You can replace it with a &str instead which will allow it to stay as a reference into s:
struct Foo<'a> {
v: &'a str,
}
pub fn main() {
// s now lives longer than l
let s = "aaa".to_string();
let mut l = Vec::<Foo>::new();
{
let mut hash = HashMap::<&str, usize>::new();
let idx: usize = match hash.entry(&s) {
Occupied(ent) => {
*ent.get()
}
Vacant(ent) => {
l.push(Foo { v: &s });
ent.insert(l.len() - 1);
l.len() - 1
}
};
}
}
Note that, previously I had to move the declaration of s to before hash, so that it would outlive it. But now, l holds a reference to s, so it has to be declared even earlier, so that it outlives l.