Refactoring out `clone` when Copy trait is not implemented? - rust

Is there a way to get rid of clone(), given the restrictions I've noted in the comments? I would really like to know if it's possible to use borrowing in this case, where modifying the third-party function signature is not possible.
// We should keep the "data" hidden from the consumer
mod le_library {
pub struct Foobar {
data: Vec<i32> // Something that doesn't implement Copy
}
impl Foobar {
pub fn new() -> Foobar {
Foobar {
data: vec![1, 2, 3],
}
}
pub fn foo(&self) -> String {
let i = third_party(self.data.clone()); // Refactor out clone?
format!("{}{}", "foo!", i)
}
}
// Can't change the signature, suppose this comes from a crate
pub fn third_party(data:Vec<i32>) -> i32 {
data[0]
}
}
use le_library::Foobar;
fn main() {
let foobar = Foobar::new();
let foo = foobar.foo();
let foo2 = foobar.foo();
println!("{}", foo);
println!("{}", foo2);
}
playground

As long as your foo() method accepts &self, it is not possible, because the
pub fn third_party(data: Vec<i32>) -> i32
signature is quite unambiguous: regardless of what this third_party function does, it's API states that it needs its own instance of Vec, by value. This precludes using borrowing of any form, and because foo() accepts self by reference, you can't really do anything except for cloning.
Also, supposedly this third_party is written without any weird unsafe hacks, so it is quite safe to assume that the Vec which is passed into it is eventually dropped and deallocated. Therefore, unsafely creating a copy of the original Vec without cloning it (by copying internal pointers) is out of question - you'll definitely get a use-after-free if you do it.
While your question does not state it, the fact that you want to preserve the original value of data is kind of a natural assumption. If this assumption can be relaxed, and you're actually okay with giving the data instance out and e.g. replacing it with an empty vector internally, then there are several things you can potentially do:
Switch foo(&self) to foo(&mut self), then you can quite easily extract data and replace it with an empty vector.
Use Cell or RefCell to store the data. This way, you can continue to use foo(&self), at the cost of some runtime checks when you extract the value out of a cell and replace it with some default value.
Both these approaches, however, will result in you losing the original Vec. With the given third-party API there is no way around that.
If you still can somehow influence this external API, then the best solution would be to change it to accept &[i32], which can easily be obtained from Vec<i32> with borrowing.

No, you can't get rid of the call to clone here.
The problem here is with the third-party library. As the function third_party is written now, it's true that it could be using an &Vec<i32>; it doesn't require ownership, since it's just moving out a value that's Copy. However, since the implementation is outside of your control, there's nothing preventing the person maintaining the function from changing it to take advantage of owning the Vec. It's possible that whatever it is doing would be easier or require less memory if it were allowed to overwrite the provided memory, and the function writer is leaving the door open to do so in the future. If that's not the case, it might be worth suggesting a change to the third-party function's signature and relying on clone in the meantime.

Related

How Can I Hash By A Raw Pointer?

I want to create a function that provides a two step write and commit, like so:
// Omitting locking for brevity
struct States {
commited_state: u64,
// By reference is just a placeholder - I don't know how to do this
pending_states: HashSet<i64>
}
impl States {
fn read_dirty(&self) -> {
// Sum committed state and all non committed states
self.commited_state +
pending_states.into_iter().fold(sum_all_values).unwrap_or(0)
}
fn read_committed(&self) {
self.commited_state
}
}
let state_container = States::default();
async fn update_state(state_container: States, new_state: i64) -> Future {
// This is just pseudo code missing locking and such
// I'd like to add a reference to new_state
state_container.pending_states.insert(
new_state
)
async move {
// I would like to defer the commit
// I add the state to the commited state
state_container.commited_state =+ new_state;
// Then remove it *by reference* from the pending states
state_container.remove(new_state)
}
}
I'd like to be in a situation where I can call it like so
let commit_handler = update_state(state_container, 3).await;
// Do some external transactional stuff
third_party_transactional_service(...)?
// Commit if the above line does not error
commit_handler.await;
The problem I have is that HashMaps and HashSets, hash values based of their value and not their actual reference - so I can't remove them by reference.
I appreciate this a bit of a long question, but I'm just trying to give a bit more context as to what I'm trying to do. I know that in a typical database you'd generally have an atomic counter to generate the transaction ID, but that feels a bit overkill when the pointer reference would be enough.
However, I don't want to get the pointer value using unsafe, because it just seems a bit off to do something relatively simple.
Values in rust don't have an identity like they do in other languages. You need to ascribe them an identity somehow. You've hit on two ways to do this in your question: an ID contained within the value, or the address of the value as a pointer.
Option 1: An ID contained in the value
It's trivial to have a usize ID with a static AtomicUsize (atomics have interior mutability).
use std::sync::atomic::{AtomicUsize, Ordering};
// No impl of clone/copy as we want these IDs to be unique.
#[derive(Debug, Hash, PartialEq, Eq)]
#[repr(transparent)]
pub struct OpaqueIdentifier(usize);
impl OpaqueIdentifier {
pub fn new() -> Self {
static COUNTER: AtomicUsize = AtomicUsize::new(0);
Self(COUNTER.fetch_add(1, Ordering::Relaxed))
}
pub fn id(&self) -> usize {
self.0
}
}
Now your map key becomes usize, and you're done.
Having this be a separate type that doesn't implement Copy or Clone allows you to have a concept of an "owned unique ID" and then every type with one of these IDs is forced not to be Copy, and a Clone impl would require obtaining a new ID.
(You can use a different integer type than usize. I chose it semi-arbitrarily.)
Option 2: A pointer to the value
This is more challenging in Rust since values in Rust are movable by default. In order for this approach to be viable, you have to remove this capability by pinning.
To make this work, both of the following must be true:
You pin the value you're using to provide identity, and
The pinned value is !Unpin (otherwise pinning still allows moves!), which can be forced by adding a PhantomPinned member to the value's type.
Note that the pin contract is only upheld if the object remains pinned for its entire lifetime. To enforce this, your factory for such objects should only dispense pinned boxes.
This could complicate your API as you cannot obtain a mutable reference to a pinned value without unsafe. The pin documentation has examples of how to do this properly.
Assuming that you have done all of this, you can then use *const T as the key in your map (where T is the pinned type). Note that conversion to a pointer is safe -- it's conversion back to a reference that isn't. So you can just use some_pin_box.get_ref() as *const _ to obtain the pointer you'll use for lookup.
The pinned box approach comes with pretty significant drawbacks:
All values being used to provide identity have to be allocated on the heap (unless using local pinning, which is unlikely to be ergonomic -- the pin! macro making this simpler is experimental).
The implementation of the type providing identity has to accept self as a &Pin or &mut Pin, requiring unsafe code to mutate the contents.
In my opinion, it's not even a good semantic fit for the problem. "Location in memory" and "identity" are different things, and it's only kind of by accident that the former can sometimes be used to implement the latter. It's a bit silly that moving a value in memory would change its identity, no?
I'd just go with adding an ID to the value. This is a substantially more obvious pattern, and it has no serious drawbacks.

How to return the contents of an Rc?

I am trying to return a moved value from an Rc:
if let Some(last_elem) = self.tail.take() {
let last = Rc::clone(&last_elem);
let tmp_node = last.borrow();
let tmp = tmp_node.deref();
return Some(*tmp);
}
Where:
self.tail has type Option<Rc<RefCell<Node<T>>>>;
after borrow the tmp_node has type Ref<Node<T>>; and
I would like to return Option<Node<T>>.
However the compiler complains, "cannot move out of *tmp which is behind a shared reference".
How can I fix this?
In general, it's impossible to move a value out of Rc, since it might be read concurrently from somewhere else.
However, if your code's logic can guarantee that this Rc is the sole owner of the underlying data, there's an escape hatch - Rc::try_unwrap, which performs the check at runtime and fails if the condition is not fulfilled. After that, we can easily unwrap the RefCell (not Ref!) with RefCell::into_inner:
pub fn unwrap<T>(last_elem: Rc<RefCell<T>>) -> T {
let inner: RefCell<T> = Rc::try_unwrap(last_elem)
.unwrap_or_else(|_| panic!("The last_elem was shared, failed to unwrap"));
inner.into_inner()
}
Playground
Another possible approach, if you want not to move value from Rc but to get a copy, would be to go with your original approach, but use clone instead of deref:
pub fn clone_out<T: Clone>(last_elem: Rc<RefCell<T>>) -> T {
last_elem.borrow().clone()
}
A side note: looks like you're trying to implement some kind of linked list. This is a notoriously hard problem to do in Rust, since it plays very bad with the single-ownership semantics. But if you're really sure you want to go though all the dirty details, this book is highly recommended.

Rust idiom for "clone if immutable or ref"

I'm trying to make a generic method interface that takes several ref-types and tries to save a clone if possible. I have two issues:
There are a lot of traits about referencing & such, but I'm not sure how to apply them so that you can tell btwn value, mut value, ref, mut ref in a single generic function.
Is there a best practice for clone if needed? I've played with Cow, From/Into, ToOwned, AsRef, Deref, more, but I just can't find the sensible way to do it.
Cow has the idea of what I want, but if I expose Cow on input, I can't use Cow::from without implementing it and Cow::Owned/Borrowed defeats most of the purpose. And I can't figure out without exposing Cow input. From/Into is almost perfect, but I can't get the generics to work. I also tried splitting the functions into separate traits and implementing, but the compiler can't choose between them.
I have a basic demo below. I have control over the Bar interface as well, if that is needed to get a decent interface, but I'd be surprised if there wasn't a simple way to essentially perfect forward the arguments.
How does Rust provide move semantics? is similar, but didn't solve the issue.
struct Foo {
x: i32
}
impl Foo {
// Move bar in; no clone, return original
pub fn add<T: Bar>(&self, mut bar: T) -> T {
bar.plus(self.x);
bar
}
// bar immutable or reference; clone and return
pub fn add<T: Bar>(&self, bar: T | &T | &mut T) -> T {
let mut newbar = bar.clone();
newbar.plus(self.x);
newbar
}
}

Avoiding "cannot move out of borrowed content" without the use of "to_vec"?

I'm learning rust and have a simple program, shown below. Playground link.
#[derive(Debug)]
pub struct Foo {
bar: String,
}
pub fn gather_foos<'a>(data: &'a Vec<Vec<&'a Foo>>) -> Vec<Vec<&'a Foo>> {
let mut ret: Vec<Vec<&Foo>> = Vec::new();
for i in 0..data.len() {
if meets_requirements(&data[i]) {
ret.push(data[i].to_vec());
}
}
return ret
}
fn meets_requirements<'a>(_data: &'a Vec<&'a Foo>) -> bool {
true
}
fn main() {
let foo = Foo{
bar: String::from("bar"),
};
let v1 = vec![&foo, &foo, &foo];
let v2 = vec![&foo, &foo];
let data = vec![v1, v2];
println!("{:?}", gather_foos(&data));
}
The program simply loops through an array of arrays of a struct, checks if the array of structs meets some requirement and returns an array of arrays that meets said requirement.
I'm sure there's a more efficient way of doing this without the need to call to_vec(), which I had to implement in order to avoid the error cannot move out of borrowed content, but I'm not sure what that solution is.
I'm learning about Box<T> now and think it might provide a solution to my needs? Thanks for any help!!
The error is showing up because you're trying to move ownership of one of the vectors in the input vector to the output vector, which is not allowed since you've borrowed the input vector immutably. to_vec() creates a copy, which is why it works when you use it.
The solution depends on what you're trying to do. If you don't need the original input (you only want the matched ones), you can simply pass the input by value rather than by reference, which will allow you to consume the vector and move items to the output. Here's an example of this.
If you do need the original input, but you don't want to copy the vectors with to_vec(), you may want to use references in the output, as demonstrated by this example. Note that the function now returns a vector of references to vectors, rather than a vector of owned vectors.
For other cases, there are other options. If you need the data to be owned by multiple items for some reason, you could try Rc<T> or Arc<T> for reference-counted smart pointers, which can be cloned to provide immutable access to the same data by multiple owners.

Passing a member of a struct to a method of the same struct in Rust

I am now facing a borrowing problem in Rust, and I have an idea to solve it. But I think the way I found is not a good answer. So I am wondering if there is another way to solve it.
I use the following example code to describe my situation:
struct S {
val: u8
}
impl S {
pub fn f1(&mut self) {
println!("F1");
self.f2(self.val);
}
pub fn f2(&mut self, input: u8) {
println!("F2");
// Do something with input
}
}
fn main() {
let mut s = S {
val: 0
};
s.f1();
}
Structure S has a method, f2, which takes an additional argument input to do something. There is another method, f1, which calls f2 with the val of structure S. Outsider may call either f1 or f2 for different use cases.
When I compiled the above code, I got the following error message:
src\main.rs:9:17: 9:25 error: cannot use `self.val` because it was mutably borrowed [E0503]
src\main.rs:9 self.f2(self.val);
^~~~~~~~
src\main.rs:9:9: 9:13 note: borrow of `*self` occurs here
src\main.rs:9 self.f2(self.val);
^~~~
I roughly understand how borrowing works in Rust. So I know that I can solve the problem by changing the implementation of f1 to:
pub fn f1(&mut self) {
let v = self.val;
println!("F1");
self.f2(v);
}
However, I feel this solution a little bit redundant. I am wondering if there is a way to solve this problem without using extra variable binding.
Your solution works not because of an extra variable binding, but rather because of an extra copy. Integer types can be implicitly copied, so let v = self.val creates a copy of the value. That copy is not borrowed from self but owned. So compiler allows you to call f2 with this copy.
If you write self.f2(self.val), compiler will also attempt to make a copy of self.val. However, at this location it is not possible to make a copy because self is borrowed for the function call. So it is not possible to make such call unless you copy the value before it. And this is not a syntax limitation, but an enforcement of the borrow checker. Anyway, it's better to write the copying and the call in the order in which they actually happen.
If the type you're trying to use as argument were not Copy (e.g. a String), you would need to write let v = self.val.clone(); self.f2(v); to ask the compiler for copy explicitly. Making such calls without making a copy is not allowed. You probably would need to make the method non-mutable or eliminate the argument somehow.
You can use this trick for copyable values:
pub fn f1(&mut self) {
println!("F1");
match self.val {x => self.f2(x)};
}
However, using an explicit temporary variable is more clear and idiomatic.

Resources