Rust idiom for "clone if immutable or ref" - rust

I'm trying to make a generic method interface that takes several ref-types and tries to save a clone if possible. I have two issues:
There are a lot of traits about referencing & such, but I'm not sure how to apply them so that you can tell btwn value, mut value, ref, mut ref in a single generic function.
Is there a best practice for clone if needed? I've played with Cow, From/Into, ToOwned, AsRef, Deref, more, but I just can't find the sensible way to do it.
Cow has the idea of what I want, but if I expose Cow on input, I can't use Cow::from without implementing it and Cow::Owned/Borrowed defeats most of the purpose. And I can't figure out without exposing Cow input. From/Into is almost perfect, but I can't get the generics to work. I also tried splitting the functions into separate traits and implementing, but the compiler can't choose between them.
I have a basic demo below. I have control over the Bar interface as well, if that is needed to get a decent interface, but I'd be surprised if there wasn't a simple way to essentially perfect forward the arguments.
How does Rust provide move semantics? is similar, but didn't solve the issue.
struct Foo {
x: i32
}
impl Foo {
// Move bar in; no clone, return original
pub fn add<T: Bar>(&self, mut bar: T) -> T {
bar.plus(self.x);
bar
}
// bar immutable or reference; clone and return
pub fn add<T: Bar>(&self, bar: T | &T | &mut T) -> T {
let mut newbar = bar.clone();
newbar.plus(self.x);
newbar
}
}

Related

Access to the component type of tuple struct [duplicate]

This question already has an answer here:
Does Rust have an equivalent to C++'s decltype() to get the type of an expression?
(1 answer)
Closed 11 months ago.
I want to use a tuple struct as the key of a HashMap. The tuple struct has exactly one component, which is an u32. The struct itself does not implement Hash, so itself can't be directly used as key. However, I can always use the underlying u32 as key.
struct St(u32); // defined in a crate not owned by me
let s = St(1);
let mut m = HashMap::<u32, i32>::new();
m.insert(s.0, 2);
Question: is there a way to instead of hard-coding the u32 in the HashMap declaration, we use the actual component type of St, so that if St changes it to something like isize, it still works. Something like C++'s decltype:
struct St(isize); // the crate changes this
let mut m = HashMap::<decltype(St.0), i32>::new();
Rust has no equivalent of C++'s decltype, but you can always make a type alias which is used in both places
type StImpl = u32;
struct St(StImpl);
let mut m = HashMap::<StImpl, i32>::new();
And, of course, you could also have St just directly derive Hash and Eq to use it in the hash map in the first place.
#[derive(Hash, PartialEq, Eq)]
struct St(isize);
let mut m = HashMap::<St, i32>::new();
Now the derived impls will automatically update if you change the internals of St.
In response to the question edit, I'll propose what I would do here. If you really want to future-proof your code against changes to the library you don't control, you can wrap St in a custom struct whose Hash and Eq delegate to the internals. Something like
use std::hash::{Hash, Hasher};
// Derive Clone, Copy, and anything else useful St implements here.
struct StWrapper(St);
impl Hash for StWrapper {
fn hash<H: Hasher>(&self, hasher: &mut H) {
self.0.0.hash(hasher);
}
}
impl PartialEq for StWrapper {
fn eq(&self, rhs: &Self) -> bool {
self.0.0 == rhs.0.0
}
}
impl Eq for StWrapper {}
Our implementations all reference the inside of St (via self.0.0) but we never actually mention the type of it, so as long as the inner type is something that at least gives us Eq and Hash, this will continue to work even if the specific type changes. So we can use StWrapper as our key type in a HashMap and be assured that it's protected against future changes to the library.
It is up to you to weigh whether your particular use case warrants this much boilerplate to future-proof against changes to another library, or whether it's worth it to just use u32 and tank the cost if they do change implementations. Note that, while this approach does require a decent bit of code, it is very likely to be a zero-cost abstraction at runtime, since St, StWrapper, and the inner type should all compile down to the same representation in most cases (not guaranteed, but likely), and all of the hash and eq calls are resolved statically.

Best way to model storing an iterator for a vector inside same struct

Context
I'm a beginner, and, at a high level, what I want to do is store some mutable state (to power a state machine) in a struct with the following constraints
Mutating the state doesn't require the entire struct to be mut (since I'd have to update a ton of callsites to be mut + I don't want every field to be mutable)
The state is represented as an enum, and can, in the right state, store a way to index into the correct position in a vec that's in the same struct
I came up with two different approaches/examples that seem quite complicated and I want to see if there's a way to simplify. Here's some playgrounds that minimally reproduce what I'm exploring
Using a Cell and a usize
#[derive(Clone, Copy)]
enum S {
A,
B(usize),
}
struct Test {
a: Vec<i32>,
b: Cell<S>,
}
where usage look like this
println!("{}", t.a[index]);
t.b.set(S::B(index + 1));
Using a RefCell and an iterator
enum S<'a> {
A,
B(Iter<'a, i32>),
}
struct Test<'a> {
a: Vec<i32>,
b: RefCell<S<'a>>,
}
where usage looks like this
println!("{}", iter.next().unwrap());
Questions
Is there a better way to model this in general vs. what I've tried?
I like approach #2 with the iterator better in theory since it feels cleaner, but I don't like how it introduces explicit lifetime annotations into the struct...in the actual codebase I'm working on, I'd need to update a ton of callsites to add the lifetime annotation and the tiny bit of convenience doesn't seem worth it. Is there some way to do #2 without introducing lifetimes?

What is the idiomatic way in rust for a function accepts a closure as argument or return a closure?

What is the idiomatic way in rust for a function accepts a closure as argument or return a closure?
I see it can be done in at least the below 3 ways:
// 1
pub fn run_with_envs_guard1(envs: &HashMap<&str, &str>, f: &dyn FnOnce()) {}
// 2
pub fn run_with_envs_guard2(envs: &HashMap<&str, &str>, f: Box<dyn FnOnce()>) {}
// 3
pub fn run_with_envs_guard3<F: FnOnce()>(envs: &HashMap<&str, &str>, f: F) {}
Are there really some differences among these 3 ways? If yes, pls help to clarify, and which way is more idiomatic i should choose?
I am learning rust still, sorry if all the above ways are some bad/strange things.
Maybe a more specific question, why in way 1 and 2 i need the dyn keyword, but in 3 i don't, from my understanding, these all need dynamic dispatching, is it? as the actual function cannot be determined in compiling time
Abdul answers the first half of your question (and I agree completely with what he said), so I'll take a stab at the second half.
If you want to return a closure from a function, you can't return a type parameter, because that implies that you're returning an instance of any FnOnce, at the caller's choice. You can't return a &FnOnce, because you (usually) need to pass ownership to the caller. You could make it work with Box<FnOnce>, but that tends to just be clunky to work with. When returning closures from functions, I'm partial to the impl trait syntax.
pub fn test() -> impl FnOnce() {
|| { println!("It worked!") }
}
In argument position, writing impl FnOnce() as the type of something is equivalent to defining a type argument, as Abdul did in his answer. However, in return position, it's an entirely new feature that returns an opaque value. It says "I'm returning an FnOnce, and I'm not telling you which one it is". It's the same concept as a trait object, but without the overhead of throwing it in a box.
Responding to your edit
i don't, from my understanding, these all need dynamic dispatching, is it? as the actual function cannot be determined in compiling time
This is actually not necessarily true. If you see the dyn keyword, then there's definitely a dynamic (runtime) dispatch happening. To understand your other example, though, let's consider a simple trait that doesn't have the baggage of FnOnce.
pub trait MyTrait {}
struct Foo;
struct Bar;
impl MyTrait for Foo {}
impl MyTrait for Bar {}
pub fn example<T: MyTrait>(_arg: T) {
println!("It works!");
}
fn main() {
example(Foo);
example(Bar);
}
I claim there's no dynamic dispatch happening here. Rust monomorphizes functions with type parameters. That means that example is like a template function in C++. Every instantiation of it will end up being a separate function. So, really, during Rust's compilation, this will end up being more like
struct Foo;
struct Bar;
pub fn example1(_arg: Foo) {
println!("It works!");
}
pub fn example2(_arg: Foo) {
println!("It works!");
}
fn main() {
example1(Foo);
example2(Bar);
}
Two unrelated functions that happen to do something similar. Rust resolves all of the linkage statically, so there's no dispatch happening at runtime. In fact, we can prove it. Take the code I just posted above and compile it with debugging symbols on (rustc -g filename.rs). Then use a tool like nm (available on most Linux machines by default) to list all of the symbols in the linker table. Assuming you didn't turn any optimizations on, you should see two example functions. This is what they look like in my linker
0000000000005340 t _ZN10code7example17h46383f9ad372dc94E
00000000000053a0 t _ZN10code7example17h97b400359a146fcaE
or, with nm -C to demangle the function names
0000000000005340 t code::example
00000000000053a0 t code::example
Two different functions, each of which takes concrete arguments of specific types.
Your proposed FnOnce would work the same way.
pub fn run_with_envs_guard3<F: FnOnce()>(envs: &HashMap<&str, &str>, f: F) {}
Every closure in Rust has a distinct type, so every time this function is called, a new version of run_with_envs_guard3 will get made, specifically for that closure. That new function will know exactly what to do for the closure you just gave it. In 99% of cases, if you have optimizations turned on, these made-up local functions will get inlined and optimized out, so no harm done. But there's no dynamic dispatch here.
In the other two examples, we have a dyn FnOnce, which is more like what you'd expect coming from a traditionally object-oriented language. dyn FnOnce contains a dynamic pointer to some function somewhere that will be dispatched at runtime, the way you'd expect.
I would prefer the third one. Because the rust documentation suggest to Use FnOnce as a bound when you want to accept a parameter of function-like type and only need to call it once.
pub fn run_with_envs_guard3<F: FnOnce()>(envs: &HashMap<&str, &str>, f: F) {}
This means that the F to be bound by FnOnce(ie, F must implement FnOnce)

Refactoring out `clone` when Copy trait is not implemented?

Is there a way to get rid of clone(), given the restrictions I've noted in the comments? I would really like to know if it's possible to use borrowing in this case, where modifying the third-party function signature is not possible.
// We should keep the "data" hidden from the consumer
mod le_library {
pub struct Foobar {
data: Vec<i32> // Something that doesn't implement Copy
}
impl Foobar {
pub fn new() -> Foobar {
Foobar {
data: vec![1, 2, 3],
}
}
pub fn foo(&self) -> String {
let i = third_party(self.data.clone()); // Refactor out clone?
format!("{}{}", "foo!", i)
}
}
// Can't change the signature, suppose this comes from a crate
pub fn third_party(data:Vec<i32>) -> i32 {
data[0]
}
}
use le_library::Foobar;
fn main() {
let foobar = Foobar::new();
let foo = foobar.foo();
let foo2 = foobar.foo();
println!("{}", foo);
println!("{}", foo2);
}
playground
As long as your foo() method accepts &self, it is not possible, because the
pub fn third_party(data: Vec<i32>) -> i32
signature is quite unambiguous: regardless of what this third_party function does, it's API states that it needs its own instance of Vec, by value. This precludes using borrowing of any form, and because foo() accepts self by reference, you can't really do anything except for cloning.
Also, supposedly this third_party is written without any weird unsafe hacks, so it is quite safe to assume that the Vec which is passed into it is eventually dropped and deallocated. Therefore, unsafely creating a copy of the original Vec without cloning it (by copying internal pointers) is out of question - you'll definitely get a use-after-free if you do it.
While your question does not state it, the fact that you want to preserve the original value of data is kind of a natural assumption. If this assumption can be relaxed, and you're actually okay with giving the data instance out and e.g. replacing it with an empty vector internally, then there are several things you can potentially do:
Switch foo(&self) to foo(&mut self), then you can quite easily extract data and replace it with an empty vector.
Use Cell or RefCell to store the data. This way, you can continue to use foo(&self), at the cost of some runtime checks when you extract the value out of a cell and replace it with some default value.
Both these approaches, however, will result in you losing the original Vec. With the given third-party API there is no way around that.
If you still can somehow influence this external API, then the best solution would be to change it to accept &[i32], which can easily be obtained from Vec<i32> with borrowing.
No, you can't get rid of the call to clone here.
The problem here is with the third-party library. As the function third_party is written now, it's true that it could be using an &Vec<i32>; it doesn't require ownership, since it's just moving out a value that's Copy. However, since the implementation is outside of your control, there's nothing preventing the person maintaining the function from changing it to take advantage of owning the Vec. It's possible that whatever it is doing would be easier or require less memory if it were allowed to overwrite the provided memory, and the function writer is leaving the door open to do so in the future. If that's not the case, it might be worth suggesting a change to the third-party function's signature and relying on clone in the meantime.

Why does Rust not allow the copy and drop traits on one type?

From the book:
Rust won’t let us annotate a type with the Copy trait if the type, or any of its parts, has implemented the Drop trait. If the type needs something special to happen when the value goes out of scope and we add the Copy annotation to that type, we’ll get a compile time error.
Why was the design decision made to disallow Copy and Drop on the same type?
The Drop trait is used in an RAII context, typically when some resource needs to be released/closed when the object is destroyed.
In the other hand, a Copy type is a trivial type that can be copied with a memcpy only.
With those two descriptions, it is clearer that they are exclusive: it makes no sense to memcpy nontrivial data: what if we copy the data, and we drop one of the copies? The inner resource of the other copy will not be reliable anymore.
In fact, Copy in not even a "real" trait, in that it does not define any function. It is a special marker that says to the compiler: "you can duplicate myself with a simple bytes copy". So you cannot provide a custom implementation of Copy, because there is no implementation at all. However, you can mark a type as copyable:
impl Copy for Foo {}
or better, with a derive:
#[derive(Clone, Copy)]
struct Foo { /* ... */ }
This builds only if all the fields implement Copy. Otherwise, the compiler refuses to compile because this is unsafe.
For the sake of an example, let's suppose that the File struct implements Copy. Of course, this is not the case, and this example is wrong and cannot compile:
fn drop_copy_type<T>(T x)
where
T: Copy + Drop,
{
// The inner file descriptor is closed there:
std::mem::drop(x);
}
fn main() {
let mut file = File::open("foo.txt").unwrap();
drop_copy_type(file);
let mut contents = String::new();
// Oops, this is unsafe!
// We try to read an already closed file descriptor:
file.read_to_string(&mut contents).unwrap();
}
The other answers here are talking about why we don't usually want to implement both Copy and Drop for the same type, but that's not the same as explaining why it's forbidden. It might seem like a toy example like this should work just fine:
#[derive(Copy, Clone)]
struct Foo {
i: i32,
}
impl Drop for Foo {
fn drop(&mut self) {
// No problematic memory management here. Just print.
println!("{}", self.i);
}
}
fn main() {
let foo1 = Foo { i: 42 };
let foo2 = foo1;
// Shouldn't this just print 42 twice?
}
But indeed, if we try to compile that (using Rust 1.52), it fails as expected:
error[E0184]: the trait `Copy` may not be implemented for this type; the type has a destructor
--> src/main.rs:1:10
|
1 | #[derive(Copy, Clone)]
| ^^^^ Copy not allowed on types with destructors
|
= note: this error originates in a derive macro (in Nightly builds, run with -Z macro-backtrace for more info)
error: aborting due to previous error
For more information about this error, try `rustc --explain E0184`.
See the "For more information" note at the bottom? Those are often helpful. Let's run rustc --explain E0184:
The `Copy` trait was implemented on a type with a `Drop` implementation.
Erroneous code example:
```
#[derive(Copy)]
struct Foo; // error!
impl Drop for Foo {
fn drop(&mut self) {
}
}
```
Explicitly implementing both `Drop` and `Copy` trait on a type is currently
disallowed. This feature can make some sense in theory, but the current
implementation is incorrect and can lead to memory unsafety (see
[issue #20126][iss20126]), so it has been disabled for now.
[iss20126]: https://github.com/rust-lang/rust/issues/20126
Following that issue link leads to a discussion of "zeroing-on-drop". Present-day Rust doesn't do this anymore, but up until around 2016 Rust implemented "dynamic drop" by zeroing all the bits of a value when dropping it. But of course that isn't a valid implementation if a type can be both Copy and Drop -- Rust can't zero out a value that you're allowed to keep using -- so implementing both of those traits on the same type was disallowed. The discussion ends with this interesting comment:
Anyhow, it's easiest to forbid it for now. We can always make it legal later if someone comes up with a persuasive use case. Idempotent destructors seem like a bit of an odd thing.
What's above is the explanation for Rust's current behavior, as best I can tell. But I think there's another reason to keep things the way they are, which I haven't seen discussed: Copy currently implies that a value can be both bitwise copied and also bitwise overwritten. Consider this code:
#[derive(Copy, Clone)]
struct Foo {
i: i32,
}
fn main() {
let mut ten_foos = [Foo { i: 42 }; 10];
let ten_more_foos = [Foo { i: 99 }; 10];
// Overwrite all the bytes of the first array with those of the second.
unsafe {
std::ptr::copy_nonoverlapping(&ten_more_foos, &mut ten_foos, 1);
}
}
This unsafe code is totally fine today. In fact, [T]::copy_from_slice will do exactly the same thing for any T: Copy. But would it still be ok, if Foo (or any other Copy type) were allowed to be Drop? Our code here, and the standard library code in copy_from_slice, would be destroying objects without dropping them!
Now, technically, failing to call the destructor of an object is allowed. There was a very interesting discussion back in the day that led to std::mem::forget going from unsafe to safe shortly before Rust 1.0. So it's possible that Rust could allow Copy + Drop without leading to any undefined behavior, despite this issue. But it would be quite surprising that certain (standard!) functions would fail to call the destructors you expect. The property that "Copy objects can be bitwise copied and bitwise overwritten" seems like a good one to keep.
Quoting the documentation.
[...] [A]ny type implementing Drop can't be Copy, because it's managing some resource besides its own size_of::<T> bytes.

Resources