I'm pretty new to Rust and have a couple different implementations of a method that includes a closure referencing self. To use the reference in the closure effectively, I've been using Arc<Self> (I am multithreading) and Pin<Arc<Self>>.
I would like to make this method as generally memory efficient as possible. I assume pinning the Arc in memory would help with this. However, (a) I've read that Arcs are pinned and (b) it seems like Pin<Arc<T>> may require additional allocations.
What is Pin<Arc<T>> good for?
Adding Pin around some pointer type does not change the behavior of the program. It only adds a restriction on what further code you can write (and even that, only if the T in Pin<Arc<T>> is not Unpin, which most types are).
Therefore, there is no "memory efficiency" to be gained by adding Pin.
The only use of Pin is to allow working with types that require they be pinned to use them, such as Futures.
Related
I was wondering, is there a way to know the list of all smart pointers in Rust std?
I know String and Vec<T> are smart pointers, and reading Chp. 15 of the Rust book, I am learning about Box<T>, Rc<T>, Ref<T> and RefMut<T>
I was just wondering, is there a place to know all the available smart pointers in Rust's std?
I don't think an all-encompassing list would be particularly useful since there are lots (especially many which serve more as an implementation detail of another type). If you really want a complete list of everything that's technically a smart pointer, then as eggyal pointed out in a comment on your question you could browse the implementors of Deref, but that will probably give you more noise than useful information. I've listed some of the most common and useful ones below:
Box<T> - a unique pointer to an object on the heap. Analogous to C++'s std::unique_ptr.
Rc<T>/Weak<T> - a shared pointer that provides shared ownership of a value on a single thread. This smart pointer cannot be sent between threads safely since it does not use atomic operations to maintain its refcount (the compiler will make sure you don't accidentally do this).
Arc<T>/Weak<T> - very similar to Rc except it uses atomic operations to update its refcount, and thus is thread-safe. Similar to std::shared_ptr.
Vec<T>/String/PathBuf/OsString et al. - all of these are smart pointers for owning dynamically allocated arrays of items on the heap. Read their documentation for more specific details.
Cow<'a, B> - a clone-on-write smart pointer. Useful for when you have a value that could be borrowed or owned.
The list above isn't the full picture but it will get you very far with most of the code you write.
As you've noted there are other smart pointers like Ref and MutexGuard. These are returned by types with interior mutability, and usually have some kind of specific behavior on drop, such as releasing a lock or decrementing a refcount. Usually you don't interact with these types as much, but you can read their documentation on an as-needed basis.
There is also Pin<T>, but this smart pointer is notoriously hard to understand and really only comes up in conversations about the implementation details of futures and generators. You can read more about it here.
Given a rust object, is it possible to wrap it so that multiple references and a mutable reference are allowed but do not cause problems?
For example, a Vec that has multiple references and a single mutable reference.
Yes, but...
The type you're looking for is RefCell, but read on before jumping the gun!
Rust is a single-ownership language. It always will be. It's exactly that feature that makes Rust as thread-safe and memory-safe as it is. You cannot fully circumvent this, short of wrapping your entire program in unsafe and using raw pointers exclusively, and if you're going to do that, just write C since you're no longer getting any benefits out of using Rust.
So, at any given moment in your program, there must either be one thing writing to this memory or several things reading. That's the fundamental law of single-ownership. Keep that in mind; you cannot get around that. What I'm about to say still follows that rule.
Usually, we enforce this with our type signatures. If I take a &T, then I'm just an alias and won't write to it. If I take a &mut T, then nobody else can see what I'm doing till I forfeit that reference. That's usually good enough, and if we can, we want to do it that way, since we get guarantees at compile-time.
But it doesn't always work that way. Sometimes we can't prove that what we're doing is okay. Sometimes I've got two functions holding an, ostensibly, mutable reference, but I know, due to some other guarantees Rust doesn't know about, that only one will be writing to it at a time. Enter RefCell. RefCell<T> contains a single T and pretends to be immutable but lets you borrow the thing inside either mutably or immutably with try_borrow_mut and try_borrow. When we call one of these functions, we get a reference-like value that can read (and write, in the mutable case) to the original data, even though we started with a &RefCell<T> that doesn't look mutable.
But the fundamental law still holds. Note that those try_* functions return a Result, i.e. they might fail. If two functions simultaneously try to get try_borrow_mut references, the second one will fail, and it's your job to deal with that eventuality (even if "deal with that" means panic! in your particular use case). All we've done is move the single-ownership rules from compile-time to runtime. We haven't gotten rid of them; we've just changed who's responsible for enforcing them.
I need to create a stack of pointers with the following constraints:
The pointers need to point to the same Trait object (so Box seems like a fit)
Those Trait objects may need to be modified (RefCell may need to be used?)
Two pointers in the stack may need to point to the same object (Rc seems like a fit)
Right now, the only way I've found to accommodate this is to use a Vec<Rc<RefCell<Box<dyn MyTrait>>>>. Is that the best solution though? It looks like a lot of pointer dereferences needed to access the objects.
I'm not quite sure what you exactly mean with:
The pointers need to point to the same Trait object (so Box seems like a fit)
But if you are interested in storing objects of actually different types, then you need trait-objects and those need to be behind some sort of pointer such as a Box. And a Box is generally a good default (but there are alternatives).
Those Trait objects may need to be modified (RefCell may need to be used?)
Well, actually, that could still be done with a Box.
Two pointers in the stack may need to point to the same object (Rc seems like a fit)
Here, it gets difficult because in Rust sharable and mutable are kind of exclude each other. To be sharable, we need an Rc, which you can think of as a shared box. Then to make it mutable anyway, we can use interior mutability by using a RefCell. So, essentially a Rc<RefCell<_>>, which you can think of as a sharable & mutable Box.
Finally, if you put it all together into a Vec you get: Vec<Rc<RefCell<dyn MyTrait>>> (no Box).
This allows you to have different types in the Vec, having some instances even multiple times in it, and still allowing mutable access to each of them.
I understand that the preferred way to implement something like a global/instance/module variable in Rust is to create said variable in main() or other common entry point and then pass it down to whoever needs it.
It also seems possible to use a lazy_static for an immutable variable, or it can be combined with a mutex to implement a mutable one.
In my case, I am using Rust to create a .so with bindings to Python and I need to have a large amount of mutable state stored within the Rust library (in response to many different function calls invoked by the Python application).
What is the preferred way to store that state?
Is it only via the mutable lazy_static approach since I have no main() (or more generally, any function which does not terminate between function calls from Python), or is there another way to do it?
Bundle it
In general, and absent other requirements, the answer is to bundle your state in some object and hand it over to the client. A popular name is Context.
Then, the client should have to pass the object around in each function call that requires it:
Either by defining the functionality as methods on the object.
Or by requiring the object as parameter of the functions/methods.
This gives full control to the client.
The client may end up creating a global for it, or may actually appreciate the flexibility of being able to juggle multiple instances.
Note: There is no need to provide any access to the inner state of the object; all the client needs is a handle (ref-counted, in Python) to control the lifetime and decide when to use which handle. In C, this would be a void*.
Exceptions
There are cases, such as a cache, where the functionality is not impacted, only the performance.
In this case, while the flexibility could be appreciated, it may be more of a burden than anything. A global, or thread-local, would then make sense.
I'd be tempted to dip into unsafe code here. You cannot use non-static lifetimes, as the lifetime of your state would be determined by the Python code, which Rust can't see. On the other hand, 'static state has other problems:
It necessarily persists until the end of the program, which means there's no way of recovering memory you're no longer using.
'static variables are essentially singletons, making it very difficult to write an application that makes multiple independent usages of your library.
I would go with a solution similar to what #Matthieu M. suggests, but instead of passing the entire data structure back and forth over the interface, allocate it on the heap, unsafely, and then pass some sort of handle (i.e. pointer) back and forth.
You would probably want to write a cleanup function, and document your library to compel users to call the cleanup function when they're done using a particular handle. Effectively, you're explicitly delegating the management of the lifecycle of the data to the calling code.
With this model, if desired, an application could create, use, and cleanup multiple datasets (each represented by their own handle) concurrently and independently. If an application "forgets" to cleanup a handle when finished, you have a memory leak, but that's no worse than storing the data in a 'static variable.
There may be helper crates and libraries to assist with doing this sort of thing. I'm not familiar enough with rust to know.
I have a struct that contains a field that is rather expensive to initialize, so I want to be able to do so lazily. However, this may be necessary in a method that takes &self. The field also needs to be able to modified once it is initialized, but this will only occur in methods that take &mut self.
What is the correct (as in idiomatic, as well as in thread-safe) way to do this in Rust? It seems to me that it would be trivial with either of the two constraints:
If it only needed to be lazily initialized, and not mutated, I could simply use lazy-init's Lazy<T> type.
If it only needed to be mutable and not lazy, then I could just use a normal field (obviously).
However, I'm not quite sure what to do with both in place. RwLock seems relevant, but it appears that there is considerable trickiness to thread-safe lazy initialization given what I've seen of lazy-init's source, so I am hesitant to roll my own solution based on it.
The simplest solution is RwLock<Option<T>>.
However, I'm not quite sure what to do with both in place. RwLock seems relevant, but it appears that there is considerable trickiness to thread-safe lazy initialization given what I've seen of lazy-init's source, so I am hesitant to roll my own solution based on it.
lazy-init uses tricky code because it guarantees lock-free access after creation. Lock-free is always a bit trickier.
Note that in Rust it's easy to tell whether something is tricky or not: tricky means using an unsafe block. Since you can use RwLock<Option<T>> without any unsafe block there is nothing for you to worry about.
A variant to RwLock<Option<T>> may be necessary if you want to capture a closure for initialization once, rather than have to pass it at each potential initialization call-site.
In this case, you'll need something like RwLock<SimpleLazy<T>> where:
enum SimpleLazy<T> {
Initialized(T),
Uninitialized(Box<FnOnce() -> T>),
}
You don't have to worry about making SimpleLazy<T> Sync as RwLock will take care of that for you.