How to implement long-lived variables/state in a library?

How to implement long-lived variables/state in a library? - rust

I understand that the preferred way to implement something like a global/instance/module variable in Rust is to create said variable in main() or other common entry point and then pass it down to whoever needs it.
It also seems possible to use a lazy_static for an immutable variable, or it can be combined with a mutex to implement a mutable one.
In my case, I am using Rust to create a .so with bindings to Python and I need to have a large amount of mutable state stored within the Rust library (in response to many different function calls invoked by the Python application).
What is the preferred way to store that state?
Is it only via the mutable lazy_static approach since I have no main() (or more generally, any function which does not terminate between function calls from Python), or is there another way to do it?

Bundle it
In general, and absent other requirements, the answer is to bundle your state in some object and hand it over to the client. A popular name is Context.
Then, the client should have to pass the object around in each function call that requires it:
Either by defining the functionality as methods on the object.
Or by requiring the object as parameter of the functions/methods.
This gives full control to the client.
The client may end up creating a global for it, or may actually appreciate the flexibility of being able to juggle multiple instances.
Note: There is no need to provide any access to the inner state of the object; all the client needs is a handle (ref-counted, in Python) to control the lifetime and decide when to use which handle. In C, this would be a void*.
Exceptions
There are cases, such as a cache, where the functionality is not impacted, only the performance.
In this case, while the flexibility could be appreciated, it may be more of a burden than anything. A global, or thread-local, would then make sense.

I'd be tempted to dip into unsafe code here. You cannot use non-static lifetimes, as the lifetime of your state would be determined by the Python code, which Rust can't see. On the other hand, 'static state has other problems:
It necessarily persists until the end of the program, which means there's no way of recovering memory you're no longer using.
'static variables are essentially singletons, making it very difficult to write an application that makes multiple independent usages of your library.
I would go with a solution similar to what #Matthieu M. suggests, but instead of passing the entire data structure back and forth over the interface, allocate it on the heap, unsafely, and then pass some sort of handle (i.e. pointer) back and forth.
You would probably want to write a cleanup function, and document your library to compel users to call the cleanup function when they're done using a particular handle. Effectively, you're explicitly delegating the management of the lifecycle of the data to the calling code.
With this model, if desired, an application could create, use, and cleanup multiple datasets (each represented by their own handle) concurrently and independently. If an application "forgets" to cleanup a handle when finished, you have a memory leak, but that's no worse than storing the data in a 'static variable.
There may be helper crates and libraries to assist with doing this sort of thing. I'm not familiar enough with rust to know.

Related

When should I use Pin<Arc<T>> in Rust?

I'm pretty new to Rust and have a couple different implementations of a method that includes a closure referencing self. To use the reference in the closure effectively, I've been using Arc<Self> (I am multithreading) and Pin<Arc<Self>>.
I would like to make this method as generally memory efficient as possible. I assume pinning the Arc in memory would help with this. However, (a) I've read that Arcs are pinned and (b) it seems like Pin<Arc<T>> may require additional allocations.
What is Pin<Arc<T>> good for?

Adding Pin around some pointer type does not change the behavior of the program. It only adds a restriction on what further code you can write (and even that, only if the T in Pin<Arc<T>> is not Unpin, which most types are).
Therefore, there is no "memory efficiency" to be gained by adding Pin.
The only use of Pin is to allow working with types that require they be pinned to use them, such as Futures.

Is there a way in rust to mark a type as non-droppable?

I would like to make it a compiler error to allow a type to be dropped, instead it must be forgotten. My use case is for a type the represents a handle of sorts that must be returned to its source for cleanup. This way a user of the API cannot accidentally leak the handle. They would be required to either return the handle to its source or explicitly forget it. In the source, the associated resources would be cleaned up and the handle explicitly forgotten.

The article The Pain Of Real Linear Types in Rust mentions this. Relevant quote:
One extreme option that I've seen is to implement drop() as
abort("this value must be used"). All "proper" consumers then
mem::forget the value, preventing this "destructor bomb" from going
off. This provides a dynamic version of strict must-use values.
Although it's still vulnerable to the few ways destructors can leak,
this isn't a significant concern in practice. Mostly it just stinks
because it's dynamic and Rust users Want Static Verification.
Ultimately, Rust lacks "proper" support for this kind of type.
So, assuming you want static checks, the answer is no.

You could require the user to pass a function object that returns the handle (FnOnce(Handle) -> Handle), as long as there aren't any other ways to create a handle.

What's the right way to have a thread-safe lazy-initialized possibly mutable value in Rust?

I have a struct that contains a field that is rather expensive to initialize, so I want to be able to do so lazily. However, this may be necessary in a method that takes &self. The field also needs to be able to modified once it is initialized, but this will only occur in methods that take &mut self.
What is the correct (as in idiomatic, as well as in thread-safe) way to do this in Rust? It seems to me that it would be trivial with either of the two constraints:
If it only needed to be lazily initialized, and not mutated, I could simply use lazy-init's Lazy<T> type.
If it only needed to be mutable and not lazy, then I could just use a normal field (obviously).
However, I'm not quite sure what to do with both in place. RwLock seems relevant, but it appears that there is considerable trickiness to thread-safe lazy initialization given what I've seen of lazy-init's source, so I am hesitant to roll my own solution based on it.

The simplest solution is RwLock<Option<T>>.
However, I'm not quite sure what to do with both in place. RwLock seems relevant, but it appears that there is considerable trickiness to thread-safe lazy initialization given what I've seen of lazy-init's source, so I am hesitant to roll my own solution based on it.
lazy-init uses tricky code because it guarantees lock-free access after creation. Lock-free is always a bit trickier.
Note that in Rust it's easy to tell whether something is tricky or not: tricky means using an unsafe block. Since you can use RwLock<Option<T>> without any unsafe block there is nothing for you to worry about.
A variant to RwLock<Option<T>> may be necessary if you want to capture a closure for initialization once, rather than have to pass it at each potential initialization call-site.
In this case, you'll need something like RwLock<SimpleLazy<T>> where:
enum SimpleLazy<T> {
Initialized(T),
Uninitialized(Box<FnOnce() -> T>),
}
You don't have to worry about making SimpleLazy<T> Sync as RwLock will take care of that for you.

magic statics: similar constructs, interesting non-obvious uses?

C++11 introduced threadsafe local static initialization, aka "magic statics": Is local static variable initialization thread-safe in C++11?
In particular, the spec says:
If control enters the declaration concurrently while the variable is
being initialized, the concurrent execution shall wait for completion
of the initialization.
So there's an implicit mutex lock here. This is very interesting, and seems like an anomaly-- that is, I don't know of any other implicit mutexes built into c++ (i.e. mutex semantics without any use of things like std::mutex). Are there any others, or is this unique in the spec?
I'm also curious whether magic static's implicit mutex (or other implicit mutexes, if there are any) can be leveraged to implement other synchronization primitives. For example, I see that they can be used to implement std::call_once, since this:
std::call_once(onceflag, some_function);
can be expressed as this:
static int dummy = (some_function(), 0);
Note, however, that the magic static version is more limited than std::call_once, since with std::call_once you could re-initialize onceflag and so use the code multiple times per program execution, whereas with magic statics, you really only get to use it once per program execution.
That's the only somewhat non-obvious use of magic statics that I can think of.
Is it possible to use magic static's implicit mutex to implement other synchronization primitives, e.g. a general std::mutex, or other useful things?

Initialization of block-scope static variables is the only place where the language requires synchronization. Several library functions require synchronization, but aren't directly synchronization functions (e.g. atexit).
Since the synchronization on the initialization of a local static is a one-time affair, it would be hard, if not impossible, to implement a general purpose synchronization mechanism on top of it, since every time you needed a synchronization point you would need to be initializing a different local static object.
Though they can be used in place of call_once in some circumstances, they can't be used as a general replacement for that, since a given once_flag object may be used from many places.

Share immutable reference in a HTTP Server

I'm currently building a HTTP service exposing actions on a unique object.
I already created the central object, with several methods taking immutable &self references, and using internally various efficient synchronization structures to access the data inside (all the code is unsafe-free). My thought was that this would be enough to make it safe to use concurrently.
And then comes the hard part of actually connecting it to a HTTP server.
I'm currently trying to use Iron, but I could switch to Nickel.rs or any other if it makes things easier.
Most HTTP server examples I saw used stateless handlers, without any access to local variables. I now understand why: it's near-impossible to do.
Here is an example of what I'd like to do, using Nickel.rs:
https://gist.github.com/Gyscos/42510a335098ce935848
Here is a similar failed attempt using Iron:
https://gist.github.com/Gyscos/92e56e95baee0ebce78f
The basic idea being that obj only lives for the duration of the scope, but so does server, so it shouldn't be a big deal... right?
Unfortunately each of my attempts failed. When trying to give the server a closure that accesses self.object, I get an error saying that the closure might outlive the reference.
I saw Iron provided a shared memory module with a Read structure. Not only does it look overly complicated for my needs, I also would like to avoid paying the Arc price when I don't need it (I have a clear view of the lifecycle of the object and really don't need to count references).
The current solution I see would be to have a static Object and use that instead of one specific to MyServer, but I'd prefer to avoid this level of ugliness if possible.
I come from golang, where this is not a problem (I could use object-bound methods as handler for that). So my question is: how do you easily access a shared immutable reference from your HTTP handlers in Rust?

Note that I have no prior experience with Nickel or Iron (personally I'm using SCGI so far).
The error I got compiling your Nickel example is:
<nickel macros>:7:9: 7:70 error: captured variable `obj` does not outlive the enclosing closure
The error happens in the following snippet:
server.utilize(router! {
get "/foo" => |_req, _res| {
obj.foo();
}
});
Now, router! is just a fancy macros to wrap your closure in a Middleware, with some additional checks built in. For our investigation we might want to get to the roots and use the Middleware directly.
Unfortunately Nickel explicitly requires the Middleware to have a 'static lifetime. That's a quirk of the Nickel API design and doesn't have much with Rust per se (except for the fact that Rust allows the library to require such things from the user).
I see two options then. First is to use our superior knowledge of the object lifetime (Nickel doesn't know that our object outlives the server but we do) and tell the compiler so. Compiler allows us to show our superiour knowledge with the helpful unsafe and transmute primitives.
Here it is, working: unsafe.rs.
In Rust unsafe means "this piece of code is safe because the programmer said so". Every unsafe piece must satisfy this safety contract between the programmer and the compiler. In this case we know that the object outlives the server so the safety guarantee is maintained.
The second option is to pay the price for tricks that would satisfy the requirements of the Nickel API. Here we use a scoped thread-local storage for this: thread_local.rs.
Bottom line: Nickel API has requirements that make you jump through some hoops to get you where you want. I haven't investigated the Iron API. I have a feeling that you might have a better luck with the lower-level Hyper API.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string