How to preload lazy_static variables in Rust?

How to preload lazy_static variables in Rust? - rust

I use Rust's lazy_static crate to assign a frequently used database object to a global variable but I do not want lazy loading. Is there a way to trigger lazy_static to preload the variable or is there a better way to achieve this? All the functions of the database are expensive and it seems to not be enough to assign a reference.
lazy_static! {
static ref DB: DataBase = load_db();
}
/// this is too slow
#[allow(unused_must_use)]
pub fn preload1() { DB.slow_function(); }
/// this does not cause lazy static to trigger
pub fn preload2() { let _ = &DB; }

_ does not bind, so use a variable name (with leading underscore to avoid the lint):
use lazy_static::lazy_static; // 1.4.0
struct DataBase;
fn load_db() -> DataBase {
println!("db loaded");
DataBase
}
lazy_static! {
static ref DB: DataBase = load_db();
}
fn main() {
let _db: &DataBase = &*DB; // prints "db loaded"
}
playground

Related

How do I get a simple mutable thread-local struct in Rust?

I'm building an interpreter with a garbage collector. I want a thread-local nursery region, and a shared older region. I am having trouble setting up the nursery. I have:
const NurserySize : usize = 25000;
#[thread_local]
static mut NurseryMemory : [usize;NurserySize] = [0;NurserySize];
thread_local! {
static Nursery: AllocableRegion = AllocableRegion::makeLocal(unsafe{&mut NurseryMemory});
}
#[cfg(test)]
mod testMemory {
use super::*;
#[test]
fn test1() {
Nursery.with(|n| n.allocObject(10));
}
}
First question is why do I need the unsafe - NurseryMemory is thread local, so access can't be unsafe?
Second question is how can I actually use this? The code is at playground, but it doesn't compile and attempts I've made to fix it seem to make it worse.

1. Why is unsafe required to get a reference to a mutable ThreadLocal?
The same reason that you need unsafe for a normal mutable static,
you would be able to create aliasing mut pointers in safe code.
The following incorrectly creates two mutable references to the mutable thread local.
#![feature(thread_local)]
#[thread_local]
static mut SomeValue: Result<&str, usize> = Ok("hello world");
pub fn main() {
let first = unsafe {&mut SomeValue};
let second = unsafe {&mut SomeValue};
if let Ok(string) = first {
*second = Err(0); // should invalidate the string reference, but it doesn't
println!("{}", string) // as first and second are considered to be disjunct
}
}
first wouldn't even need to be a mutable reference for this to be a problem.
2. How to fix the code?
You could use a RefCell around the AllocatableRegion to dynamically enforce the borrowing of the inner value.
const NurserySize : usize = 25000;
#[thread_local]
static mut NurseryMemory : [usize;NurserySize] = [0;NurserySize];
thread_local! {
static Nursery: RefCell<AllocableRegion> = RefCell::new(AllocableRegion::makeLocal(unsafe{&mut NurseryMemory}));
}
#[cfg(test)]
mod testMemory {
use super::*;
#[test]
fn test1() {
Nursery.with(|n| n.borrow_mut().allocObject(10));
}
}

You don't need unsafe to make a mutable thread-local struct in rust. However, Nursery does need to be a RefCell. This is sufficient:
use std::cell::RefCell;
const NURSERY_SIZE: usize = 300;
thread_local! {
static NURSERY: RefCell<[usize; NURSERY_SIZE]> = RefCell::new([0; NURSERY_SIZE]);
}
#[cfg(test)]
mod test_memory {
use super::*;
#[test]
fn test1() {
NURSERY.with(|n| n.borrow_mut()[10] = 20);
}
}
Rust Playground link

"statics cannot evaluate destructors" in Rust

I'm getting the follow compile error:
static optionsRegex: regex::Regex
= match regex::Regex::new(r###"$(~?[\w-]+(?:=[^,]*)?(?:,~?[\w-]+(?:=[^,]*)?)*)$"###) {
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ statics cannot evaluate destructors
Ok(r) => r,
Default => panic!("Invalid optionsRegex")
};
More details: I need to access a compiled regexp to be used by struct when creating. Any Rust documentation links or explanation appreciated.
P.S. I think I understand that Rust needs to know when to destruct it but I have no idea how to make it other than just avoid making it static and pass some struct with all the regexps every time it's needed when creating the struct.

Lazily initializing and safely re-using a static variable such as a regular expression is one of the primary use-cases of the once_cell crate. Here's an example of a validation regex that is only compiled once and re-used in a struct constructor function:
use once_cell::sync::OnceCell;
use regex::Regex;
struct Struct;
impl Struct {
fn new(options: &str) -> Result<Self, &str> {
static OPTIONS_REGEX: OnceCell<Regex> = OnceCell::new();
let options_regex = OPTIONS_REGEX.get_or_init(|| {
Regex::new(r###"$(~?[\w-]+(?:=[^,]*)?(?:,~?[\w-]+(?:=[^,]*)?)*)$"###).unwrap()
});
if options_regex.is_match(options) {
Ok(Struct)
} else {
Err("invalid options")
}
}
}
playground

What is the Rust equivalent of C++'s shared_from_this?

I have an object that I know that is inside an Arc because all the instances are always Arced. I would like to be able to pass a cloned Arc of myself in a function call. The thing I am calling will call me back later on other threads.
In C++, there is a standard mixin called enable_shared_from_this. It enables me to do exactly this
class Bus : public std::enable_shared_from_this<Bus>
{
....
void SetupDevice(Device device,...)
{
device->Attach(shared_from_this());
}
}
If this object is not under shared_ptr management (the closest C++ has to Arc) then this will fail at run time.
I cannot find an equivalent.
EDIT:
Here is an example of why its needed. I have a timerqueue library. It allows a client to request an arbitrary closure to be run at some point in the future. The code is run on a dedicated thread. To use it you must pass a closure of the function you want to be executed later.
use std::time::{Duration, Instant};
use timerqueue::*;
use parking_lot::Mutex;
use std::sync::{Arc,Weak};
use std::ops::{DerefMut};
// inline me keeper cos not on github
pub struct MeKeeper<T> {
them: Mutex<Weak<T>>,
}
impl<T> MeKeeper<T> {
pub fn new() -> Self {
Self {
them: Mutex::new(Weak::new()),
}
}
pub fn save(&self, arc: &Arc<T>) {
*self.them.lock().deref_mut() = Arc::downgrade(arc);
}
pub fn get(&self) -> Arc<T> {
match self.them.lock().upgrade() {
Some(arc) => return arc,
None => unreachable!(),
}
}
}
// -----------------------------------
struct Test {
data:String,
me: MeKeeper<Self>,
}
impl Test {
pub fn new() -> Arc<Test>{
let arc = Arc::new(Self {
me: MeKeeper::new(),
data: "Yo".to_string()
});
arc.me.save(&arc);
arc
}
fn task(&self) {
println!("{}", self.data);
}
// in real use case the TQ and a ton of other status data is passed in the new call for Test
// to keep things simple here the 'container' passes tq as an arg
pub fn do_stuff(&self, tq: &TimerQueue) {
// stuff includes a async task that must be done in 1 second
//.....
let me = self.me.get().clone();
tq.queue(
Box::new(move || me.task()),
"x".to_string(),
Instant::now() + Duration::from_millis(1000),
);
}
}
fn main() {
// in real case (PDP11 emulator) there is a Bus class owning tons of objects thats
// alive for the whole duration
let tq = Arc::new(TimerQueue::new());
let test = Test::new();
test.do_stuff(&*tq);
// just to keep everything alive while we wait
let mut input = String::new();
std::io::stdin().read_line(&mut input).unwrap();
}
cargo toml
[package]
name = "tqclient"
version = "0.1.0"
edition = "2018"
# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html
[dependencies]
timerqueue = { git = "https://github.com/pm100/timerqueue.git" }
parking_lot = "0.11"

There is no way to go from a &self to the Arc that self is stored in. This is because:
Rust references have additional assumptions compared to C++ references that would make such a conversion undefined behavior.
Rust's implementation of Arc does not even expose the information necessary to determine whether self is stored in an Arc or not.
Luckily, there is an alternative approach. Instead of creating a &self to the value inside the Arc, and passing that to the method, pass the Arc directly to the method that needs to access it. You can do that like this:
use std::sync::Arc;
struct Shared {
field: String,
}
impl Shared {
fn print_field(self: Arc<Self>) {
let clone: Arc<Shared> = self.clone();
println!("{}", clone.field);
}
}
Then the print_field function can only be called on an Shared encapsulated in an Arc.

having found that I needed this three times in recent days I decided to stop trying to come up with other designs. Maybe poor data design as far as rust is concerned but I needed it.
Works by changing the new function of the types using it to return an Arc rather than a raw self. All my objects are arced anyway, before they were arced by the caller, now its forced.
mini util library called mekeeper
use parking_lot::Mutex;
use std::sync::{Arc,Weak};
use std::ops::{DerefMut};
pub struct MeKeeper<T> {
them: Mutex<Weak<T>>,
}
impl<T> MeKeeper<T> {
pub fn new() -> Self {
Self {
them: Mutex::new(Weak::new()),
}
}
pub fn save(&self, arc: &Arc<T>) {
*self.them.lock().deref_mut() = Arc::downgrade(arc);
}
pub fn get(&self) -> Arc<T> {
match self.them.lock().upgrade() {
Some(arc) => return arc,
None => unreachable!(),
}
}
}
to use it
pub struct Test {
me: MeKeeper<Self>,
foo:i8,
}
impl Test {
pub fn new() -> Arc<Self> {
let arc = Arc::new(Test {
me: MeKeeper::new(),
foo:42
});
arc.me.save(&arc);
arc
}
}
now when an instance of Test wants to call a function that requires it to pass in an Arc it does:
fn nargle(){
let me = me.get();
Ooddle::fertang(me,42);// fertang needs an Arc<T>
}
the weak use is what the shared_from_this does so as to prevent refcount deadlocks, I stole that idea.
The unreachable path is safe because the only place that can call MeKeeper::get is the instance of T (Test here) that owns it and that call can only happen if the T instance is alive. Hence no none return from weak::upgrade

How do I keep internal state in a WebAssembly module written in Rust?

I want to do computations on a large set of data each frame of my web app. Only a subset of this will be used by JavaScript, so instead of sending the entire set of data back and forth between WebAssembly and JavaScript each frame, it would be nice if the data was maintained internally in my WebAssembly module.
In C, something like this works:
#include <emscripten/emscripten.h>
int state = 0;
void EMSCRIPTEN_KEEPALIVE inc() {
state++;
}
int EMSCRIPTEN_KEEPALIVE get() {
return state;
}
Is the same thing possible in Rust? I tried doing it with a static like this:
static mut state: i32 = 0;
pub fn main() {}
#[no_mangle]
pub fn add() {
state += 1;
}
#[no_mangle]
pub fn get() -> i32 {
state
}
But it seems static variables cannot be mutable.

Francis Gagné is absolutely correct that global variables generally make your code worse and you should avoid them.
However, for the specific case of WebAssembly as it is today, we don't have to worry about this concern:
if you have multiple threads
We can thus choose to use mutable static variables, if we have a very good reason to do so:
// Only valid because we are using this in a WebAssembly
// context without threads.
static mut STATE: i32 = 0;
#[no_mangle]
pub extern fn add() {
unsafe { STATE += 1 };
}
#[no_mangle]
pub extern fn get() -> i32 {
unsafe { STATE }
}
We can see the behavior with this NodeJS driver program:
const fs = require('fs-extra');
fs.readFile(__dirname + '/target/wasm32-unknown-unknown/release/state.wasm')
.then(bytes => WebAssembly.instantiate(bytes))
.then(({ module, instance }) => {
const { get, add } = instance.exports;
console.log(get());
add();
add();
console.log(get());
});
0
2

error[E0133]: use of mutable static requires unsafe function or block
In general, accessing mutable global variables is unsafe, which means that you can only do it in an unsafe block. With mutable global variables, it's easy to accidentally create dangling references (think of a reference to an item of a global mutable Vec), data races (if you have multiple threads – Rust doesn't care that you don't actually use threads) or otherwise invoke undefined behavior.
Global variables are usually not the best solution to a problem because it makes your software less flexible and less reusable. Instead, consider passing the state explicitly (by reference, so you don't need to copy it) to the functions that need to operate on it. This lets the calling code work with multiple independent states.
Here's an example of allocating unique state and modifying that:
type State = i32;
#[no_mangle]
pub extern fn new() -> *mut State {
Box::into_raw(Box::new(0))
}
#[no_mangle]
pub extern fn free(state: *mut State) {
unsafe { Box::from_raw(state) };
}
#[no_mangle]
pub extern fn add(state: *mut State) {
unsafe { *state += 1 };
}
#[no_mangle]
pub extern fn get(state: *mut State) -> i32 {
unsafe { *state }
}
const fs = require('fs-extra');
fs.readFile(__dirname + '/target/wasm32-unknown-unknown/release/state.wasm')
.then(bytes => WebAssembly.instantiate(bytes))
.then(({ module, instance }) => {
const { new: newFn, free, get, add } = instance.exports;
const state1 = newFn();
const state2 = newFn();
add(state1);
add(state2);
add(state1);
console.log(get(state1));
console.log(get(state2));
free(state1);
free(state2);
});
2
1
Note — This currently needs to be compiled in release mode to work. Debugging mode has some issues at the moment.
Admittedly, this is not less unsafe because you're passing raw pointers around, but it makes it clearer in the calling code that there is some mutable state being manipulated. Also note that it is now the responsibility of the caller to ensure that the state pointer is being handled correctly.

Recommended way to wrap C lib initialization/destruction routine

I am writing a wrapper/FFI for a C library that requires a global initialization call in the main thread as well as one for destruction.
Here is how I am currently handling it:
struct App;
impl App {
fn init() -> Self {
unsafe { ffi::InitializeMyCLib(); }
App
}
}
impl Drop for App {
fn drop(&mut self) {
unsafe { ffi::DestroyMyCLib(); }
}
}
which can be used like:
fn main() {
let _init_ = App::init();
// ...
}
This works fine, but it feels like a hack, tying these calls to the lifetime of an unnecessary struct. Having the destructor in a finally (Java) or at_exit (Ruby) block seems theoretically more appropriate.
Is there some more graceful way to do this in Rust?
EDIT
Would it be possible/safe to use this setup like so (using the lazy_static crate), instead of my second block above:
lazy_static! {
static ref APP: App = App::new();
}
Would this reference be guaranteed to be initialized before any other code and destroyed on exit? Is it bad practice to use lazy_static in a library?
This would also make it easier to facilitate access to the FFI through this one struct, since I wouldn't have to bother passing around the reference to the instantiated struct (called _init_ in my original example).
This would also make it safer in some ways, since I could make the App struct default constructor private.

I know of no way of enforcing that a method be called in the main thread beyond strongly-worded documentation. So, ignoring that requirement... :-)
Generally, I'd use std::sync::Once, which seems basically designed for this case:
A synchronization primitive which can be used to run a one-time global
initialization. Useful for one-time initialization for FFI or related
functionality. This type can only be constructed with the ONCE_INIT
value.
Note that there's no provision for any cleanup; many times you just have to leak whatever the library has done. Usually if a library has a dedicated cleanup path, it has also been structured to store all that initialized data in a type that is then passed into subsequent functions as some kind of context or environment. This would map nicely to Rust types.
Warning
Your current code is not as protective as you hope it is. Since your App is an empty struct, an end-user can construct it without calling your method:
let _init_ = App;
We will use a zero-sized argument to prevent this. See also What's the Rust idiom to define a field pointing to a C opaque pointer? for the proper way to construct opaque types for FFI.
Altogether, I'd use something like this:
use std::sync::Once;
mod ffi {
extern "C" {
pub fn InitializeMyCLib();
pub fn CoolMethod(arg: u8);
}
}
static C_LIB_INITIALIZED: Once = Once::new();
#[derive(Copy, Clone)]
struct TheLibrary(());
impl TheLibrary {
fn new() -> Self {
C_LIB_INITIALIZED.call_once(|| unsafe {
ffi::InitializeMyCLib();
});
TheLibrary(())
}
fn cool_method(&self, arg: u8) {
unsafe { ffi::CoolMethod(arg) }
}
}
fn main() {
let lib = TheLibrary::new();
lib.cool_method(42);
}

I did some digging around to see how other FFI libs handle this situation. Here is what I am currently using (similar to #Shepmaster's answer and based loosely on the initialization routine of curl-rust):
fn initialize() {
static INIT: Once = ONCE_INIT;
INIT.call_once(|| unsafe {
ffi::InitializeMyCLib();
assert_eq!(libc::atexit(cleanup), 0);
});
extern fn cleanup() {
unsafe { ffi::DestroyMyCLib(); }
}
}
I then call this function inside the public constructors for my public structs.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

How to preload lazy_static variables in Rust? - rust

Related

How do I get a simple mutable thread-local struct in Rust?

"statics cannot evaluate destructors" in Rust

What is the Rust equivalent of C++'s shared_from_this?

How do I keep internal state in a WebAssembly module written in Rust?

Recommended way to wrap C lib initialization/destruction routine

Categories

Resources