I'm using something similar to this answer to load some file data into a Rust binary. I'd like this data to be stored in a HashMap, so I'm able to search by key in the main program. However, I'm not sure how to do it in a way that ensures this data is immutable.
From the vector previously defined in build.rs, I assume I could do something like this in main.rs:
const ALL_THE_FILES: &[(&str, &[u8])] = &include!(concat!(env!("OUT_DIR"), "/all_the_files.rs"));
fn main () {
let all_the_files_hashmap: HashMap<&str, &[u8]> = ALL_THE_FILES.iter().cloned().collect();
// ...
}
Is there any way to construct this HashMap directly from build.rs and load it to a const in main.rs? If I replace the tuple-vector approach used in the linked answer with a HashMap definition iterating with something like data.insert(<key>,<value>);, can I load this into main.rs with include! in the same way, keeping the HashMap immutable?
Thanks a lot!
Regards
The easiest way to create a global read-only hashmap is to use lazy_static:
#[macro_use]
extern crate lazy_static;
use std::collections::HashMap;
const ALL_THE_FILES: &[(&str, &[u8])] = ...;
lazy_static! {
static ref FILE_TO_DATA: HashMap<&'static str, &'static [u8]> = {
let mut m = HashMap::new();
for (file, data) in ALL_THE_FILES {
m.insert(*file, *data);
}
m
};
}
fn main() {
println!("{:?}", FILE_TO_DATA.get("foo").unwrap());
}
Playground.
You could move this logic into code generated by build.rs, but it would be much less obvious what's going on.
Related
While working with a HashMap that uses &'static str as the key type, I created a newtype to hash by the pointer rather than by the string contents to reduce overhead.
pub struct StaticStr(&'static str);
impl Hash for StaticStr {
fn hash<H: Hasher>(&self, state: &mut H) {
self.0.as_ptr().hash(state)
}
}
impl PartialEq for StaticStr {
fn eq(&self, other: &Self) -> bool {
self.0.as_ptr() == other.0.as_ptr()
}
}
impl Eq for StaticStr {}
It turns out that this does not work consistently, as in the following example.
pub type MyMap = HashMap<StaticStr, u8>;
pub const A: &str = "A";
pub fn make_map() -> MyMap {
let mut map = MyMap::new();
map.insert(StaticStr(A), 1);
map
}
pub fn get_value(control: &MyMap) -> Option<u8> {
control.get(&StaticStr(A)).cloned()
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
pub fn map_made_in_lib() {
let map = make_map();
assert_eq!(get_value(&map), Some(1));
}
#[test]
pub fn map_made_in_test() {
// Same as make_map()
let mut map = MyMap::new();
map.insert(StaticStr(A), 1);
// This check fails
assert_eq!(get_value(&map), Some(1));
}
}
Notice that in the first test, the string constant A is only used directly in the lib crate. In the second test, A is used directly in both the lib crate and the test crate. I discovered that although both tests use the same string constant, the pointers are different depending on which crate refers to the string constant by name. This is demonstrated in the minimal reproduction I created. I would have expected that the string literal be included only once for the crate that defines it, or at least that the linker would be smart enough to deduplicate the string literals. Is there a reason for this behavior?
Instead of a const try a static?
A constant item is an optionally named constant value which is not
associated with a specific memory location in the program. Constants
are essentially inlined wherever they are used, meaning that they are
copied directly into the relevant context when used. This includes
usage of constants from external crates, and non-Copy types.
References to the same constant are not necessarily guaranteed to
refer to the same memory address. -- The Rust Reference
A static item is similar to a constant, except that it represents a
precise memory location in the program. All references to the static
refer to the same memory location. Static items have the static
lifetime, which outlives all other lifetimes in a Rust program. Static
items do not call drop at the end of the program. -- The Rust Reference
I'm getting the follow compile error:
static optionsRegex: regex::Regex
= match regex::Regex::new(r###"$(~?[\w-]+(?:=[^,]*)?(?:,~?[\w-]+(?:=[^,]*)?)*)$"###) {
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ statics cannot evaluate destructors
Ok(r) => r,
Default => panic!("Invalid optionsRegex")
};
More details: I need to access a compiled regexp to be used by struct when creating. Any Rust documentation links or explanation appreciated.
P.S. I think I understand that Rust needs to know when to destruct it but I have no idea how to make it other than just avoid making it static and pass some struct with all the regexps every time it's needed when creating the struct.
Lazily initializing and safely re-using a static variable such as a regular expression is one of the primary use-cases of the once_cell crate. Here's an example of a validation regex that is only compiled once and re-used in a struct constructor function:
use once_cell::sync::OnceCell;
use regex::Regex;
struct Struct;
impl Struct {
fn new(options: &str) -> Result<Self, &str> {
static OPTIONS_REGEX: OnceCell<Regex> = OnceCell::new();
let options_regex = OPTIONS_REGEX.get_or_init(|| {
Regex::new(r###"$(~?[\w-]+(?:=[^,]*)?(?:,~?[\w-]+(?:=[^,]*)?)*)$"###).unwrap()
});
if options_regex.is_match(options) {
Ok(Struct)
} else {
Err("invalid options")
}
}
}
playground
After reading the Rust book, I've decided to give it a try with Web Assembly. I'm creating a simple tracker script to practice and learn more about it. There are a couple of methods that need to access the window, navigator or cookie API. Every time I have to access any of those there are a lot of boiler plate code involved:
pub fn start() {
let window = web_sys::window().unwrap();
let document = window.document().unwrap();
let html = document.dyn_into::<web_sys::HtmlDocument>().unwrap();
let cookie = html_document.cookie().unwrap();
}
That's unpractical and bothers me. Is there a smart way to solve this? I've in fact tried to use lazy_static to have all of this in a global.rs file:
#[macro_use]
extern crate lazy_static;
use web_sys::*;
lazy_static! {
static ref WINDOW: window = {
web_sys::window().unwrap()
};
}
But the compile fails with: *mut u8 cannot be shared between threads safely`.
You could use the ? operator instead of unwrapping.
Instead of writing
pub fn start() {
let window = web_sys::window().unwrap();
let document = window.document().unwrap();
let html = document.dyn_into::<web_sys::HtmlDocument>().unwrap();
let cookie = html_document.cookie().unwrap();
}
You can write
pub fn start() -> Result<()> {
let cookie = web_sys::window()?
.document()?
.dyn_into<web_sys::HtmlDocument>()?
.cookie()?;
Ok(())
}
It's the same number of lines, but less boilerplate and for simpler cases a one-liner.
If you really don't want to return a result you can wrap the whole thing in a lambda, (or a try block if you're happy using unstable features).
pub fn start() {
let cookie = (|| Result<Cookie)> {
web_sys::window()?
.document()?
.dyn_into<web_sys::HtmlDocument>()?
.cookie()
}).unwrap();
}
if you find you don't like repeating this frequently - you can use functions
fn document() -> Result<Document> {
web_sys::window()?.document()
}
fn html() -> Result<web_sys::HtmlDocument> {
document()?.dyn_into<web_sys::HtmlDocument>()
}
fn cookie() -> Result<Cookie> {
html()?.cookie()
}
pub fn start() {
let cookie = cookie()?;
}
That's unpractical and bothers me.
Unsure what's your issue here, but if you access the same cookie again and again in your application, perhaps you can save it in a struct and just use that struct? In my recent WebAssembly project I saved some of the elements I've used in a struct and used them by passing it around.
I also think that perhaps explaining your specific use case might lead to more specific answers :)
I am writing a wrapper/FFI for a C library that requires a global initialization call in the main thread as well as one for destruction.
Here is how I am currently handling it:
struct App;
impl App {
fn init() -> Self {
unsafe { ffi::InitializeMyCLib(); }
App
}
}
impl Drop for App {
fn drop(&mut self) {
unsafe { ffi::DestroyMyCLib(); }
}
}
which can be used like:
fn main() {
let _init_ = App::init();
// ...
}
This works fine, but it feels like a hack, tying these calls to the lifetime of an unnecessary struct. Having the destructor in a finally (Java) or at_exit (Ruby) block seems theoretically more appropriate.
Is there some more graceful way to do this in Rust?
EDIT
Would it be possible/safe to use this setup like so (using the lazy_static crate), instead of my second block above:
lazy_static! {
static ref APP: App = App::new();
}
Would this reference be guaranteed to be initialized before any other code and destroyed on exit? Is it bad practice to use lazy_static in a library?
This would also make it easier to facilitate access to the FFI through this one struct, since I wouldn't have to bother passing around the reference to the instantiated struct (called _init_ in my original example).
This would also make it safer in some ways, since I could make the App struct default constructor private.
I know of no way of enforcing that a method be called in the main thread beyond strongly-worded documentation. So, ignoring that requirement... :-)
Generally, I'd use std::sync::Once, which seems basically designed for this case:
A synchronization primitive which can be used to run a one-time global
initialization. Useful for one-time initialization for FFI or related
functionality. This type can only be constructed with the ONCE_INIT
value.
Note that there's no provision for any cleanup; many times you just have to leak whatever the library has done. Usually if a library has a dedicated cleanup path, it has also been structured to store all that initialized data in a type that is then passed into subsequent functions as some kind of context or environment. This would map nicely to Rust types.
Warning
Your current code is not as protective as you hope it is. Since your App is an empty struct, an end-user can construct it without calling your method:
let _init_ = App;
We will use a zero-sized argument to prevent this. See also What's the Rust idiom to define a field pointing to a C opaque pointer? for the proper way to construct opaque types for FFI.
Altogether, I'd use something like this:
use std::sync::Once;
mod ffi {
extern "C" {
pub fn InitializeMyCLib();
pub fn CoolMethod(arg: u8);
}
}
static C_LIB_INITIALIZED: Once = Once::new();
#[derive(Copy, Clone)]
struct TheLibrary(());
impl TheLibrary {
fn new() -> Self {
C_LIB_INITIALIZED.call_once(|| unsafe {
ffi::InitializeMyCLib();
});
TheLibrary(())
}
fn cool_method(&self, arg: u8) {
unsafe { ffi::CoolMethod(arg) }
}
}
fn main() {
let lib = TheLibrary::new();
lib.cool_method(42);
}
I did some digging around to see how other FFI libs handle this situation. Here is what I am currently using (similar to #Shepmaster's answer and based loosely on the initialization routine of curl-rust):
fn initialize() {
static INIT: Once = ONCE_INIT;
INIT.call_once(|| unsafe {
ffi::InitializeMyCLib();
assert_eq!(libc::atexit(cleanup), 0);
});
extern fn cleanup() {
unsafe { ffi::DestroyMyCLib(); }
}
}
I then call this function inside the public constructors for my public structs.
This question already has answers here:
How do I create a global, mutable singleton?
(7 answers)
Closed 7 years ago.
I want to have an extendable dictionary linking together Object with a &'static str inside my library. HashMap seems like the right data structure for this, but how do I make it global, initialised on declaration and mutable?
So something like this:
use std::collections::HashMap;
enum Object { A, B, C }
const OBJECT_STR: &'static [&'static str] = &[ "a", "b", "c" ];
static mut word_map: HashMap<&'static str, Object> = {
let mut m = HashMap::new();
m.insert(OBJECT_STR[0], Object::A);
m.insert(OBJECT_STR[1], Object::B);
m.insert(OBJECT_STR[2], Object::C);
m
};
impl Object {
...
}
This would be possible with the lazy_static crate. As seen in their example. Since mutablity accessing a static variable is unsafe, it would need to wrapped into a Mutex. I would recommend not making the HashMap public, but instead provide a set of methods that lock, and provide access to the HashMap. See this answer on making a globally mutable singleton.
#[macro_use]
extern crate lazy_static;
use std::collections::HashMap;
use std::sync::Mutex;
lazy_static! {
static ref HASHMAP: Mutex<HashMap<u32, &'static str>> = {
let mut m = HashMap::new();
m.insert(0, "foo");
m.insert(1, "bar");
m.insert(2, "baz");
Mutex::new(m)
};
}
fn main() {
let mut map = HASHMAP.lock().unwrap();
map.insert(3, "sample");
}