Static references, multiple crates, and data from compile time to runtime - rust

I have some multi-crate Rust library project. This includes one module for proc-macro definitions.
For sake of simplicity, let's supose that the project contains 3 diferent crates, crate_a, crate_b, crate_c.
crate_a it's mainly used to expose the public API's, organizing reexports...
crate_b is where the proc-macro lives.
crate_c contains a static reference of contents.
crate_a depends on both others crates. crate_b depends on crate_c.
The purpose of crate_c it's to hold data that one macro of the crate_b generates at compile time, and them wire them into a client main() function.
Code on lib.rs of crate_c:
use std::sync::Mutex;
use lazy_static::lazy_static;
lazy_static! {
pub static ref REGISTER_ENTITIES: Mutex<Vec<RegisterEntity<'static>>> = Mutex::new(Vec::new());
pub static ref QUERIES_TO_EXECUTE: Mutex<Vec<String>> = Mutex::new(Vec::new());
}
A example of the macro on crate_b:
#[proc_macro_attribute]
pub fn macro(_meta: CompilerTokenStream, input: CompilerTokenStream) -> CompilerTokenStream {
// Parses the function that this attribute is attached to
let func_res = syn::parse::<FunctionParser>(input);
if func_res.is_err() {
return quote! { fn main() {} }.into()
}
let func = func_res.ok().unwrap();
let sign = func.clone().sig;
let body = func.clone().block.stmts;
let rt = tokio::runtime::Runtime::new().unwrap();
rt.block_on(async {
Handler::run().await; // Execute code that stores data on the `REGISTER_ENTITIES` static ref
});
// Saves data on the `QUERIES_TO_EXECUTE` static ref
let mut queries_tokens: Vec<TokenStream> = Vec::new();
wire_queries_to_execute(&mut queries_tokens);
// The final code wired in main()
quote! {
#[tokio::main]
async #sign {
{
#(#queries_tokens)*
}
#(#body)*
}
}.into()
}
where wire_queries_to_execute reduced for the example:
pub fn wire_queries_to_execute(manager_tokens: &mut Vec<TokenStream>) {
let mut queries = String::new();
for query in (*QUERIES_TO_EXECUTE).lock().unwrap().iter() {
queries.push_str(&(query.to_owned() + "->"));
}
let tokens = quote! {
*QUERIES_TO_EXECUTE.lock().unwrap() = #queries
.split("->")
.map(str::to_string)
.collect::<Vec<String>>();
DatabaseSyncOperations::from_query_register().await;
};
manager_tokens.push(tokens)
}
and then just the code used by some consumer of the library:
// use statements...
fn main() {
{
// here run, for example, queries to the database wired by the macro
}
// User's code
}
I can write some data to the static ref at compile time (when the macro runs) and then be retrieved. Is this always true?
Edit: For clarification, let's put an example:
At compile time macro saves some data into the static ref. Then, when the code of the user runs, it's able to fetch that data of the static ref. I wanted to know if there's some possibility of overwrite the data stored in the static ref by another program (or service, or whatever element) in the time between the compilation and the start the program.
Suppose that you, as user of my library, compile your binary, wait to hours, and then executes your binary.
What happens to that data? Change the value of two hours for the amount of time that you want, for the example.
How I can ensure that this data will be there, waiting for the runtime execution of the binary? Is possible that another program, thread or service of the OS, decides to allocate some data where resides the data of the static ref? Or it will be there until a reboot or a shutdown?
Even more, what it's the real connection between the data stored at runtime, and the one retrieved at runtime?
Or just, because it's a static ref it's stored inside the binary, and no such case as present above can happens?

Related

Extracting the saved local variables of the generator data structure of Future

From the "Rust for Rustaceans" book, I read that "... every await or yield is really a return from the function. After all, there are several local variables in the function, and it’s not clear how they’re restored when we resume later on. This is where the compiler-generated part of generators comes into play. The compiler transparently injects code to persist those variables into and read them from the generator’s associated data structure, rather than the stack, at the time of execution. So if you declare, write to, or read from some local variable a, you are really operating on something akin to self.a"
Say I have something like this:
use futures::future::{AbortHandle, Abortable};
use tokio::{time::sleep};
use std::{time::Duration};
async fn echo(s: String, times_to_repeat: u32) {
let mut vec = Vec::new();
for n in 0..times_to_repeat {
println!("Iteration {} Echoing {}", n, s.clone());
vec.push(s.clone());
sleep(Duration::from_millis(10)).await;
}
}
async fn child(s: String) {
echo(s, 100).await
}
#[tokio::main]
async fn main() {
let (abort_handle, abort_registration) = AbortHandle::new_pair();
let result_fut = Abortable::new(child(String::from("Hello")), abort_registration);
tokio::spawn(async move {
sleep(Duration::from_millis(100)).await;
abort_handle.abort();
});
result_fut.await.unwrap();
}
After abort has been called, how do I save/ serialize variables like n and vec? Is there a way to reach within the inside of the data structure of the generator that is generated from Future?

How do I write a custom attribute for const fields in my Rust library?

I'd like to write a custom attribute for a const field, which will later be accessed throughout my entire library.
Example
// default declaration in `my_lib`...
pub const INITIAL_VEC_CAPACITY: usize = 10;
//...but can be overriden by dependent crates...
#[mylib_initial_vec_capacity]
pub const INITIAL_VEC_CAPACITY: usize = 5;
//...then can be accessed within my crate:
pub fn do_something() {
let mut vec = Vec::with_capacity(macros::INITIAL_VEC_CAPACITY);
/* do stuff with vec */
}
How would I go about achieving this?
You can use features to allow the user the compilation process of your library.
https://doc.rust-lang.org/cargo/reference/features-examples.html
Other than that I think you should use some configuration object the user passes into your code or is able to configure from outside using one of the following crates
https://crates.io/crates/config
https://crates.io/crates/config-derive/0.1.6
https://crates.io/crates/envconfig

Is there a way to simplify the access to an inner functionality of web_sys?

After reading the Rust book, I've decided to give it a try with Web Assembly. I'm creating a simple tracker script to practice and learn more about it. There are a couple of methods that need to access the window, navigator or cookie API. Every time I have to access any of those there are a lot of boiler plate code involved:
pub fn start() {
let window = web_sys::window().unwrap();
let document = window.document().unwrap();
let html = document.dyn_into::<web_sys::HtmlDocument>().unwrap();
let cookie = html_document.cookie().unwrap();
}
That's unpractical and bothers me. Is there a smart way to solve this? I've in fact tried to use lazy_static to have all of this in a global.rs file:
#[macro_use]
extern crate lazy_static;
use web_sys::*;
lazy_static! {
static ref WINDOW: window = {
web_sys::window().unwrap()
};
}
But the compile fails with: *mut u8 cannot be shared between threads safely`.
You could use the ? operator instead of unwrapping.
Instead of writing
pub fn start() {
let window = web_sys::window().unwrap();
let document = window.document().unwrap();
let html = document.dyn_into::<web_sys::HtmlDocument>().unwrap();
let cookie = html_document.cookie().unwrap();
}
You can write
pub fn start() -> Result<()> {
let cookie = web_sys::window()?
.document()?
.dyn_into<web_sys::HtmlDocument>()?
.cookie()?;
Ok(())
}
It's the same number of lines, but less boilerplate and for simpler cases a one-liner.
If you really don't want to return a result you can wrap the whole thing in a lambda, (or a try block if you're happy using unstable features).
pub fn start() {
let cookie = (|| Result<Cookie)> {
web_sys::window()?
.document()?
.dyn_into<web_sys::HtmlDocument>()?
.cookie()
}).unwrap();
}
if you find you don't like repeating this frequently - you can use functions
fn document() -> Result<Document> {
web_sys::window()?.document()
}
fn html() -> Result<web_sys::HtmlDocument> {
document()?.dyn_into<web_sys::HtmlDocument>()
}
fn cookie() -> Result<Cookie> {
html()?.cookie()
}
pub fn start() {
let cookie = cookie()?;
}
That's unpractical and bothers me.
Unsure what's your issue here, but if you access the same cookie again and again in your application, perhaps you can save it in a struct and just use that struct? In my recent WebAssembly project I saved some of the elements I've used in a struct and used them by passing it around.
I also think that perhaps explaining your specific use case might lead to more specific answers :)

Rust/rocket pass variable to endpoints

Not to my preference but I'm forced to write some Rust today so I'm trying to create a Rocket instance with only one endpoint but, on that endpoint I need to access a variable that is being created during main. The variable takes a long time to be instantiated so that's why I do it there.
My problem is that I can;t find a way to pass it safely. Whatever I do, the compiler complaints about thread safety even though the library appears to be thread safe: https://github.com/brave/adblock-rust/pull/130 (commited code is found on my local instance)
This is the error tat I get:
|
18 | / lazy_static! {
19 | | static ref rules_engine: Mutex<Vec<Engine>> = Mutex::new(vec![]);
20 | | }
| |_^ `std::rc::Rc<std::cell::RefCell<lifeguard::CappedCollection<std::vec::Vec<u64>>>>` cannot be sent between threads safely
|
...and this is my code:
#![feature(proc_macro_hygiene, decl_macro)]
#[macro_use]
extern crate rocket;
use std::fs::File;
use std::io::{self, BufRead};
use std::path::Path;
use lazy_static::lazy_static;
use std::sync::Mutex;
use adblock::engine::Engine;
use adblock::lists::FilterFormat;
use rocket::request::{Form, FormError, FormDataError};
lazy_static! {
static ref rules_engine: Mutex<Vec<Engine>> = Mutex::new(vec![]);
}
fn main() {
if !Path::new("./rules.txt").exists() {
println!("rules file does not exist")
} else {
println!("loading rules");
let mut rules = vec![];
if let Ok(lines) = read_lines("./rules.txt") {
for line in lines {
if let Ok(ip) = line {
rules.insert(0, ip)
}
}
let eng = Engine::from_rules(&rules, FilterFormat::Standard);
rules_engine.lock().unwrap().push(eng);
rocket().launch();
}
}
}
#[derive(Debug, FromForm)]
struct FormInput<> {
#[form(field = "textarea")]
text_area: String
}
#[post("/", data = "<sink>")]
fn sink(sink: Result<Form<FormInput>, FormError>) -> String {
match sink {
Ok(form) => {
format!("{:?}", &*form)
}
Err(FormDataError::Io(_)) => format!("Form input was invalid UTF-8."),
Err(FormDataError::Malformed(f)) | Err(FormDataError::Parse(_, f)) => {
format!("Invalid form input: {}", f)
}
}
}
fn rocket() -> rocket::Rocket {
rocket::ignite().mount("/", routes![sink])
}
fn read_lines<P>(filename: P) -> io::Result<io::Lines<io::BufReader<File>>>
where P: AsRef<Path>, {
let file = File::open(filename)?;
Ok(io::BufReader::new(file).lines())
}
Any way of having the eng available inside the sink endpoint method?
Rc is not thread safe, even behind a mutex. It looks like Rc is used in eng.blocker.pool.pool which is a lifeguard::Pool. So no, the Engine is not thread safe (at least by default).
Fortunately, it appears that the adblock crate has a feature called "object-pooling", which enables that specific functionality. Removing that feature will (hopefully) make it thread safe.
Rocket makes it really easy to share resources between routes (and also between main or any other thread you might have spawned from main). They call their mechanism state. Check out its documentation here.
To give a short example of how it works:
You create your type that you want to share in your application and manage an instance of that type in the instance of rocket that you use for your application. In the guide they give this example:
use std::sync::atomic::AtomicUsize;
struct HitCount {
count: AtomicUsize
}
rocket::build().manage(HitCount { count: AtomicUsize::new(0) });
In a route then you access the resource like this (again from the guide):
use rocket::State;
#[get("/count")]
fn count(hit_count: &State<HitCount>) -> String {
let current_count = hit_count.count.load(Ordering::Relaxed);
format!("Number of visits: {}", current_count)
}
While I learnt rocket I needed to share a struct that contained a String, which is not thread safe per se. That means you need to wrap it into a Mutex before you can manage it with rocket.
Also, as far as I understand, only one resource of any specific type can be shared with manage. But you can just create differently named wrapper types in that case and work around that limitation.

Create immutable HashMap from Cargo build script

I'm using something similar to this answer to load some file data into a Rust binary. I'd like this data to be stored in a HashMap, so I'm able to search by key in the main program. However, I'm not sure how to do it in a way that ensures this data is immutable.
From the vector previously defined in build.rs, I assume I could do something like this in main.rs:
const ALL_THE_FILES: &[(&str, &[u8])] = &include!(concat!(env!("OUT_DIR"), "/all_the_files.rs"));
fn main () {
let all_the_files_hashmap: HashMap<&str, &[u8]> = ALL_THE_FILES.iter().cloned().collect();
// ...
}
Is there any way to construct this HashMap directly from build.rs and load it to a const in main.rs? If I replace the tuple-vector approach used in the linked answer with a HashMap definition iterating with something like data.insert(<key>,<value>);, can I load this into main.rs with include! in the same way, keeping the HashMap immutable?
Thanks a lot!
Regards
The easiest way to create a global read-only hashmap is to use lazy_static:
#[macro_use]
extern crate lazy_static;
use std::collections::HashMap;
const ALL_THE_FILES: &[(&str, &[u8])] = ...;
lazy_static! {
static ref FILE_TO_DATA: HashMap<&'static str, &'static [u8]> = {
let mut m = HashMap::new();
for (file, data) in ALL_THE_FILES {
m.insert(*file, *data);
}
m
};
}
fn main() {
println!("{:?}", FILE_TO_DATA.get("foo").unwrap());
}
Playground.
You could move this logic into code generated by build.rs, but it would be much less obvious what's going on.

Resources