Rust: initialize a static variable/reference in a lib?

Rust: initialize a static variable/reference in a lib? - rust

I'm new to Rust. I'm trying to create a static variable DATA of Vec<u8> in a library so that it is initialized after the compilation of the lib. I then include the lib in the main code hoping to use DATA directly without calling init_data() again. Here's what I've tried:
my_lib.rs:
use lazy_static::lazy_static;
pub fn init_data() -> Vec<u8> {
// some expensive calculations
}
lazy_static! {
pub static ref DATA: Vec<u8> = init_data(); // supposed to call init_data() only once during compilation
}
main.rs:
use my_lib::DATA;
call1(&DATA); // use DATA here without calling init_data()
call2(&DATA);
But it turned out that init_data() is still called in the main.rs. What's wrong with this code?
Update: as Ivan C pointed out, lazy_static is not run at compile time. So, what's the right choice for 'pre-loading' the data?

There are two problems here: the choice of type, and performing the allocation.
It is not possible to construct a Vec, a Box, or any other type that requires heap allocation at compile time, because the heap allocator and the heap do not yet exist at that point. Instead, you must use a reference type, which can point to data allocated in the binary rather than in the run-time heap, or an array without any reference (if the data is not too large).
Next, we need a way to perform the computation. Theoretically, the cleanest option is constant evaluation — straightforwardly executing parts of your code at compile time.
static DATA: &'static [u8] = {
// code goes here
};
However, in current stable Rust versions (1.58.1 as I'm writing this), constant evaluation is very limited, because you cannot do anything that looks like dropping a value, or use any function belonging to a trait. It can still do some things, mostly integer arithmetic or constructing other "almost literal" data; for example:
const N: usize = 10;
static FIRST_N_FIBONACCI: &'static [u32; N] = &{
let mut array = [0; N];
array[1] = 1;
let mut i = 2;
while i < array.len() {
array[i] = array[i - 1] + array[i - 2];
i += 1;
}
array
};
fn main() {
dbg!(FIRST_N_FIBONACCI);
}
If your computation cannot be expressed using const evaluation, then you will need to perform it another way:
Procedural macros are effectively compiler plugins, and they can perform arbitrary computation, but their output is generated Rust syntax. So, a procedural macro could produce an array literal with the precomputed data.
The main limitation of procedural macros is that they must be defined in dedicated crates (so if your project is one library crate, it would now be two instead).
Build scripts are ordinary Rust code which can compile or generate files used by the main compilation. They don't interact with the compiler, but are run by Cargo before compilation starts.
(Unlike const evaluation, both build scripts and proc macros can't use any of the types or constants defined within the crate being built itself; they can read the source code, but they run too early to use other items in the crate in their own code.)
In your case, because you want to precompute some [u8] data, I think the simplest approach would be to add a build script which writes the data to a file, after which your normal code can embed this data from the file using include_bytes!.

Related

Rust behavior after move [duplicate]

The Rust language website claims move semantics as one of the features of the language. But I can't see how move semantics is implemented in Rust.
Rust boxes are the only place where move semantics are used.
let x = Box::new(5);
let y: Box<i32> = x; // x is 'moved'
The above Rust code can be written in C++ as
auto x = std::make_unique<int>(5);
auto y = std::move(x); // Note the explicit move
As far as I know (correct me if I'm wrong),
Rust doesn't have constructors at all, let alone move constructors.
No support for rvalue references.
No way to create functions overloads with rvalue parameters.
How does Rust provide move semantics?

I think it's a very common issue when coming from C++. In C++ you are doing everything explicitly when it comes to copying and moving. The language was designed around copying and references. With C++11 the ability to "move" stuff was glued onto that system. Rust on the other hand took a fresh start.
Rust doesn't have constructors at all, let alone move constructors.
You do not need move constructors. Rust moves everything that "does not have a copy constructor", a.k.a. "does not implement the Copy trait".
struct A;
fn test() {
let a = A;
let b = a;
let c = a; // error, a is moved
}
Rust's default constructor is (by convention) simply an associated function called new:
struct A(i32);
impl A {
fn new() -> A {
A(5)
}
}
More complex constructors should have more expressive names. This is the named constructor idiom in C++
No support for rvalue references.
It has always been a requested feature, see RFC issue 998, but most likely you are asking for a different feature: moving stuff to functions:
struct A;
fn move_to(a: A) {
// a is moved into here, you own it now.
}
fn test() {
let a = A;
move_to(a);
let c = a; // error, a is moved
}
No way to create functions overloads with rvalue parameters.
You can do that with traits.
trait Ref {
fn test(&self);
}
trait Move {
fn test(self);
}
struct A;
impl Ref for A {
fn test(&self) {
println!("by ref");
}
}
impl Move for A {
fn test(self) {
println!("by value");
}
}
fn main() {
let a = A;
(&a).test(); // prints "by ref"
a.test(); // prints "by value"
}

Rust's moving and copying semantics are very different from C++. I'm going to take a different approach to explain them than the existing answer.
In C++, copying is an operation that can be arbitrarily complex, due to custom copy constructors. Rust doesn't want custom semantics of simple assignment or argument passing, and so takes a different approach.
First, an assignment or argument passing in Rust is always just a simple memory copy.
let foo = bar; // copies the bytes of bar to the location of foo (might be elided)
function(foo); // copies the bytes of foo to the parameter location (might be elided)
But what if the object controls some resources? Let's say we are dealing with a simple smart pointer, Box.
let b1 = Box::new(42);
let b2 = b1;
At this point, if just the bytes are copied over, wouldn't the destructor (drop in Rust) be called for each object, thus freeing the same pointer twice and causing undefined behavior?
The answer is that Rust moves by default. This means that it copies the bytes to the new location, and the old object is then gone. It is a compile error to access b1 after the second line above. And the destructor is not called for it. The value was moved to b2, and b1 might as well not exist anymore.
This is how move semantics work in Rust. The bytes are copied over, and the old object is gone.
In some discussions about C++'s move semantics, Rust's way was called "destructive move". There have been proposals to add the "move destructor" or something similar to C++ so that it can have the same semantics. But move semantics as they are implemented in C++ don't do this. The old object is left behind, and its destructor is still called. Therefore, you need a move constructor to deal with the custom logic required by the move operation. Moving is just a specialized constructor/assignment operator that is expected to behave in a certain way.
So by default, Rust's assignment moves the object, making the old location invalid. But many types (integers, floating points, shared references) have semantics where copying the bytes is a perfectly valid way of creating a real copy, with no need to ignore the old object. Such types should implement the Copy trait, which can be derived by the compiler automatically.
#[derive(Copy)]
struct JustTwoInts {
one: i32,
two: i32,
}
This signals the compiler that assignment and argument passing do not invalidate the old object:
let j1 = JustTwoInts { one: 1, two: 2 };
let j2 = j1;
println!("Still allowed: {}", j1.one);
Note that trivial copying and the need for destruction are mutually exclusive; a type that is Copy cannot also be Drop.
Now what about when you want to make a copy of something where just copying the bytes isn't enough, e.g. a vector? There is no language feature for this; technically, the type just needs a function that returns a new object that was created the right way. But by convention this is achieved by implementing the Clone trait and its clone function. In fact, the compiler supports automatic derivation of Clone too, where it simply clones every field.
#[Derive(Clone)]
struct JustTwoVecs {
one: Vec<i32>,
two: Vec<i32>,
}
let j1 = JustTwoVecs { one: vec![1], two: vec![2, 2] };
let j2 = j1.clone();
And whenever you derive Copy, you should also derive Clone, because containers like Vec use it internally when they are cloned themselves.
#[derive(Copy, Clone)]
struct JustTwoInts { /* as before */ }
Now, are there any downsides to this? Yes, in fact there is one rather big downside: because moving an object to another memory location is just done by copying bytes, and no custom logic, a type cannot have references into itself. In fact, Rust's lifetime system makes it impossible to construct such types safely.
But in my opinion, the trade-off is worth it.

Rust supports move semantics with features like these:
All types are moveable.
Sending a value somewhere is a move, by default, throughout the language. For non-Copy types, like Vec, the following are all moves in Rust: passing an argument by value, returning a value, assignment, pattern-matching by value.
You don't have std::move in Rust because it's the default. You're really using moves all the time.
Rust knows that moved values must not be used. If you have a value x: String and do channel.send(x), sending the value to another thread, the compiler knows that x has been moved. Trying to use it after the move is a compile-time error, "use of moved value". And you can't move a value if anyone has a reference to it (a dangling pointer).
Rust knows not to call destructors on moved values. Moving a value transfers ownership, including responsibility for cleanup. Types don't have to be able to represent a special "value was moved" state.
Moves are cheap and the performance is predictable. It's basically memcpy. Returning a huge Vec is always fast—you're just copying three words.
The Rust standard library uses and supports moves everywhere. I already mentioned channels, which use move semantics to safely transfer ownership of values across threads. Other nice touches: all types support copy-free std::mem::swap() in Rust; the Into and From standard conversion traits are by-value; Vec and other collections have .drain() and .into_iter() methods so you can smash one data structure, move all the values out of it, and use those values to build a new one.
Rust doesn't have move references, but moves are a powerful and central concept in Rust, providing a lot of the same performance benefits as in C++, and some other benefits as well.

let s = vec!["udon".to_string(), "ramen".to_string(), "soba".to_string()];
this is how it is represented in memory
Then let's assign s to t
let t = s;
this is what happens:
let t = s MOVED the vector’s three header fields from s to t; now t is the owner of the vector. The vector’s elements stayed just
where they were, and nothing happened to the strings either. Every value still has a single owner.
Now s is freed, if I write this
let u = s
I get error: "use of moved value: s"
Rust applies move semantics to almost any use of a value (Except Copy types). Passing
arguments to functions moves ownership to the function’s parameters;
returning a value from a function moves ownership to the caller.
Building a tuple moves the values into the tuple. And so on.
Ref for example:Programming Rust by Jim Blandy, Jason Orendorff, Leonora F. S. Tindall
Primitive types cannot be empty and are fixed size while non primitives can grow and can be empty. since primitive types cannot be empty and are fixed size, therefore assigning memory to store them and handling them are relatively easy. however the handling of non primitives involves the computation of how much memory they will take as they grow and other costly operations.Wwith primitives rust will make a copy, with non primitive rust does a move
fn main(){
// this variable is stored in stack. primitive types are fixed size, we can store them on stack
let x:i32=10;
// s1 is stored in heap. os will assign memory for this. pointer of this memory will be stored inside stack.
// s1 is the owner of memory space in heap which stores "my name"
// if we dont clear this memory, os will have no access to this memory. rust uses ownership to free the memory
let s1=String::from("my name");
// s1 will be cleared from the stack, s2 will be added to the stack poniting the same heap memory location
// making new copy of this string will create extra overhead, so we MOVED the ownership of s1 into s2
let s2=s1;
// s3 is the pointer to s2 which points to heap memory. we Borrowed the ownership
// Borrowing is similar borrowing in real life, you borrow a car from your friend, but its ownership does not change
let s3=&s2;
// this is creating new "my name" in heap and s4 stored as the pointer of this memory location on the heap
let s4=s2.clone()
}
Same principle applies when we pass primitive or non-primitive type arguments to a function:
fn main(){
// since this is primitive stack_function will make copy of it so this will remain unchanged
let stack_num=50;
let mut heap_vec=vec![2,3,4];
// when we pass a stack variable to a function, function will make a copy of that and will use the copy. "move" does not occur here
stack_var_fn(stack_num);
println!("The stack_num inside the main fn did not change:{}",stack_num);
// the owner of heap_vec moved here and when function gets executed, it goes out of scope so the variable will be dropped
// we can pass a reference to reach the value in heap. so we use the pointer of heap_vec
// we use "&"" operator to indicate that we are passing a reference
heap_var_fn(&heap_vec);
println!("the heap_vec inside main is:{:?}",heap_vec);
}
// this fn that we pass an argument stored in stack
fn stack_var_fn(mut var:i32){
// we are changing the arguments value
var=56;
println!("Var inside stack_var_fn is :{}",var);
}
// this fn that we pass an arg that stored in heap
fn heap_var_fn(var:&Vec<i32>){
println!("Var:{:?}",var);
}

I would like to add that it is not necessary for move to memcpy. If the object on the stack is large enough, Rust's compiler may choose to pass the object's pointer instead.

In C++ the default assignment of classes and structs is shallow copy. The values are copied, but not the data referenced by pointers. So modifying one instance changes the referenced data of all copies. The values (f.e. used for administration) remain unchanged in the other instance, likely rendering an inconsistent state. A move semantic avoids this situation. Example for a C++ implementation of a memory managed container with move semantic:
template <typename T>
class object
{
T *p;
public:
object()
{
p=new T;
}
~object()
{
if (p != (T *)0) delete p;
}
template <typename V> //type V is used to allow for conversions between reference and value
object(object<V> &v) //copy constructor with move semantic
{
p = v.p; //move ownership
v.p = (T *)0; //make sure it does not get deleted
}
object &operator=(object<T> &v) //move assignment
{
delete p;
p = v.p;
v.p = (T *)0;
return *this;
}
T &operator*() { return *p; } //reference to object *d
T *operator->() { return p; } //pointer to object data d->
};
Such an object is automatically garbage collected and can be returned from functions to the calling program. It is extremely efficient and does the same as Rust does:
object<somestruct> somefn() //function returning an object
{
object<somestruct> a;
auto b=a; //move semantic; b becomes invalid
return b; //this moves the object to the caller
}
auto c=somefn();
//now c owns the data; memory is freed after leaving the scope

Initialization of a static variable in the scope of a function or a module

Example code:
use std::sync::atomic::{AtomicU32, Ordering};
#[derive(Debug)]
struct Token(u32);
impl Token {
fn new() -> Self {
static COUNTER: AtomicU32 = AtomicU32::new(1);
let inner = COUNTER.fetch_add(1, Ordering::Relaxed);
Token(inner)
}
}
fn main() {
let t1 = Token::new();
let t2 = Token::new();
let t3 = Token::new();
println!("{:?}\n{:?}\n{:?}", t1, t2, t3);
}
When I run the code snippet shown above, it prints:
Token(1)
Token(2)
Token(3)
I found in Rust reference that static items' initialization are evaluated at compile time.
I wonder what exactly happens at runtime when the program executes up to the line that initializes the variable COUNTER. What the compiled code looks like so that it could neglect the initialization?

static variables are handled the same whether they're defined at the module level or at the function level. The only difference is the scope for name resolution.
Many file formats for executables, including ELF and PE, structure programs in various sections. Usually, the code for functions is in one section, mutable global variables are in another section and constants (including string literals) are in yet another section. When the operating system loads a program into memory, these sections are mapped into memory with different memory protection options (e.g. constants are not writeable), as specified in the executable file.
When the documentation says that static items are initialized at compile time, it means that the compiler determines the initial value of that item at compile time, then writes that value into the appropriate section in the compiled binary. When your program is run, the value will already be in memory before your program even gets the chance to run a single instruction.
The compiler is able to evaluate the expression AtomicU32::new(1) because AtomicU32::new is defined as a const fn. Adding const to a function definition means that it can be used in expressions that are evaluated at compile time, but const fns are much more limited than regular functions in what they can do. In the case of AtomicU32::new though, all the function needs to do is initialize the AtomicU32 struct, which is just a wrapper around a u32.

Can I use an iterator in global state in Rust?

I want to use an iterator as global state in Rust. Simplified example:
static nums = (0..).filter(|&n|n%2==0);
Is this possible?

You can do it, but you'll have to fight the language along the way.
First, true Rust statics created with the static declaration need to be compile-time constants. So something like static FOO: usize = 10 will compile, but static BAR: String = "foo".to_string() won't, because BAR requires a run-time allocation. While your iterator doesn't require a run-time allocation (though using it will make your life simpler, as you'll see later), its type is complex enough that it doesn't support compile-time initialization.
Second, Rust statics require specifying the full type up-front. This is a problem for arbitrary iterators, which one would like to create by combining iterator adapters and closures. While in this particular case, as mcarton points out, one could specify the type as Filter<RangeFrom<i32>, fn(&i32) -> bool>, it'd be closely tied to the current implementation. You'd have to change the type as soon as you switch to a different combinator. To avoid the hassle it's better to hide the iterator behind a dyn Iterator reference, i.e. type-erase it by putting it in a Box. Erasing the type involves dynamic dispatch, but so would specifying the filter function through a function pointer.
Third, Rust statics are read-only, and Iterator::next() takes &mut self, as it updates the state of the iteration. Statics must be read-only because Rust is multi-threaded, and writing to a static without proof that there are no readers or other writers would allow a data race in safe code. So to advance your global iterator, you must wrap it in a Mutex, which provides both thread safety and interior mutability.
After the long introduction, let's take a look at the fairly short implementation:
use lazy_static::lazy_static;
use std::sync::Mutex;
lazy_static! {
static ref NUMS: Mutex<Box<dyn Iterator<Item = u32> + Send + Sync>> =
Mutex::new(Box::new((0..).filter(|&n| n % 2 == 0)));
}
lazy_static is used to implement the create-on-first-use idiom to work around the non-const initial value. The first time NUMS is accessed, it will create the iterator.
As explained above, the iterator itself is boxed and wrapped in a Mutex. Since global variables are assumed to be accessed from multiple threads, our boxed iterator implements Send and Sync in addition to Iterator.
The result is used as follows:
fn main() {
assert_eq!(NUMS.lock().unwrap().next(), Some(0)); // take single value
assert_eq!(
// take multiple values
Vec::from_iter(NUMS.lock().unwrap().by_ref().take(5)),
vec![2, 4, 6, 8, 10]
);
}
Playground

No. For multiple reasons:
Iterators types tend to be complicated. This is usually not a problem because iterator types must rarely be named, but statics must be explicitly typed. In this case the type is still relatively simple: core::iter::Filter<core::ops::RangeFrom<i32>, fn(&i32) -> bool>.
Iterator's main method, next, needs a &mut self parameter. statics can't be mutable by default, as this would not be safe.
Iterators can only be iterated once. Therefore it makes little sense to have a global iterator in the first place.
The value to initialize a static must be a constant expression. Your initializer is not a constant expression.

Is it available to drop a variable holding a primitive value in Rust?

Updated Question:
Or I can ask this way: for every type T, if it's Copy, then there is no way for it to be moved, right? I mean is there any way like the std::move in C++ can move a copyable value explicitly?
Original Question:
Presume we have below a piece of Rust code, in this code, I defined a variable x holding an i32 value. What I want to do is to drop its value and invalidate it. I tried to use ptr::drop_in_place to drop it through a pointer, but it doesn't work, why?
fn main() {
let mut x = 10;
use std::ptr;
unsafe {
ptr::drop_in_place(&mut x as *mut i32);
}
println!("{}", x); // x is still accessible here.
}

For every type T, if it's Copy, then there is no way for it to be moved, right?
That is one way to word it. The semantics of Copy are such that any move leaves the original object valid.
Because of this, and that Drop and Copy are mutually exclusive traits, there's no way to "drop" a Copy. The traditional method of calling std::mem::drop(x) won't work. The only meaningful thing you can do is let the variable fall out of scope:
fn main() {
{
let x = 10;
}
println!("{}", x); // x is no longer accessible here.
}
I mean is there any way like the std::move in C++ can move a copyable value explicitly?
The specifics of copying vs moving are quite different between C++ and Rust. All types are moveable in Rust, whereas its opt-in for C++. And moving and copying in Rust are always bitwise copies, there's no room for custom code. Moving in Rust leaves the source object invalid whereas its still useable as a value in C++.
I can go on, but I'll leave off one last bit: moving a primitive in C++ isn't different than a copy either.

Idiomatic way to parse binary data into primitive types

I've written the following method to parse binary data from a gzipped file using GzDecoder from the Flate2 library
fn read_primitive<T: Copy>(reader: &mut GzDecoder<File>) -> std::io::Result<T>
{
let sz = mem::size_of::<T>();
let mut vec = Vec::<u8>::with_capacity(sz);
let ret: T;
unsafe{
vec.set_len(sz);
let mut s = &mut vec[..];
try!(reader.read(&mut s));
let ptr :*const u8 = s.as_ptr();
ret = *(ptr as *const T)
}
Ok(ret)
}
It works, but I'm not particularly happy with the code, especially with using the dummy vector and the temporary variable ptr. It all feels very inelegant to me and I'm sure there's a better way to do this. I'd be happy to hear any suggestions of how to clean up this code.

Your code allows any copyable T, not just primitives. That means you could try to parse in something with a reference, which is probably not what you want:
#[derive(Copy)]
struct Foo(&str);
However, the general sketch of your code is what I'd expect. You need a temporary place to store some data, and then you must convert that data to the appropriate primitive (perhaps dealing with endinaness issues).
I'd recommend the byteorder library. With it, you call specific methods for the primitive that is required:
reader.read_u16::<LittleEndian>()
Since these methods know the desired size, they can stack-allocate an array to use as the temporary buffer, which is likely a bit more efficient than a heap-allocation. Additionally, I'd suggest changing your code to accept a generic object with the Read trait, instead of the specific GzDecoder.
You may also want to look into a serialization library like rustc-serialize or serde to see if they fit any of your use cases.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string