static variable with BufWriter in Rust

static variable with BufWriter in Rust - rust

I would like to get a global variable with BufWriter.
This code is executed without errors, but writing to the file is not carried out:
lazy_static! {
static ref WRITER: Mutex<BufWriter<File>> = {
let file = File::create("test.bin").unwrap();
BufWriter::new(file).into()
}
}
WRITER.lock().unwrap().write_all(&vec![1, 2, 3, 4]).unwrap();

Two things come into play:
A BufWriter absorbes writes in its internal buffer before handing them to the operating system until either
Its buffer is full
It has flush called
It gets Dropped
lazy_static items are never Dropped
So, to make your code work, you must do something like
let mut writer = WRITER.lock().unwrap();
writer.write_all(&vec![1, 2, 3, 4]).unwrap();
writer.flush().unwrap();
Playground (with one minor syntax error fixed)
Alternatively:
You construct the BufWriter in your main function and hand a reference to it through all method calls.
A bit dirty, but you could call flush from a dtor.

Related

allocating data structures while making the borrow checker happy

I'm writing my first rust program and as expected I'm having problems making the borrow checker happy. Here is what I'm trying to do:
I would like to have a function that allocates some array, stores the array in some global data structure, and returns a reference to it. Example:
static mut global_data = ...
fn f() -> &str {
let s = String::new();
global.my_string = s;
return &s;
};
Is there any way to make something like this work? If not, what is "the rust way"(tm) to get an array and a pointer into it?
Alternatively, is there any documentation I could read? The rust book is unfortunately very superficial on most topics.

There are a couple things wrong with your code:
Using global state is very unidiomatic in rust. It can be done in some specific scenarios, but it should never be a go to method. You cold try wrapping your state in Rc or Arc and share it this way in your program. If you also want to mutate this state (as you show in your example) you must to wrap it also in some kind of interior mutability type. So try Rc<RefCell<State>> if you want to use state in only one thread or Arc<Mutex<State>> if you want to use it from multiple different threads.
Accessing mutable static memory is unsafe. So even the following code won't compile:
static mut x: i32 = 0;
// neither of this lines work!
println!("{}", x);
x = 42;
You must use unsafe to access or modify any static mutable variables, because you must de facto prove to the compiler that you assure it that no data races (from accessing this data from different threads) will occur.
I can't be sure, since you didn't show what type is global_data, but I assume, that my_string is a field of type String. When you write
let s = String::new();
global.my_string = s;
You move ownership of that string to the global. You therefore cannot return (or even create) reference to it. You must do this though it's new owner. &global.my_string could work, but not if you do what I written in 1. You could try to return RefMut of MutexGuard, but that is probably not what you want.

Okay, just in case someone else is having the same question, the following code seems to work:
struct foo {
b : Option<Box<u32>>,
}
static mut global : foo = foo { b : None };
fn f<'a>() -> &'a u32 {
let b : Box<u32> = Box::new(5);
unsafe {
global.b = Some(b);
match &global.b {
None => panic!(""),
Some(a) => return &a,
}
}
}
At least it compiles. Hopefully it will also do the right thing when run.
I'm aware that this is not how you are supposed to do things in rust. But I'm currently trying to figure out how to implement various data structures from scratch, and the above is just a reduced example of one of the problems I encountered.

Rust: initialize a static variable/reference in a lib?

I'm new to Rust. I'm trying to create a static variable DATA of Vec<u8> in a library so that it is initialized after the compilation of the lib. I then include the lib in the main code hoping to use DATA directly without calling init_data() again. Here's what I've tried:
my_lib.rs:
use lazy_static::lazy_static;
pub fn init_data() -> Vec<u8> {
// some expensive calculations
}
lazy_static! {
pub static ref DATA: Vec<u8> = init_data(); // supposed to call init_data() only once during compilation
}
main.rs:
use my_lib::DATA;
call1(&DATA); // use DATA here without calling init_data()
call2(&DATA);
But it turned out that init_data() is still called in the main.rs. What's wrong with this code?
Update: as Ivan C pointed out, lazy_static is not run at compile time. So, what's the right choice for 'pre-loading' the data?

There are two problems here: the choice of type, and performing the allocation.
It is not possible to construct a Vec, a Box, or any other type that requires heap allocation at compile time, because the heap allocator and the heap do not yet exist at that point. Instead, you must use a reference type, which can point to data allocated in the binary rather than in the run-time heap, or an array without any reference (if the data is not too large).
Next, we need a way to perform the computation. Theoretically, the cleanest option is constant evaluation — straightforwardly executing parts of your code at compile time.
static DATA: &'static [u8] = {
// code goes here
};
However, in current stable Rust versions (1.58.1 as I'm writing this), constant evaluation is very limited, because you cannot do anything that looks like dropping a value, or use any function belonging to a trait. It can still do some things, mostly integer arithmetic or constructing other "almost literal" data; for example:
const N: usize = 10;
static FIRST_N_FIBONACCI: &'static [u32; N] = &{
let mut array = [0; N];
array[1] = 1;
let mut i = 2;
while i < array.len() {
array[i] = array[i - 1] + array[i - 2];
i += 1;
}
array
};
fn main() {
dbg!(FIRST_N_FIBONACCI);
}
If your computation cannot be expressed using const evaluation, then you will need to perform it another way:
Procedural macros are effectively compiler plugins, and they can perform arbitrary computation, but their output is generated Rust syntax. So, a procedural macro could produce an array literal with the precomputed data.
The main limitation of procedural macros is that they must be defined in dedicated crates (so if your project is one library crate, it would now be two instead).
Build scripts are ordinary Rust code which can compile or generate files used by the main compilation. They don't interact with the compiler, but are run by Cargo before compilation starts.
(Unlike const evaluation, both build scripts and proc macros can't use any of the types or constants defined within the crate being built itself; they can read the source code, but they run too early to use other items in the crate in their own code.)
In your case, because you want to precompute some [u8] data, I think the simplest approach would be to add a build script which writes the data to a file, after which your normal code can embed this data from the file using include_bytes!.

How can I mutate a shared variable from multiple threads, disregarding data races?

How can I mutate the variable i inside the closure? Race conditions are considered to be acceptable.
use rayon::prelude::*;
fn main() {
let mut i = 0;
let mut closure = |_| {
i = i + 1;
};
(0..100).into_par_iter().for_each(closure);
}
This code fails with:
error[E0525]: expected a closure that implements the `Fn` trait, but this closure only implements `FnMut`
--> src\main.rs:6:23
|
6 | let mut closure = |_| {
| ^^^ this closure implements `FnMut`, not `Fn`
7 | i = i + 1;
| - closure is `FnMut` because it mutates the variable `i` here
...
10 | (0..100).into_par_iter().for_each(closure);
| -------- the requirement to implement `Fn` derives from here

There is a difference between a race condition and a data race.
A race condition is any situation when the outcome of two or more events depends on which one happens first, and nothing enforces a relative ordering between them. This can be fine, and as long as all possible orderings are acceptable, you may accept that your code has a race in it.
A data race is a specific kind of race condition where the events are unsynchronized accesses to the same memory and at least one of them is a mutation. Data races are undefined behavior. You cannot "accept" a data race because its existence invalidates the entire program; a program with an unavoidable data race in it does not have any defined behavior at all, so it does nothing useful.
Here's a version of your code that has a race condition, but not a data race:
use std::sync::atomic::{AtomicI32, Ordering};
let i = AtomicI32::new(0);
let closure = |_| {
i.store(i.load(Ordering::Relaxed) + 1, Ordering::Relaxed);
};
(0..100).into_par_iter().for_each(closure);
Because the loads and stores are not ordered with respect to the concurrently executing threads, there is no guarantee that the final value of i will be exactly 100. It could be 99, or 72, or 41, or even 1. This code has indeterminate, but defined behavior because although you don't know the exact order of events or the final outcome, you can still reason about its behavior. In this case, you can prove that the final value of i must be at least 1 and no greater than 100.
Note that in order to write this racy code, I still had to use AtomicI32 and atomic load and store. Not caring about the order of events in different threads doesn't free you from having to think about synchronizing memory access.
If your original code compiled, it would have a data race.¹ This means there are no guarantees about its behavior at all. So, assuming you actually accept data races, here's a version of your code that is consistent with what a compiler is allowed to do with it:
fn main() {}
Oh, right, undefined behavior must never occur. So this hypothetical compiler just deleted all your code because it is never allowed to run in the first place.
It's actually even worse than that. Suppose you had written something like this:
fn main() {
let mut i = 0;
let mut closure = |_| {
i = i + 1;
};
(0..100).into_par_iter().for_each(closure);
if i < 100 || i >= 100 {
println!("this should always print");
} else {
println!("this should never print");
}
}
What should this code print? If there are no data races, this code must emit the following:
this should always print
But if we allow data races, it might also print this:
this should never print
Or it could even print this:
this should never print
this should always print
If you think there is no way it could do the last thing, you are wrong. Undefined behavior in a program cannot be accepted, because it invalidates analysis even of correct code that has nothing obvious to do with the original error.
How likely is any of this to happen, if you just use unsafe and ignore the possibility of a data race? Well, probably not very likely, to be honest. If you use unsafe to bypass the checks and look at the generated assembly, it's likely to even be correct. But the only way to be sure is to write in assembly language directly, understand and code to the machine model: if you want to use Rust, you have to code to Rust's model, even if that means you lose a little performance.
How much performance? Probably not much, if anything. Atomic operations are very efficient and on many architectures, including the one you're probably using right now to read this, they actually are exactly as fast as non-atomic operations in cases like this. If you really want to know how much potential performance you lose, write both versions and benchmark them, or simply compare the assembly code with and without atomic operations.
¹ Technically, we can't say that a data race must occur, because it depends on whether any threads actually access i at the same time or not. If for_each decided for some reason to run all the closures on the same OS thread, for example, this code would not have a data race. But the fact that it may have a data race still poisons our analysis because we can't be sure it doesn't.

You cannot do that exactly, you need to ensure that some safe synchronisation happens in the under-layers for example. For example using an Arc + some kind of atomics operations.
You have some examples in the documentation:
use std::sync::Arc;
use std::sync::atomic::{AtomicUsize, Ordering};
use std::thread;
let val = Arc::new(AtomicUsize::new(5));
for _ in 0..10 {
let val = Arc::clone(&val);
thread::spawn(move || {
let v = val.fetch_add(1, Ordering::SeqCst);
println!("{:?}", v);
});
}
Playground
(as Adien4 points: there is no need of the Arc or the move in the second example- Rayon only requires the Fn to be Send + Sync)
Which lead us to your example, that could be adapted as:
use std::sync::Arc;
use std::sync::atomic::{AtomicUsize, Ordering};
use rayon::prelude::*;
fn main() {
let i = AtomicUsize::new(5);
let mut closure = |_| {
i.fetch_add(1, Ordering::SeqCst);
};
(0..100).into_par_iter().for_each(closure);
}
Playground

This is not possible as it would require parallel access to i which causes race conditions. You can try to use a Mutex to allow access from multiple threads.

The accepted answer explains the situation thoroughly - you definitely don't want data races in your code, because they are undefined behavior, and distinct from the more general "race conditions". Nor do you need data races to update shared data, there are better efficient ways to do that. But to satisfy curiosity, this answer attempts to answer the question as literally asked - if you were reckless enough to intentionally ignore data races and incur undefined behavior at your own peril, could you do it in unsafe Rust?
You indeed can. Code and discussion in this answer is provided for educational purposes, such as to check what kind of code the compiler generates. If code that intentionally incurs UB offends you, please stop reading here. You've been warned. :)
The obvious way to convince Rust to allow this data race is to create a raw pointer to mut i, send the pointer to the closure, and dereference it to mutate i. This dereference is unsafe because it leaves it to the programmer to ensure that no mutable references exist simultaneously, and that writes to the underlying data are synchronized with other accesses to it. While we can easily ensure the former by just not creating a reference, we obviously won't ensure the latter:
// Must wrap raw pointer in type that implements Sync.
struct Wrap(*mut i32);
unsafe impl Sync for Wrap {}
// Contains undefined behavior - don't use this!
fn main() {
let mut i = 0;
let i_ptr = Wrap(&mut i as *mut i32);
let closure = |_| {
unsafe { *i_ptr.0 = *i_ptr.0 + 1 }; // XXX: UB!
};
(0..100).into_par_iter().for_each(closure);
println!("{}", i);
}
Playground
Note that pointers don't implement Sync or Send, so they require a wrapper to use them in threads. The wrapper unsafely implements Sync, but this unsafe is actually not UB - accessing to the pointer is safe, and there would be no UB if we, say, only printed it, or even dereferenced it for reading (as long as no one else writes to i). Writing to the dereferenced pointer is where we create UB, and that itself requires unsafe.
While this is the kind of code that the OP might have been after (it even prints 100 when run), it's of course still undefined behavior, and could break on a different hardware, or when upgraded to a different compiler. Making even a slight change to the code, such as using let i_ref = unsafe { &mut *i_ptr } to create a mutable reference and update it with *i_ref += 1 will make it change behavior.
In the context of C++11 Hans Boehm wrote an entire article on the danger of so-called "benign" data races, and why they cannot be allowed in the C++ memory model (which Rust shares).

Unable to join threads from JoinHandles stored in a Vector - Rust

I am writing a program which scrapes data from a list of websites and stores it into a struct called Listing which is then collected into a final struct called Listings.
use std::{ thread,
sync::{ Arc, Mutex }
};
fn main() {
// ... some declarations
let sites_count = site_list.len(); // site_list is a vector containing the list of websites
// The variable to be updated by the thread instances ( `Listing` is a struct holding the information )
let listings: Arc<Mutex<Vec<Vec<types::Listing<String>>>>> = Arc::new(Mutex::new(Vec::new()));
// A vector containing all the JoinHandles for the spawned threads
let mut fetch_handle: Vec<thread::JoinHandle<()>> = Vec::new();
// Spawn a thread for each concurrent website
for i in 0..sites_count {
let slist = Arc::clone(&site_list);
let listng = Arc::clone(&listings);
fetch_handle.push(
thread::spawn(move || {
println!("⌛ Spawned Thread: {}",i);
let site_profile = read_profile(&slist[i]);
let results = function1(function(2)) // A long list of functions from a submodule that make the http request and parse the data into `Listing`
listng.lock().unwrap().push(results);
}));
}
for thread in fetch_handle.iter_mut() {
thread.join().unwrap();
}
// This is the one line version of the above for loop - yields the same error.
// fetch_handle.iter().map(|thread| thread.join().unwrap());
// The final println to just test feed the target struct `Listings` with the values
println!("{}",types::Listings{ date_time: format!("{}", chrono::offset::Local::now()),
category: category.to_string(),
query: (&search_query).to_string(),
listings: listings.lock().unwrap() // It prevents me from owning this variable
}.to_json());
}
To which I stumble upon the error
error[E0507]: cannot move out of `*thread` which is behind a mutable reference
--> src/main.rs:112:9
|
112 | thread.join().unwrap();
| ^^^^^^ move occurs because `*thread` has type `JoinHandle<()>`, which does not implement the `Copy` trait
It prevents me from owning the variable after the thread.join() for loop.
When I tried assigning to check the output type
let all_listings = listings.lock().unwrap()
all_listings reports a type of MutexGuard(which is also true inside the thread for loop, but it allows me to call vector methods on it) and wouldn't allow me to own the data.
I changed the data type in the Listings struct to hold a reference instead of owning it. But it seems so the operations I perform on the struct in .to_json() require me to own its value.
The type declaration for listings inside the Listings Struct is Vec<Vec<Listing<T>>.
This code however works just fine when I move the .join().unwrap() to the end of thread::spawn() block or apply to its handle inside the for loop(whilst disabling the external .join() ). But that makes all the threads execute in a chain which is not desirable, since the main intention of using threads was to execute same functions with different data values simultaneously.
I am quite new to Rust in general(been 3 weeks since I am using it) and its my first time ever implementing Multithreading. I have only ever written single threaded programs in java and python before this, so if possible be a little noob friendly. However any help is appreciated :) .

I figured out what needed to happen. First, for this kind of thing, I agree that into_iter does what you want, but it IMO it obscures why. The why is that when you borrow on it, it doesn't own the value, which is necessary for the join() method on the JoinHandle<()> struct. You'll note its signature takes self and not &mut self or anything like that. So it needs the real object there.
To do that, you need to get your object out of the Vec<thread::JoinHandle<()>> that it's inside. As stated, into_iter does this, because it "destroys" the existing Vec and takes it over, so it fully owns the contents, and the iteration returns the "actual" objects to be joined without a copy. But you can also own the contents one at a time with remove as demonstrated below:
while fetch_handle.len() > 0 {
let cur_thread = fetch_handle.remove(0); // moves it into cur_thread
cur_thread.join().unwrap();
}
This is instead of your for loop above. The complete example in the playground is linked if you want to try that.
I hope this is clearer on how to work with things that can't be copied, but methods need to fully own them, and the issues in getting them out of collections. Imagine if you needed to end just one of those threads, and you knew which one to end, but didn't want to end them all? Vec<_>::remove would work, but into_iter would not.
Thank you for asking a question which made me think, and prompted me to go look up the answer (and try it) myself. I'm still learning Rust as well, so this helped a lot.
Edit:
Another way to do it with pop() and while let:
while let Some(cur_thread) = fetch_handle.pop() {
cur_thread.join().unwrap();
}
This goes through it from the end (pop pulls it off of the end, not the front), but doesn't reallocate or move the vector contents via pulling it off the front either.

Okay so the problem as pointed out by #PiRocks seems to be in the for loop that joins the threads.
for thread in fetch_handle.iter_mut() {
thread.join().unwrap();
}
The problem is the iter_mut(). Using into_iter() instead
for thread in fetch_handle.into_iter() {
thread.join().unwrap();
}
yields no errors and the program runs across the threads simultaneously as required.
The explanation to this, as given by #Kevin Anderson is:
Using into_iter() causes JoinHandle<()> to move into the for loop.
Also looking into the docs(std::iter)
I found that iter() and iter_mut() iterate over a reference of self whereas into_iter() iterates over self directly(owning it).
So iter_mut() was iterating over &mut thread::JoinHandle<()> instead of thread::JoinHandle<()>.

Should I pass a mutable reference or transfer ownership of a variable in the context of FFI?

I have a program that utilizes a Windows API via a C FFI (via winapi-rs). One of the functions expects a pointer to a pointer to a string as an output parameter. The function will store its result into this string. I'm using a variable of type WideCString for this string.
Can I "just" pass in a mutable ref to a ref to a string into this function (inside an unsafe block) or should I rather use a functionality like .into_raw() and .from_raw() that also moves the ownership of the variable to the C function?
Both versions compile and work but I'm wondering whether I'm buying any disadvantages with the direct way.
Here are the relevant lines from my code utilizing .into_raw and .from_raw.
let mut widestr: WideCString = WideCString::from_str("test").unwrap(); //this is the string where the result should be stored
let mut security_descriptor_ptr: winnt::LPWSTR = widestr.into_raw();
let rtrn3 = unsafe {
advapi32::ConvertSecurityDescriptorToStringSecurityDescriptorW(sd_buffer.as_mut_ptr() as *mut std::os::raw::c_void,
1,
winnt::DACL_SECURITY_INFORMATION,
&mut security_descriptor_ptr,
ptr::null_mut())
};
if rtrn3 == 0 {
match IOError::last_os_error().raw_os_error() {
Some(1008) => println!("Need to fix this errror in get_acl_of_file."), // Do nothing. No idea, why this error occurs
Some(e) => panic!("Unknown OS error in get_acl_of_file {}", e),
None => panic!("That should not happen in get_acl_of_file!"),
}
}
let mut rtr: WideCString = unsafe{WideCString::from_raw(security_descriptor_ptr)};
The description of this parameter in MSDN says:
A pointer to a variable that receives a pointer to a null-terminated security descriptor string. For a description of the string format, see Security Descriptor String Format. To free the returned buffer, call the LocalFree function.
I am expecting the function to change the value of the variable. Doesn't that - per definition - mean that I'm moving ownership?

I am expecting the function to change the value of the variable. Doesn't that - per definition - mean that I'm moving ownership?
No. One key way to think about ownership is: who is responsible for destroying the value when you are done with it.
Competent C APIs (and Microsoft generally falls into this category) document expected ownership rules, although sometimes the words are oblique or assume some level of outside knowledge. This particular function says:
To free the returned buffer, call the LocalFree function.
That means that the ConvertSecurityDescriptorToStringSecurityDescriptorW is going to perform some kind of allocation and return that to the user. Checking out the function declaration, you can also see that they document that parameter as being an "out" parameter:
_Out_ LPTSTR *StringSecurityDescriptor,
Why is it done this way? Because the caller doesn't know how much memory to allocate to store the string 1!
Normally, you'd pass a reference to uninitialized memory into the function which must then initialize it for you.
This compiles, but you didn't provide enough context to actually call it, so who knows if it works:
extern crate advapi32;
extern crate winapi;
extern crate widestring;
use std::{mem, ptr, io};
use winapi::{winnt, PSECURITY_DESCRIPTOR};
use widestring::WideCString;
fn foo(sd_buffer: PSECURITY_DESCRIPTOR) -> WideCString {
let mut security_descriptor = unsafe { mem::uninitialized() };
let retval = unsafe {
advapi32::ConvertSecurityDescriptorToStringSecurityDescriptorW(
sd_buffer,
1,
winnt::DACL_SECURITY_INFORMATION,
&mut security_descriptor,
ptr::null_mut()
)
};
if retval == 0 {
match io::Error::last_os_error().raw_os_error() {
Some(1008) => println!("Need to fix this errror in get_acl_of_file."), // Do nothing. No idea, why this error occurs
Some(e) => panic!("Unknown OS error in get_acl_of_file {}", e),
None => panic!("That should not happen in get_acl_of_file!"),
}
}
unsafe { WideCString::from_raw(security_descriptor) }
}
fn main() {
let x = foo(ptr::null_mut());
println!("{:?}", x);
}
[dependencies]
winapi = { git = "https://github.com/nils-tekampe/winapi-rs/", rev = "1bb62e2c22d0f5833cfa9eec1db2c9cfc2a4a303" }
advapi32-sys = { git = "https://github.com/nils-tekampe/winapi-rs/", rev = "1bb62e2c22d0f5833cfa9eec1db2c9cfc2a4a303" }
widestring = "*"
Answering your questions directly:
Can I "just" pass in a mutable ref to a ref to a string into this function (inside an unsafe block) or should I rather use a functionality like .into_raw() and .from_raw() that also moves the ownership of the variable to the C function?
Neither. The function doesn't expect you to pass it a pointer to a string, it wants you to pass a pointer to a place where it can put a string.
I also just realized after your explanation that (as far as I understood it) in my example, the widestr variable never gets overwritten by the C function. It overwrites the reference to it but not the data itself.
It's very likely that the memory allocated by WideCString::from_str("test") is completely leaked, as nothing has a reference to that pointer after the function call.
Is this a general rule that a C (WinAPI) function will always allocate the buffer by itself (if not following the two step approach where it first returns the size)?
I don't believe there are any general rules between C APIs or even inside of a C API. Especially at a company as big as Microsoft with so much API surface. You need to read the documentation for each method. This is part of the constant drag that can make writing C feel like a slog.
it somehow feels odd for me to hand over uninitialized memory to such a function.
Yep, because there's not really a guarantee that the function initializes it. In fact, it would be wasteful to initialize it in case of failure, so it probably doesn't. It's another thing that Rust seems to have nicer solutions for.
Note that you shouldn't do function calls (e.g. println!) before calling things like last_os_error; those function calls might change the value of the last error!
1 Other Windows APIs actually require a multistep process - you call the function with NULL, it returns the number of bytes you need to allocate, then you call it again

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

static variable with BufWriter in Rust - rust

Related

allocating data structures while making the borrow checker happy

Rust: initialize a static variable/reference in a lib?

How can I mutate a shared variable from multiple threads, disregarding data races?

Unable to join threads from JoinHandles stored in a Vector - Rust

Should I pass a mutable reference or transfer ownership of a variable in the context of FFI?

Categories

Resources