How to add state to a function in Rust - rust

Rust have anonymous closures with state. Can I do the same with named function?
(invalid pseudocode)
fn counting_function()->i32 {
let mut static counter = 0;
counter = counter + 1;
return counter.clone();
}
I understand I can use structs and functions/traits to do this. And I understand that iterators are the proper way to do it. But leaving aside structs with traits and iterators, can I do this without passing the any burden (of initializing structure) to caller?

This is a thread safe variant using an atomic:
use std::sync::atomic::{AtomicUsize, Ordering};
fn counting_function() -> usize {
static COUNTER: AtomicUsize = AtomicUsize::new(0);
let result = COUNTER.fetch_add(1, Ordering::Relaxed);
result
}
But it's actually a code smell I'd say.

Your pseudocode almost works as is. To work with the static mut variable, you'll need to mark the accessing and modifying parts of your code as unsafe as these operations are not threadsafe.
fn counting_function() -> u32 {
static mut counter: u32 = 0;
let retval = unsafe { counter };
unsafe {
counter += 1;
}
retval
}

Related

How to use a macro to generate compile-time unique integers?

I need several parts of a program, in different modules, to have a unique integer.
for example:
pub fn foo() -> u64 {
unique_integer!()
}
pub fn bar() -> u64 {
unique_integer!()
}
(foo() should never return the same as bar(), but the values themselves are meaningless and do not need to be stable across builds. All invocations of foo() must return the same values, as must all invocations to bar(). It is preferred, but not essential, that the values are contiguous.)
Is there a way of using a macro to do this?
You could compute a compile-time hash using the module path (which contains the crate and modules leading up to the file), the file name, column and line number of the macro invocation like this:
pub const fn hash(module_path: &'static str, file: &'static str, line: u32, column: u32) -> u64 {
let mut hash = 0xcbf29ce484222325;
let prime = 0x00000100000001B3;
let mut bytes = module_path.as_bytes();
let mut i = 0;
while i < bytes.len() {
hash ^= bytes[i] as u64;
hash = hash.wrapping_mul(prime);
i += 1;
}
bytes = file.as_bytes();
i = 0;
while i < bytes.len() {
hash ^= bytes[i] as u64;
hash = hash.wrapping_mul(prime);
i += 1;
}
hash ^= line as u64;
hash = hash.wrapping_mul(prime);
hash ^= column as u64;
hash = hash.wrapping_mul(prime);
hash
}
macro_rules! unique_number {
() => {{
const UNIQ: u64 = crate::hash(module_path!(), file!(), line!(), column!());
UNIQ
}};
}
fn foo() -> u64 {
unique_number!()
}
fn bar() -> u64 {
unique_number!()
}
fn main() {
println!("{} {}", foo(), bar()); // 2098219922142993841 2094402417770602149 on the playground
}
(playground)
This has the benefit of consistent results, when compared to the top answer that can return different values depending on the order of invocation, and this is also entirely computed in compile time, which remove the runtime overhead of maintaining a counter.
The only downside to this is that you could get hash value collisions. But the chance is low. If you want, you could try implementing an algorithm that computes perfect hash values. The example shown uses the FNV algorithm which should be decent but not perfect.
Not exactly a macro but anyway it's a proposition:
#[repr(u64)]
enum Unique {
Foo,
Bar,
}
pub fn foo() -> u64 {
Unique::Foo as u64
}
pub fn bar() -> u64 {
Unique::Bar as u64
}
Compiler should warn you if you don't use a variant.
No, you can not use a regular macro for this. However, you might be able to find a procedural macro crate which might give this functionality.
That being said...
This does not count as safe rust, but if we are okay with throwing safety out the window then this should do the trick.
macro_rules! unique_u64 {
() => {{
struct PlaceHolder;
let id = ::std::any::TypeId::of::<PlaceHolder>();
unsafe { ::std::mem::transmute::<_, u64>(id) }
}};
}
This is probably undefined behavior, but since we know that every type should have a unique TypeId it would have the desired effect. The only reason I know that this is even possible is because I have looked at the structure of TypeId and know it contains a single u64 to distinguish types. However, there are currently plans to change TypeId from being a u64 to something more stable and less prone to this kind of unsafe code. We have no guarantees on what the contents of TypeId might change to and when it does change it might silently fail if it still has the same size as a u64.
Alternatively,
We can achieve a similar result in safe rust by hashing the TypeId. Now, it slightly breaks the rules since we do not have any guarantee that it will always produce a unique result. However, it seems highly unlikely that 2 different TypeIds would hash to the same value. Plus this stays within safe rust and is unlikely to break for future releases of Rust.
macro_rules! unique_u64 {
() => {{
use ::std::hash::{Hash, Hasher};
struct PlaceHolder;
let id = ::std::any::TypeId::of::<PlaceHolder>();
let mut hasher = ::std::collections::hash_map::DefaultHasher::new();
id.hash(&mut hasher);
hasher.finish()
}};
}
It's possible to do something like this with once_cell, using a static atomic variable as a counter:
use core::sync::atomic::{Ordering, AtomicU64};
use once_cell::sync::Lazy;
static COUNTER: AtomicU64 = AtomicU64::new(0);
fn foo() -> u64 {
static LOCAL_COUNTER: Lazy<u64> = Lazy::new(|| COUNTER.fetch_add(1, Ordering::Relaxed));
*LOCAL_COUNTER
}
fn bar() -> u64 {
static LOCAL_COUNTER: Lazy<u64> = Lazy::new(|| COUNTER.fetch_add(1, Ordering::Relaxed));
*LOCAL_COUNTER
}
fn main() {
dbg!(foo()); // 0
dbg!(foo()); // still 0
dbg!(bar()); // 1
dbg!(foo()); // unchanged - 0
dbg!(bar()); // unchanged - 1
}
Playground
And, yes, the repeating code can be, as usual, wrapped in macro:
macro_rules! unique_integer {
() => {{
static LOCAL_COUNTER: Lazy<u64> = Lazy::new(|| COUNTER.fetch_add(1, Ordering::Relaxed));
*LOCAL_COUNTER
}}
}
fn foo() -> u64 {
unique_integer!()
}
fn bar() -> u64 {
unique_integer!()
}

How do I remove `MutexGuard` around a value?

I'm trying to use ndarray as an asynchronous process to do linear algebra and such.
I used Rust's tokio and ndarray to create the following code.
use std::sync::{Arc, Mutex};
use ndarray::prelude::*;
use futures::future::join_all;
fn print_type_of<T>(_: &T) {
println!("{}", std::any::type_name::<T>())
}
#[tokio::main]
async fn main() {
let db = Arc::new(Mutex::new(array![0,0,0,0,0,0,0,0]));
let mut handels = vec![];
for i in 0..8 {
let db = db.clone();
let unchange_array = unchange_array.clone();
handels.push(tokio::spawn(async move{
print(i, db).await;
}));
}
join_all(handels).await;
let array = Arc::try_unwrap(db).unwrap();
let array = array.lock().unwrap();
print_type_of(&array); // -> std::sync::mutex::MutexGuard<ndarray::ArrayBase<ndarray::data_repr::OwnedRepr<u32>, ndarray::dimension::dim::Dim<[usize; 1]>>>
}
async fn print(i: u32, db: Arc<Mutex<Array1<u32>>>) {
let unchange = unchange.to_owned();
let mut tmp = 0;
// time-consuming process
for k in 0..100000000 {
tmp = k;
}
tmp += i;
let mut db = db.lock().unwrap();
db.fill(i);
println!("{:?}", unchange);
print_type_of(&db);
}
I would like to change the data std::sync::mutex::MutexGuard<ndarray::ArrayBase<OwnedRepr<u32>, Dim<[usize; 1]>>>
to ndarray::ArrayBase<OwnedRepr<u32>, Dim<[usize; 1]>>.
How can I do this?
You can't. That's the whole point of MutexGuard: if you could take the data out of the MutexGuard, then you would be able to make a reference that can be accessed without locking the mutex, defeating the whole purpose of having a mutex in the first place.
Depending on what you really want to do, one of the following solutions might apply to you:
Most of the time, you don't need to take the data out of the mutex: MutexGuard<T> implements Deref<Target=T> and DerefMut<Target=T>, so you can use the MutexGuard everywhere you would use a &T or a &mut T. Note that if you change your code to call print_type_of(&*array) instead of print_type_of(&array), it will print the inner type.
If you really need to, you can take the data out of the Mutex itself (but not the MutexGuard) with into_inner, which consumes the mutex, ensuring that no one else can ever access it:
let array = Arc::try_unwrap(db).unwrap();
let array = array.into_inner().unwrap();
print_type_of(&array); // -> ndarray::ArrayBase<ndarray::data_repr::OwnedRepr<u32>, ndarray::dimension::dim::Dim<[usize; 1]>>

Thread-safe mutable non-owning pointer in Rust?

I'm trying to parallelize an algorithm I have. This is a sketch of how I would write it in C++:
void thread_func(std::vector<int>& results, int threadid) {
results[threadid] = threadid;
}
std::vector<int> foo() {
std::vector<int> results(4);
for(int i = 0; i < 4; i++)
{
spawn_thread(thread_func, results, i);
}
join_threads();
return results;
}
The point here is that each thread has a reference to a shared, mutable object that it does not own. It seems like this is difficult to do in Rust. Should I try to cobble it together in terms of (and I'm guessing here) Mutex, Cell and &mut, or is there a better pattern I should follow?
The proper way is to use Arc<Mutex<...>> or, for example, Arc<RWLock<...>>. Arc is a shared ownership-based concurrency-safe pointer to immutable data, and Mutex/RWLock introduce synchronized internal mutability. Your code then would look like this:
use std::sync::{Arc, Mutex};
use std::thread;
fn thread_func(results: Arc<Mutex<Vec<i32>>>, thread_id: i32) {
let mut results = results.lock().unwrap();
results[thread_id as usize] = thread_id;
}
fn foo() -> Arc<Mutex<Vec<i32>>> {
let results = Arc::new(Mutex::new(vec![0; 4]));
let guards: Vec<_> = (0..4).map(|i| {
let results = results.clone();
thread::spawn(move || thread_func(results, i))
}).collect();
for guard in guards {
guard.join();
}
results
}
This unfortunately requires you to return Arc<Mutex<Vec<i32>>> from the function because there is no way to "unwrap" the value. An alternative is to clone the vector before returning.
However, using a crate like scoped_threadpool (whose approach could only be recently made sound; something like it will probably make into the standard library instead of the now deprecated thread::scoped() function, which is unsafe) it can be done in a much nicer way:
extern crate scoped_threadpool;
use scoped_threadpool::Pool;
fn thread_func(result: &mut i32, thread_id: i32) {
*result = thread_id;
}
fn foo() -> Vec<i32> {
let results = vec![0; 4];
let mut pool = Pool::new(4);
pool.scoped(|scope| {
for (i, e) in results.iter_mut().enumerate() {
scope.execute(move || thread_func(e, i as i32));
}
});
results
}
If your thread_func needs to access the whole vector, however, you can't get away without synchronization, so you would need a Mutex, and you would still get the unwrapping problem:
extern crate scoped_threadpool;
use std::sync::Mutex;
use scoped_threadpool::Pool;
fn thread_func(results: &Mutex<Vec<u32>>, thread_id: i32) {
let mut results = results.lock().unwrap();
result[thread_id as usize] = thread_id;
}
fn foo() -> Vec<i32> {
let results = Mutex::new(vec![0; 4]);
let mut pool = Pool::new(4);
pool.scoped(|scope| {
for i in 0..4 {
scope.execute(move || thread_func(&results, i));
}
});
results.lock().unwrap().clone()
}
But at least you don't need any Arcs here. Also execute() method is unsafe if you use stable compiler because it does not have a corresponding fix to make it safe. It is safe on all compiler versions greater than 1.4.0, according to its build script.

More convenient way to work with strings in winapi calls

I'm looking for more convenient way to work with std::String in winapi calls in Rust.
Using rust v 0.12.0-nigtly with winapi 0.1.22 and user32-sys 0.1.1
Now I'm using something like this:
use winapi;
use user32;
pub fn get_window_title(handle: i32) -> String {
let mut v: Vec<u16> = Vec::new();
v.reserve(255);
let mut p = v.as_mut_ptr();
let len = v.len();
let cap = v.capacity();
let mut read_len = 0;
unsafe {
mem::forget(v);
read_len = unsafe { user32::GetWindowTextW(handle as winapi::HWND, p, 255) };
if read_len > 0 {
return String::from_utf16_lossy(Vec::from_raw_parts(p, read_len as usize, cap).as_slice());
} else {
return "".to_string();
}
}
}
I think, that this vector based memory allocation is rather bizarre. So I'm looking for more easier way to cast LPCWSTR to std::String
In your situation, you always want a maximum of 255 bytes, so you can use an array instead of a vector. This reduces the entire boilerplate to a mem::uninitialized() call, an as_mut_ptr() call and a slicing operation.
unsafe {
let mut v: [u16; 255] = mem::uninitialized();
let read_len = user32::GetWindowTextW(
handle as winapi::HWND,
v.as_mut_ptr(),
255,
);
String::from_utf16_lossy(&v[0..read_len])
}
In case you wanted to use a Vec, there's an easier way than to destroy the vec and re-create it. You can write to the Vec's content directly and let Rust handle everything else.
let mut v: Vec<u16> = Vec::with_capacity(255);
unsafe {
let read_len = user32::GetWindowTextW(
handle as winapi::HWND,
v.as_mut_ptr(),
v.capacity(),
);
v.set_len(read_len); // this is undefined behavior if read_len > v.capacity()
String::from_utf16_lossy(&v)
}
As a side-note, it is idiomatic in Rust to not use return on the last statement in a function, but to simply let the expression stand there without a semicolon. In your original code, the final if-expression could be written as
if read_len > 0 {
String::from_utf16_lossy(Vec::from_raw_parts(p, read_len as usize, cap).as_slice())
} else {
"".to_string()
}
but I removed the entire condition from my samples, as it is unnecessary to handle 0 read characters differently from n characters.

Lazy sequence generation in Rust

How can I create what other languages call a lazy sequence or a "generator" function?
In Python, I can use yield as in the following example (from Python's docs) to lazily generate a sequence that is iterable in a way that does not use the memory of an intermediary list:
# a generator that yields items instead of returning a list
def firstn(n):
num = 0
while num < n:
yield num
num += 1
sum_of_first_n = sum(firstn(1000000))
How can I do something similar in Rust?
Rust does have generators, but they are highly experimental and not currently available in stable Rust.
Works in stable Rust 1.0 and above
Range handles your concrete example. You can use it with the syntactical sugar of ..:
fn main() {
let sum: u64 = (0..1_000_000).sum();
println!("{}", sum)
}
What if Range didn't exist? We can create an iterator that models it!
struct MyRange {
start: u64,
end: u64,
}
impl MyRange {
fn new(start: u64, end: u64) -> MyRange {
MyRange {
start: start,
end: end,
}
}
}
impl Iterator for MyRange {
type Item = u64;
fn next(&mut self) -> Option<u64> {
if self.start == self.end {
None
} else {
let result = Some(self.start);
self.start += 1;
result
}
}
}
fn main() {
let sum: u64 = MyRange::new(0, 1_000_000).sum();
println!("{}", sum)
}
The guts are the same, but more explicit than the Python version. Notably, Python's generators keep track of the state for you. Rust prefers explicitness, so we have to create our own state and update it manually. The important part is the implementation of the Iterator trait. We specify that the iterator yields values of a specific type (type Item = u64) and then deal with stepping each iteration and how to tell we have reached the end of iteration.
This example is not as powerful as the real Range, which uses generics, but shows an example of how to go about it.
Works in nightly Rust
Nightly Rust does have generators, but they are highly experimental. You need to bring in a few unstable features to create one. However, it looks pretty close to the Python example, with some Rust-specific additions:
// 1.43.0-nightly (2020-02-09 71c7e149e42cb0fc78a8)
#![feature(generators, generator_trait)]
use std::{
ops::{Generator, GeneratorState},
pin::Pin,
};
fn firstn(n: u64) -> impl Generator<Yield = u64, Return = ()> {
move || {
let mut num = 0;
while num < n {
yield num;
num += 1;
}
}
}
Since everything in current Rust operates on iterators, we create an adapter that converts a generator into an iterator in order to play with the broader ecosystem. I'd expect that such an adapter would be present in the standard library eventually:
struct GeneratorIteratorAdapter<G>(Pin<Box<G>>);
impl<G> GeneratorIteratorAdapter<G>
where
G: Generator<Return = ()>,
{
fn new(gen: G) -> Self {
Self(Box::pin(gen))
}
}
impl<G> Iterator for GeneratorIteratorAdapter<G>
where
G: Generator<Return = ()>,
{
type Item = G::Yield;
fn next(&mut self) -> Option<Self::Item> {
match self.0.as_mut().resume(()) {
GeneratorState::Yielded(x) => Some(x),
GeneratorState::Complete(_) => None,
}
}
}
Now we can use it:
fn main() {
let generator_iterator = GeneratorIteratorAdapter::new(firstn(1_000_000));
let sum: u64 = generator_iterator.sum();
println!("{}", sum);
}
What's interesting about this is that it's less powerful than an implementation of Iterator. For example, iterators have the size_hint method, which allows consumers of the iterator to have an idea of how many elements are remaining. This allows optimizations when collecting into a container. Generators do not have any such information.
As of Rust 1.34 stable, you have convenient std::iter::from_fn utility. It is not a coroutine (i.e. you still have to return each time), but at least it saves you from defining another struct.
from_fn accepts a closure FnMut() -> Option<T> and repeatedly calls it to create an Iterator<T>. In pseudo-Python, def from_fn(f): while (val := f()) is not None: yield val.
// -> Box<dyn std::iter::Iterator<Item=u64>> in Rust 2015
fn firstn(n: u64) -> impl std::iter::Iterator<Item = u64> {
let mut num = 0;
std::iter::from_fn(move || {
let result;
if num < n {
result = Some(num);
num += 1
} else {
result = None
}
result
})
}
fn main() {
let sum_of_first_n = firstn(1000000).sum::<u64>();
println!("sum(0 to 999999): {}", sum_of_first_n);
}
std::iter::successors is also available. It is less general but might be a bit easier to use since you just pass around the seed value explicitly. In pseudo-Python: def successors(seed, f): while seed is not None: yield seed; seed = f(seed).
fn firstn(n: u64) -> impl std::iter::Iterator<Item = u64> {
std::iter::successors(
Some(0),
move |&num| {
let next = num + 1;
if next < n {
Some(next)
} else {
None
}
},
)
}
However, Shepmaster's note applies to these utility too. (tldr: often hand-rolled Iterators are more memory efficient)
What's interesting about this is that it's less powerful than an implementation of Iterator. For example, iterators have the size_hint method, which allows consumers of the iterator to have an idea of how many elements are remaining. This allows optimizations when collecting into a container. Generators do not have any such information.
(Note: returning impl is a Rust 2018 feature. See the Edition Guide for configuration and Announcement or Rust By Example for explanation)
Rust 1.0 does not have generator functions, so you'd have to do it manually with explicit iterators.
First, rewrite your Python example as a class with a next() method, since that is closer to the model you're likely to get in Rust. Then you can rewrite it in Rust with a struct that implements the Iterator trait.
You might also be able to use a function that returns a closure to achieve a similar result, but I don't think it would be possible to have that implement the Iterator trait (since it would require being called to generate a new result).
You can use my stackful Rust generator library which supports stable Rust:
#[macro_use]
extern crate generator;
use generator::{Generator, Gn};
fn firstn(n: usize) -> Generator<'static, (), usize> {
Gn::new_scoped(move |mut s| {
let mut num = 0;
while num < n {
s.yield_(num);
num += 1;
}
done!();
})
}
fn main() {
let sum_of_first_n: usize = firstn(1000000).sum();
println!("sum ={}", sum_of_first_n);
}
or more simply:
let n = 100000;
let range = Gn::new_scoped(move |mut s| {
let mut num = 0;
while num < n {
s.yield_(num);
num += 1;
}
done!();
});
let sum: usize = range.sum();

Resources