I want to use an iterator as global state in Rust. Simplified example:
static nums = (0..).filter(|&n|n%2==0);
Is this possible?
You can do it, but you'll have to fight the language along the way.
First, true Rust statics created with the static declaration need to be compile-time constants. So something like static FOO: usize = 10 will compile, but static BAR: String = "foo".to_string() won't, because BAR requires a run-time allocation. While your iterator doesn't require a run-time allocation (though using it will make your life simpler, as you'll see later), its type is complex enough that it doesn't support compile-time initialization.
Second, Rust statics require specifying the full type up-front. This is a problem for arbitrary iterators, which one would like to create by combining iterator adapters and closures. While in this particular case, as mcarton points out, one could specify the type as Filter<RangeFrom<i32>, fn(&i32) -> bool>, it'd be closely tied to the current implementation. You'd have to change the type as soon as you switch to a different combinator. To avoid the hassle it's better to hide the iterator behind a dyn Iterator reference, i.e. type-erase it by putting it in a Box. Erasing the type involves dynamic dispatch, but so would specifying the filter function through a function pointer.
Third, Rust statics are read-only, and Iterator::next() takes &mut self, as it updates the state of the iteration. Statics must be read-only because Rust is multi-threaded, and writing to a static without proof that there are no readers or other writers would allow a data race in safe code. So to advance your global iterator, you must wrap it in a Mutex, which provides both thread safety and interior mutability.
After the long introduction, let's take a look at the fairly short implementation:
use lazy_static::lazy_static;
use std::sync::Mutex;
lazy_static! {
static ref NUMS: Mutex<Box<dyn Iterator<Item = u32> + Send + Sync>> =
Mutex::new(Box::new((0..).filter(|&n| n % 2 == 0)));
}
lazy_static is used to implement the create-on-first-use idiom to work around the non-const initial value. The first time NUMS is accessed, it will create the iterator.
As explained above, the iterator itself is boxed and wrapped in a Mutex. Since global variables are assumed to be accessed from multiple threads, our boxed iterator implements Send and Sync in addition to Iterator.
The result is used as follows:
fn main() {
assert_eq!(NUMS.lock().unwrap().next(), Some(0)); // take single value
assert_eq!(
// take multiple values
Vec::from_iter(NUMS.lock().unwrap().by_ref().take(5)),
vec![2, 4, 6, 8, 10]
);
}
Playground
No. For multiple reasons:
Iterators types tend to be complicated. This is usually not a problem because iterator types must rarely be named, but statics must be explicitly typed. In this case the type is still relatively simple: core::iter::Filter<core::ops::RangeFrom<i32>, fn(&i32) -> bool>.
Iterator's main method, next, needs a &mut self parameter. statics can't be mutable by default, as this would not be safe.
Iterators can only be iterated once. Therefore it makes little sense to have a global iterator in the first place.
The value to initialize a static must be a constant expression. Your initializer is not a constant expression.
Related
My understanding of AsMut is that it is supposed to provide a generic way to take an argument which is "equivalent* to a mutable reference, i.e. it can be cheaply converted to a mutable reference.
However, I have encountered the following example code, I am trying to use AsMut just to be generic over slice, arr and Vec, but it seems to just be copying my array, rather than passing a mutable reference to it and modifying it in place:
pub fn uses_asmut<T, M>(mut m: M)
where M: AsMut<[T]> {
m.as_mut().swap(0,1);
}
#[test]
pub fn test_swap() {
let arr = [1,2];
uses_asmut(arr);
assert_eq!(arr, [2,1]);
}
(note, I know something must be wrong since I apparently pass ownership of arr to uses_asmut as an argument, but then the borrow checker doesn't complain on the next line when I use arr again! If I change it to uses_asmut(&mut arr) the test passes, but I think the code as written shouldn't even compile!)
The conversion should be cheap. But you're passing the array by value, and that was never said to be cheap. You could pass &mut arr to not copy the array.
When we say the conversion should be cheap, we mean "don't do something like str::from_utf8_mut()" that needs to scan the whole string. Indeed, the conversion from [T; N] to [T] is extremely cheap: so cheap, that it happens automatically by the compiler (coercion).
But it does not mean it is equivalent to mutable reference, because it is not a mutable reference. It is a generic type. If you want a mutable reference, take a mutable reference. You can even use AsMut, like &mut impl AsMut<[T]>.
There is no way AsMut could prevent you from moving (or copying) things but also not require you to type the &mut at the call site.
I am converting a variety of types to String when they are passed to a function. I'm not concerned about performance as much as ergonomics, so I want the conversion to be implicit. The original, less generic implementation of the function simply used &[impl Into<String>], but I think that it should be possible to pass a variety of types at once without manually converting each to a string.
The key is that ideally, all of the following cases should be valid calls to my function:
// String literals
perform_tasks(&["Hello", "world"]);
// Owned strings
perform_tasks(&[String::from("foo"), String::from("bar")]);
// Non-string types
perform_tasks(&[1,2,3]);
// A mix of any of them
perform_tasks(&["All", 3, String::from("types!")]);
Some various signatures I've attempted to use:
fn perform_tasks(items: &[impl Into<String>])
The original version fails twice; it can't handle numeric types without manual conversion, and it requires all of the arguments to be the same type.
fn perform_tasks(items: &[impl ToString])
This is slightly closer, but it still requires all of the arguments to be of one type.
fn perform_tasks(items: &[&dyn ToString])
Doing it this way is almost enough, but it won't compile unless I manually add a borrow on each argument.
And that's where we are. I suspect that either Borrow or AsRef will be involved in a solution, but I haven't found a way to get them to handle this situation. For convenience, here is a playground link to the final signature in use (without the needed references for it to compile), alongside the various tests.
The following way works for the first three cases if I understand your intention correctly.
pub fn perform_tasks<I, A>(values: I) -> Vec<String>
where
A: ToString,
I: IntoIterator<Item = A>,
{
values.into_iter().map(|s| s.to_string()).collect()
}
As the other comments pointed out, Rust does not support an array of mixed types. However, you can do one extra step to convert them into a &[&dyn fmt::Display] and then call the same function perform_tasks to get their strings.
let slice: &[&dyn std::fmt::Display] = &[&"All", &3, &String::from("types!")];
perform_tasks(slice);
Here is the playground.
If I understand your intention right, what you want is like this
fn main() {
let a = 1;
myfn(a);
}
fn myfn(i: &dyn SomeTrait) {
//do something
}
So it's like implicitly borrow an object as function argument. However, Rust won't let you to implicitly borrow some objects since borrowing is quite an important safety measure in rust and & can help other programmers quickly identified which is a reference and which is not. Thus Rust is designed to enforce the & to avoid confusion.
So I am trying to define a method in a trait that would spawn a thread and make use of another trait method, but I am a bit stuck on how to "unpack" it from Arc<...>:
use std::sync::Arc;
use std::sync::Mutex;
use websocket::{Message, WebSocketResult};
trait Sender<S>
where
S: Into<Message<'static>> + Send,
{
fn send_once(&mut self, message: S) -> WebSocketResult<()>;
fn send_in_thread(&mut self, sleep_interval: time::Duration) -> WebSocketResult<()> {
let self_copy = Arc::new(Mutex::new(self)).clone();
let thread_join_handle = thread::spawn(move || self_copy.send_once(message));
thread_join_handle.join().unwrap()
}
}
The error I get is:
no method named `send_once` found for struct `std::sync::Arc<std::sync::Mutex<&mut Self>>` in the current scope
method not found in `std::sync::Arc<std::sync::Mutex<&mut Self>>`
Which is fair, I didn't define such a method on this wrapper type, but how do I get out of this situation the shortest way? Or, the most idiomatic way? I used Arc because previously I had Self cannot be sent between threads safely if I didn't use it.
There's a couple of things going on here:
Arc<T> is used to provide "shared ownership". By default, a value has a single owner, which ensures that each value is dropped exactly once. If the same piece of data could be owned by 2 variables, it would be dropped twice. An Arc bypasses this restriction by providing a different Drop implementation: "if there are other references, decrease the reference count by one, otherwise, drop the wrapped data".
Arc dereferences to T via the Deref trait. This means that something like the following will work:
let string = Arc::new("hello");
println!("{}", string.len());
Note there is no "unpacking" needed, this happens implicitly, and is explained in some detail in this question: What is the relation between auto-dereferencing and deref coercion?
Mutex does a different job. It allows an otherwise non-thread-safe value to be shared safely between threads, by performing "locking" to prevent simultaneous reads/writes.
Because of this, if you have a Mutex<i32>, you can't just treat that as an i32, you first have to acquire the lock, by calling .lock(), and then handle the Result you get back in case the mutex was poisoned.
TLDR:
use self_copy.lock().unwrap().send_once(), or .lock() and handle the error case
You need to lock the mutex to obtain a MutexGuard before you can call methods on it:
let thread_join_handle = thread::spawn(move || self_copy
.lock()
.unwrap()
.send_once(message));
Standard Cell struct provides interior mutability but allows only a few mutation methods such as set(), swap() and replace(). All of these methods change the whole content of the Cell.
However, sometimes more specific manipulations are needed, for example, to change only a part of data contained in the Cell.
So I tried to implement some kind of universal Cell, allowing arbitrary data manipulation.
The manipulation is represented by user-defined closure that accepts a single argument - &mut reference to the interior data of the Cell, so the user itself can deside what to do with the Cell interior. The code below demonstrates the idea:
use std::cell::UnsafeCell;
struct MtCell<Data>{
dcell: UnsafeCell<Data>,
}
impl<Data> MtCell<Data>{
fn new(d: Data) -> MtCell<Data> {
return MtCell{dcell: UnsafeCell::new(d)};
}
fn exec<F, RetType>(&self, func: F) -> RetType where
RetType: Copy,
F: Fn(&mut Data) -> RetType
{
let p = self.dcell.get();
let pd: &mut Data;
unsafe{ pd = &mut *p; }
return func(pd);
}
}
// test:
type MyCell = MtCell<usize>;
fn main(){
let c: MyCell = MyCell::new(5);
println!("initial state: {}", c.exec(|pd| {return *pd;}));
println!("state changed to {}", c.exec(|pd| {
*pd += 10; // modify the interior "in place"
return *pd;
}));
}
However, I have some concerns regarding the code.
Is it safe, i.e can some safe but malicious closure break Rust mutability/borrowing/lifetime rules by using this "universal" cell?
I consider it safe since lifetime of the interior reference parameter prohibits its exposition beyond the closure call time. But I still have doubts (I'm new to Rust).
Maybe I'm re-inventing the wheel and there exist some templates or techniques solving the problem?
Note: I posted the question here (not on code review) as it seems more related to the language rather than code itself (which represents just a concept).
[EDIT] I'd want zero cost abstraction without possibility of runtime failures, so RefCell is not perfect solution.
This is a very common pitfall for Rust beginners.
Is it safe, i.e can some safe but malicious closure break Rust mutability/borrowing/lifetime rules by using this "universal" cell? I consider it safe since lifetime of the interior reference parameter prohibits its exposition beyond the closure call time. But I still have doubts (I'm new to Rust).
In a word, no.
Playground
fn main() {
let mt_cell = MtCell::new(123i8);
mt_cell.exec(|ref1: &mut i8| {
mt_cell.exec(|ref2: &mut i8| {
println!("Double mutable ref!: {:?} {:?}", ref1, ref2);
})
})
}
You're absolutely right that the reference cannot be used outside of the closure, but inside the closure, all bets are off! In fact, pretty much any operation (read or write) on the cell within the closure is undefined behavior (UB), and may cause corruption/crashes anywhere in your program.
Maybe I'm re-inventing the wheel and there exist some templates or techniques solving the problem?
Using Cell is often not the best technique, but it's impossible to know what the best solution is without knowing more about the problem.
If you insist on Cell, there are safe ways to do this. The unstable (ie. beta) Cell::update() method is literally implemented with the following code (when T: Copy):
pub fn update<F>(&self, f: F) -> T
where
F: FnOnce(T) -> T,
{
let old = self.get();
let new = f(old);
self.set(new);
new
}
Or you could use Cell::get_mut(), but I guess that defeats the whole purpose of Cell.
However, usually the best way to change only part of a Cell is by breaking it up into separate Cells. For example, instead of Cell<(i8, i8, i8)>, use (Cell<i8>, Cell<i8>, Cell<i8>).
Still, IMO, Cell is rarely the best solution. Interior mutability is a common design in C and many other languages, but it is somewhat more rare in Rust, at least via shared references and Cell, for a number of reasons (e.g. it's not Sync, and in general people don't expect interior mutability without &mut). Ask yourself why you are using Cell and if it is really impossible to reorganize your code to use normal &mut references.
IMO the bottom line is actually about safety: if no matter what you do, the compiler complains and it seems that you need to use unsafe, then I guarantee you that 99% of the time either:
There's a safe (but possibly complex/unintuitive) way to do it, or
It's actually undefined behavior (like in this case).
EDIT: Frxstrem's answer also has better info about when to use Cell/RefCell.
Your code is not safe, since you can call c.exec inside c.exec to get two mutable references to the cell contents, as demonstrated by this snippet containing only safe code:
let c: MyCell = MyCell::new(5);
c.exec(|n| {
// need `RefCell` to access mutable reference from within `Fn` closure
let n = RefCell::new(n);
c.exec(|m| {
let n = &mut *n.borrow_mut();
// now `n` and `m` are mutable references to the same data, despite using
// no unsafe code. this is BAD!
})
})
In fact, this is exactly the reason why we have both Cell and RefCell:
Cell only allows you to get and set a value and does not allow you to get a mutable reference from an immutable one (thus avoiding the above issue), but it does not have any runtime cost.
RefCell allows you to get a mutable reference from an immutable one, but needs to perform checks at runtime to ensure that this is safe.
As far as I know, there's not really any safe way around this, so you need to make a choice in your code between no runtime cost but less flexibility, and more flexibility but with a small runtime cost.
To add the elements of two Vecs I wrote a function like
fn add_components(dest: &mut Vec<i32>, first: &Vec<i32>, second: &Vec<i32>){
for i in 0..first.len() {
dest[i] = first[i] + second[i];
}
}
And this works fine when dest is another Vec.
let mut new_components = Vec::with_capacity(components.len());
Vector::add_components(&mut new_comps, &components, &other_components);
But it blows up when I am trying to add in-place:
Vector::add_components(&mut components, &components, &other_components);
because now I borrow components as mutable and immutable at the same time. But this obviously is what I am trying to achieve.
Are there any conventional and general (meaning not only concerning Vecs) solutions to this problem which don't involve unsafe code and pointer magic?
Another example of this problem:
Suppose I want to overload AddAssign for a numeric type like
impl AddAssign<Output=&NumericType> for NumericType {
fn add_assign(&mut self, other: &NumericType) {
unimplemented!() // concrete implementation is not important
}
}
Notice that I want to take a reference as second argument to avoid copying. This works fine when adding two different objects, but adding an object to itself creates the exact same scenario:
let mut num = NumericType{};
num += &num
I am borrowing num mutably and immutably at the same time. So obviously this should work and is safe, but it also is against Rust's borrowing rules.
What are the best practices (apart from copying of course) to deal with this issue, which arises in many forms?
There is no generic solution to this. Rust can't generically abstract over mutability in borrow checking.
You will need to have two versions of the function for in-place and destination versions.
Rust has strict aliasing rules, so dest[i] = first[i] + second[i] actually compiles to different code depending on whether the compiler has a guarantee that dest and first are different. Don't try to fudge it with unsafe, because it will be Undefined Behavior and will get miscompiled.