Pin/Unpin was introduced as a prerequisite to adding async/await support to Rust. It allows safely polling a future that potentially has internal refererences to it's own state.
However, wouldn't introducing move-constructors to the language be a simpler solution? This way the future could be moved freely in memory after being polled, all the internal references being fixed by the move-constructor. For example:
async fn foo { /* ... */ }
async fn bar {
let future = foo();
poll_fn(|cx| future.poll(cx)).await;
// Now `future` might have internal references.
let boxed_future = Box::new(future);
// The future was moved to the heap, but the move-constructor
// took care of adjusting the internal references, so we can
// safely poll the future again.
boxed_future.await;
}
Surely this approach was contemplated, but ultimately rejected. Why?
Related
I want to run two Future in parallel, and if possible in different threads :
try_join!(
tokio::spawn(fut1), // fut1 is not 'static
tokio::spawn(fut2)
)?;
If my understanding is correct, tokio::spawn requires a Future to be 'static because the execution is started immediately and it has no guarantee that the future will not outlive current scope.
However in my case I immediately await them, so I know it won't outlive the current scope.
Is my reasoning correct ? If not, what is unsafe in passing non 'static arguments in my case ?
However in my case I immediately await them, so I know it won't outlive the current scope.
There are two responses to this line of reasoning.
One is that the fact that you're immediately awaiting simply has no bearing on the checks performed by the compiler. tokio::spawn() requires a future that owns its data, and that's just a fact - how you use it just doesn't enter the picture, or in other words the compiler doesn't even attempt to be smart enough to override such bound even where it seems safe to do so.
The other response is that what you're saying is not actually true. Yes, you immediately await the result, but that doesn't guarantee that the future passed to spawn() will not outlive the current scope. Awaiting a future just means that if the awaited future chooses to suspend, the async function that awaits it suspends along with it. The outer future created by the async function may be dropped before it's awaited to completion, in which case the scope disappears while fut1 is still running. For example:
// let's assume this function were allowed to compile
async fn foo() {
let mut i = 0;
tokio::spawn(async {
sleep(1).await;
i = 1;
}).await;
assert!(i == 1);
}
// this function is safe and compiles
async fn bar() {
{
// create the foo() future in an inner scope
let fut = foo();
// spin up the future created by `foo()` by polling it just once
Box::pin(fut)
.as_mut()
.poll(&mut Context::from_waker(&futures::task::noop_waker()));
// leave fut to go out of scope and get dropped
}
// what memory does `i = 1` modify after 1s?
}
I'm trying to speed up a computationally-heavy Rust function by making it concurrent using only the built-in thread support. In particular, I want to alternate between quick single-threaded phases (where the main thread has mutable access to a big structure) and concurrent phases (where many worker threads run with read-only access to the structure). I don't want to make extra copies of the structure or force it to be 'static. Where I'm having trouble is convincing the borrow checker that the worker threads have finished.
Ignoring the borrow checker, an Arc reference seems like does all that is needed. The reference count in the Arc increases with the .clone() for each worker, then decreases as the workers conclude and I join all the worker threads. If (and only if) the Arc reference count is 1, it should be safe for the main thread to resume. The borrow checker, however, doesn't seem to know about Arc reference counts, and insists that my structure needs to be 'static.
Here's some sample code which works fine if I don't use threads, but won't compile if I switch the comments to enable the multi-threaded case.
struct BigStruct {
data: Vec<usize>
// Lots more
}
pub fn main() {
let ref_bigstruct = &mut BigStruct { data: Vec::new() };
for i in 0..3 {
ref_bigstruct.data.push(i); // Phase where main thread has write access
run_threads(ref_bigstruct); // Phase where worker threads have read-only access
}
}
fn run_threads(ref_bigstruct: &BigStruct) {
let arc_bigstruct = Arc::new(ref_bigstruct);
{
let arc_clone_for_worker = arc_bigstruct.clone();
// SINGLE-THREADED WORKS:
worker_thread(arc_clone_for_worker);
// MULTI-THREADED DOES NOT COMPILE:
// let handle = thread::spawn(move || { worker_thread(arc_clone_for_worker); } );
// handle.join();
}
assert!(Arc::strong_count(&arc_bigstruct) == 1);
println!("??? How can I tell the borrow checker that all borrows of ref_bigstruct are done?")
}
fn worker_thread(my_struct: Arc<&BigStruct>) {
println!(" worker says len()={}", my_struct.data.len());
}
I'm still learning about Rust lifetimes, but what I think (fear?) what I need is an operation that will take an ordinary (not 'static) reference to my structure and give me an Arc that I can clone into immutable references with a 'static lifetime for use by the workers. Once all the the worker Arc references are dropped, the borrow checker needs to allow my thread-spawning function to return. For safety, I assume this would panic if the the reference count is >1. While this seems like it would generally confirm with Rust's safety requirements, I don't see how to do it.
The underlying problem is not the borrowing checker not following Arc and the solution is not to use Arc. The problem is the borrow checker being unable to understand that the reason a thread must be 'static is because it may outlive the spawning thread, and thus if I immediately .join() it it is fine.
And the solution is to use scoped threads, that is, threads that allow you to use non-'static data because they always immediately .join(), and thus the spawned thread cannot outlive the spawning thread. Problem is, there are no worker threads on the standard library. Well, there are, however they're unstable.
So if you insist on not using crates, for some reason, you have no choice but to use unsafe code (don't, really). But if you can use external crates, then you can use the well-known crossbeam crate with its crossbeam::scope function, at least til std's scoped threads are stabilized.
In Rust Arc< T>, T is per definition immutable. Which means in order to use Arc, to make threads access data that is going to change, you also need it to wrap in some type that is interiorly mutable.
Rust provides a type that is especially suited for a single write or multiple read accesses in parallel, called RwLock.
So for your simple example, this would propably look something like this
use std::{sync::{Arc, RwLock}, thread};
struct BigStruct {
data: Vec<usize>
// Lots more
}
pub fn main() {
let arc_bigstruct = Arc::new(RwLock::new(BigStruct { data: Vec::new() }));
for i in 0..3 {
arc_bigstruct.write().unwrap().data.push(i); // Phase where main thread has write access
run_threads(&arc_bigstruct); // Phase where worker threads have read-only access
}
}
fn run_threads(ref_bigstruct: &Arc<RwLock<BigStruct>>) {
{
let arc_clone_for_worker = ref_bigstruct.clone();
//MULTI-THREADED
let handle = thread::spawn(move || { worker_thread(&arc_clone_for_worker); } );
handle.join().unwrap();
}
assert!(Arc::strong_count(&ref_bigstruct) == 1);
}
fn worker_thread(my_struct: &Arc<RwLock<BigStruct>>) {
println!(" worker says len()={}", my_struct.read().unwrap().data.len());
}
Which outputs
worker says len()=1
worker says len()=2
worker says len()=3
As for your question, the borrow checker does not know when an Arc is released, as far as I know. The references are counted at runtime.
I ran into a curious issue while debugging some async code in Rust. Mainly, I have the following async block which is part of a test:
let s = Signal::new(0);
let s_clone = s.clone();
assert_eq!(s_clone.clone().await, 0);
assert_eq!(s.clone().await, 0);
join![s.for_each(|_| async {}), async {
s_clone.set_and_wait(7).await;
println!("s_clone should drop here...");
drop(s_clone);
}];
The join![] code in this part doesn't matter too much, what matters is that the Signal I implemented is shared in such a way that only where there are no more unused signals do they stop and drop their values. Thus, if I omit the explicit drop(s_clone);, s_clone won't get dropped, causing join![] to never be able to complete.
The Question
In synchronous code, when a value reaches the end of the block, it just get's gropped, but why not in async? Why do I have to manually drop the value?
Additional Context
In case anyone is curious, I've implemented both Future and Stream for Signal. The future will only complete when all other pending signals cloned from the same source signal have finished reacting to any of their changes.
drop(s_clone) makes the async block consume s_clone and thus forces s_clone to be moved into it. (This is not specific to drop() - doing anything that consumes the value would have the same effect.) Omitting the drop() reverts to the default closure behavior of borrowing the value. When the value is borrowed, it remains owned by the surrounding environment and so it won't (and can't) be dropped by the async block, which only holds a reference.
To force the async block to drop s_clone without an explicit drop(), use the move keyword, i.e. change async to async move. This will ensure that ownership of s_clone (and any other captured value) is moved into the async block, which will then drop it prior to exit.
In synchronous code, when a value reaches the end of the block, it just get's gropped, but why not in async? Why do I have to manually drop the value?
The exact same thing can happen in synchronous code. Here is a non-async example equivalent to yours, where you can either uncomment the move or the drop() to have the value owned and dropped by the closure:
struct Dropper;
impl Drop for Dropper {
fn drop(&mut self) {
println!("dropper dropped");
}
}
fn main() {
let dropper = Dropper;
{
let closure = /*move*/ || {
let _ = &dropper; // capture dropper
//drop(dropper);
};
closure();
}
println!("closure dropped")
}
Playground
This is a continuation of How to re-use a value from the outer scope inside a closure in Rust? , opened new Q for better presentation.
// main.rs
// The value will be modified eventually inside `main`
// and a http request should respond with whatever "current" value it holds.
let mut test_for_closure :Arc<RefCell<String>> = Arc::new(RefCell::from("Foo".to_string()));
// ...
// Handler for HTTP requests
// From https://docs.rs/hyper/0.14.8/hyper/service/fn.service_fn.html
let make_svc = make_service_fn(|_conn| async {
Ok::<_, Infallible>(service_fn(|req: Request<Body>| async move {
if req.version() == Version::HTTP_11 {
let foo:String = *test_for_closure.borrow();
Ok(Response::new(Body::from(foo.as_str())))
} else {
Err("not HTTP/1.1, abort connection")
}
}))
});
Unfortunately, I get RefCell<std::string::String> cannot be shared between threads safely:
RefCell only works on single threads. You will need to use Mutex which is similar but works on multiple threads. You can read more about Mutex here: https://doc.rust-lang.org/std/sync/struct.Mutex.html.
Here is an example of moving an Arc<Mutex<>> into a closure:
use std::sync::{Arc, Mutex};
fn main() {
let mut test: Arc<Mutex<String>> = Arc::new(Mutex::from("Foo".to_string()));
let mut test_for_closure = Arc::clone(&test);
let closure = || async move {
// lock it so it cant be used in other threads
let foo = test_for_closure.lock().unwrap();
println!("{}", foo);
};
}
The first error in your error message is that Sync is not implemented for RefCell<String>. This is by design, as stated by Sync's rustdoc:
Types that are not Sync are those that have “interior mutability” in a
non-thread-safe form, such as Cell and RefCell. These types allow for
mutation of their contents even through an immutable, shared
reference. For example the set method on Cell takes &self, so it
requires only a shared reference &Cell. The method performs no
synchronization, thus Cell cannot be Sync.
Thus it's not safe to share RefCells between threads, because you can cause a data race through a regular, shared reference.
But what if you wrap it in Arc ? Well, the rustdoc is quite clear again:
Arc will implement Send and Sync as long as the T implements Send
and Sync. Why can’t you put a non-thread-safe type T in an Arc to
make it thread-safe? This may be a bit counter-intuitive at first:
after all, isn’t the point of Arc thread safety? The key is this:
Arc makes it thread safe to have multiple ownership of the same
data, but it doesn’t add thread safety to its data. Consider
Arc<RefCell>. RefCell isn’t Sync, and if Arc was always Send,
Arc<RefCell> would be as well. But then we’d have a problem:
RefCell is not thread safe; it keeps track of the borrowing count
using non-atomic operations.
In the end, this means that you may need to pair Arc with some sort
of std::sync type, usually Mutex.
Arc<T> will not be Sync unless T is Sync because of the same reason. Given that, probably you should use std/tokio Mutex instead of RefCell
In the crate I'm developing I have several unsafe functions, which are marked as such because of reasons explained in this answer. In unsafe functions, I can perform unsafe operations as if the full function body was wrapped in an unsafe { } block.
The problem is that, in bigger functions, only a small part of the function body is actually performing unsafe operations, while the rest is doing perfectly safe stuff. Often, this safe stuff is even pretty independent of the unsafe code. In these larger functions, I would like to narrow the scope of unsafe operations. The reason should be fairly understandable: I also don't wrap my complete codebase in an unsafe { } block just because I can.
Unfortunately, there isn't a safe { } block to "invert" the behavior of unsafe functions. If there were I would use it like that:
unsafe fn my_function() {
safe {
// ... doing safe stuff ...
unsafe {
// ... doing `unsafe` stuff ...
}
// ... doing safe stuff ...
}
}
But as this is not possible: what are best practices in these situations to narrow the scope of unsafe operations? Are there established tricks to deal with this?
Just to be clear: this question is not about discussing whether or not narrowing the unsafe scope is good or bad. I stated that I want to do it: this question is about how to do it and what solutions (if any) are most commonly used in practice. (And if you don't understand why I would like to do it, this RFC is very related.)
If you want to use the unsafe keyword as a way to catalogue all unsafe operations, you can construct more accurate boundaries by splitting your code into safe private functions. I'm not sure if it exactly meets your requirement of "best practice" since I don't know of any large projects using the technique, but it will work:
// Document the assumptions of this unsafe function here
pub unsafe fn my_function() {
my_internal_function()
}
// private
fn my_internal_function() {
// ... doing safe stuff ...
unsafe {
// Document the assumptions of this unsafe block here
// ... doing `unsafe` stuff ...
}
// ... doing safe stuff ...
}
If you are concerned about the existence of a "safe" function that is actually unsafe to use, introducing a risk of accidentally being used incorrectly, you can nest those private functions so they are not callable outside the main unsafe function:
pub unsafe fn my_function() {
fn my_internal_function() {
// ... doing safe stuff ...
unsafe {
// Document the assumptions of this unsafe block here
// ... doing `unsafe` stuff ...
}
// ... doing safe stuff ...
}
my_internal_function();
}
After all of this, properly documenting the assumptions of unsafe code with comments is the most important part to get right. This sort of trick will only help if you are concerned about metrics for the number of unsafe lines.