future cannot be sent between threads safely after Mutex - rust

I've been trying to move from postgres to tokio_postgres but struggle with some async.
use scraper::Html;
use std::sync::Arc;
use tokio::sync::Mutex;
use tokio::task;
struct Url {}
impl Url {
fn scrapped_home(&self, symbol: String) -> Html {
let url = format!(
"https://finance.yahoo.com/quote/{}?p={}&.tsrc=fin-srch", symbol, symbol
);
let response = reqwest::blocking::get(url).unwrap().text().unwrap();
scraper::Html::parse_document(&response)
}
}
#[derive(Clone)]
struct StockData {
symbol: String,
}
#[tokio::main]
async fn main() {
let stock_data = StockData { symbol: "".to_string() };
let url = Url {};
let mut uri_test: Arc<Mutex<Html>> = Arc::new(Mutex::from(url.scrapped_home(stock_data.clone().symbol)));
let mut uri_test_closure = Arc::clone(&uri_test);
let uri = task::spawn_blocking(|| {
uri_test_closure.lock()
});
}
Without putting a mutex on
url.scrapped_home(stock_data.clone().symbol)),
I would get the error that a runtime cannot drop in a context where blocking is not allowed, so I put in inside spawn_blocking. Then I get the error that Cell cannot be shared between threads safely. This, from what I could gather, is because Cell isn'it Sync. I then wrapped in within a Mutex. This on the other hand throws Cell cannot be shared between threads safely'.
Now, is that because it contains a reference to a Cell and therefore isn't memory-safe? If so, would I need to implement Sync for Html? And how?
Html is from the scraper crate.
UPDATE:
Sorry, here's the error.
error: future cannot be sent between threads safely
--> src/database/queries.rs:141:40
|
141 | let uri = task::spawn_blocking(|| {
| ________________________________________^
142 | | uri_test_closure.lock()
143 | | });
| |_________^ future is not `Send`
|
= help: within `tendril::tendril::NonAtomic`, the trait `Sync` is not implemented for `Cell<usize>`
note: required by a bound in `spawn_blocking`
--> /home/a/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.20.1/src/task/blocking.rs:195:12
|
195 | R: Send + 'static,
| ^^^^ required by this bound in `spawn_blocking`
UPDATE:
Adding Cargo.toml as requested:
[package]
name = "reprod"
version = "0.1.0"
edition = "2021"
# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html
[dependencies]
reqwest = { version = "0.11", features = ["json", "blocking"] }
tokio = { version = "1", features = ["full"] }
tokio-postgres = "0"
scraper = "0.12.0"
UPDATE: Added original sync code:
fn main() {
let stock_data = StockData { symbol: "".to_string() };
let url = Url {};
url.scrapped_home(stock_data.clone().symbol);
}
UPDATE: Thanks to Kevin I was able to get it to work. As he pointed out Html was neither Send nor Sync. This part of the Rust lang doc helped me to understand how message passing works.
pub fn scrapped_home(&self, symbol: String) -> Html {
let (tx, rx) = mpsc::channel();
let url = format!(
"https://finance.yahoo.com/quote/{}?p={}&.tsrc=fin-srch", symbol, symbol
);
thread::spawn(move || {
let val = reqwest::blocking::get(url).unwrap().text().unwrap();
tx.send(val).unwrap();
});
scraper::Html::parse_document(&rx.recv().unwrap())
}
Afterwards I had some sort of epiphany and got it to work with tokio, without message passing, as well
pub async fn scrapped_home(&self, symbol: String) -> Html {
let url = format!(
"https://finance.yahoo.com/quote/{}?p={}&.tsrc=fin-srch", symbol, symbol
);
let response = task::spawn_blocking(move || {
reqwest::blocking::get(url).unwrap().text().unwrap()
}).await.unwrap();
scraper::Html::parse_document(&response)
}
I hope that this might help someone.

This illustrates it a bit more clearly now: you're trying to return a tokio::sync::MutexGuard across a thread boundary. When you call this:
let mut uri_test: Arc<Mutex<Html>> = Arc::new(Mutex::from(url.scrapped_home(stock_data.clone().symbol)));
let mut uri_test_closure = Arc::clone(&uri_test);
let uri = task::spawn_blocking(|| {
uri_test_closure.lock()
});
The uri_test_closure.lock() call (tokio::sync::Mutex::lock()) doesn't have a semicolon, which means it's returning the object that's the result of the call. But you can't return a MutexGuard across a thread boundary.
I suggest you read up on the linked lock() call, as well as blocking_lock() and such there.
I'm not certain of the point of your call to task::spawn_blocking here. If you're trying to illustrate a use case for something, that's not coming across.
Edit:
The problem is deeper. Html is both !Send and !Sync which means you can't even wrap it up in an Arc<Mutex<Html>> or Arc<Mutex<Optional<Html>>> or whatever. You need to get the data from another thread in another way, and not as that "whole" object. See this post on the rust user forum for more detailed information. But whatever you're wrapping must be Send and that struct is explicitly not.
So if a type is Send and !Sync, you can wrap in a Mutex and an Arc. But if it's !Send, you're hooped, and need to use message passing, or other synchronization mechanisms.

Related

Return a reference to a T inside a lazy static RwLock<Option<T>>?

I have a lazy static struct that I want to be able to set to some random value in the beginning of the execution of the program, and then get later. This little silly snippet can be used as an example:
use lazy_static::lazy_static;
use std::sync::RwLock;
struct Answer(i8);
lazy_static! {
static ref ANSWER: RwLock<Option<Answer>> = RwLock::new(None);
}
fn answer_question() {
*ANSWER.write().unwrap() = Some(Answer(42));
}
fn what_is_the_answer() -> &'static Answer {
ANSWER
.read()
.unwrap()
.as_ref()
.unwrap()
}
This code fails to compile:
error[E0515]: cannot return value referencing temporary value
--> src/lib.rs:15:5
|
15 | ANSWER
| _____^
| |_____|
| ||
16 | || .read()
17 | || .unwrap()
| ||_________________- temporary value created here
18 | | .as_ref()
19 | | .unwrap()
| |__________________^ returns a value referencing data owned by the current function
I know you can not return a reference to a temporary value. But I want to return a reference to ANSWER which is static - the very opposite of temporary! I guess it is the RwLockReadGuard that the first call to unwrap returns that is the problem?
I can get the code to compile by changing the return type:
fn what_is_the_answer() -> RwLockReadGuard<'static, Option<Answer>> {
ANSWER
.read()
.unwrap()
}
But now the calling code becomes very unergonomic - I have to do two extra calls to get to the actual value:
what_is_the_answer().as_ref().unwrap()
Can I somehow return a reference to the static ANSWER from this function? Can I get it to return a RwLockReadGuard<&Answer> maybe by mapping somehow?
once_cell is designed for this: use .set(...).unwrap() in answer_question and .get().unwrap() in what_is_the_answer.
As far as I understand your intention, the value of Answer can't be computed while it is being initialized in the lazy_static but depends on parameters known only when answer_question is called. The following may not be the most elegant solution, yet it allows for having a &'static-reference to a value that depends on parameters only known at runtime.
The basic approach is to use two lazy_static-values, one of which serves as a "proxy" to do the necessary synchronization, the other being the value itself. This avoids having to access multiple layers of locks and unwrapping of Option-values whenever you access ANSWER.
The ANSWER-value is initialized by waiting on a CondVar, which will signal when the value has been computed. The value is then placed in the lazy_static and from then on unmovable. Hence &'static is possible (see get_the_answer()). I have chosen String as the example-type. Notice that accessing ANSWER without calling generate_the_answer() will cause the initialization to wait forever, deadlocking the program.
use std::{sync, thread};
lazy_static::lazy_static! {
// A proxy to synchronize when the value is generated
static ref ANSWER_PROXY: (sync::Mutex<Option<String>>, sync::Condvar) = {
(sync::Mutex::new(None), sync::Condvar::new())
};
// The actual value, which is initialized from the proxy and stays in place
// forever, hence allowing &'static access
static ref ANSWER: String = {
let (lock, cvar) = &*ANSWER_PROXY;
let mut answer = lock.lock().unwrap();
loop {
// As long as the proxy is None, the answer has not been generated
match answer.take() {
None => answer = cvar.wait(answer).unwrap(),
Some(answer) => return answer,
}
}
};
}
// Generate the answer and place it in the proxy. The `param` is just here
// to demonstrate we can move owned values into the proxy
fn generate_the_answer(param: String) {
// We don't need a thread here, yet we can
thread::spawn(move || {
println!("Generating the answer...");
let mut s = String::from("Hello, ");
s.push_str(&param);
thread::sleep(std::time::Duration::from_secs(1));
let (lock, cvar) = &*ANSWER_PROXY;
*lock.lock().unwrap() = Some(s);
cvar.notify_one();
println!("Answer generated.");
});
}
// Nothing to see here, except that we have a &'static reference to the answer
fn get_the_answer() -> &'static str {
println!("Asking for the answer...");
&ANSWER
}
fn main() {
println!("Hello, world!");
// Accessing `ANSWER` without generating it will deadlock!
//get_the_answer();
generate_the_answer(String::from("John!"));
println!("The answer is \"{}\"", get_the_answer());
// The second time a value is generated, noone is listening.
// This is the flipside of `ANSWER` being a &'static
generate_the_answer(String::from("Peter!"));
println!("The answer is still \"{}\"", get_the_answer());
}

How do I solve "cannot return value referencing local data" when using threads and async/await?

I am learning Rust especially multithreading and async requests in parallel.
I read the documentation and still I do not understand where I made a mistake. I assume I know where, but do not see how to resolve it.
main.rs
use std::thread;
struct Request {
url: String,
}
impl Request {
fn new(name: &str) -> Request {
Request {
url: name.to_string(),
}
}
async fn call(&self, x: &str) -> Result<(), Box<dyn std::error::Error>> {
let resp = reqwest::get(x).await;
Ok(())
}
}
#[tokio::main]
async fn main() {
let requests = vec![
Request::new("https://www.google.com/"),
Request::new("https://www.google.com/"),
];
let handles: Vec<_> = requests
.into_iter()
.map(|request| {
thread::spawn(move || async {
request.call(&request.url).await;
})
})
.collect();
for y in handles {
println!("{:?}", y);
}
}
error[E0515]: cannot return value referencing local data `request`
--> src/main.rs:29:35
|
29 | thread::spawn(move || async {
| ___________________________________^
30 | | request.call(&request.url).await;
| | ------- `request` is borrowed here
31 | | })
| |_____________^ returns a value referencing data owned by the current function
Cargo.toml
[dependencies]
reqwest = "0.10.4"
tokio = { version = "0.2", features = ["full"] }
Like closures, async blocks capture their variables as weakly as possible. In order of preference:
immutable reference
mutable reference
by value
This is determined by how the variable is used in the closure / async block. In your example, request is only used by reference, so it is only captured by reference:
async {
request.call(&request.url).await;
}
However, you need to transfer ownership of the variable to the async block so that the variable is still alive when the future is eventually executed. Like closures, this is done via the move keyword:
thread::spawn(move || async move {
request.call(&request.url).await;
})
See also:
What is the difference between `|_| async move {}` and `async move |_| {}`
Is there a way to have a Rust closure that moves only some variables into it?
It is very unlikely that you want to mix threads and async at this point in your understanding. One is inherently blocking and the other expects code to not block. You should follow the example outlined in How can I perform parallel asynchronous HTTP GET requests with reqwest? instead.
See also:
What is the best approach to encapsulate blocking I/O in future-rs?

Cannot borrow in a Rc as mutable

First of all I'm new with Rust :-)
The problem:
I want to create a module called RestServer that contain the methods ( actix-web ) to add routes and start the server.
struct Route
{
url: String,
request: String,
handler: Box<dyn Fn(HttpRequest) -> HttpResponse>
}
impl PartialEq for Route {
fn eq(&self, other: &Self) -> bool {
self.url == other.url
}
}
impl Eq for Route {}
impl Hash for Route {
fn hash<H: Hasher>(&self, hasher: &mut H) {
self.url.hash(hasher);
}
}
this is the route structure, this structure containe the the route url, the request type ( GET, POST etc ) and hanlder is the function that have to catch the request and return a HTTPResponse
pub struct RestServer
{
scopes: HashMap<String, Rc<HashSet<Route>>>,
routes: HashSet<Route>,
host: String,
}
impl RestServer {
pub fn add_route(self, req: &str, funct: impl Fn(HttpRequest) -> HttpResponse + 'static,
route: &str, scope: Option<&str>) -> RestServer
{
let mut routes_end = self.routes;
let mut scopes_end = self.scopes;
let url = self.host;
let route = Route {
url: String::from(route),
request: String::from(req),
handler: Box::new(funct)
};
if let Some(x) = scope {
if let Some(y) = scopes_end.get(x) {
let mut cloned_y = Rc::clone(y);
cloned_y.insert(route);
scopes_end.insert(String::from(x), cloned_y);
}else {
let mut hash_scopes = HashSet::new();
hash_scopes.insert(route);
scopes_end.insert(String::from(x), Rc::new(hash_scopes));
}
} else {
routes_end.insert(route);
}
RestServer {
scopes: scopes_end,
routes: routes_end,
host: String::from(url)
}
}
the latest code is the implementation of RestServer.
The most important part is the add_route function, this function receive as paramente the route that is a string, the function handler, the request string and the scope.
First i create the route object.
I check if the scope exist into the HashMap, if yes i have to take the actual scope and update the HashSet.
When i build the code i get the following error
error[E0596]: cannot borrow data in an `Rc` as mutable
--> interface/src/rest/mod.rs:60:17
|
60 | cloned_y.insert(route);
| ^^^^^^^^ cannot borrow as mutable
|
= help: trait `DerefMut` is required to modify through a dereference, but it is not
implemented for `std::rc::Rc<std::collections::HashSet<rest::Route>>`
I know that the compiler give me some help but honestly i have no idea how to do that or if i can do with some easy solution.
After a large search in google i found a solution in RefCell, but is not so much clear
Thanks in advance for your help
You cannot borrow a reference-counting pointer as mutable; this is because one of the guarantees it provides is only possible if the structure is read-only.
You can, however, get around it, but it will require some signature changes.
Enter interior mutability
Interior mutability is a concept you may know from other programming languages in the form of mutexes, atomics and synchronization primitives. In practice, those structures allow you to temporarily guarantee that you are the only accessor of a given variable.
In Rust, this is particularly good, as it allows us to extract a mutable reference to an interior member from a structure that only requires immutable references to itself to function. Perfect to fit in Rc.
Depending on what you need for your needs, you will find the Cell and RefCell structures to be exactly what you need for this. These are not thread-safe, but then again, neither is Rc so it's not exactly a deal-breaker.
In practice, it works very simply:
let data = Rc::new(RefCell::new(true));
{
let mut reference = data.borrow_mut();
*reference = false;
}
println!("{:?}", data);
playground
(If you ever want the threaded versions, Arc replaces Rc and Mutex or RwLock replaces Cell/RefCell)

Spawning tasks with non-static lifetimes with tokio 0.1.x

I have a tokio core whose main task is running a websocket (client). When I receive some messages from the server, I want to execute a new task that will update some data. Below is a minimal failing example:
use tokio_core::reactor::{Core, Handle};
use futures::future::Future;
use futures::future;
struct Client {
handle: Handle,
data: usize,
}
impl Client {
fn update_data(&mut self) {
// spawn a new task that updates the data
self.handle.spawn(future::ok(()).and_then(|x| {
self.data += 1; // error here
future::ok(())
}));
}
}
fn main() {
let mut runtime = Core::new().unwrap();
let mut client = Client {
handle: runtime.handle(),
data: 0,
};
let task = future::ok::<(), ()>(()).and_then(|_| {
// under some conditions (omitted), we update the data
client.update_data();
future::ok::<(), ()>(())
});
runtime.run(task).unwrap();
}
Which produces this error:
error[E0477]: the type `futures::future::and_then::AndThen<futures::future::result_::FutureResult<(), ()>, futures::future::result_::FutureResult<(), ()>, [closure#src/main.rs:13:51: 16:10 self:&mut &mut Client]>` does not fulfill the required lifetime
--> src/main.rs:13:21
|
13 | self.handle.spawn(future::ok(()).and_then(|x| {
| ^^^^^
|
= note: type must satisfy the static lifetime
The problem is that new tasks that are spawned through a handle need to be static. The same issue is described here. Sadly it is unclear to me how I can fix the issue. Even some attempts with and Arc and a Mutex (which really shouldn't be needed for a single-threaded application), I was unsuccessful.
Since developments occur rather quickly in the tokio landscape, I am wondering what the current best solution is. Do you have any suggestions?
edit
The solution by Peter Hall works for the example above. Sadly when I built the failing example I changed tokio reactor, thinking they would be similar. Using tokio::runtime::current_thread
use futures::future;
use futures::future::Future;
use futures::stream::Stream;
use std::cell::Cell;
use std::rc::Rc;
use tokio::runtime::current_thread::{Builder, Handle};
struct Client {
handle: Handle,
data: Rc<Cell<usize>>,
}
impl Client {
fn update_data(&mut self) {
// spawn a new task that updates the data
let mut data = Rc::clone(&self.data);
self.handle.spawn(future::ok(()).and_then(move |_x| {
data.set(data.get() + 1);
future::ok(())
}));
}
}
fn main() {
// let mut runtime = Core::new().unwrap();
let mut runtime = Builder::new().build().unwrap();
let mut client = Client {
handle: runtime.handle(),
data: Rc::new(Cell::new(1)),
};
let task = future::ok::<(), ()>(()).and_then(|_| {
// under some conditions (omitted), we update the data
client.update_data();
future::ok::<(), ()>(())
});
runtime.block_on(task).unwrap();
}
I obtain:
error[E0277]: `std::rc::Rc<std::cell::Cell<usize>>` cannot be sent between threads safely
--> src/main.rs:17:21
|
17 | self.handle.spawn(future::ok(()).and_then(move |_x| {
| ^^^^^ `std::rc::Rc<std::cell::Cell<usize>>` cannot be sent between threads safely
|
= help: within `futures::future::and_then::AndThen<futures::future::result_::FutureResult<(), ()>, futures::future::result_::FutureResult<(), ()>, [closure#src/main.rs:17:51: 20:10 data:std::rc::Rc<std::cell::Cell<usize>>]>`, the trait `std::marker::Send` is not implemented for `std::rc::Rc<std::cell::Cell<usize>>`
= note: required because it appears within the type `[closure#src/main.rs:17:51: 20:10 data:std::rc::Rc<std::cell::Cell<usize>>]`
= note: required because it appears within the type `futures::future::chain::Chain<futures::future::result_::FutureResult<(), ()>, futures::future::result_::FutureResult<(), ()>, [closure#src/main.rs:17:51: 20:10 data:std::rc::Rc<std::cell::Cell<usize>>]>`
= note: required because it appears within the type `futures::future::and_then::AndThen<futures::future::result_::FutureResult<(), ()>, futures::future::result_::FutureResult<(), ()>, [closure#src/main.rs:17:51: 20:10 data:std::rc::Rc<std::cell::Cell<usize>>]>`
So it does seem like in this case I need an Arc and a Mutex even though the entire code is single-threaded?
In a single-threaded program, you don't need to use Arc; Rc is sufficient:
use std::{rc::Rc, cell::Cell};
struct Client {
handle: Handle,
data: Rc<Cell<usize>>,
}
impl Client {
fn update_data(&mut self) {
let data = Rc::clone(&self.data);
self.handle.spawn(future::ok(()).and_then(move |_x| {
data.set(data.get() + 1);
future::ok(())
}));
}
}
The point is that you no longer have to worry about the lifetime because each clone of the Rc acts as if it owns the data, rather than accessing it via a reference to self. The inner Cell (or RefCell for non-Copy types) is needed because the Rc can't be dereferenced mutably, since it has been cloned.
The spawn method of tokio::runtime::current_thread::Handle requires that the future is Send, which is what is causing the problem in the update to your question. There is an explanation (of sorts) for why this is the case in this Tokio Github issue.
You can use tokio::runtime::current_thread::spawn instead of the method of Handle, which will always run the future in the current thread, and does not require that the future is Send. You can replace self.handle.spawn in the code above and it will work just fine.
If you need to use the method on Handle then you will also need to resort to Arc and Mutex (or RwLock) in order to satisfy the Send requirement:
use std::sync::{Mutex, Arc};
struct Client {
handle: Handle,
data: Arc<Mutex<usize>>,
}
impl Client {
fn update_data(&mut self) {
let data = Arc::clone(&self.data);
self.handle.spawn(future::ok(()).and_then(move |_x| {
*data.lock().unwrap() += 1;
future::ok(())
}));
}
}
If your data is really a usize, you could also use AtomicUsize instead of Mutex<usize>, but I personally find it just as unwieldy to work with.

How to create a Nickel handler that uses a database connection?

I'm trying to do a simple extension to the comments example by creating a REST API and committing the post to the database. I'm creating the connection outside the scope of the handler itself which I'm assuming is where my problem lies. I'm just not sure how to fix it.
This is the code for the post handler:
server.get("/comments", middleware! {
let mut stmt = conn.prepare("SELECT * FROM comment").unwrap();
let mut iter = stmt.query_map(&[], |row| {
Comment { id: row.get(0), author: row.get(1), text: row.get(2) }
}).unwrap();
let mut out: Vec<Comment> = Vec::new();
for comment in iter {
out.push(comment.unwrap());
}
json::encode(&out).unwrap()
});
This is the error I get:
<nickel macros>:22:50: 22:66 error: the trait `core::marker::Sync` is not implemented for the type `core::cell::UnsafeCell<rusqlite::InnerConnection>` [E0277]
I assume the error is because I have created the instance and then tried to use it in a closure and that variable is probably destroyed once my main function completes.
Here's an MCVE that reproduces the problem (you should provide these when asking questions):
extern crate rusqlite;
#[macro_use]
extern crate nickel;
use nickel::{Nickel, HttpRouter};
use rusqlite::Connection;
fn main() {
let mut server = Nickel::new();
let conn = Connection::open_in_memory().unwrap();
server.get("/comments", middleware! {
let _stmt = conn.prepare("SELECT * FROM comment").unwrap();
""
});
server.listen("127.0.0.1:6767");
}
The Sync trait says:
Types that are not Sync are those that have "interior mutability" in a non-thread-safe way, such as Cell and RefCell
Which matches with the error message you get. Something inside the Connection has interior mutability which means that the compiler cannot automatically guarantee that sharing it across threads is safe. I had a recent question that might be useful to the implementor of Connection, if they can guarantee it's safe to share (perhaps SQLite itself makes guarantees).
The simplest thing you can do is to ensure that only one thread has access to the database object at a time:
use std::sync::Mutex;
fn main() {
let mut server = Nickel::new();
let conn = Mutex::new(Connection::open_in_memory().unwrap());
server.get("/comments", middleware! {
let conn = conn.lock().unwrap();
let _stmt = conn.prepare("SELECT * FROM comment").unwrap();
""
});
server.listen("127.0.0.1:6767");
}

Resources