I am attempting to make a future that continuously finds new work to do and then maintains a set of futures for those work items. I would like to make sure neither my main future that finds work to be blocked for long periods of time and to have my work being done concurrently.
Here is a rough overview of what I am trying to do. Specifically isDone does not exist and also from what I can understand from the docs isn't necessarily a valid way to use futures in Rust. What is the idomatic way of doing this kind of thing?
use std::collections::HashMap;
use tokio::runtime::Runtime;
async fn find_work() -> HashMap<i64, String> {
// Go read from the DB or something...
let mut work = HashMap::new();
work.insert(1, "test".to_string());
work.insert(2, "test".to_string());
return work;
}
async fn do_work(id: i64, value: String) -> () {
// Result<(), Error> {
println!("{}: {}", id, value);
}
async fn async_main() -> () {
let mut pending_work = HashMap::new();
loop {
for (id, value) in find_work().await {
if !pending_work.contains_key(&id) {
let fut = do_work(id, value);
pending_work.insert(id, fut);
}
}
pending_work.retain(|id, fut| {
if isDone(fut) {
// do something with the result
false
} else {
true
}
});
}
}
fn main() {
let runtime = Runtime::new().unwrap();
let exec = runtime.executor();
exec.spawn(async_main());
runtime.shutdown_on_idle();
}
Related
I'm trying to run multiple invocation of the same script on a single deno MainWorker concurrently, and waiting for their
results (since the scripts can be async). Conceptually, I want something like the loop in run_worker below.
type Tx = Sender<(String, Sender<String>)>;
type Rx = Receiver<(String, Sender<String>)>;
struct Runner {
worker: MainWorker,
futures: FuturesUnordered<Pin<Box<dyn Future<Output=(String, Result<Global<Value>, Error>)>>>>,
response_futures: FuturesUnordered<Pin<Box<dyn Future<Output=(String, Result<(), SendError<String>>)>>>>,
result_senders: HashMap<String, Sender<String>>,
}
impl Runner {
fn new() ...
async fn run_worker(&mut self, rx: &mut Rx, main_module: ModuleSpecifier, user_module: ModuleSpecifier) {
self.worker.execute_main_module(&main_module).await.unwrap();
self.worker.preload_side_module(&user_module).await.unwrap();
loop {
tokio::select! {
msg = rx.recv() => {
if let Some((id, sender)) = msg {
let global = self.worker.js_runtime.execute_script("test", "mod.entry()").unwrap();
self.result_senders.insert(id, sender);
self.futures.push(Box::pin(async {
let resolved = self.worker.js_runtime.resolve_value(global).await;
return (id, resolved);
}));
}
},
script_result = self.futures.next() => {
if let Some((id, out)) = script_result {
self.response_futures.push(Box::pin(async {
let value = deserialize_value(out.unwrap(), &mut self.worker);
let res = self.result_senders.remove(&id).unwrap().send(value).await;
return (id.clone(), res);
}));
}
},
// also handle response_futures here
else => break,
}
}
}
}
The worker can't be borrowed as mutable multiple times, so this won't work. So the worker has to be a RefCell, and
I've created a BorrowingFuture:
struct BorrowingFuture {
worker: RefCell<MainWorker>,
global: Global<Value>,
id: String
}
And its poll implementation:
fn poll(self: Pin<&mut Self>, cx: &mut Context<'_>) -> Poll<Self::Output> {
match Pin::new(&mut Box::pin(self.worker.borrow_mut().js_runtime.resolve_value(self.global.clone()))).poll(cx) {
Poll::Ready(result) => Poll::Ready((self.id.clone(), result)),
Poll::Pending => {
cx.waker().clone().wake_by_ref();
Poll::Pending
}
}
}
So the above
self.futures.push(Box::pin(async {
let resolved = self.worker.js_runtime.resolve_value(global).await;
return (id, resolved);
}));
would become
self.futures.push(Box::pin(BorrowingFuture{worker: self.worker, global: global.clone(), id: id.clone()}));
and this would have to be done for the response_futures above as well.
But I see a few issues with this.
Creating a new future on every poll and then polling that seems wrong, but it does work.
It probably has a performance impact because new objects are created constantly.
The same issue would happen for the response futures, which would call send on each poll, which seems completely wrong.
The waker.wake_by_ref is called on every poll, because there is no way to know when a script result
will resolve. This results in the future being polled thousands (and more) times per second (always creating a new object),
which could be the same as checking it in a loop, I guess.
Note My current setup doesn't use select!, but an enum as Output from multiple Future implementations, pushed into
a single FuturesUnordered, and then matched to handle the correct type (script, send, receive). I used select here because
it's far less verbose, and gets the point across.
Is there a way to do this better/more efficiently? Or is it just not the way MainWorker was meant to be used?
main for completeness:
#[tokio::main]
async fn main() {
let main_module = deno_runtime::deno_core::resolve_url(MAIN_MODULE_SPECIFIER).unwrap();
let user_module = deno_runtime::deno_core::resolve_url(USER_MODULE_SPECIFIER).unwrap();
let (tx, mut rx) = channel(1);
let (result_tx, mut result_rx) = channel(1);
let handle = thread::spawn(move || {
let runtime = tokio::runtime::Builder::new_multi_thread().enable_all().build().unwrap();
let mut runner = Runner::new();
runtime.block_on(runner.run_worker(&mut rx, main_module, user_module));
});
tx.send(("test input".to_string(), result_tx)).await.unwrap();
let result = result_rx.recv().await.unwrap();
println!("result from worker {}", result);
handle.join().unwrap();
}
use tokio::runtime::Runtime;
// Create the runtime
let rt = Runtime::new().unwrap();
// Execute the future, blocking the current thread until completion
let s = rt.block_on(async {
println!("hello");
});
is it possible to specify an output type for a future block? On the code above, s: (), but I wanted to be Result<(), Error> so I can return some error from inside the block.
I am not quite familiar with async rust, but as far as I know, the return type of an async fn or async block is impl Future<Output=TheRealReturnTypeOfFnBody>. Once blocked on, as you did rt.block_on(async block), the return type will become TheRealReturnTypeOfFnBody, therefore:
To make s have type Result<(), Error>, you have to implement it in the function body(i.e. make TheRealReturnTypeOfFnBody Result<(), Error>)
use tokio::runtime::Runtime;
fn main() {
// Create the runtime
let rt = Runtime::new().unwrap();
// Execute the future, blocking the current thread until completion
let s: () = rt.block_on(async {
println!("hello");
});
let s_with_error_case: Result<(), &str> = rt.block_on(async {
if false {
Err("run into a trouble")
} else {
println!("erverything is fine");
Ok(())
}
});
if let Err(err_info) = s_with_error_case {
eprintln!("{}", err_info);
}
}
I've spent a few hours trying to figure this out and I'm pretty done. I found the question with a similar name, but that looks like something was blocking synchronously which was messing with tokio. That very well may be the issue here, but I have absolutely no idea what is causing it.
Here is a heavily stripped down version of my project which hopefully gets the issue across.
use std::io;
use futures_util::{
SinkExt,
stream::{SplitSink, SplitStream},
StreamExt,
};
use tokio::{
net::TcpStream,
sync::mpsc::{channel, Receiver, Sender},
};
use tokio_tungstenite::{
connect_async,
MaybeTlsStream,
tungstenite::Message,
WebSocketStream,
};
#[tokio::main]
async fn main() {
connect_to_server("wss://a_valid_domain.com".to_string()).await;
}
async fn read_line() -> String {
loop {
let mut str = String::new();
io::stdin().read_line(&mut str).unwrap();
str = str.trim().to_string();
if !str.is_empty() {
return str;
}
}
}
async fn connect_to_server(url: String) {
let (ws_stream, _) = connect_async(url).await.unwrap();
let (write, read) = ws_stream.split();
let (tx, rx) = channel::<ChannelMessage>(100);
tokio::spawn(channel_thread(write, rx));
tokio::spawn(handle_std_input(tx.clone()));
read_messages(read, tx).await;
}
#[derive(Debug)]
enum ChannelMessage {
Text(String),
Close,
}
// PROBLEMATIC FUNCTION
async fn channel_thread(
mut write: SplitSink<WebSocketStream<MaybeTlsStream<TcpStream>>, Message>,
mut rx: Receiver<ChannelMessage>,
) {
while let Some(msg) = rx.recv().await {
println!("{:?}", msg); // This only fires when buffer is full
match msg {
ChannelMessage::Text(text) => write.send(Message::Text(text)).await.unwrap(),
ChannelMessage::Close => {
write.close().await.unwrap();
rx.close();
return;
}
}
}
}
async fn read_messages(
mut read: SplitStream<WebSocketStream<MaybeTlsStream<TcpStream>>>,
tx: Sender<ChannelMessage>,
) {
while let Some(msg) = read.next().await {
let msg = match msg {
Ok(m) => m,
Err(_) => continue
};
match msg {
Message::Text(m) => println!("{}", m),
Message::Close(_) => break,
_ => {}
}
}
if !tx.is_closed() {
let _ = tx.send(ChannelMessage::Close).await;
}
}
async fn handle_std_input(tx: Sender<ChannelMessage>) {
loop {
let str = read_line().await;
if tx.is_closed() {
break;
}
tx.send(ChannelMessage::Text(str)).await.unwrap();
}
}
As you can see, what I'm trying to do is:
Connect to a websocket
Print outgoing messages from the websocket
Forward any input from stdin to the websocket
Also a custom heartbeat solution which was trimmed out
The issue lies in the channel_thread() function. I move the websocket writer into this function as well as the channel receiver. The issue is, it only loops over the sent objects when the buffer is full.
I've spent a lot of time trying to solve this, any help is greatly appreciated.
Here, you make a blocking synchronous call in an async context:
async fn read_line() -> String {
loop {
let mut str = String::new();
io::stdin().read_line(&mut str).unwrap();
// ^^^^^^^^^^^^^^^^^^^
// This is sync+blocking
str = str.trim().to_string();
if !str.is_empty() {
return str;
}
}
}
You never ever make blocking synchronous calls in an async context, because that prevents the entire thread from running other async tasks. Your channel receiver task is likely also assigned to this thread, so it's having to wait until all the blocking calls are done and whatever invokes this function yields back to the async runtime.
Tokio has its own async version of stdin, which you should use instead.
I am trying to write to a HashMap using the Arc<Mutex<T>> pattern as part of a website scraping exercise inspired from the Rust cookbook.
This first part uses tokio runtime. I cannot get past the tasks being completed and returning the HashMap as it just hangs.
type Db = Arc<Mutex<HashMap<String, bool>>>;
pub async fn handle_async_tasks(db: Db) -> BoxResult<HashMap<String, bool>> {
let links = NodeUrl::new("https://www.inverness-courier.co.uk/")
.await
.unwrap();
let arc = db.clone();
let mut handles = Vec::new();
for link in links.links_with_paths {
let x = arc.clone();
handles.push(tokio::spawn(async move {
process(x, link).await;
}));
}
// for handle in handles {
// handle.await.expect("Task panicked!");
// } < I tried this as well>
futures::future::join_all(handles).await;
let readables = arc.lock().await;
for (key, value) in readables.clone().into_iter() {
println!("Checking db: k, v ==>{} / {}", key, value);
}
let clone_db = readables.clone();
return Ok(clone_db);
}
async fn process(db: Db, url: Url) {
let mut db = db.lock().await;
println!("checking {}", url);
if check_link(&url).await.is_ok() {
db.insert(url.to_string(), true);
} else {
db.insert(url.to_string(), false);
}
}
async fn check_link(url: &Url) -> BoxResult<bool> {
let res = reqwest::get(url.as_ref()).await?;
Ok(res.status() != StatusCode::NOT_FOUND)
}
pub struct NodeUrl {
domain: String,
pub links_with_paths: Vec<Url>,
}
#[tokio::main]
async fn main() {
let db: Db = Arc::new(Mutex::new(HashMap::new()));
let db = futures::executor::block_on(task::handle_async_tasks(db));
}
I would like to return the HashMap to main() where the main thread is blocked. How can I wait for all async threaded processes to be complete and return the HashMap?
let links = NodeUrl::new("https://www.some-site.com/.co.uk/").await.unwrap();
This doesn't seem like a valid URL to me.
async fn process(db: Db, url: Url) {
let mut db = db.lock().await;
println!("checking {}", url);
if check_link(&url).await.is_ok() {
db.insert(url.to_string(), true);
} else {
db.insert(url.to_string(), false);
}
}
This is highly problematic. You hold the exclusive lock on the database during the entire request. This makes your application effectively serial.
The default timeout in reqwest is 30 seconds. So if the server isn't responsive and you have a lot of links to go through the program might just seem to 'hang'.
You should only get the database lock for as short as possible - just to do the insert:
async fn process(db: Db, url: Url) {
println!("checking {}", url);
if check_link(&url).await.is_ok() {
let mut db = db.lock().await;
db.insert(url.to_string(), true);
} else {
let mut db = db.lock().await;
db.insert(url.to_string(), false);
}
}
Or even better, eliminating the useless if:
async fn process(db: Db, url: Url) {
println!("checking {}", url);
let valid = check_link(&url).await.is_ok();
db.lock().await.insert(url.to_string(), valid);
}
Finally you didn't show your main function, the way you invoke handle_async_tasks or have other stuff running might be problematic.
My main issue was how to handle the MutexGuard - which I did in the end by using clone and returning the inner value.
There was no need to use an futures::executor in main: since we are within the tokio runtime, calling .await was sufficient to access the final value synchronously.
Cloning the Arc once was enough; I had cloned it twice before passing it into the multi-threaded context.
Thanks to #orlp for pointing out bad logic where it concerned the check_link function.
pub async fn handle_async_tasks() -> BoxResult<HashMap<String, bool>> {
let get_links = NodeUrl::new("https://www.invernesscourier.co.uk/")
.await
.unwrap();
let db: Db = Arc::new(Mutex::new(HashMap::new()));
let mut handles = Vec::new();
for link in get_links.links_with_paths {
let x = db.clone();
handles.push(tokio::spawn(async move {
process(x, link).await;
}));
}
futures::future::join_all(handles).await;
let guard = db.lock().await;
let cloned = guard.clone();
Ok(cloned)
}
#[tokio::main]
async fn main() {
let db = task::handle_async_tasks().await.unwrap();
for (key, value) in db.into_iter() {
println!("Checking db: {} / {}", key, value);
}
}
This is by no means the best Rust code, but I wanted to share how I tackled things in the end.
I'm trying to make a Stream that would wait until a specific character is in buffer. I know there's read_until() on BufRead but I actually need a custom solution, as this is a stepping stone to implement waiting until a specific string in in buffer (or, for example, a regexp match happens).
In my project where I first encountered the problem, problem was that future processing just hanged when I get a Ready(_) from inner future and return NotReady from my function. I discovered I shouldn't do that per docs (last paragraph). However, what I didn't get, is what's the actual alternative that is promised in that paragraph. I read all the published documentation on the Tokio site and it doesn't make sense for me at the moment.
So following is my current code. Unfortunately I couldn't make it simpler and smaller as it's already broken. Current result is this:
Err(Custom { kind: Other, error: Error(Shutdown) })
Err(Custom { kind: Other, error: Error(Shutdown) })
Err(Custom { kind: Other, error: Error(Shutdown) })
<ad infinum>
Expected result is getting some Ok(Ready(_)) out of it, while printing W and W', and waiting for specific character in buffer.
extern crate futures;
extern crate tokio_core;
extern crate tokio_io;
extern crate tokio_io_timeout;
extern crate tokio_process;
use futures::stream::poll_fn;
use futures::{Async, Poll, Stream};
use tokio_core::reactor::Core;
use tokio_io::AsyncRead;
use tokio_io_timeout::TimeoutReader;
use tokio_process::CommandExt;
use std::process::{Command, Stdio};
use std::sync::{Arc, Mutex};
use std::thread;
use std::time::Duration;
struct Process {
child: tokio_process::Child,
stdout: Arc<Mutex<tokio_io_timeout::TimeoutReader<tokio_process::ChildStdout>>>,
}
impl Process {
fn new(
command: &str,
reader_timeout: Option<Duration>,
core: &tokio_core::reactor::Core,
) -> Self {
let mut cmd = Command::new(command);
let cat = cmd.stdout(Stdio::piped());
let mut child = cat.spawn_async(&core.handle()).unwrap();
let stdout = child.stdout().take().unwrap();
let mut timeout_reader = TimeoutReader::new(stdout);
timeout_reader.set_timeout(reader_timeout);
let timeout_reader = Arc::new(Mutex::new(timeout_reader));
Self {
child,
stdout: timeout_reader,
}
}
}
fn work() -> Result<(), ()> {
let window = Arc::new(Mutex::new(Vec::new()));
let mut core = Core::new().unwrap();
let process = Process::new("cat", Some(Duration::from_secs(20)), &core);
let mark = Arc::new(Mutex::new(b'c'));
let read_until_stream = poll_fn({
let window = window.clone();
let timeout_reader = process.stdout.clone();
move || -> Poll<Option<u8>, std::io::Error> {
let mut buf = [0; 8];
let poll;
{
let mut timeout_reader = timeout_reader.lock().unwrap();
poll = timeout_reader.poll_read(&mut buf);
}
match poll {
Ok(Async::Ready(0)) => Ok(Async::Ready(None)),
Ok(Async::Ready(x)) => {
{
let mut window = window.lock().unwrap();
println!("W: {:?}", *window);
println!("buf: {:?}", &buf[0..x]);
window.extend(buf[0..x].into_iter().map(|x| *x));
println!("W': {:?}", *window);
if let Some(_) = window.iter().find(|c| **c == *mark.lock().unwrap()) {
Ok(Async::Ready(Some(1)))
} else {
Ok(Async::NotReady)
}
}
}
Ok(Async::NotReady) => Ok(Async::NotReady),
Err(e) => Err(e),
}
}
});
let _stream_thread = thread::spawn(move || {
for o in read_until_stream.wait() {
println!("{:?}", o);
}
});
match core.run(process.child) {
Ok(_) => {}
Err(e) => {
println!("Child error: {:?}", e);
}
}
Ok(())
}
fn main() {
work().unwrap();
}
This is complete example project.
If you need more data you need to call poll_read again until you either find what you were looking for or poll_read returns NotReady.
You might want to avoid looping in one task for too long, so you can build yourself a yield_task function to call instead if poll_read didn't return NotReady; it makes sure your task gets called again ASAP after other pending tasks were run.
To use it just run return yield_task();.
fn yield_inner() {
use futures::task;
task::current().notify();
}
#[inline(always)]
pub fn yield_task<T, E>() -> Poll<T, E> {
yield_inner();
Ok(Async::NotReady)
}
Also see futures-rs#354: Handle long-running, always-ready futures fairly #354.
With the new async/await API futures::task::current is gone; instead you'll need a std::task::Context reference, which is provided as parameter to the new std::future::Future::poll trait method.
If you're already manually implementing the std::future::Future trait you can simply insert:
context.waker().wake_by_ref();
return std::task::Poll::Pending;
Or build yourself a Future-implementing type that yields exactly once:
pub struct Yield {
ready: bool,
}
impl core::future::Future for Yield {
type Output = ();
fn poll(self: core::pin::Pin<&mut Self>, cx: &mut core::task::Context<'_>) -> core::task::Poll<Self::Output> {
let this = self.get_mut();
if this.ready {
core::task::Poll::Ready(())
} else {
cx.waker().wake_by_ref();
this.ready = true; // ready next round
core::task::Poll::Pending
}
}
}
pub fn yield_task() -> Yield {
Yield { ready: false }
}
And then use it in async code like this:
yield_task().await;