Using synchronous file-IO library in asynchronous code - rust

I want to use library with synchronous file IO in asynchronous application. I also want all file operations work asynchronously. Is that possible?
Something like this:
// function in other crate with synchronous API
fn some_api_fun_with_sync_io(r: &impl std::io::Read) -> Result<(), std::io::Error> {
// ...
}
async fn my_fun() -> anyhow::Result<()> {
let mut async_file = async_std::fs::File::open("test.txt").await?;
// I want some magic here ))
let mut sync_file = magic_async_to_sync_converter(async_file);
some_api_fun_with_sync_io(&mut sync_file)?;
Ok(())
}

I don't think this magic exists yet, but you can conjure it up yourself with async_std::task::block_on:
fn magic_async_to_sync_converter(async_file: AsyncFile) -> Magic {
Magic(async_file)
}
struct Magic(AsyncFile);
impl SyncRead for Magic {
fn read(&mut self, buf: &mut [u8]) -> std::io::Result<usize> {
block_on(self.0.read(buf))
}
}
use std::io::Read as SyncRead;
use async_std::{
fs::File as AsyncFile,
io::ReadExt,
task::{block_on, spawn_blocking},
};
But since some_api_fun_with_sync_io is now doing blocking io, you'll have to shove it into a blocking io thread with spawn_blocking:
spawn_blocking(move || some_api_fun_with_sync_io(sync_file)).await?;
You might want to revise your design and see whether you can do without this though. spawn_blocking is still marked as unstable by async_std.

Benchmarking idea of #Caesar :
use async_std::prelude::*;
use std::time::*;
struct AsyncToSyncWriteCvt<T: async_std::io::Write + Unpin> (T);
impl<T: async_std::io::Write + Unpin> std::io::Write for AsyncToSyncWriteCvt<T> {
fn write(&mut self, buf: &[u8]) -> std::io::Result<usize> {
async_std::task::block_on(self.0.write(buf))
}
fn flush(&mut self) -> std::io::Result<()> {
async_std::task::block_on(self.0.flush())
}
}
fn test_sync<W: std::io::Write>(mut w: W) -> Result<(), std::io::Error> {
for _ in 0..1000000 { w.write("test test test test ".as_bytes())?; }
Ok(())
}
async fn test_async<T: async_std::io::Write + Unpin>(mut w: T) -> Result<(), std::io::Error> {
for _ in 0..1000000 { w.write("test test test test ".as_bytes()).await?; }
Ok(())
}
fn main() -> anyhow::Result<()> {
async_std::task::block_on(async {
// bench async -> sync IO
let now = Instant::now();
let async_file = async_std::fs::File::create("test1.txt").await?;
let sync_file = AsyncToSyncWriteCvt(async_file);
test_sync(sync_file)?;
println!("Async -> sync: {:.2}s", now.elapsed().as_secs_f32());
// bench sync IO
let now = Instant::now();
let sync_file = std::fs::File::create("test2.txt")?;
test_sync(sync_file)?;
println!("Sync: {:.2}s", now.elapsed().as_secs_f32());
// bench async IO
let now = Instant::now();
let async_file = async_std::fs::File::create("test3.txt").await?;
test_async(async_file).await?;
println!("Async: {:.2}s", now.elapsed().as_secs_f32());
Ok(())
})
}
This code shows "sync -> async" file writing as fast as "async" file writing but less fast then direct sync writing. BufWriter allow to speed up and to close the speed gap between sync and async

Related

What is the Rust idiom for turning a bunch of tasks into futures that are executed on a thread pool?

In java I would use an ExecutorService that is a thread pool with a fixed size and submit(Callable)s to it and then get() the results.
What is the idiom that matches this in Rust? I could thread::spawn() a bunch of tasks and join() them, but it would create a thread per task, and I want to limit the number of concurrent threads.
In order to make things a little more concrete, here is a rough sketch of some code:
let a4 = thread_pool.spawn(|| svg.compute_stitches("path674"));
let a1 = thread_pool.spawn(|| svg.compute_stitches("path653"));
let a2 = thread_pool.spawn(|| svg.compute_stitches("g659"));
let a3 = thread_pool.spawn(|| svg.compute_stitches("path664"));
let a5 = thread_pool.spawn(|| svg.compute_stitches("path679"));
stitcher.stitch(a1.join());
stitcher.stitch(a2.join());
stitcher.next_color();
stitcher.stitch(a3.join());
stitcher.next_color();
stitcher.stitch(a4.join());
stitcher.next_color();
stitcher.stitch(a5.join());
I have rolled my own solution for the time being. It looks like this:
use std::sync::mpsc;
use std::sync::mpsc::{Receiver, RecvError};
use std::{panic, thread};
pub struct ThreadPool {
channel: spmc::Sender<Mission>,
}
impl ThreadPool {
pub fn new(thread_count: usize) -> ThreadPool {
let (tx, rx) = spmc::channel();
for _ in 0..thread_count {
let rx2 = rx.clone();
thread::spawn(move || Self::work_loop(rx2));
}
ThreadPool { channel: tx }
}
pub fn spawn<F, T: 'static>(&mut self, task: F) -> Answer<T>
where
F: FnOnce() -> T + std::panic::UnwindSafe + Send + 'static,
{
let (tx, rx) = mpsc::channel();
let mission = Mission {
go: Box::new(move || {
let tmp = panic::catch_unwind(task);
tx.send(tmp).unwrap()
}),
};
self.channel.send(mission).unwrap();
Answer { channel: rx }
}
fn work_loop(channel: spmc::Receiver<Mission>) {
while let Ok(mission) = channel.recv() {
(mission.go)();
}
}
}
struct Mission {
go: Box<dyn FnOnce()>,
}
unsafe impl Send for Mission {}
pub struct Answer<T> {
channel: Receiver<std::thread::Result<T>>,
}
impl<T> Answer<T> {
pub fn get(self) -> Result<T, RecvError> {
let tmp = self.channel.recv();
match tmp {
Ok(rval) => match rval {
Ok(rval) => Ok(rval),
Err(explosion) => panic::resume_unwind(explosion),
},
Err(e) => Err(e),
}
}
}

how to generalize from `File` to `Read`?

I have some working code that reads a file, but I need to generalize it to pull data from additional sources other than simple disk files.
Is Read the correct generalization I should work with in order to replace File?
If so, how can I fix example2 in the following sample code? As is, it fails with the compile error dyn async_std::io::Read cannot be unpinned at the commented line. If not, what type should I return instead from get_read and are there any corresponding changes required in example2?
//! [dependencies]
//! tokio = { version = "1.0.1", features = ["full"] }
//! async-std = "1.8.0"
//! anyhow = "1.0.32"
use async_std::io::prelude::*;
use async_std::fs::File;
use anyhow::Result;
#[tokio::main]
async fn main() -> Result<()> {
example1().await?;
example2().await?;
Ok(())
}
// Example of consuming `File` ... works great!
async fn example1() -> Result<()> {
let mut file = get_file().await?;
let mut contents = String::new();
let _ = file.read_to_string(&mut contents).await?;
println!("read {} characters", contents.len());
Ok(())
}
// Example of consuming `Read` ... does not compile?
async fn example2() -> Result<()> {
let mut read = get_read().await?;
let mut contents = String::new();
// ERROR: `dyn async_std::io::Read` cannot be unpinned
let _ = read.read_to_string(&mut contents).await?;
println!("read {} characters", contents.len());
Ok(())
}
async fn get_read() -> Result<Box<dyn Read>> {
let file = get_file().await?;
Ok(Box::new(file))
}
async fn get_file() -> Result<File> {
let file = File::open("/etc/hosts").await?;
Ok(file)
}
You need to pin:
async fn get_read() -> Result<Pin<Box<dyn Read>>> {
let file = get_file().await?;
Ok(Box::pin(file))
}
Box<File> (without Pin) works because File implements Unpin. Box<dyn Read + Unpin> would work too.

How can I chain two futures on the same resource without having to define every single method combination ahead of time?

I am writing the code to bootstrap and connect to a 2G/3G network using a SIM800L modem. This modem is interfaced with a single serial channel, which I've muxed outside of this project into 4 channels (data, text interface, control interface, status messages).
In order to bootstrap this, I need to run a series of sequential commands. This sequence changes based on the output of the modem (is the SIM locked? What kind of info does the SIM need to be unlocked? What kind of APN are we getting on? What kind of network selection do we want?). I initially thought that this would be a perfect application for futures as each individual operation can be very costly in terms of time spent idling (AT+COPS, one of the command, takes up to 10s to return).
I'm on to something like this, which, while it compiles and seems to execute commands sequentially, the third operation comes out empty. My question is twofold: why do the commands run not pop up in the result of the last future, and is there a more robust way of doing something like this?
#![feature(conservative_impl_trait)]
extern crate futures;
extern crate tokio_core;
use std::sync::{Arc, Mutex};
use futures::{future, Future};
use tokio_core::reactor::Core;
use futures::sync::oneshot;
use std::thread;
use std::io;
use std::time::Duration;
pub struct Channel {
operations: Arc<Mutex<Vec<String>>>,
}
impl Channel {
pub fn ops(&mut self) -> Box<Future<Item = Vec<String>, Error = io::Error>> {
println!("{:?}", self.operations);
let ops = Arc::clone(&self.operations);
let ops = ops.lock().unwrap();
future::ok::<Vec<String>, io::Error>(ops.to_vec()).boxed()
}
pub fn run(&mut self, command: &str) -> Box<Future<Item = Vec<String>, Error = io::Error>> {
let (tx, rx) = oneshot::channel::<Vec<String>>();
let ops = Arc::clone(&self.operations);
let str_cmd = String::from(command);
thread::spawn(move || {
thread::sleep(Duration::new(0, 10000));
let mut ops = ops.lock().unwrap();
ops.push(str_cmd.clone());
println!("Pushing op: {}", str_cmd.clone());
tx.send(vec!["OK".to_string()])
});
rx.map_err(|_| io::Error::new(io::ErrorKind::NotFound, "Test"))
.boxed()
}
}
pub struct Channels {
inner_object: Arc<Mutex<Channel>>,
}
impl Channels {
pub fn one(&self, cmd: &str) -> Box<Future<Item = Vec<String>, Error = io::Error>> {
let v = Arc::clone(&self.inner_object);
let mut v = v.lock().unwrap();
v.run(&cmd)
}
pub fn ops(&self) -> Box<Future<Item = Vec<String>, Error = io::Error>> {
let v = Arc::clone(&self.inner_object);
let mut v = v.lock().unwrap();
v.ops()
}
pub fn run_command(&self) -> Box<Future<Item = (), Error = io::Error>> {
let a = self.one("AT+CMEE=2");
let b = self.one("AT+CREG=0");
let c = self.ops();
Box::new(a.and_then(|result_1| {
assert_eq!(result_1, vec![String::from("OK")]);
b.and_then(|result_2| {
assert_eq!(result_2, vec![String::from("OK")]);
c.map(move |ops| {
assert_eq!(
ops.as_slice(),
["AT+CMEE=2".to_string(), "AT+CREG=0".to_string()]
);
})
})
}))
}
}
fn main() {
let mut core = Core::new().expect("Core should be created");
let channels = Channels {
inner_object: Arc::new(Mutex::new(Channel {
operations: Arc::new(Mutex::new(vec![])),
})),
};
let result = core.run(channels.run_command()).expect("Should've worked");
println!("{:?}", result);
}
playground
why do the commands run not pop up in the result of the last future
Because you haven't sequenced the operations to occur in that way:
let a = self.one("AT+CMEE=2");
let b = self.one("AT+CREG=0");
let c = self.ops();
This immediately builds:
a, b — promises that sleep a while before they respond
c — a promise that gets the ops in the vector
At the point in time that c is created, the sleeps have yet to terminate, so there have been no operations performed, so the vector will be empty.
Future::and_then is intended to be used to define sequential operations. This is complicated in your case as you want to use self in the body of the and_then closure. You can clone the Arc<Channel> and use that instead.
You'll note that I've made a number of simplifications:
Returning a String instead of Vec<String>
Removing unused mut qualifiers and a Mutex
Returning the operations Vec directly.
extern crate futures;
extern crate tokio_core;
use std::sync::{Arc, Mutex};
use futures::Future;
use tokio_core::reactor::Core;
use futures::sync::oneshot;
use std::thread;
use std::io;
use std::time::Duration;
pub struct Channel {
operations: Arc<Mutex<Vec<String>>>,
}
impl Channel {
fn ops(&self) -> Vec<String> {
self.operations.lock().unwrap().clone()
}
fn command(&self, command: &str) -> Box<Future<Item = String, Error = io::Error>> {
let (tx, rx) = oneshot::channel();
let ops = Arc::clone(&self.operations);
let str_cmd = String::from(command);
thread::spawn(move || {
thread::sleep(Duration::new(0, 10000));
println!("Pushing op: {}", str_cmd);
ops.lock().unwrap().push(str_cmd);
tx.send("OK".to_string())
});
Box::new(rx.map_err(|_| io::Error::new(io::ErrorKind::NotFound, "Test")))
}
}
struct Channels {
data: Arc<Channel>,
}
impl Channels {
fn run_command(&self) -> Box<Future<Item = (), Error = io::Error>> {
let d2 = Arc::clone(&self.data);
let d3 = Arc::clone(&self.data);
Box::new(
self.data
.command("AT+CMEE=2")
.and_then(move |cmee_answer| {
assert_eq!(cmee_answer, "OK"); // This should be checked in `command` and be a specific Error
d2.command("AT+CREG=0")
})
.map(move |creg_answer| {
assert_eq!(creg_answer, "OK"); // This should be checked in `command` and be a specific Error
let ops = d3.ops();
assert_eq!(ops, ["AT+CMEE=2", "AT+CREG=0"])
}),
)
}
}
fn main() {
let mut core = Core::new().expect("Core should be created");
let channels = Channels {
data: Arc::new(Channel {
operations: Arc::new(Mutex::new(vec![])),
}),
};
let result = core.run(channels.run_command()).expect("Should've worked");
println!("{:?}", result);
}
However, this isn't the type of code I usually see with futures. Instead of taking &self, many futures take self. Let's see how that would look:
extern crate futures;
extern crate tokio_core;
use std::sync::{Arc, Mutex};
use futures::Future;
use tokio_core::reactor::Core;
use futures::sync::oneshot;
use std::thread;
use std::io;
use std::time::Duration;
#[derive(Clone)]
pub struct Channel {
operations: Arc<Mutex<Vec<String>>>,
}
impl Channel {
fn ops(&self) -> Arc<Mutex<Vec<String>>> {
Arc::clone(&self.operations)
}
fn command(self, command: &str) -> Box<Future<Item = (Self, String), Error = io::Error>> {
let (tx, rx) = oneshot::channel();
let str_cmd = String::from(command);
thread::spawn(move || {
thread::sleep(Duration::new(0, 10000));
println!("Pushing op: {}", str_cmd);
self.operations.lock().unwrap().push(str_cmd);
tx.send((self, "OK".to_string()))
});
Box::new(rx.map_err(|_| io::Error::new(io::ErrorKind::NotFound, "Test")))
}
}
struct Channels {
data: Channel,
}
impl Channels {
fn run_command(self) -> Box<Future<Item = (), Error = io::Error>> {
Box::new(
self.data
.clone()
.command("AT+CMEE=2")
.and_then(|(channel, cmee_answer)| {
assert_eq!(cmee_answer, "OK");
channel.command("AT+CREG=0")
})
.map(|(channel, creg_answer)| {
assert_eq!(creg_answer, "OK");
let ops = channel.ops();
let ops = ops.lock().unwrap();
assert_eq!(*ops, ["AT+CMEE=2", "AT+CREG=0"]);
}),
)
}
}
fn main() {
let mut core = Core::new().expect("Core should be created");
let channels = Channels {
data: Channel {
operations: Arc::new(Mutex::new(vec![])),
},
};
let result = core.run(channels.run_command()).expect("Should've worked");
println!("{:?}", result);
}

How can I pass a socket as an argument to a function being called within a thread?

I'm going to have multiple functions that all need access to one main socket.
Would it better to:
Pass this socket to each function that needs access to it
Have a globally accessible socket
Can someone provide an example of the best way to do this?
I come from a Python/Nim background where things like this are easily done.
Edit:
How can I pass a socket as an arg to a function being called within a thread.
Ex.
fn main() {
let mut s = BufferedStream::new((TcpStream::connect(server).unwrap()));
let thread = Thread::spawn(move || {
func1(s, arg1, arg2);
});
while true {
func2(s, arg1);
}
}
Answer for updated question
We can use TcpStream::try_clone:
use std::io::Read;
use std::net::{TcpStream, Shutdown};
use std::thread;
fn main() {
let mut stream = TcpStream::connect("127.0.0.1:34254").unwrap();
let stream2 = stream.try_clone().unwrap();
let _t = thread::spawn(move || {
// close this stream after one second
thread::sleep_ms(1000);
stream2.shutdown(Shutdown::Read).unwrap();
});
// wait for some data, will get canceled after one second
let mut buf = [0];
stream.read(&mut buf).unwrap();
}
Original answer
It's usually (let's say 99.9% of the time) a bad idea to have any global mutable state, if you can help it. Just do as you said: pass the socket to the functions that need it.
use std::io::{self, Write};
use std::net::TcpStream;
fn send_name(stream: &mut TcpStream) -> io::Result<()> {
stream.write(&[42])?;
Ok(())
}
fn send_number(stream: &mut TcpStream) -> io::Result<()> {
stream.write(&[1, 2, 3])?;
Ok(())
}
fn main() {
let mut stream = TcpStream::connect("127.0.0.1:31337").unwrap();
let r = send_name(&mut stream).and_then(|_| send_number(&mut stream));
match r {
Ok(..) => println!("Yay, sent!"),
Err(e) => println!("Boom! {}", e),
}
}
You could also pass the TcpStream to a struct that manages it, and thus gives you a place to put similar methods.
use std::io::{self, Write};
use std::net::TcpStream;
struct GameService {
stream: TcpStream,
}
impl GameService {
fn send_name(&mut self) -> io::Result<()> {
self.stream.write(&[42])?;
Ok(())
}
fn send_number(&mut self) -> io::Result<()> {
self.stream.write(&[1, 2, 3])?;
Ok(())
}
}
fn main() {
let stream = TcpStream::connect("127.0.0.1:31337").unwrap();
let mut service = GameService { stream: stream };
let r = service.send_name().and_then(|_| service.send_number());
match r {
Ok(..) => println!("Yay, sent!"),
Err(e) => println!("Boom! {}", e),
}
}
None of this is really Rust-specific, these are generally-applicable programming practices.

std::sync::Arc of trait in Rust

I am trying to implement library for making TCP servers.
This is very simplified code with a problem:
#![crate_name="http_server2"]
#![crate_type="lib"]
use std::io::{TcpListener, Listener, Acceptor, TcpStream, IoResult, Reader, Writer};
use std::ops::Fn;
use std::sync::Arc;
pub trait Handler: Sized + Send {
fn do_it(s: TcpStream) -> IoResult<()>;
}
fn serve(handler: Arc<Handler + Sized>) -> IoResult<()>
{
let listener = TcpListener::bind("127.0.0.1", 1234);
for stream in try!(listener.listen()).incoming() {
let stream = try!(stream);
let handler = handler.clone();
spawn(proc() {
handler.do_it(stream);
});
}
Ok(())
}
Compiler totally ignores my specifications of Handler + Sized. If I implement structure with trait Handler and try to call serve with this structure, such advice about size will be ignored too ( http://is.gd/OWs22i ).
<anon>:13:1: 25:2 error: the trait `core::kinds::Sized` is not implemented for the type `Handler+'static+Sized`
<anon>:13 fn serve(handler: Arc<Handler + Sized>) -> IoResult<()>
<anon>:14 {
<anon>:15 let listener = TcpListener::bind("127.0.0.1", 1234);
<anon>:16
<anon>:17 for stream in try!(listener.listen()).incoming() {
<anon>:18 let stream = try!(stream);
...
<anon>:13:1: 25:2 note: the trait `core::kinds::Sized` must be implemented because it is required by `alloc::arc::Arc`
<anon>:13 fn serve(handler: Arc<Handler + Sized>) -> IoResult<()>
<anon>:14 {
<anon>:15 let listener = TcpListener::bind("127.0.0.1", 1234);
<anon>:16
<anon>:17 for stream in try!(listener.listen()).incoming() {
<anon>:18 let stream = try!(stream);
...
error: aborting due to previous error
How can I implement one template function with multithreading that will accept different handlers?
As I said in my comment above,
use std::io::{TcpListener, Listener, Acceptor, TcpStream, IoResult, Writer};
use std::sync::Arc;
pub trait Handler: Sized + Send {
fn do_it(&self, s: TcpStream) -> IoResult<()>;
}
fn serve<T: Handler + Sized + Send + Sync>(handler: Arc<T>) -> IoResult<()> {
let listener = TcpListener::bind("127.0.0.1", 1234);
for stream in try!(listener.listen()).incoming() {
let stream = try!(stream);
let handler = handler.clone();
spawn(proc() {
let _ = handler.do_it(stream);
});
}
Ok(())
}
struct Hello {
x: u32,
}
impl Handler for Hello {
fn do_it(&self, mut s: TcpStream) -> IoResult<()> { s.write_le_u32(self.x) }
}
fn main() {
let s = Arc::new(Hello{x: 123,});
let _ = serve(s);
}
compiles fine. (playpen)
Changes
Make do_it take &self.
Make serve generic, by adding a type parameter with the constraints you want.
Make the impl of Handler for Hello in do_it not discard the result of the write (remove ;).
Clarify with let _ = ... that we intentionally discard a result.
You will not be able to execute it in the playpen though (application terminated abnormally with signal 31 (Bad system call)), as the playpen forbids IO (network IO in this case). It runs fine on my local box though.

Resources