futures-rs using Stream combinators on `BoxStream`s - rust

Using the futures-rs library, I've encountered a situation where a stream needs to be mapped through an indeterminate number of other streams before being returned to the user. Since the exact type of the output stream is unknown at the end of this operation, I've been using a BoxStream trait object while storing the stream in a struct and when returning it.
Although this approach works fine, it has the unfortunate side effect of causing the inner Stream object to be unsized. This is a problem because every one of the stream combinators require Self: Sized in their signatures meaning that I can't even wait() on the returned BoxStream in order to convert it into a blocking iterator.
Here's an example of a situation that could lead to this issue:
struct Server {
receiver: Option<Box<Stream<Item = usize, Error = ()> + Send>>,
}
impl Server {
pub fn new() -> Server {
let (tx, rx) = channel(0);
// do things with the tx (subscribe to tcp socket, connect to database, etc.)
Server { receiver: Some(rx.boxed()) }
}
/// Maps the inner `Receiver` through another stream, essentially duplicating it.
pub fn get_stream(&mut self) -> Result<Box<Stream<Item = usize, Error = ()> + Send>, ()> {
let (tx, rx) = channel(0);
let strm = self.receiver.take().unwrap();
let mut tx_opt = Some(tx);
let new_strm = strm.map(move |msg| {
// unfortunate workaround needed since `send()` takes `self`
let mut tx = tx_opt.take().unwrap();
tx = tx.send(msg.clone()).wait().unwrap();
tx_opt = Some(tx);
msg
});
simbroker.receiver = Some(new_strm.boxed());
Ok(rx.boxed())
}
}
pub fn main() {
let server = Server::new();
// possible that this may happen 0..n times
let rx: BoxStream<usize, ()> = server.get_stream();
// can't do this since the inner `Stream` trait object isn't `Sized` and `wait()`
// (along with all other stream combinators) requires that in their signatures.
for msg in rx.wait() {
// compiler error here
// ...
}
}
As shown by the above code, BoxStreams are necessary since calling map() on a stream changes its type from Receiver to Map which would make it impossible to store back into the struct. It's impossible to do pretty much anything with the returned BoxStream since it's ?Sized. In fact, the only function that's available for trait-object Streams is poll() which is supposed to never be called outside of a Task.
Is there any way that I can avoid this problem without doing something like returning an enum containing any one of the possible varieties of stream that could possibly occur? Would writing my own struct that implements Stream fix the issue somehow?

As pointed out by #FrancisGagné in a comment, futures-rs declares impl<S: ?Sized + Stream> Stream for Box<S> in the futures::Stream module. In the test in which my code was, I had failed to import Stream so that trait wasn't in scope.
The compiler didn't trigger an error for the lack of the wait() function because it had the unsized issue first.
This was resolved by adding use futures::Stream; to the start of the function.

Related

Rust member functions as multithreaded mutable callbacks in real-time safe code

I am looking to use a member function of a struct as a mutable callback. The standard way I would do this is by wrapping my struct instance in a mutex and then in an Arc. In this use case however, my callback is a real-time audio callback which needs to be real-time safe and is triggered on a dedicated thread. This means no locking can happen that might delay the callback from occurring.
My actual member callback method must take a mutable reference to the struct (I think this is where things fall apart). The struct does contain members but these can all be atomic variables.
Here is an example of the kind of thing I'm looking for:
use std::sync::Arc;
use clap::error;
use cpal::{traits::{HostTrait, DeviceTrait}, StreamConfig, OutputCallbackInfo, StreamError};
use std::sync::atomic::AtomicI16;
pub struct AudioProcessor {
var: AtomicI16,
}
impl AudioProcessor {
fn audio_block_f32(&mut self, audio: &mut [f32], _info: &OutputCallbackInfo) {
}
fn audio_error(&mut self, error: StreamError) {
}
}
fn initAudio()
{
let out_dev = cpal::default_host().default_output_device().expect("No available output device found");
let mut supported_configs_range = out_dev.supported_output_configs().expect("Could not obtain device configs");
let config = supported_configs_range.next().expect("No available configs").with_max_sample_rate();
let proc = Arc::new(AudioProcessor{var: AtomicI16::new(5)});
let audio_callback_instance = proc.clone();
let error_callback_instance = proc.clone();
out_dev.build_output_stream(&StreamConfig::from(config), move |audio: &mut [f32], info: &OutputCallbackInfo| audio_callback_instance.audio_block_f32(audio, info), move |stream_error| error_callback_instance.clone().audio_error(stream_error));
}
Currently this code fails in the build_output_stream() function with this error message:
cannot borrow data in an Arc as mutable
trait DerefMut is required to modify through a dereference, but it is not implemented for Arc<AudioProcessor>
Are there any standard ways of dealing with problems like this?

How do I define the lifetime for a tokio task spawned from a class?

I'm attempting to write a generic set_interval function helper:
pub fn set_interval<F, Fut>(mut f: F, dur: Duration)
where
F: Send + 'static + FnMut() -> Fut,
Fut: Future<Output = ()> + Send + 'static,
{
let mut interval = tokio::time::interval(dur);
tokio::spawn(async move {
// first tick is at 0ms
interval.tick().await;
loop {
interval.tick().await;
tokio::spawn(f());
}
});
}
This works fine until it's called from inside a class:
fn main() {}
struct Foo {}
impl Foo {
fn bar(&self) {
set_interval(|| self.task(), Duration::from_millis(1000));
}
async fn task(&self) {
}
}
self is not 'static, and we can't restrict lifetime parameter to something that is less than 'static because of tokio::task.
Is it possible to modify set_interval implementation so it works in cases like this?
Link to playground
P.S. Tried to
let instance = self.clone();
set_interval(move || instance.task(), Duration::from_millis(1000));
but I also get an error: error: captured variable cannot escape FnMut closure body
Is it possible to modify set_interval implementation so it works in cases like this?
Not really. Though spawn-ing f() really doesn't help either, as it precludes a simple "callback owns the object" solution (as you need either both callback and future to own the object, or just future).
I think that leaves two solutions:
Convert everything to shared mutability Arc, the callback owns one Arc, then on each tick it clones that and moves the clone into the future (the task method).
Have the future (task) acquire the object from some external source instead of being called on one, this way the intermediate callback doesn't need to do anything. Or the callback can do the acquiring and move that into the future, same diff.
Incidentally at this point it could make sense to just create the future directly, but allow cloning it. So instead of taking a callback set_interval would take a clonable future, and it would spawn() clones of its stored future instead of creating them anew.
As mentioned by #Masklinn, you can clone the Arc to allow for this. Note that cloning the Arc will not clone the underlying data, just the pointer, so it is generally OK to do so, and should not have a major impact on performance.
Here is an example. The following code will produce the error async block may outlive the current function, but it borrows data, which is owned by the current function:
fn main() {
// 🛑 Error: async block may outlive the current function, but it borrows data, which is owned by the current function
let data = Arc::new("Hello, World".to_string());
tokio::task::spawn(async {
println!("1: {}", data.len());
});
tokio::task::spawn(async {
println!("2: {}", data.len());
});
}
Rust unhelpfully suggests adding move to both async blocks, but that will result in a borrowing error because there would be multiple ownership.
To fix the problem, we can clone the Arc for each task and then add the move keyword to the async blocks:
fn main() {
let data = Arc::new("Hello, World".to_string());
let data_for_task_1 = data.clone();
tokio::task::spawn(async move {
println!("1: {}", data_for_task_1.len());
});
let data_for_task_2 = data.clone();
tokio::task::spawn(async move {
println!("2: {}", data_for_task_2.len());
});
}

AsyncRead wrapper over sync read

I met this problem while implementing AsyncRead over a synchronized read to adjust to the async world in Rust.
The sync read implementation I'm handling is a wrapper over a raw C sync implementation, much like the std::fs::File::read; therefore I would use std::io::Read for simplicity hereafter.
Here's the code:
use futures::{AsyncRead, Future};
use std::task::{Context, Poll};
use std::pin::Pin;
use tokio::task;
use std::fs::File;
use std::io::Read;
use std::io::Result;
struct FileAsyncRead {
path: String
}
impl AsyncRead for FileAsyncRead {
fn poll_read(self: Pin<&mut Self>, cx: &mut Context<'_>, buf: &mut [u8]) -> Poll<Result<usize>> {
let path = self.path.to_owned();
let buf_len = buf.len();
let mut handle = task::spawn_blocking(move || {
let mut vec = vec![0u8; buf_len];
let mut file = File::open(path).unwrap();
let len = file.read(vec.as_mut_slice());
(vec, len)
});
match Pin::new(&mut handle).poll(cx) {
Poll::Ready(l) => {
let v_l = l.unwrap();
let _c_l = v_l.0.as_slice().read(buf);
Poll::Ready(v_l.1)
}
Poll::Pending => Poll::Pending
}
}
}
The current implementation is creating a new vector of the same size with the outer buf: &mut [u8] each time because of :
`buf` has an anonymous lifetime `'_` but it needs to satisfy a `'static` lifetime requirement
buf: &mut [u8],
| --------- this data with an anonymous lifetime `'_`...
My question is:
Is that possible to avoid the vector creation in spwan_blocking and mutate the buf in poll_read? To avoid vector allocation as well as copying?
Is there a better way to express this "wrapper" logic instead of spawn_blocking as well as Pin::new(&mut handle).poll(cx)? What's the more idiomatic way to do this in Rust?
Something is odd about this code:
If this code is called once, it will likely return Poll::Pending, because spawn_blocking takes time to even start a task.
If this is called multiple times, then it creates multiple unrelated tasks reading the same part of the file and potentially ignoring the result due to (1), which is probably not what you want.
What you could do to fix this is to remember the task inside the FileAsyncRead struct first time you create it, and then on the next call only start a new task if needed, and poll the existing task.
With this API you have it doesn't seem possible to avoid double buffering, because since your API is blocking, and the ReadBuf buffer is not shared, you need to do a blocking read into some other buffer, and then copy the data over when a new non-blocking call poll_read() arrives.

impl Stream cannot be unpinned

I'm trying to get data using crates_io_api.
I attempted to get data from a stream, but
I can not get it to work.
AsyncClient::all_crates returns an impl Stream. How do I get data from it? It would be helpful if you provide code.
I checked out the async book but it didn't work.
Thank you.
Here's my current code.
use crates_io_api::{AsyncClient, Error};
use futures::stream::StreamExt;
async fn get_all(query: Option<String>) -> Result<crates_io_api::Crate, Error> {
// Instantiate the client.
let client = AsyncClient::new(
"test (test#test.com)",
std::time::Duration::from_millis(10000),
)?;
let stream = client.all_crates(query);
// what should I do after?
// ERROR: `impl Stream cannot be unpinned`
while let Some(item) = stream.next().await {
// ...
}
}
This looks like a mistake on the side of crates_io_api. Getting the next element of a Stream requires that the Stream is Unpin:
pub fn next(&mut self) -> Next<'_, Self> where
Self: Unpin,
Because Next stores a reference to Self, you must guarantee that Self is not moved during the process, or risk pointer invalidation. This is what the Unpin marker trait represents. crates_io_api does not provide this guarantee (although they could, and should be), so you must make it yourself. To convert a !Unpin type to a Unpin type, you can pin it to a heap allocation:
use futures::stream::StreamExt;
let stream = client.all_crates(query).boxed();
// boxed simply calls Box::pin
while let Some(elem) = stream.next() { ... }
Or you can pin it to the stack with the pin_mut!/pin! macro:
let stream = client.all_crates(query);
futures::pin_mut!(stream);
while let Some(elem) = stream.next() { ... }
Alternatively, you could use a combinator that does not require Unpin such as for_each:
stream.for_each(|elem| ...)

Why do I get an error that "Sync is not satisfied" when moving self, which contains an Arc, into a new thread?

I have a struct which holds an Arc<Receiver<f32>> and I'm trying to add a method which takes ownership of self, and moves the ownership into a new thread and starts it. However, I'm getting the error
error[E0277]: the trait bound `std::sync::mpsc::Receiver<f32>: std::marker::Sync` is not satisfied
--> src/main.rs:19:9
|
19 | thread::spawn(move || {
| ^^^^^^^^^^^^^ `std::sync::mpsc::Receiver<f32>` cannot be shared between threads safely
|
= help: the trait `std::marker::Sync` is not implemented for `std::sync::mpsc::Receiver<f32>`
= note: required because of the requirements on the impl of `std::marker::Send` for `std::sync::Arc<std::sync::mpsc::Receiver<f32>>`
= note: required because it appears within the type `Foo`
= note: required because it appears within the type `[closure#src/main.rs:19:23: 22:10 self:Foo]`
= note: required by `std::thread::spawn`
If I change the struct to hold an Arc<i32> instead, or just a Receiver<f32>, it compiles, but not with a Arc<Receiver<f32>>. How does this work? The error doesn't make sense to me as I'm not trying to share it between threads (I'm moving it, not cloning it).
Here is the full code:
use std::sync::mpsc::{channel, Receiver, Sender};
use std::sync::Arc;
use std::thread;
pub struct Foo {
receiver: Arc<Receiver<f32>>,
}
impl Foo {
pub fn new() -> (Foo, Sender<f32>) {
let (sender, receiver) = channel::<f32>();
let sink = Foo {
receiver: Arc::new(receiver),
};
(sink, sender)
}
pub fn run_thread(self) -> thread::JoinHandle<()> {
thread::spawn(move || {
println!("Thread spawned by 'run_thread'");
self.run(); // <- This line gives the error
})
}
fn run(mut self) {
println!("Executing 'run'")
}
}
fn main() {
let (example, sender) = Foo::new();
let handle = example.run_thread();
handle.join();
}
How does this work?
Let's check the requirements of thread::spawn again:
pub fn spawn<F, T>(f: F) -> JoinHandle<T>
where
F: FnOnce() -> T,
F: Send + 'static, // <-- this line is important for us
T: Send + 'static,
Since Foo contains an Arc<Receiver<_>>, let's check if and how Arc implements Send:
impl<T> Send for Arc<T>
where
T: Send + Sync + ?Sized,
So Arc<T> implements Send if T implements Send and Sync. And while Receiver implements Send, it does not implement Sync.
So why does Arc have such strong requirements for T? T also has to implement Send because Arc can act like a container; if you could just hide something that doesn't implement Send in an Arc, send it to another thread and unpack it there... bad things would happen. The interesting part is to see why T also has to implement Sync, which is apparently also the part you are struggling with:
The error doesn't make sense to me as I'm not trying to share it between threads (I'm moving it, not cloning it).
The compiler can't know that the Arc in Foo is in fact not shared. Consider if you would add a #[derive(Clone)] to Foo later (which is possible without a problem):
fn main() {
let (example, sender) = Foo::new();
let clone = example.clone();
let handle = example.run_thread();
clone.run();
// oopsie, now the same `Receiver` is used from two threads!
handle.join();
}
In the example above there is only one Receiver which is shared between threads. And this is no good, since Receiver does not implement Sync!
To me this code raises the question: why the Arc in the first place? As you noticed, without the Arc, it works without a problem: you clearly state that Foo is the only owner of the Receiver. And if you are "not trying to share [the Receiver]" anyway, there is no point in having multiple owners.

Resources