impl Stream cannot be unpinned - rust

I'm trying to get data using crates_io_api.
I attempted to get data from a stream, but
I can not get it to work.
AsyncClient::all_crates returns an impl Stream. How do I get data from it? It would be helpful if you provide code.
I checked out the async book but it didn't work.
Thank you.
Here's my current code.
use crates_io_api::{AsyncClient, Error};
use futures::stream::StreamExt;
async fn get_all(query: Option<String>) -> Result<crates_io_api::Crate, Error> {
// Instantiate the client.
let client = AsyncClient::new(
"test (test#test.com)",
std::time::Duration::from_millis(10000),
)?;
let stream = client.all_crates(query);
// what should I do after?
// ERROR: `impl Stream cannot be unpinned`
while let Some(item) = stream.next().await {
// ...
}
}

This looks like a mistake on the side of crates_io_api. Getting the next element of a Stream requires that the Stream is Unpin:
pub fn next(&mut self) -> Next<'_, Self> where
Self: Unpin,
Because Next stores a reference to Self, you must guarantee that Self is not moved during the process, or risk pointer invalidation. This is what the Unpin marker trait represents. crates_io_api does not provide this guarantee (although they could, and should be), so you must make it yourself. To convert a !Unpin type to a Unpin type, you can pin it to a heap allocation:
use futures::stream::StreamExt;
let stream = client.all_crates(query).boxed();
// boxed simply calls Box::pin
while let Some(elem) = stream.next() { ... }
Or you can pin it to the stack with the pin_mut!/pin! macro:
let stream = client.all_crates(query);
futures::pin_mut!(stream);
while let Some(elem) = stream.next() { ... }
Alternatively, you could use a combinator that does not require Unpin such as for_each:
stream.for_each(|elem| ...)

Related

Owned variable seems to not issue a borrow that lives as long as required by a serde deserialize lifetime

I am trying to write a trait that allows for gzip encode/decode of arbitrary (de)serializable structs. My primary use case is to persist some stateful struct on disk via a clean API. To that end, any time a struct S implements serde's Serialize and Deserialize, and our trait is in scope, a gzipped + serialized copy of it should be read/written by/to anything that is Read/Write on demand.
For example:
A trait that describes the API for reading/writing of some (de)serializable struct.
use flate2::read::GzDecoder;
use flate2::write::GzEncoder;
use serde::{Serialize, Deserialize};
use rmp_serde::{Serializer};
use std::io::{Read, Write};
pub type Result<T, E = std::io::Error> = std::result::Result<T, E>;
pub trait ReadWriteState<S: Serialize + Deserialize> {
/// Write the given persistent state to a stream.
fn write_state(&mut self, state: &S) -> Result<usize>;
/// Write the given persistent state to a stream.
fn read_state(&mut self) -> Result<S>;
}
Blanket implementation of the ReadWriteState for (de)serializable states and via anything that is std::io::Read and std::io::Write simultaneously.
impl<S, T> ReadWriteState<S> for T
where
S: Serialize + Deserialize, // This doesn't work because of lifetimes in serde Deserializer.
T: Read + Write
{
/// Serialize the state into messagepack and then
/// GzEncode it before sending to the output stream.
fn write_state(&mut self, state: &S) -> Result<usize> {
let mut buf = Vec::new();
state
.serialize(&mut Serializer::new(&mut buf))
.unwrap_or_else(|_| panic!("Could not serialize data."));
let mut e = GzEncoder::new(Vec::new(), Compression::default());
e.write_all(&buf)?;
let compressed_bytes = e.finish()?;
let length = compressed_bytes.len();
self.write_all(&compressed_bytes)?;
}
/// Decode the gzipped stream into msgpack and then further deserialize it into the generic state struct.
fn read_state(&mut self) -> Result<S, serde_json::Error> {
let mut decoder = GzDecoder::new(self);
let mut buf = Vec::new(); // The buf is created here so it is owned by this function scope.
decoder.read_to_end(&mut buf).expect("Couldn't read the gzipped stream to end.");
serde_json::from_slice::<'de, S>(&buf) // (*)
// This is what I expect should work fine
// but the borrow checker complains that
// `buf` doesn't live long enough.
}
}
A sample stateful struct that is (de)serializable by serde_derive macros.
// Now suppose we have some struct that is Serialize as
// well as Deserialize.
#[derive(Clone, Serialize, Deserialize, PartialEq, Eq)]
pub struct FooMap<K, V>
where
// For a moment, suppose Deserialize doesn't need a lifetime.
// To compile, it should look more like Deserialize<'a> for some defined
// lifetime 'a, but let's ignore that for a moment.
K: Clone + Hash + Eq + Serialize + Deserialize,
V: Eq + Serialize + Deserialize
{
pub key: K,
pub value: V
}
The convenient disk persistence API for our FooMap in action.
// Now I should be able to write gzipped + messagepacked FooMap to file.
pub fn main() {
let foomap = FooMap {
key: "color",
value: "blue"
};
let mut file = std::fs::File::create("/tmp/foomap.gz").expect("Could not create file.");
let bytes_written = file.write_state(&foomap).expect("Could not write state.");
println!("{} bytes written to /tmp/foomap.gz", bytes_written);
let mut file = std::fs::File::open("/tmp/foomap.gz").expect("Could not open file.");
let recovered: FooMap<&str, &str> = file.read_state().expect("Could not recover FooMap.");
assert_eq!(foomap, recovered);
}
You may notice a few problems with the code above. The one that I'm aware of is the lack of lifetime annotations for Deserialize when used as a trait bound. Serde has a beautiful write up regarding Deserializer lifetimes.
I've put together a Playground that tries to address the lifetime issue and by doing so, I've come across another compiler error (*) that seems pretty weird to me, given this situation.
I'm really confused at what point did I end up taking a wrong path and how to correct it. I would really appreciate if someone could help me understand the mistakes I've made in this implementation and how could I prevent it from happening again.
Your code compiles if you use DeserializeOwned as a bound instead of Deserialize<'de>.
serde's lifetime page has this to say about DeserializeOwned (highlights by me):
This means "T can be deserialized from any lifetime." The callee gets to decide what lifetime. Usually this is because the data that is being deserialized from is going to be thrown away before the function returns, so T must not be allowed to borrow from it. For example a function that accepts base64-encoded data as input, decodes it from base64, deserializes a value of type T, then throws away the result of base64 decoding. [...]
This exactly matches your use case, since buf is dropped before the function returns.

AsyncRead wrapper over sync read

I met this problem while implementing AsyncRead over a synchronized read to adjust to the async world in Rust.
The sync read implementation I'm handling is a wrapper over a raw C sync implementation, much like the std::fs::File::read; therefore I would use std::io::Read for simplicity hereafter.
Here's the code:
use futures::{AsyncRead, Future};
use std::task::{Context, Poll};
use std::pin::Pin;
use tokio::task;
use std::fs::File;
use std::io::Read;
use std::io::Result;
struct FileAsyncRead {
path: String
}
impl AsyncRead for FileAsyncRead {
fn poll_read(self: Pin<&mut Self>, cx: &mut Context<'_>, buf: &mut [u8]) -> Poll<Result<usize>> {
let path = self.path.to_owned();
let buf_len = buf.len();
let mut handle = task::spawn_blocking(move || {
let mut vec = vec![0u8; buf_len];
let mut file = File::open(path).unwrap();
let len = file.read(vec.as_mut_slice());
(vec, len)
});
match Pin::new(&mut handle).poll(cx) {
Poll::Ready(l) => {
let v_l = l.unwrap();
let _c_l = v_l.0.as_slice().read(buf);
Poll::Ready(v_l.1)
}
Poll::Pending => Poll::Pending
}
}
}
The current implementation is creating a new vector of the same size with the outer buf: &mut [u8] each time because of :
`buf` has an anonymous lifetime `'_` but it needs to satisfy a `'static` lifetime requirement
buf: &mut [u8],
| --------- this data with an anonymous lifetime `'_`...
My question is:
Is that possible to avoid the vector creation in spwan_blocking and mutate the buf in poll_read? To avoid vector allocation as well as copying?
Is there a better way to express this "wrapper" logic instead of spawn_blocking as well as Pin::new(&mut handle).poll(cx)? What's the more idiomatic way to do this in Rust?
Something is odd about this code:
If this code is called once, it will likely return Poll::Pending, because spawn_blocking takes time to even start a task.
If this is called multiple times, then it creates multiple unrelated tasks reading the same part of the file and potentially ignoring the result due to (1), which is probably not what you want.
What you could do to fix this is to remember the task inside the FileAsyncRead struct first time you create it, and then on the next call only start a new task if needed, and poll the existing task.
With this API you have it doesn't seem possible to avoid double buffering, because since your API is blocking, and the ReadBuf buffer is not shared, you need to do a blocking read into some other buffer, and then copy the data over when a new non-blocking call poll_read() arrives.

Writing to a file or String in Rust

TL;DR: I want to implement trait std::io::Write that outputs to a memory buffer, ideally String, for unit-testing purposes.
I must be missing something simple.
Similar to another question, Writing to a file or stdout in Rust, I am working on a code that can work with any std::io::Write implementation.
It operates on structure defined like this:
pub struct MyStructure {
writer: Box<dyn Write>,
}
Now, it's easy to create instance writing to either a file or stdout:
impl MyStructure {
pub fn use_stdout() -> Self {
let writer = Box::new(std::io::stdout());
MyStructure { writer }
}
pub fn use_file<P: AsRef<Path>>(path: P) -> Result<Self> {
let writer = Box::new(File::create(path)?);
Ok(MyStructure { writer })
}
pub fn printit(&mut self) -> Result<()> {
self.writer.write(b"hello")?;
Ok(())
}
}
But for unit testing, I also need to have a way to run the business logic (here represented by method printit()) and trap its output, so that its content can be checked in the test.
I cannot figure out how to implement this. This playground code shows how I would like to use it, but it does not compile because it breaks borrowing rules.
// invalid code - does not compile!
fn main() {
let mut buf = Vec::new(); // This buffer should receive output
let mut x2 = MyStructure { writer: Box::new(buf) };
x2.printit().unwrap();
// now, get the collected output
let output = std::str::from_utf8(buf.as_slice()).unwrap().to_string();
// here I want to analyze the output, for instance in unit-test asserts
println!("Output to string was {}", output);
}
Any idea how to write the code correctly? I.e., how to implement a writer on top of a memory structure (String, Vec, ...) that can be accessed afterwards?
Something like this does work:
let mut buf = Vec::new();
{
// Use the buffer by a mutable reference
//
// Also, we're doing it inside another scope
// to help the borrow checker
let mut x2 = MyStructure { writer: Box::new(&mut buf) };
x2.printit().unwrap();
}
let output = std::str::from_utf8(buf.as_slice()).unwrap().to_string();
println!("Output to string was {}", output);
However, in order for this to work, you need to modify your type and add a lifetime parameter:
pub struct MyStructure<'a> {
writer: Box<dyn Write + 'a>,
}
Note that in your case (where you omit the + 'a part) the compiler assumes that you use 'static as the lifetime of the trait object:
// Same as your original variant
pub struct MyStructure {
writer: Box<dyn Write + 'static>
}
This limits the set of types which could be used here, in particular, you cannot use any kinds of borrowed references. Therefore, for maximum genericity we have to be explicit here and define a lifetime parameter.
Also note that depending on your use case, you can use generics instead of trait objects:
pub struct MyStructure<W: Write> {
writer: W
}
In this case the types are fully visible at any point of your program, and therefore no additional lifetime annotation is needed.

Discerning lifetimes understanding the move keyword

I've been playing around with AudioUnit via Rust and the Rust library coreaudio-rs. Their example seems to work well:
extern crate coreaudio;
use coreaudio::audio_unit::{AudioUnit, IOType};
use coreaudio::audio_unit::render_callback::{self, data};
use std::f32::consts::PI;
struct Iter {
value: f32,
}
impl Iterator for Iter {
type Item = [f32; 2];
fn next(&mut self) -> Option<[f32; 2]> {
self.value += 440.0 / 44_100.0;
let amp = (self.value * PI * 2.0).sin() as f32 * 0.15;
Some([amp, amp])
}
}
fn main() {
run().unwrap()
}
fn run() -> Result<(), coreaudio::Error> {
// 440hz sine wave generator.
let mut samples = Iter { value: 0.0 };
//let buf: Vec<[f32; 2]> = vec![[0.0, 0.0]];
//let mut samples = buf.iter();
// Construct an Output audio unit that delivers audio to the default output device.
let mut audio_unit = try!(AudioUnit::new(IOType::DefaultOutput));
// Q: What is this type?
let callback = move |args| {
let Args { num_frames, mut data, .. } = args;
for i in 0..num_frames {
let sample = samples.next().unwrap();
for (channel_idx, channel) in data.channels_mut().enumerate() {
channel[i] = sample[channel_idx];
}
}
Ok(())
};
type Args = render_callback::Args<data::NonInterleaved<f32>>;
try!(audio_unit.set_render_callback(callback));
try!(audio_unit.start());
std::thread::sleep(std::time::Duration::from_millis(30000));
Ok(())
}
However, changing it up a little bit to load via a buffer doesn't work as well:
extern crate coreaudio;
use coreaudio::audio_unit::{AudioUnit, IOType};
use coreaudio::audio_unit::render_callback::{self, data};
fn main() {
run().unwrap()
}
fn run() -> Result<(), coreaudio::Error> {
let buf: Vec<[f32; 2]> = vec![[0.0, 0.0]];
let mut samples = buf.iter();
// Construct an Output audio unit that delivers audio to the default output device.
let mut audio_unit = try!(AudioUnit::new(IOType::DefaultOutput));
// Q: What is this type?
let callback = move |args| {
let Args { num_frames, mut data, .. } = args;
for i in 0..num_frames {
let sample = samples.next().unwrap();
for (channel_idx, channel) in data.channels_mut().enumerate() {
channel[i] = sample[channel_idx];
}
}
Ok(())
};
type Args = render_callback::Args<data::NonInterleaved<f32>>;
try!(audio_unit.set_render_callback(callback));
try!(audio_unit.start());
std::thread::sleep(std::time::Duration::from_millis(30000));
Ok(())
}
It says, correctly so, that buf only lives until the end of run and does not live long enough for the audio unit—which makes sense, because "borrowed value must be valid for the static lifetime...".
In any case, that doesn't bother me; I can modify the iterator to load and read from the buffer just fine. However, it does raise some questions:
Why does the Iter { value: 0.0 } have the 'static lifetime?
If it doesn't have the 'static lifetime, why does it say the borrowed value must be valid for the 'static lifetime?
If it does have the 'static lifetime, why? It seems like it would be on the heap and closed on by callback.
I understand that the move keyword allows moving inside the closure, which doesn't help me understand why it interacts with lifetimes. Why can't it move the buffer? Do I have to move both the buffer and the iterator into the closure? How would I do that?
Over all this, how do I figure out the expected lifetime without trying to be a compiler myself? It doesn't seem like guessing and compiling is always a straightforward method to resolving these issues.
Why does the Iter { value: 0.0 } have the 'static lifetime?
It doesn't; only references have lifetimes.
why does it say the borrowed value must be valid for the 'static lifetime
how do I figure out the expected lifetime without trying to be a compiler myself
Read the documentation; it tells you the restriction:
fn set_render_callback<F, D>(&mut self, f: F) -> Result<(), Error>
where
F: FnMut(Args<D>) -> Result<(), ()> + 'static, // <====
D: Data
This restriction means that any references inside of F must live at least as long as the 'static lifetime. Having no references is also acceptable.
All type and lifetime restrictions are expressed at the function boundary — this is a hard rule of Rust.
I understand that the move keyword allows moving inside the closure, which doesn't help me understand why it interacts with lifetimes.
The only thing that the move keyword does is force every variable directly used in the closure to be moved into the closure. Otherwise, the compiler tries to be conservative and move in references/mutable references/values based on the usage inside the closure.
Why can't it move the buffer?
The variable buf is never used inside the closure.
Do I have to move both the buffer and the iterator into the closure? How would I do that?
By creating the iterator inside the closure. Now buf is used inside the closure and will be moved:
let callback = move |args| {
let mut samples = buf.iter();
// ...
}
It doesn't seem like guessing and compiling is always a straightforward method to resolving these issues.
Sometimes it is, and sometimes you have to think about why you believe the code to be correct and why the compiler states it isn't and come to an understanding.

futures-rs using Stream combinators on `BoxStream`s

Using the futures-rs library, I've encountered a situation where a stream needs to be mapped through an indeterminate number of other streams before being returned to the user. Since the exact type of the output stream is unknown at the end of this operation, I've been using a BoxStream trait object while storing the stream in a struct and when returning it.
Although this approach works fine, it has the unfortunate side effect of causing the inner Stream object to be unsized. This is a problem because every one of the stream combinators require Self: Sized in their signatures meaning that I can't even wait() on the returned BoxStream in order to convert it into a blocking iterator.
Here's an example of a situation that could lead to this issue:
struct Server {
receiver: Option<Box<Stream<Item = usize, Error = ()> + Send>>,
}
impl Server {
pub fn new() -> Server {
let (tx, rx) = channel(0);
// do things with the tx (subscribe to tcp socket, connect to database, etc.)
Server { receiver: Some(rx.boxed()) }
}
/// Maps the inner `Receiver` through another stream, essentially duplicating it.
pub fn get_stream(&mut self) -> Result<Box<Stream<Item = usize, Error = ()> + Send>, ()> {
let (tx, rx) = channel(0);
let strm = self.receiver.take().unwrap();
let mut tx_opt = Some(tx);
let new_strm = strm.map(move |msg| {
// unfortunate workaround needed since `send()` takes `self`
let mut tx = tx_opt.take().unwrap();
tx = tx.send(msg.clone()).wait().unwrap();
tx_opt = Some(tx);
msg
});
simbroker.receiver = Some(new_strm.boxed());
Ok(rx.boxed())
}
}
pub fn main() {
let server = Server::new();
// possible that this may happen 0..n times
let rx: BoxStream<usize, ()> = server.get_stream();
// can't do this since the inner `Stream` trait object isn't `Sized` and `wait()`
// (along with all other stream combinators) requires that in their signatures.
for msg in rx.wait() {
// compiler error here
// ...
}
}
As shown by the above code, BoxStreams are necessary since calling map() on a stream changes its type from Receiver to Map which would make it impossible to store back into the struct. It's impossible to do pretty much anything with the returned BoxStream since it's ?Sized. In fact, the only function that's available for trait-object Streams is poll() which is supposed to never be called outside of a Task.
Is there any way that I can avoid this problem without doing something like returning an enum containing any one of the possible varieties of stream that could possibly occur? Would writing my own struct that implements Stream fix the issue somehow?
As pointed out by #FrancisGagné in a comment, futures-rs declares impl<S: ?Sized + Stream> Stream for Box<S> in the futures::Stream module. In the test in which my code was, I had failed to import Stream so that trait wasn't in scope.
The compiler didn't trigger an error for the lack of the wait() function because it had the unsized issue first.
This was resolved by adding use futures::Stream; to the start of the function.

Resources