AsyncRead wrapper over sync read - rust

I met this problem while implementing AsyncRead over a synchronized read to adjust to the async world in Rust.
The sync read implementation I'm handling is a wrapper over a raw C sync implementation, much like the std::fs::File::read; therefore I would use std::io::Read for simplicity hereafter.
Here's the code:
use futures::{AsyncRead, Future};
use std::task::{Context, Poll};
use std::pin::Pin;
use tokio::task;
use std::fs::File;
use std::io::Read;
use std::io::Result;
struct FileAsyncRead {
path: String
}
impl AsyncRead for FileAsyncRead {
fn poll_read(self: Pin<&mut Self>, cx: &mut Context<'_>, buf: &mut [u8]) -> Poll<Result<usize>> {
let path = self.path.to_owned();
let buf_len = buf.len();
let mut handle = task::spawn_blocking(move || {
let mut vec = vec![0u8; buf_len];
let mut file = File::open(path).unwrap();
let len = file.read(vec.as_mut_slice());
(vec, len)
});
match Pin::new(&mut handle).poll(cx) {
Poll::Ready(l) => {
let v_l = l.unwrap();
let _c_l = v_l.0.as_slice().read(buf);
Poll::Ready(v_l.1)
}
Poll::Pending => Poll::Pending
}
}
}
The current implementation is creating a new vector of the same size with the outer buf: &mut [u8] each time because of :
`buf` has an anonymous lifetime `'_` but it needs to satisfy a `'static` lifetime requirement
buf: &mut [u8],
| --------- this data with an anonymous lifetime `'_`...
My question is:
Is that possible to avoid the vector creation in spwan_blocking and mutate the buf in poll_read? To avoid vector allocation as well as copying?
Is there a better way to express this "wrapper" logic instead of spawn_blocking as well as Pin::new(&mut handle).poll(cx)? What's the more idiomatic way to do this in Rust?

Something is odd about this code:
If this code is called once, it will likely return Poll::Pending, because spawn_blocking takes time to even start a task.
If this is called multiple times, then it creates multiple unrelated tasks reading the same part of the file and potentially ignoring the result due to (1), which is probably not what you want.
What you could do to fix this is to remember the task inside the FileAsyncRead struct first time you create it, and then on the next call only start a new task if needed, and poll the existing task.
With this API you have it doesn't seem possible to avoid double buffering, because since your API is blocking, and the ReadBuf buffer is not shared, you need to do a blocking read into some other buffer, and then copy the data over when a new non-blocking call poll_read() arrives.

Related

Best way to read a raw struct from a file

Background (Skippable)
On linux, the file /var/run/utmp contains several utmp structures, each in raw binary format, following each other in a file. utmp itself is a relatively large (384 bytes on my machine). I am trying to read this file to it's raw data, and them implement checks after the fact that the data makes sense. I'm not new to rust, but this is my first real experience with the unsafe side of things.
Problem Statement
I have a file that contains several c sturct utmps (docs). In rust, I would like to read the entire file into an array of Vec<libc::utmpx>. More specifically, given a reader open to this file, how could I read one struct utmp?
What I have so far
Below are three different implementations of read_raw, which accepts a reader and returns a RawEntry(my alias for struct utmp). Which method is most correct? I am trying to write as performant code as possible, and I am worried that read_raw0 might be slower than the others if it involves memcpys. What is the best/fastest way to accomplish this behavior?
use std::io::Read;
use libc::utmpx as RawEntry;
const RawEntrySize = std::mem::size_of::<RawEntry>();
type RawEntryBuffer = [u8; RawEntrySize];
/// Read a raw utmpx struct
// After testing, this method doesn't work
pub fn read_raw0<R: Read>(reader: &mut R) -> RawEntry {
let mut entry: RawEntry = unsafe { std::mem::zeroed() };
unsafe {
let mut entry_buf = std::mem::transmute::<RawEntry, RawEntryBuffer>(entry);
reader.read_exact(&mut entry_buf[..]);
}
return entry;
}
/// Read a raw utmpx struct
pub fn read_raw1<R: Read>(reader: &mut R) -> RawEntry {
// Worried this could cause alignment issues, or maybe it's okay
// because transmute copies
let mut buffer: RawEntryBuffer = [0; RawEntrySize];
reader.read_exact(&mut buffer[..]);
let entry = unsafe {
std::mem::transmute::<RawEntryBuffer, RawEntry>(buffer)
};
return entry;
}
/// Read a raw utmpx struct
pub fn read_raw2<R: Read>(reader: &mut R) -> RawEntry {
let mut entry: RawEntry = unsafe { std::mem::zeroed() };
unsafe {
let entry_ptr = std::mem::transmute::<&mut RawEntry, *mut u8>(&mut entry);
let entry_slice = std::slice::from_raw_parts_mut(entry_ptr, RawEntrySize);
reader.read_exact(entry_slice);
}
return entry;
}
Note: After more testing, it appears read_raw0 doesn't work. I believe this is because transmute creates a new buffer instead of referencing the struct.
This is what I came up with, which I imagine should be about as fast as it gets to read a single entry. It follows the spirit of your last entry, but avoids the transmute (Transmuting &mut T to *mut u8 can be done with two casts: t as *mut T as *mut u8). Also it uses MaybeUninit instead of zeroed to be a bit more explicit (The assembly is likely the same once optimized). Lastly, the function will be unsafe either way, so we may as well mark it as such and do away with the unsafe blocks.
use std::io::{self, Read};
use std::slice::from_raw_parts_mut;
use std::mem::{MaybeUninit, size_of};
pub unsafe fn read_raw_struct<R: Read, T: Sized>(src: &mut R) -> io::Result<T> {
let mut buffer = MaybeUninit::uninit();
let buffer_slice = from_raw_parts_mut(buffer.as_mut_ptr() as *mut u8, size_of::<T>());
src.read_exact(buffer_slice)?;
Ok(buffer.assume_init())
}

Is there a simpler way to pass a BufReader to a function?

To read the bytes of a PNG file, I want to create a function called read_8_bytes which will read the next 8 bytes in the file each time it's called.
fn main(){
let png = File::open("test.png").expect("1");
let mut png_reader = BufReader::new(png);
let mut byteBuffer: Vec<u8> = vec![0;8];
png_reader.read_exact(&mut byteBuffer).expect("2");
}
This works fine and if I keep calling read_exact from main I can read the next 8 bytes. I tried to create a function to do this and the solution just seems needlessly complicated. I'm wondering if there is a better way.
I thought I have to pass the BufReader to the function, but due to how Rust works this makes things complicated and I end up working out I need to do something like:
fn read_eight_bytes<R: BufRead>(fd: &mut R)
This compiles but I'm not happy because I don't understand why this needed to be done and seems complex. Is there a simple way of having a function I can pass a file descriptor type thing to and have it store the position like in C without having to do this?
Looking at your question, I think you are trying to say that you are confused as to why the <R: BufRead> is necessary or furthermore why this even works.
In your example, this generic is not strictly necessary. One could implement the function you describe like so:
use std::{fs, io};
fn main() -> io::Result<()> {
let mut file = fs::File::open("./path/to/file")?;
let bytes = read_eight_bytes(&mut file)?;
println!("{:?}", bytes);
Ok(())
}
fn read_eight_bytes(file: &mut fs::File) -> io::Result<[u8; 8]> {
use io::Read;
let mut bytes = [0; 8];
file.read_exact(&mut bytes)?;
Ok(bytes)
}
Playground
This is perfectly valid and hopefully should make sense.
But then, why does fn read_eight_bytes<R: BufRead>(file: &mut R) -> [u8; 8] work? First of all, I assume you understand the following concepts:
Generics
Traits
Given an understanding of the above concepts, you should know that this syntax means that the function read_eight_bytes is a generic function with a generic type named R. You should then also understand that the generic has a trait bound, requiring the type R to implement BufRead. And that this function takes a parameter which is a mutable reference to the variable file, which is of the type R.
Now taking a look at the definition of BufRead: we see that it contains several functions. But surprisingly there is no read_exact function! Why does a function like this compile?
use std::{fs, io};
use io::BufRead;
fn main() -> io::Result<()> {
let file = fs::File::open("./path/to/file")?;
let mut reader = io::BufReader::new(file);
let bytes = read_eight_bytes(&mut reader)?;
println!("{:?}", bytes);
Ok(())
}
fn read_eight_bytes<R: BufRead>(reader: &mut R) -> io::Result<[u8; 8]> {
let mut bytes = [0; 8];
reader.read_exact(&mut bytes)?;
Ok(bytes)
}
Playground
Note: I have altered the return type to io::Result<...>. This is considered to be a better practice compared to unwraping every Result.
I have also changed the function call to use a BufReader because BufReader implements BufRead whilst File does not. I will cover the difference a little further below.
The reason this works is because BufRead is a Super Trait. This means that any type that implements BufRead must also implement Read too. And thus it must have the read_exact function!
Given our function never requires the functions on BufRead we could change the trait bound to only require Read:
use std::{fs, io};
use io::Read;
fn main() -> io::Result<()> {
let file = fs::File::open("./path/to/file")?;
let mut reader = io::BufReader::new(file);
let bytes = read_eight_bytes(&mut reader)?;
println!("{:?}", bytes);
Ok(())
}
fn read_eight_bytes<R: Read>(reader: &mut R) -> io::Result<[u8; 8]> {
let mut bytes = [0; 8];
reader.read_exact(&mut bytes)?;
Ok(bytes)
}
Playground
Now here is something interesting about this change. The read_eight_bytes function can now be called in (at least) two different ways:
use std::{fs, io};
use io::Read;
fn main() -> io::Result<()> {
let mut file = fs::File::open("./path/to/file")?;
let bytes = read_eight_bytes(&mut file)?;
println!("{:?}", bytes);
let file = fs::File::open("./path/to/file")?;
let mut reader = io::BufReader::new(file);
let bytes = read_eight_bytes(&mut reader)?;
println!("{:?}", bytes);
Ok(())
}
fn read_eight_bytes<R: Read>(reader: &mut R) -> io::Result<[u8; 8]> {
let mut bytes = [0; 8];
reader.read_exact(&mut bytes)?;
Ok(bytes)
}
Playground
Why is this? This is because both File and BufReader implement the Read trait. And thus can both be used with the read_eight_bytes function!
So then why would someone want to use either File or BufReader over the other?
Well the BufReader documentation explains this:
The BufReader struct adds buffering to any reader.
It can be excessively inefficient to work directly with a Read
instance. For example, every call to read on TcpStream results in a
system call. A BufReader performs large, infrequent reads on the
underlying Read and maintains an in-memory buffer of the results.
BufReader can improve the speed of programs that make small and
repeated read calls to the same file or network socket. It does not
help when reading very large amounts at once, or reading just one or a
few times. It also provides no advantage when reading from a source
that is already in memory, like a Vec.
Now, remember how before we wrote this function just for the File type? The primary reason why one would want to write it with generics would be such that a caller can make the choice presented above. This is common practice in libraries where such a choice really does matter. However, generics come at the cost of increased compile times (when used excessively) and increased code complexity.

Open a single file from a ZIP archive and pass on as Read instance

I am using the zip crate to read data from ZIP archives:
impl<R: Read + Seek> ZipArchive<R> {
pub fn new(reader: R) -> ZipResult<ZipArchive<R>> {...}
pub fn by_name<'a>(&'a mut self, name: &str) -> ZipResult<ZipFile<'a>> {...}
...
}
I need to implement a function that given the name of a ZIP archive and the name of a contained file returns an instance of std::io::Read. Is this possible?
ZipFile does implement Read, but unfortunately it retains a reference to the ZipArchive and I can't find a way to build a struct that takes ownership of both the ZipArchive and ZipFile.
Unfortunately the zip crate requires a self-referential struct for such usage. Self-referential structs are not allowed by the borrow checker, but you can avoid the underlying problem by heap-allocating ZipArchive to prevent it from moving.
Even with the use of Box for heap allocation, the borrow checker still won't accept the resulting code because it doesn't special-case Box, and because it can't prove that some code won't move the object out of the box. To make it compile you'll need to use unsafe transmute to decouple the borrow of ZipFile from the archive. It will be up to you to maintain the invariants: that ZipArchive doesn't move and that ZipFile gets destroyed before ZipArchive. Fortunately the code is short, so it should be easy to review for correctness.
Here is a possible implementation:
pub fn read_zip(file_name: &str, member_name: &str) -> ZipResult<impl std::io::Read> {
struct ZipReader {
// actually has lifetime of `archive`
// declared first so it's droped before `archive`
reader: ZipFile<'static>,
#[allow(dead_code)]
// safety: we must never move out of this box as long as reader is alive
archive: Box<ZipArchive<BufReader<File>>>,
}
impl Read for ZipReader {
fn read(&mut self, buf: &mut [u8]) -> io::Result<usize> {
self.reader.read(buf)
}
}
let file = BufReader::new(File::open(file_name)?);
// Safety: We must never move `archive` out of its Box, and we must destroy
// `reader` before `archive`. The first is ensured by never giving access to
// the box, and the second by the drop order guarantees documented by Rust.
let mut archive = Box::new(ZipArchive::new(file)?);
let reader = archive.by_name(member_name)?;
let reader = unsafe { std::mem::transmute(reader) };
Ok(ZipReader { archive, reader })
}
The above code should be sound even though we lie to the borrow checker about the lifetime of the reference. First, the lie is consistent with the premise of the 'static bound: it is indeed possible to indefinitely extend the lifetime of ZipRead without invalidating the reference. (This is what the borrow checker cannot yet prove on its own.) Secondly, Rust's lifetime analysis never affects code generation, it only validates the code, thus our "lie" cannot cause the code to miscompile.
If you're ok with an external dependency, you can use ouroboros to avoid unsafe (or rather confine it to the code generated by its proc macro). That way the code you write should be sound, providing there are no issues in ouroboros. This is what it would look like:
pub fn read_zip(file_name: &str, member_name: &str) -> ZipResult<impl std::io::Read> {
#[ouroboros::self_referencing]
struct ZipReader {
archive: ZipArchive<BufReader<File>>,
#[borrows(mut archive)]
#[not_covariant]
reader: ZipFile<'this>,
}
impl Read for ZipReader {
fn read(&mut self, buf: &mut [u8]) -> io::Result<usize> {
self.with_reader_mut(|reader| reader.read(buf))
}
}
let file = BufReader::new(File::open(file_name)?);
let archive = ZipArchive::new(file)?;
// ZipReaderBuilder and ZipReaderTryBuilder are generated by ouroboros.
ZipReaderTryBuilder {
archive,
reader_builder: |archive| archive.by_name(member_name),
}
.try_build()
}

impl Stream cannot be unpinned

I'm trying to get data using crates_io_api.
I attempted to get data from a stream, but
I can not get it to work.
AsyncClient::all_crates returns an impl Stream. How do I get data from it? It would be helpful if you provide code.
I checked out the async book but it didn't work.
Thank you.
Here's my current code.
use crates_io_api::{AsyncClient, Error};
use futures::stream::StreamExt;
async fn get_all(query: Option<String>) -> Result<crates_io_api::Crate, Error> {
// Instantiate the client.
let client = AsyncClient::new(
"test (test#test.com)",
std::time::Duration::from_millis(10000),
)?;
let stream = client.all_crates(query);
// what should I do after?
// ERROR: `impl Stream cannot be unpinned`
while let Some(item) = stream.next().await {
// ...
}
}
This looks like a mistake on the side of crates_io_api. Getting the next element of a Stream requires that the Stream is Unpin:
pub fn next(&mut self) -> Next<'_, Self> where
Self: Unpin,
Because Next stores a reference to Self, you must guarantee that Self is not moved during the process, or risk pointer invalidation. This is what the Unpin marker trait represents. crates_io_api does not provide this guarantee (although they could, and should be), so you must make it yourself. To convert a !Unpin type to a Unpin type, you can pin it to a heap allocation:
use futures::stream::StreamExt;
let stream = client.all_crates(query).boxed();
// boxed simply calls Box::pin
while let Some(elem) = stream.next() { ... }
Or you can pin it to the stack with the pin_mut!/pin! macro:
let stream = client.all_crates(query);
futures::pin_mut!(stream);
while let Some(elem) = stream.next() { ... }
Alternatively, you could use a combinator that does not require Unpin such as for_each:
stream.for_each(|elem| ...)

How can I use Stream::map with a function that returns Result?

I've got the following piece of code (see playground):
use futures::{stream, Future, Stream}; // 0.1.25
use std::num::ParseIntError;
fn into_many(i: i32) -> impl Stream<Item = i32, Error = ParseIntError> {
stream::iter_ok(0..i)
}
fn convert_to_string(number: i32) -> Result<String, ParseIntError> {
Ok(number.to_string())
}
fn main() {
println!("start:");
let vec = into_many(10)
.map(|number| convert_to_string(number))
.collect()
.wait()
.unwrap();
println!("vec={:#?}", vec);
println!("finish:");
}
It outputs the following (i.e., Vec<Result<i32, ParseIntError>>):
start:
vec=[
Ok(
"0"
),
Ok(
"1"
),
Ok(
"2"
), ...
Is there any way to make it output a Vec<i32> and if any error happens than immediately stop execution and return from the function (e.g., like this example)?
Note: I do want to use use futures::Stream; // 0.1.25 even if it doesn't make sense for this particular example.
The following code (playground link) as a modification of your current code in your question gets the result you want:
use futures::{stream, Future, Stream}; // 0.1.25
use std::num::ParseIntError;
fn into_many(i: i32) -> impl Stream<Item = i32, Error = ParseIntError> {
stream::iter_ok(0..i)
}
fn convert_to_string(number: i32) -> Result<String, ParseIntError> {
Ok(number.to_string())
}
fn main() {
println!("start:");
let vec: Result<Vec<String>, ParseIntError> = into_many(10)
.map(|number| convert_to_string(number))
.collect()
.wait()
.unwrap()
.into_iter()
.collect();
println!("vec={:#?}", vec);
println!("finish:");
}
Since your current code returned a Vec, we can turn that into an iterator and collect that into the type you want. Type annotations are needed so that collect knows what type to collect the iterator into.
Note that the collect method on the Iterator trait isn't to be confused with the collect method on a Stream.
Finally, while this works, it may not be exactly what you want, since it still waits for all results from the stream to be collected into a vector, before using collect to transform the vector. I don't have experience with futures so not sure how possible this is (it probably is but may require a less neat functional programming style solution).
map with a function that returns Result
Don't do this, that's not when you should use map. Instead, use and_then:
let vec = into_many(10)
.and_then(|number| convert_to_string(number))
.collect()
.wait()
.unwrap();
You should practice with simpler Rust concepts like Option, Result, and iterators before diving into futures. Many concepts transfer over.
See also:
How do I unwrap an arbitrary number of nested Option types?
What is the idiomatic way to handle/unwrap nested Result types?

Resources