How can I pipe bytes from a reader to a writer with an interceptor? - io

How can I perform something like std::io::copy(&mut from, &mut to);, but with an interceptor in the middle? So that I can, for instance, perform a SHA-256 hash of the whole stream?

It is common and good practice for byte stream hashers and other raw streamed processors to implement Write, through which the stream can be fed into the process (see e.g. the Hasher type from crypto_hash).
Therefore, a solution to data stream interception would be (1.) ensuring that the processor indeed implements Write, (2.) making a writer type that replicates the writing process to two independent writers at once. I found at least one crate providing this mechanism (broadcast), but implementing this by hand should not be overly complex.
With this available, a new function signature for an intercepted copy can be derived.
fn copy_intercept<I: ?Sized, O: ?Sized, M: ?Sized>(
input: &mut I,
output: &mut O,
intercept: &mut M,
) -> std::io::Result<u64>
where
I: Read,
O: Write,
M: Write,
{
std::io::copy(input, &mut BroadcastWriter::new(output, intercept))
}

Related

Creating a `Pin<Box<[T; N]>>` in Rust when `[T; N]` is too large to be created on the stack

Generalized Question
How can I implement a general function pinned_array_of_default in stable Rust where [T; N] is too large to fit on the stack?
fn pinned_array_of_default<T: Default, const N: usize>() -> Pin<Box<[T; N]>> {
unimplemented!()
}
Alternatively, T can implement Copy if that makes the process easier.
fn pinned_array_of_element<T: Copy, const N: usize>(x: T) -> Pin<Box<[T; N]>> {
unimplemented!()
}
Keeping the solution in safe Rust would have been preferable, but it seems unlikely that it is possible.
Approaches
Initially I was hopping that by implementing Default I might be able to get Default to handle the initial allocation, however it still creates it on the stack so this will not work for large values of N.
let boxed: Box<[T; N]> = Box::default();
let foo = Pin::new(boxed);
I suspect I need to use MaybeUninit to achieve this and there is a Box::new_uninit() function, but it is currently unstable and I would ideally like to keep this within stable Rust. I also somewhat unsure if transmuting Pin<Box<MaybeUninit<B>>> to Pin<Box<B>> could somehow have negative effects on the Pin.
Background
The purpose behind using a Pin<Box<[T; N]>> is to hold a block of pointers where N is some constant factor/multiple of the page size.
#[repr(C)]
#[derive(Copy, Clone)]
pub union Foo<R: ?Sized> {
assigned: NonNull<R>,
next_unused: Option<NonNull<Self>>,
}
Each pointer may or may not be in use at a given point in time. An in-use Foo points to R, and an unused/empty Foo has a pointer to either the next empty Foo in the block or None. A pointer to the first unused Foo in the block is stored separately. When a block is full, a new block is created and then pointer chain of unused positions continues through the next block.
The box needs to be pinned since it will contain self referential pointers as well as outside structs holding pointers into assigned positions in each block.
I know that Foo is wildly unsafe by Rust standards, but the general question of creating a Pin<Box<[T; N]>> still stands
A way to construct a large array on the heap and avoid creating it on the stack is to proxy through a Vec. You can construct the elements and use .into_boxed_slice() to get a Box<[T]>. You can then use .try_into() to convert it to a Box<[T; N]>. And then use .into() to convert it to a Pin<Box<[T; N]>>:
fn pinned_array_of_default<T: Default, const N: usize>() -> Pin<Box<[T; N]>> {
let mut vec = vec![];
vec.resize_with(N, T::default);
let boxed: Box<[T; N]> = match vec.into_boxed_slice().try_into() {
Ok(boxed) => boxed,
Err(_) => unreachable!(),
};
boxed.into()
}
You can optionally make this look more straight-forward if you add T: Clone so that you can do vec![T::default(); N] and/or add T: Debug so you can use .unwrap() or .expect().
See also:
Creating a fixed-size array on heap in Rust

How to apply a series of iterators to data?

I'm tinkering with Rust by building some basic genetics functionality, e.g. read a file with a DNA sequence, transcribe it to RNA, translate it to an amino acid sequence, etc.
I'd like each of these transformations to accept and return iterators. That way I can string them together (like dna.transcribe().traslate()...) and only collect when necessary, so the compiler can optimize the entire chain of transormations. I'm a data scientist coming from Scala/Spark, so this pattern makes a lot of sense, but I'm not sure how to implement it Rust.
I've read this article about returning iterators but the final solution seems to be to use trait objects (with possibly large performance impact), or to hand roll iterators with associated structs (which allows me to return an iterator, yes, but I don't see how it would allow me to write a transformation that also accepts an iterator).
Any general architectural advice here?
(FYI, my code so far is available here, but I feel like I'm not using Rust idiomatically because a. still can't quite get it to compile b. this pattern of lazily chaining operations has led to unexpectedly complex and messy code that only works on Rust nightly.)
Iterator adaptors are meant to do operations which can't easily be expressed otherwise. Your two examples, .translate(), and .transcribe(), given your explanation of them, could be simplified to the following:
dna
.map(|x| x.translate())
.map(|x| x.transcribe())
// or
dna
.map(|x| x.translate().transcribe())
However, if you are intent on designing your own iterator, the following should work:
struct Transcriber<I: Iterator<Item = Dna>> {
inner: I
}
impl<I: Iterator<Item = Dna>> Iterator for Transcriber<I> {
type Item = TranscribedDna;
fn next(&mut self) -> Option<Self::Item> {
self.next().map(|x| x.transcribe())
}
}
// Extension trait to add the `.transcribe` method to existing iterators
trait TranscribeIteratorExt: Iterator<Item = Dna> {
fn transcribe(self) -> Transcriber<Self>;
}
impl<I: Iterator<Item = Dna>> TranscriberIteratorExt for I {
fn transcribe(self) -> Transcriber<Self> {
Transcriber { inner: self }
}
}
Then you can use
dna
.transcribe() // yields TranscribedDna

How do I interpret the signature of read_until and what is AsyncRead + BufRead in Tokio?

I'm trying to understand asynchronous I/O in Rust. The following code is based on a snippet from Katharina Fey's
Jan 2019 talk which works for me:
use futures::future::Future;
use std::io::BufReader;
use tokio::io::*;
fn main() {
let reader = BufReader::new(tokio::io::stdin());
let buffer = Vec::new();
println!("Type something:");
let fut = tokio::io::read_until(reader, b'\n', buffer)
.and_then(move |(stdin, buffer)| {
tokio::io::stdout()
.write_all(&buffer)
.map_err(|e| panic!(e))
})
.map_err(|e| panic!(e));
tokio::run(fut);
}
Before finding that code, I attempted to figure it out from the read_until documentation.
How do I interpret the signature of read_until to use it in a code sample like the one above?
pub fn read_until<A>(a: A, byte: u8, buf: Vec<u8>) -> ReadUntil<A>
where
A: AsyncRead + BufRead,
Specifically, how can I know from reading the documentation, what are the parameters passed into the and_then closure and the expected result?
Parameters to and_then
Unfortunately the standard layout of the Rust documentation makes futures quite hard to follow.
Starting from the read_until documentation you linked, I can see that it returns ReadUntil<A>. I'll click on that to go to the ReadUntil documentation.
This return value is described as:
A future which can be used to easily read the contents of a stream into a vector until the delimiter is reached.
I would expect it to implement the Future trait — and I can see that it does. I would also assume that the Item that the future resolves to is some sort of vector, but I don't know exactly what, so I keep digging:
First I look under "Trait implementations" and find impl<A> Future for ReadUntil<A>
I click the [+] expander
Finally I see the associated type Item = (A, Vec<u8>). This means it's a Future that's going to return a pair of values: the A, so it is presumably giving me back the original reader that I passed in, plus a vector of bytes.
When the future resolves to this tuple, I want to attach some additional processing with and_then. This is part of the Future trait, so I can scroll down further to find that function.
fn and_then<F, B>(self, f: F) -> AndThen<Self, B, F>
where
F: FnOnce(Self::Item) -> B,
B: IntoFuture<Error = Self::Error>,
Self: Sized,
The function and_then is documented as taking two parameters, but self is passed implicitly by the compiler when using dot syntax to chain functions, which tells us that we can write read_until(A, '\n', buffer).and_then(...). The second parameter in the documentation, f: F, becomes the first argument passed to and_then in our code.
I can see that f is a closure because the type F is shown as FnOnce(Self::Item) -> B (which if I click through links to the Rust book closure chapter.
The closure f that is passed in takes Self::Item as the parameter. I just found out that Item is (A, Vec<u8>), so I expect to write something like .and_then(|(reader, buffer)| { /* ... /* })
AsyncRead + BufRead
This is putting constraints on what type of reader can be read from. The created BufReader implements BufRead.
Helpfully, Tokio provides an implementation of AsyncRead for BufReader so we don't have to worry about it, we can just go ahead and use the BufReader.

When is it safe to move a member value out of a pinned future?

I'm writing a future combinator that needs to consume a value that it was provided with. With futures 0.1, Future::poll took self: &mut Self, which effectively meant that my combinator contained an Option and I called Option::take on it when the underlying future resolves.
The Future::poll method in the standard library takes self: Pin<&mut Self> instead, so I've been reading about the guarantees required in order to safely make use of Pin.
From the pin module documentation on the Drop guarantee (emphasis mine):
Concretely, for pinned data you have to maintain the invariant that its memory will not get invalidated from the moment it gets pinned until when drop is called. Memory can be invalidated by deallocation, but also by replacing a Some(v) by None, or calling Vec::set_len to "kill" some elements off of a vector.
And Projections and Structural Pinning (emphasis mine):
You must not offer any other operations that could lead to data being moved out of the fields when your type is pinned. For example, if the wrapper contains an Option<T> and there is a take-like operation with type fn(Pin<&mut Wrapper<T>>) -> Option<T>, that operation can be used to move a T out of a pinned Wrapper<T> -- which means pinning cannot be structural.
However, the existing Map combinator calls Option::take on a member value when the underlying future has resolved:
fn poll(mut self: Pin<&mut Self>, cx: &mut Context<'_>) -> Poll<T> {
match self.as_mut().future().poll(cx) {
Poll::Pending => Poll::Pending,
Poll::Ready(output) => {
let f = self.f().take()
.expect("Map must not be polled after it returned `Poll::Ready`");
Poll::Ready(f(output))
}
}
}
The f method is generated by the unsafe_unpinned macro and looks roughly like:
fn f<'a>(self: Pin<&'a mut Self>) -> &'a mut Option<F> {
unsafe { &mut Pin::get_unchecked_mut(self).f }
}
It appears that Map violates the requirements that are described in the pin documentation, but I believe that the authors of the Map combinator know what they are doing and that this code is safe.
What logic allows them to perform this operation in a safe manner?
It is all about structural pinning.
First, I will use the syntax P<T> to mean something like impl Deref<Target = T> — some (smart) pointer type P that Deref::derefs to a T. Pin only "applies" to / makes sense on such (smart) pointers.
Let's say we have:
struct Wrapper<Field> {
field: Field,
}
The initial question is
Can we get a Pin<P<Field>> from a Pin<P<Wrapper<Field>>>, by "projecting" our Pin<P<_>> from the Wrapper to its field?
This requires the basic projection P<Wrapper<Field>> -> P<Field>, which is only possible for:
shared references (P<T> = &T). This is not a very interesting case given that Pin<P<T>> always derefs to T.
unique references (P<T> = &mut T).
I will use the syntax &[mut] T for this type of projection.
The question now becomes:
Can we go from Pin<&[mut] Wrapper<Field>> to Pin<&[mut] Field>?
The point that may be unclear from the documentation is that it is up to the creator of Wrapper to decide!
There are two possible choices for the library author for each struct field.
There is a structural Pin projection to that field
For instance, the pin_utils::unsafe_pinned! macro is used to define such a projection (Pin<&mut Wrapper<Field>> -> Pin<&mut Field>).
For the Pin projection to be sound:
the whole struct must only implement Unpin when all the fields for which there is a structural Pin projection implement Unpin.
no implementation is allowed to use unsafe to move such fields out of a Pin<&mut Wrapper<Field>> (or Pin<&mut Self> when Self = Wrapper<Field>). For instance, Option::take() is forbidden.
the whole struct may only implement Drop if Drop::drop does not move any of the fields for which there is a structural projection.
the struct cannot be #[repr(packed)] (a corollary of the previous item).
In your given future::Map example, this is the case of the future field of the Map struct.
There is no structural Pin projection to that field
For instance, the pin_utils::unsafe_unpinned! macro is used to define such a projection (Pin<&mut Wrapper<Field>> -> &mut Field).
In this case, that field is not considered pinned by a Pin<&mut Wrapper<Field>>.
whether Field is Unpin or not does not matter.
implementations are allowed to use unsafe to move such fields out of a Pin<&mut Wrapper<Field>>. For instance, Option::take() is allowed.
Drop::drop is also allowed to move such fields,
In your given future::Map example, this is the case of the f field of the Map struct.
Example of both types of projection
impl<Fut, F> Map<Fut, F> {
unsafe_pinned!(future: Fut); // pin projection -----+
unsafe_unpinned!(f: Option<F>); // not pinned --+ |
// | |
// ... | |
// | |
fn poll (mut self: Pin<&mut Self>, cx: &mut Context<'_>) -> Poll<T> {
// | |
match self.as_mut().future().poll(cx) { // <----+ required here
Poll::Pending => Poll::Pending, // |
Poll::Ready(output) => { // |
let f = self.f().take() // <--------+ allows this
edit: This answer is incorrect. It remains here for posterity.
Let's begin by recalling why Pin was introduced in the first place: we want to statically ensure that self-referential futures cannot be moved, thus invalidating their internal references.
With that in mind, let's take a look at the definition of Map.
pub struct Map<Fut, F> {
future: Fut,
f: Option<F>,
}
Map has two fields, the first one stores a future, the second stores a closure which maps the result of that future to another value. We wish to support storing self-referential types directly in future without placing them behind a pointer. This means that if Fut is a self-referential type, Map cannot be moved once it is constructed. That is why we must use Pin<&mut Map> as the receiver for Future::poll. If a normal mutable reference to a Map containing a self-referential future was ever exposed to an implementor of Future, users could cause UB using only safe code by causing the Map to be moved using mem::replace.
However, we don't need to support storing self-referential types in f. If we assume that the self-referential part of a Map is wholly contained in future, we can freely modify f as long as we don't allow future to be moved.
While a self-referential closure would be very unusual, the assumption that f be safe to move (which is equivalent to F: Unpin) is not explicitly stated anywhere. However, we still move the value in f in Future::poll by calling take! I think this is indeed a bug, but I'm not 100% sure. I think the f() getter should require F: Unpin which would mean Map can only implement Future when the closure argument is safe to be moved from behind a Pin.
It's very possible that I'm overlooking some subtleties in the pin API here, and the implementation is indeed safe. I'm still wrapping my head around it as well.

How to take a subslice of an Arc<[T]>

fn subslice(a: Arc<[T]>, begin: usize, end: usize) -> Arc<[T]> {
Arc::new(a[begin..end])
}
The above "obvious implementation" of the subslicing operation for Arc<[T]> does not work because a[begin..end] has type [T], which is unsized. Arc<T> has the curious property that the type itself does not require T: Sized, but the constructor Arc::new does, so I'm at a loss for how to construct this subslice.
You can't.
To explain why, let's look at what Arc actually is under the covers.
pub struct Arc<T: ?Sized> {
ptr: Shared<ArcInner<T>>,
}
Shared<T> is an internal wrapper type that essentially amounts to "*const T, but can't be zero"; so it's basically a &T without a lifetime. This means that you can't adjust the slice at this level; if you did, you'd end up trying to point to an ArcInner that doesn't exist. Thus, if this is possible, it must involve some manipulation of the ArcInner.
ArcInner<T> is defined as follows:
struct ArcInner<T: ?Sized> {
strong: atomic::AtomicUsize,
weak: atomic::AtomicUsize,
data: T,
}
strong and weak are just the number of strong and weak handles to this allocation respectively. data is the actual contents of the allocation, stored inline. And that's the problem.
In order for your code to work as you want, Arc would not only have to refer to data by another pointer (rather than storing it inline), but it would also have to store the reference counts and data in different places, so that you could take a slice of the data, but retain the same reference counts.
So you can't do what you're asking.
One thing you can do instead is to store the slicing information alongside the Arc. The owning_ref crate has an example that does exactly this.

Resources