How to use actix field stream by two consumers? - rust

I have an actix web service and would like to parse the contents of a multipart field while streaming with async-gcode and in addition store the contents e.g. in a database.
However, I have no clue how to feed in the stream to the Parser and at the same time collect the bytes into a Vec<u8> or a String.
The first problem I face is that field is a stream of actix::web::Bytes and not of u8.
#[post("/upload")]
pub async fn upload_job(
mut payload: Multipart,
) -> Result<HttpResponse, Error> {
let mut contents : Vec<u8> = Vec::new();
while let Ok(Some(mut field)) = payload.try_next().await {
let content_disp = field.content_disposition().unwrap();
match content_disp.get_name().unwrap() {
"file" => {
while let Some(chunk) = field.next().await {
contents.append(&mut chunk.unwrap().to_vec());
// already parse the contents
// and additionally store contents somewhere
}
}
_ => (),
}
}
Ok(HttpResponse::Ok().finish())
}
Any hint or suggestion is very much appreciated.

One of the options is to wrap field in a struct and implement Stream trait for it.
use actix_web::{HttpRequest, HttpResponse, Error};
use futures_util::stream::Stream;
use std::pin::Pin;
use actix_multipart::{Multipart, Field};
use futures::stream::{self, StreamExt};
use futures_util::TryStreamExt;
use std::task::{Context, Poll};
use async_gcode::{Parser, Error as PError};
use bytes::BytesMut;
use std::cell::RefCell;
pub struct Wrapper {
field: Field,
buffer: RefCell<BytesMut>,
index: usize,
}
impl Wrapper {
pub fn new(field: Field, buffer: RefCell<BytesMut>) -> Self {
buffer.borrow_mut().truncate(0);
Wrapper {
field,
buffer,
index: 0
}
}
}
impl Stream for Wrapper {
type Item = Result<u8, PError>;
fn poll_next(
mut self: Pin<&mut Self>,
cx: &mut Context<'_>,
) -> Poll<Option<Result<u8, PError>>> {
if self.index == self.buffer.borrow().len() {
match Pin::new(&mut self.field).poll_next(cx) {
Poll::Ready(Some(Ok(chunk))) => self.buffer.get_mut().extend_from_slice(&chunk),
Poll::Pending => return Poll::Pending,
Poll::Ready(None) => return Poll::Ready(None),
Poll::Ready(Some(Err(_))) => return Poll::Ready(Some(Err(PError::BadNumberFormat/* ??? */)))
};
} else {
let b = self.buffer.borrow()[self.index];
self.index += 1;
return Poll::Ready(Some(Ok(b)));
}
Poll::Ready(None)
}
}
#[post("/upload")]
pub async fn upload_job(
mut payload: Multipart,
) -> Result<HttpResponse, Error> {
while let Ok(Some(field)) = payload.try_next().await {
let content_disp = field.content_disposition().unwrap();
match content_disp.get_name().unwrap() {
"file" => {
let mut contents: RefCell<BytesMut> = RefCell::new(BytesMut::new());
let mut w = Wrapper::new(field, contents.clone());
let mut p = Parser::new(w);
while let Some(res) = p.next().await {
// Do something with results
};
// Do something with the buffer
let a = contents.get_mut()[0];
}
_ => (),
}
}
Ok(HttpResponse::Ok().finish())
}
Copying the Bytes from the Field won't be necessary when
Bytes::try_unsplit will be implemented. (https://github.com/tokio-rs/bytes/issues/287)

The answer from dmitryvm (thanks for your effort) showed me that there are actually two problems. At first, flatten the Bytes into u8's and, secondly, to "split" the stream into a buffer for later storage and the async-gcode parser.
This shows how I solved it:
#[post("/upload")]
pub async fn upload_job(
mut payload: Multipart,
) -> Result<HttpResponse, Error> {
let mut contents : Vec<u8> = Vec::new();
while let Ok(Some(mut field)) = payload.try_next().await {
let content_disp = field.content_disposition().unwrap();
match content_disp.get_name().unwrap() {
"file" => {
let field_stream = field
.map_err(|_| async_gcode::Error::BadNumberFormat) // Translate error
.map_ok(|y| { // Translate Bytes into stream with Vec<u8>
contents.extend_from_slice(&y); // Copy and store for later usage
stream::iter(y).map(Result::<_, async_gcode::Error>::Ok)
})
.try_flatten(); // Flatten the streams of u8's
let mut parser = Parser::new(field_stream);
while let Some(gcode) = parser.next().await {
// Process result from parser
}
}
_ => (),
}
}
Ok(HttpResponse::Ok().finish())
}

Related

What lifetimes and bounds are needed to generalize this async code? [duplicate]

This question already has answers here:
How to fix lifetime error when function returns a serde Deserialize type?
(2 answers)
Why do Rust lifetimes matter when I move values into a spawned Tokio task?
(1 answer)
Closed 1 year ago.
I have this websocket code that uses tokio and serde here:
use async_once::AsyncOnce;
use common_wasm::models::status::{CommandMessage, StatusMessage};
use futures_util::{SinkExt, StreamExt};
use lazy_static::lazy_static;
use std::{collections::VecDeque, net::SocketAddr};
use tokio::{
net::{TcpListener, TcpStream}, sync::{broadcast, mpsc}
};
use tokio_tungstenite::{
accept_async, tungstenite::{Error, Message, Result}
};
use tracing::*;
// https://stackoverflow.com/questions/67650879/rust-lazy-static-with-async-await
lazy_static! {
pub static ref STATUS_REPORTER: AsyncOnce<StatusWs> = AsyncOnce::new(async {
info!("Init lazy static WS");
let server = StatusWs::init("ws://localhost:44444").await;
server
});
}
use StatusMessage as SenderType;
use CommandMessage as ReceiveType;
pub struct StatusWs {
buf: VecDeque<ReceiveType>,
rx_client_msg: mpsc::Receiver<ReceiveType>,
tx_server_msg: broadcast::Sender<SenderType>,
}
impl StatusWs {
pub async fn init(addr: &str) -> StatusWs {
info!("Init Status WS on {}", addr);
let listener = TcpListener::bind(&addr).await.expect("Can't listen");
// Clients producting to server, they use the tx to send and server uses the rx to read
let (tx_client_msg, rx_client_msg) = mpsc::channel::<ReceiveType>(32);
// spmc for server to broadcast status to listeners. Server uses tx to send and client uses rx to read
let (tx_server_msg, _rx_server_msg) = broadcast::channel::<SenderType>(10);
let tx_server_2 = tx_server_msg.clone();
tokio::spawn(async move {
while let Ok((stream, peer)) = listener.accept().await {
info!("Peer address connected: {}", peer);
let tx_client = tx_client_msg.clone();
let rx_server = tx_server_msg.subscribe();
tokio::spawn(async move {
accept_connection(peer, stream, tx_client, rx_server).await;
});
}
});
StatusWs { buf: VecDeque::new(), rx_client_msg, tx_server_msg: tx_server_2 }
}
pub async fn reportinfo(&self, msg: &SenderType) {
let my_msg = msg.clone();
match &self.tx_server_msg.send(my_msg) {
Ok(_size) => {
//trace!("Server Sending OK {}", size)
},
Err(_err) => {
//trace!("Server Sending ERR {:?}", err)
},
}
}
pub async fn next(&mut self) -> Result<Option<ReceiveType>> {
loop {
// If buffer contains data, we can directly return it.
if let Some(data) = self.buf.pop_front() {
return Ok(Some(data));
}
// Fetch new response if buffer is empty.
let response = self.next_response().await?;
// Handle the response, possibly adding to the buffer
self.handle_response(response)?;
}
}
async fn next_response(&mut self) -> Result<ReceiveType> {
loop {
tokio::select! { // TODO don't need select if there's only one thing?
Some(msg) = self.rx_client_msg.recv() => {
return Ok(msg)
},
}
}
}
fn handle_response(&mut self, response: ReceiveType) -> Result<()> {
self.buf.push_back(response);
Ok(())
}
}
async fn accept_connection(peer: SocketAddr, stream: TcpStream, tx_client: mpsc::Sender<ReceiveType>, rx_server: broadcast::Receiver<SenderType>) {
info!("Accepting connection from {}", peer);
if let Err(e) = handle_connection(peer, stream, tx_client, rx_server).await {
match e {
Error::ConnectionClosed | Error::Protocol(_) | Error::Utf8 => error!("Connection closed"),
err => error!("Error processing connection: {}", err),
}
}
}
async fn handle_connection(
_peer: SocketAddr, stream: TcpStream, tx_client: mpsc::Sender<ReceiveType>, mut rx_server: broadcast::Receiver<SenderType>,
) -> Result<()> {
let ws_stream = accept_async(stream).await.expect("Failed to accept");
let (mut ws_sender, mut ws_receiver) = ws_stream.split();
loop {
tokio::select! {
remote_msg = ws_receiver.next() => {
match remote_msg {
Some(msg) => {
let msg = msg?;
match msg {
Message::Text(resptxt) => {
match serde_json::from_str::<ReceiveType>(&resptxt) {
Ok(cmd) => { let _ = tx_client.send(cmd).await; },
Err(err) => error!("Error deserializing: {}", err),
}
},
Message::Close(_) => break,
_ => { },
}
}
None => break,
}
}
Ok(msg) = rx_server.recv() => {
match serde_json::to_string(&msg) {
Ok(txt) => ws_sender.send(Message::Text(txt)).await?,
Err(_) => todo!(),
}
}
}
}
Ok(())
}
The sender and receiver types are simple (simple types all the way down):
use std::{collections::BTreeMap, fmt::Debug};
use serde::{Deserialize, Serialize};
#[derive(Default, Clone, Debug, Serialize, Deserialize)]
pub struct StatusMessage {
pub name: String,
pub entries: BTreeMap<i32, GuiEntry>,
}
#[derive(Clone, Debug, Serialize, Deserialize)]
pub struct CommandMessage {
pub sender: String,
pub entryid: i32,
pub command: GuiValue,
}
Now I want to generalize the code so that I can create a struct that takes some other kind of Sender and Receiver type. Yes, I could just change the aliases, but I want to be able to use the generic type arguments rather than duplicate the whole file. The problem is as I follow the suggestions from the compiler, I end up in a place where I don't know what to do next. It's telling me resptext does not live long enough:
`resptxt` does not live long enough
borrowed value does not live long enoughrust cE0597
status_ws.rs(133, 29): `resptxt` dropped here while still borrowed
status_ws.rs(115, 28): lifetime `'a` defined here
status_ws.rs(129, 39): argument requires that `resptxt` is borrowed for `'a`
Here's what I have thus far:
use async_once::AsyncOnce;
use common_wasm::models::status::{CommandMessage, StatusMessage};
use futures_util::{SinkExt, StreamExt};
use lazy_static::lazy_static;
use serde::{Serialize, Deserialize};
use std::{collections::VecDeque, net::SocketAddr};
use tokio::{
net::{TcpListener, TcpStream}, sync::{broadcast, mpsc}
};
use tokio_tungstenite::{
accept_async, tungstenite::{Error, Message, Result}
};
use tracing::*;
// https://stackoverflow.com/questions/67650879/rust-lazy-static-with-async-await
lazy_static! {
pub static ref STATUS_REPORTER: AsyncOnce<StatusWs<CommandMessage, StatusMessage>> = AsyncOnce::new(async {
info!("Init lazy static WS");
let server = StatusWs::init("ws://localhost:44444").await;
server
});
}
// use StatusMessage as SenderType;
// use CommandMessage as ReceiveType;
pub struct StatusWs<ReceiveType, SenderType> {
buf: VecDeque<ReceiveType>,
rx_client_msg: mpsc::Receiver<ReceiveType>,
tx_server_msg: broadcast::Sender<SenderType>,
}
impl <'a, ReceiveType: Deserialize<'a> + Send, SenderType: Serialize + Clone + Send + Sync> StatusWs <ReceiveType, SenderType> {
pub async fn init(addr: &str) -> StatusWs<ReceiveType, SenderType> {
info!("Init Status WS on {}", addr);
let listener = TcpListener::bind(&addr).await.expect("Can't listen");
// Clients producting to server, they use the tx to send and server uses the rx to read
let (tx_client_msg, rx_client_msg) = mpsc::channel::<ReceiveType>(32);
// spmc for server to broadcast status to listeners. Server uses tx to send and client uses rx to read
let (tx_server_msg, _rx_server_msg) = broadcast::channel::<SenderType>(10);
let tx_server_2 = tx_server_msg.clone();
tokio::spawn(async move {
while let Ok((stream, peer)) = listener.accept().await {
info!("Peer address connected: {}", peer);
let tx_client = tx_client_msg.clone();
let rx_server = tx_server_msg.subscribe();
tokio::spawn(async move {
accept_connection(peer, stream, tx_client, rx_server).await;
});
}
});
StatusWs { buf: VecDeque::new(), rx_client_msg, tx_server_msg: tx_server_2 }
}
pub async fn reportinfo(&self, msg: &SenderType) {
let my_msg = msg.clone();
match &self.tx_server_msg.send(my_msg) {
Ok(_size) => {
//trace!("Server Sending OK {}", size)
},
Err(_err) => {
//trace!("Server Sending ERR {:?}", err)
},
}
}
pub async fn next(&mut self) -> Result<Option<ReceiveType>> {
loop {
// If buffer contains data, we can directly return it.
if let Some(data) = self.buf.pop_front() {
return Ok(Some(data));
}
// Fetch new response if buffer is empty.
let response = self.next_response().await?;
// Handle the response, possibly adding to the buffer
self.handle_response(response)?;
}
}
async fn next_response(&mut self) -> Result<ReceiveType> {
loop {
tokio::select! { // TODO don't need select if there's only one thing?
Some(msg) = self.rx_client_msg.recv() => {
return Ok(msg)
},
}
}
}
fn handle_response(&mut self, response: ReceiveType) -> Result<()> {
self.buf.push_back(response);
Ok(())
}
}
async fn accept_connection<'a, ReceiveType: Deserialize<'a>, SenderType: Clone + Serialize>(peer: SocketAddr, stream: TcpStream, tx_client: mpsc::Sender<ReceiveType>, rx_server: broadcast::Receiver<SenderType>) {
info!("Accepting connection from {}", peer);
if let Err(e) = handle_connection(peer, stream, tx_client, rx_server).await {
match e {
Error::ConnectionClosed | Error::Protocol(_) | Error::Utf8 => error!("Connection closed"),
err => error!("Error processing connection: {}", err),
}
}
}
async fn handle_connection<'a, ReceiveType: Deserialize<'a>, SenderType: Clone + Serialize>(
_peer: SocketAddr, stream: TcpStream, tx_client: mpsc::Sender<ReceiveType>, mut rx_server: broadcast::Receiver<SenderType>,
) -> Result<()> {
let ws_stream = accept_async(stream).await.expect("Failed to accept");
let (mut ws_sender, mut ws_receiver) = ws_stream.split();
loop {
tokio::select! {
remote_msg = ws_receiver.next() => {
match remote_msg {
Some(msg) => {
let msg = msg?;
match msg {
Message::Text(resptxt) => {
match serde_json::from_str::<ReceiveType>(&resptxt) {
Ok(cmd) => { let _ = tx_client.send(cmd).await; },
Err(err) => error!("Error deserializing: {}", err),
}
},
Message::Close(_) => break,
_ => { },
}
}
None => break,
}
}
Ok(msg) = rx_server.recv() => {
match serde_json::to_string(&msg) {
Ok(txt) => ws_sender.send(Message::Text(txt)).await?,
Err(_) => todo!(),
}
}
}
}
Ok(())
}
I think there's some confusion about the necessary lifetimes and bounds, in particular the lifetime on the Deserializer from Serde and the Send/Sync auto trait markers on the message types.
In any case, it seems a bit brute force to just copy the whole original file and change out the aliases, which would definitely work, when it seems there's some sort of useful lesson here.
You should use serde::de::DeserializeOwned instead of Deserialize<'a>.
The Deserialize trait takes a lifetime parameter to support zero-cost deserialization, but you can't take advantage of that since the source, resptxt, is a transient value that isn't persisted anywhere. The DeserializeOwned trait can be used to constrain that the deserialized type does not keep references to the source and can therefore be used beyond it.
After fixing that, you'll get errors that ReceiveType and SenderType must be 'static to be used in a tokio::spawn'd task. Adding that constraint finally makes your code compile.
See the full compiling code on the playground for brevity.

How to print a response body in actix_web middleware?

I'd like to write a very simple middleware using actix_web framework but it's so far beating me on every front.
I have a skeleton like this:
let result = actix_web::HttpServer::new(move || {
actix_web::App::new()
.wrap_fn(move |req, srv| {
srv.call(req).map(move|res| {
println!("Got response");
// let s = res.unwrap().response().body();
// ???
res
})
})
})
.bind("0.0.0.0:8080")?
.run()
.await;
and I can access ResponseBody type via res.unwrap().response().body() but I don't know what can I do with this.
Any ideas?
This is an example of how I was able to accomplish this with 4.0.0-beta.14:
use std::cell::RefCell;
use std::pin::Pin;
use std::rc::Rc;
use std::collections::HashMap;
use std::str;
use erp_contrib::{actix_http, actix_web, futures, serde_json};
use actix_web::dev::{Service, ServiceRequest, ServiceResponse, Transform};
use actix_web::{HttpMessage, body, http::StatusCode, error::Error ,HttpResponseBuilder};
use actix_http::{h1::Payload, header};
use actix_web::web::{BytesMut};
use futures::future::{ok, Future, Ready};
use futures::task::{Context, Poll};
use futures::StreamExt;
use crate::response::ErrorResponse;
pub struct UnhandledErrorResponse;
impl<S: 'static> Transform<S, ServiceRequest> for UnhandledErrorResponse
where
S: Service<ServiceRequest, Response = ServiceResponse, Error = Error>,
S::Future: 'static,
{
type Response = ServiceResponse;
type Error = Error;
type Transform = UnhandledErrorResponseMiddleware<S>;
type InitError = ();
type Future = Ready<Result<Self::Transform, Self::InitError>>;
fn new_transform(&self, service: S) -> Self::Future {
ok(UnhandledErrorResponseMiddleware { service: Rc::new(RefCell::new(service)), })
}
}
pub struct UnhandledErrorResponseMiddleware<S> {
service: Rc<RefCell<S>>,
}
impl<S> Service<ServiceRequest> for UnhandledErrorResponseMiddleware<S>
where
S: Service<ServiceRequest, Response = ServiceResponse, Error = Error> + 'static,
S::Future: 'static,
{
type Response = ServiceResponse;
type Error = Error;
type Future = Pin<Box<dyn Future<Output = Result<Self::Response, Self::Error>>>>;
fn poll_ready(&self, cx: &mut Context<'_>) -> Poll<Result<(), Self::Error>> {
self.service.poll_ready(cx)
}
fn call(&self, mut req: ServiceRequest) -> Self::Future {
let svc = self.service.clone();
Box::pin(async move {
/* EXTRACT THE BODY OF REQUEST */
let mut request_body = BytesMut::new();
while let Some(chunk) = req.take_payload().next().await {
request_body.extend_from_slice(&chunk?);
}
let mut orig_payload = Payload::empty();
orig_payload.unread_data(request_body.freeze());
req.set_payload(actix_http::Payload::from(orig_payload));
/* now process the response */
let res: ServiceResponse = svc.call(req).await?;
let content_type = match res.headers().get("content-type") {
None => { "unknown"}
Some(header) => {
match header.to_str() {
Ok(value) => {value}
Err(_) => { "unknown"}
}
}
};
return match res.response().error() {
None => {
Ok(res)
}
Some(error) => {
if content_type.to_uppercase().contains("APPLICATION/JSON") {
Ok(res)
} else {
let error = error.to_string();
let new_request = res.request().clone();
/* EXTRACT THE BODY OF RESPONSE */
let _body_data =
match str::from_utf8(&body::to_bytes(res.into_body()).await?){
Ok(str) => {
str
}
Err(_) => {
"Unknown"
}
};
let mut errors = HashMap::new();
errors.insert("general".to_string(), vec![error]);
let new_response = match ErrorResponse::new(&false, errors) {
Ok(response) => {
HttpResponseBuilder::new(StatusCode::BAD_REQUEST)
.insert_header((header::CONTENT_TYPE, "application/json"))
.body(serde_json::to_string(&response).unwrap())
}
Err(_error) => {
HttpResponseBuilder::new(StatusCode::BAD_REQUEST)
.insert_header((header::CONTENT_TYPE, "application/json"))
.body("An unknown error occurred.")
}
};
Ok(ServiceResponse::new(
new_request,
new_response
))
}
}
}
})
}
}
The extraction of the Request Body is straightforward and similar to how Actix example's illustrate. However, with the update to version Beta 14, pulling the bytes directly from AnyBody has changed with the introduction of BoxedBody. Fortunately, I found a utility function body::to_bytes (use actix_web::body::to_bytes) which does a good job. It's current implementation looks like this:
pub async fn to_bytes<B: MessageBody>(body: B) -> Result<Bytes, B::Error> {
let cap = match body.size() {
BodySize::None | BodySize::Sized(0) => return Ok(Bytes::new()),
BodySize::Sized(size) => size as usize,
// good enough first guess for chunk size
BodySize::Stream => 32_768,
};
let mut buf = BytesMut::with_capacity(cap);
pin!(body);
poll_fn(|cx| loop {
let body = body.as_mut();
match ready!(body.poll_next(cx)) {
Some(Ok(bytes)) => buf.extend_from_slice(&*bytes),
None => return Poll::Ready(Ok(())),
Some(Err(err)) => return Poll::Ready(Err(err)),
}
})
.await?;
Ok(buf.freeze())
}
which I believe should be fine to extract the body in this way, as the body is extracted from the body stream by to_bytes().
If someone has a better way, let me know, but it was a little bit of pain, and I only had recently determined how to do it in Beta 13 when it switched to Beta 14.
This particular example intercepts errors and rewrites them to JSON format if they're not already json format. This would be the case, as an example, if an error occurs outside of a handler, such as parsing JSON in the handler function itself _data: web::Json<Request<'a, LoginRequest>> and not in the handler body. Extracting the Request Body and Response Body is not necessary to accomplish the goal, and is just here for illustration.

Why is my Future implementation blocked after it is polled once and NotReady?

I implemented the future and made a request of it, but it blocked my curl and the log shows that poll was only invoked once.
Did I implement anything wrong?
use failure::{format_err, Error};
use futures::{future, Async};
use hyper::rt::Future;
use hyper::service::{service_fn, service_fn_ok};
use hyper::{Body, Method, Request, Response, Server, StatusCode};
use log::{debug, error, info};
use std::{
sync::{Arc, Mutex},
task::Waker,
thread,
};
pub struct TimerFuture {
shared_state: Arc<Mutex<SharedState>>,
}
struct SharedState {
completed: bool,
resp: String,
}
impl Future for TimerFuture {
type Item = Response<Body>;
type Error = hyper::Error;
fn poll(&mut self) -> futures::Poll<Response<Body>, hyper::Error> {
let mut shared_state = self.shared_state.lock().unwrap();
if shared_state.completed {
return Ok(Async::Ready(Response::new(Body::from(
shared_state.resp.clone(),
))));
} else {
return Ok(Async::NotReady);
}
}
}
impl TimerFuture {
pub fn new(instance: String) -> Self {
let shared_state = Arc::new(Mutex::new(SharedState {
completed: false,
resp: String::new(),
}));
let thread_shared_state = shared_state.clone();
thread::spawn(move || {
let res = match request_health(instance) {
Ok(status) => status.clone(),
Err(err) => {
error!("{:?}", err);
format!("{}", err)
}
};
let mut shared_state = thread_shared_state.lock().unwrap();
shared_state.completed = true;
shared_state.resp = res;
});
TimerFuture { shared_state }
}
}
fn request_health(instance_name: String) -> Result<String, Error> {
std::thread::sleep(std::time::Duration::from_secs(1));
Ok("health".to_string())
}
type BoxFut = Box<dyn Future<Item = Response<Body>, Error = hyper::Error> + Send>;
fn serve_health(req: Request<Body>) -> BoxFut {
let mut response = Response::new(Body::empty());
let path = req.uri().path().to_owned();
match (req.method(), path) {
(&Method::GET, path) => {
return Box::new(TimerFuture::new(path.clone()));
}
_ => *response.status_mut() = StatusCode::NOT_FOUND,
}
Box::new(future::ok(response))
}
fn main() {
let endpoint_addr = "0.0.0.0:8080";
match std::thread::spawn(move || {
let addr = endpoint_addr.parse().unwrap();
info!("Server is running on {}", addr);
hyper::rt::run(
Server::bind(&addr)
.serve(move || service_fn(serve_health))
.map_err(|e| eprintln!("server error: {}", e)),
);
})
.join()
{
Ok(e) => e,
Err(e) => println!("{:?}", e),
}
}
After compile and run this code, a server with port 8080 is running. Call the server with curl and it will block:
curl 127.0.0.1:8080/my-health-scope
Did I implement anything wrong?
Yes, you did not read and follow the documentation for the method you are implementing (emphasis mine):
When a future is not ready yet, the Async::NotReady value will be returned. In this situation the future will also register interest of the current task in the value being produced. This is done by calling task::park to retrieve a handle to the current Task. When the future is then ready to make progress (e.g. it should be polled again) the unpark method is called on the Task.
As a minimal, reproducible example, let's use this:
use futures::{future::Future, Async};
use std::{
mem,
sync::{Arc, Mutex},
thread,
time::Duration,
};
pub struct Timer {
data: Arc<Mutex<String>>,
}
impl Timer {
pub fn new(instance: String) -> Self {
let data = Arc::new(Mutex::new(String::new()));
thread::spawn({
let data = data.clone();
move || {
thread::sleep(Duration::from_secs(1));
*data.lock().unwrap() = instance;
}
});
Timer { data }
}
}
impl Future for Timer {
type Item = String;
type Error = ();
fn poll(&mut self) -> futures::Poll<Self::Item, Self::Error> {
let mut data = self.data.lock().unwrap();
eprintln!("poll was called");
if data.is_empty() {
Ok(Async::NotReady)
} else {
let data = mem::replace(&mut *data, String::new());
Ok(Async::Ready(data))
}
}
}
fn main() {
let v = Timer::new("Some text".into()).wait();
println!("{:?}", v);
}
It only prints out "poll was called" once.
You can call task::current (previously task::park) in the implementation of Future::poll, save the resulting value, then use the value with Task::notify (previously Task::unpark) whenever the future may be polled again:
use futures::{
future::Future,
task::{self, Task},
Async,
};
use std::{
mem,
sync::{Arc, Mutex},
thread,
time::Duration,
};
pub struct Timer {
data: Arc<Mutex<(String, Option<Task>)>>,
}
impl Timer {
pub fn new(instance: String) -> Self {
let data = Arc::new(Mutex::new((String::new(), None)));
let me = Timer { data };
thread::spawn({
let data = me.data.clone();
move || {
thread::sleep(Duration::from_secs(1));
let mut data = data.lock().unwrap();
data.0 = instance;
if let Some(task) = data.1.take() {
task.notify();
}
}
});
me
}
}
impl Future for Timer {
type Item = String;
type Error = ();
fn poll(&mut self) -> futures::Poll<Self::Item, Self::Error> {
let mut data = self.data.lock().unwrap();
eprintln!("poll was called");
if data.0.is_empty() {
let v = task::current();
data.1 = Some(v);
Ok(Async::NotReady)
} else {
let data = mem::replace(&mut data.0, String::new());
Ok(Async::Ready(data))
}
}
}
fn main() {
let v = Timer::new("Some text".into()).wait();
println!("{:?}", v);
}
See also:
Why does Future::select choose the future with a longer sleep period first?
Why is `Future::poll` not called repeatedly after returning `NotReady`?
What is the best approach to encapsulate blocking I/O in future-rs?

How to copy data from a stream while also forwarding a stream

I am using hyper 0.12 to build a proxy service. When receiving a response body from the upstream server I want to forward it back to the client ASAP, and save the contents in a buffer for later processing.
So I need a function that:
takes a Stream (a hyper::Body, to be precise)
returns a Stream that is functionally identical to the input stream
also returns some sort of Future<Item = Vec<u8>, Error = ...> that is resolved with the buffered contents of the input stream, when the output stream is completely consumed
I can't for the life of me figure out how to do this.
I guess the function I'm looking for will look something like this:
type BufferFuture = Box<Future<Item = Vec<u8>, Error = ()>>;
pub fn copy_body(body: hyper::Body) -> (hyper::Body, BufferFuture) {
let body2 = ... // ???
let buffer = body.fold(Vec::<u8>::new(), |mut buf, chunk| {
buf.extend_from_slice(&chunk);
// ...somehow send this chunk to body2 also?
});
(body2, buffer);
}
Below is what I have tried, and it works until send_data() fails (obviously).
type BufferFuture = Box<Future<Item = Vec<u8>, Error = ()>>;
pub fn copy_body(body: hyper::Body) -> (hyper::Body, BufferFuture) {
let (mut sender, body2) = hyper::Body::channel();
let consume =
body.map_err(|_| ()).fold(Vec::<u8>::new(), move |mut buf, chunk| {
buf.extend_from_slice(&chunk);
// What to do if this fails?
if sender.send_data(chunk).is_err() {}
Box::new(future::ok(buf))
});
(body2, Box::new(consume));
}
However, something tells me I am on the wrong track.
I have found Sink.fanout() which seems like it is what I want, but I do not have a Sink, and I don't know how to construct one. hyper::Body implements Stream but not Sink.
What I ended up doing was implement a new type of stream that does what I need. This appeared to be necessary because hyper::Body does not implement Sink nor does hyper::Chunk implement Clone (which is required for Sink.fanout()), so I cannot use any of the existing combinators.
First a struct that contains all details that we need and methods to append a new chunk, as well as notify that the buffer is completed.
struct BodyClone<T> {
body: T,
buffer: Option<Vec<u8>>,
sender: Option<futures::sync::oneshot::Sender<Vec<u8>>>,
}
impl BodyClone<hyper::Body> {
fn flush(&mut self) {
if let (Some(buffer), Some(sender)) = (self.buffer.take(), self.sender.take()) {
if sender.send(buffer).is_err() {}
}
}
fn push(&mut self, chunk: &hyper::Chunk) {
use hyper::body::Payload;
let length = if let Some(buffer) = self.buffer.as_mut() {
buffer.extend_from_slice(chunk);
buffer.len() as u64
} else {
0
};
if let Some(content_length) = self.body.content_length() {
if length >= content_length {
self.flush();
}
}
}
}
Then I implemented the Stream trait for this struct.
impl Stream for BodyClone<hyper::Body> {
type Item = hyper::Chunk;
type Error = hyper::Error;
fn poll(&mut self) -> futures::Poll<Option<Self::Item>, Self::Error> {
match self.body.poll() {
Ok(Async::Ready(Some(chunk))) => {
self.push(&chunk);
Ok(Async::Ready(Some(chunk)))
}
Ok(Async::Ready(None)) => {
self.flush();
Ok(Async::Ready(None))
}
other => other,
}
}
}
Finally I could define an extension method on hyper::Body:
pub type BufferFuture = Box<Future<Item = Vec<u8>, Error = ()> + Send>;
trait CloneBody {
fn clone_body(self) -> (hyper::Body, BufferFuture);
}
impl CloneBody for hyper::Body {
fn clone_body(self) -> (hyper::Body, BufferFuture) {
let (sender, receiver) = futures::sync::oneshot::channel();
let cloning_stream = BodyClone {
body: self,
buffer: Some(Vec::new()),
sender: Some(sender),
};
(
hyper::Body::wrap_stream(cloning_stream),
Box::new(receiver.map_err(|_| ())),
)
}
}
This can be used as follows:
let (body: hyper::Body, buffer: BufferFuture) = body.clone_body();

What might cause a difficult-to-reproduce truncation of a Hyper HTTP response?

I am experiencing a bug where my Hyper HTTP response is being truncated to a specific size (7829 bytes). Making the same request with cURL works fine.
The request queries a JSON endpoint for data. The response struct is then shuffled around a lot, because a relatively complex rate-limiting procedure is used to make a number of these requests at once. However, if only one request is made, the response is still truncated.
Before implementing rate-limiting and doing some heavy refactoring, the program made these responses properly.
I made the minimal example below, but it fails to reproduce the problem. At this point I'm not sure where to look. The codebase is moderately complicated and iteratively expanding the reproduction example is difficult, especially when I don't know what might possibly cause this.
What are some ways that Hyper's Response body might get truncated? The response body is acquired as in the handle function below.
#![feature(use_nested_groups)]
extern crate futures;
extern crate hyper;
extern crate hyper_tls;
extern crate tokio_core;
use futures::{Future, Stream};
use hyper::{Body, Chunk, Client, Method, Request, Response};
use hyper_tls::HttpsConnector;
use tokio_core::reactor::Core;
use std::env;
fn main() {
let mut core = Core::new().unwrap();
let client = Client::configure()
.connector(HttpsConnector::new(4, &core.handle()).unwrap())
.build(&core.handle());
fn handle(response: Response<Body>) -> Box<Future<Item = usize, Error = hyper::Error>> {
Box::new(
response
.body()
.concat2()
.map(move |body: Chunk| -> usize { body.len() }),
)
}
let args: Vec<String> = env::args().collect();
let uri = &args[1];
let req = Request::new(Method::Get, uri.parse().unwrap());
let response_body_length = {
let future = Box::new(client.request(req).map(handle).flatten());
core.run(future).unwrap()
};
println!("response body length: {}", response_body_length);
}
Offending code:
extern crate serde;
extern crate serde_json;
use futures::{future, stream, Future, Stream};
use hyper;
use hyper::{client, Body, Chunk, Client, Headers, Method, Request, Response, header::Accept,
header::Date as DateHeader, header::RetryAfter};
use hyper_tls::HttpsConnector;
use tokio_core::reactor::Core;
use models::Bucket;
use std::thread;
use std::time::{Duration, UNIX_EPOCH};
use std::str;
header! { (XRateLimitRemaining, "x-ratelimit-remaining") => [String] }
#[derive(Debug)]
struct Uri(pub String);
const MAX_REQ_SIZE: u32 = 500;
fn make_uri(symbol: &str, page_ix: u32) -> Uri {
Uri(format!(
"https://www.bitmex.com/api/v1/trade/bucketed?\
symbol={symbol}&\
columns={columns}&\
partial=false&\
reverse=true&\
binSize={bin_size}&\
count={count}&\
start={start}",
symbol = symbol,
columns = "close,timestamp",
bin_size = "5m",
count = MAX_REQ_SIZE,
start = 0 + MAX_REQ_SIZE * page_ix
))
}
#[derive(Debug)]
struct RateLimitInfo {
remaining_reqs: u32,
retry_after: Option<Duration>,
}
impl RateLimitInfo {
fn default() -> RateLimitInfo {
RateLimitInfo {
remaining_reqs: 1,
retry_after: None,
}
}
fn from<T>(resp: &Response<T>) -> RateLimitInfo {
let headers = resp.headers();
let remaining_reqs = headers
.get::<XRateLimitRemaining>()
.unwrap_or_else(|| panic!("x-ratelimit-remaining not on request."))
.parse()
.unwrap();
let retry_after = match headers.get::<RetryAfter>() {
Some(RetryAfter::Delay(duration)) => Some(*duration),
_ => None,
};
RateLimitInfo {
remaining_reqs,
retry_after,
}
}
}
fn resp_dated_later<'a>(a: &'a Response<Body>, b: &'a Response<Body>) -> &'a Response<Body> {
let get_date = |resp: &Response<Body>| {
let headers: &Headers = resp.headers();
**headers.get::<DateHeader>().unwrap()
};
if get_date(&a) > get_date(&b) {
a
} else {
b
}
}
#[derive(Debug)]
struct Query {
uri: Uri,
response: Option<Response<Body>>,
}
impl Query {
fn from_uri(uri: Uri) -> Query {
Query {
uri: uri,
response: None,
}
}
}
fn query_good(q: &Query) -> bool {
match &q.response {
Some(response) => response.status().is_success(),
_ => false,
}
}
type HttpsClient = hyper::Client<HttpsConnector<client::HttpConnector>>;
type FutureQuery = Box<Future<Item = Query, Error = hyper::Error>>;
fn to_future(x: Query) -> FutureQuery {
Box::new(future::ok(x))
}
fn exec_if_needed(client: &HttpsClient, query: Query) -> FutureQuery {
fn exec(client: &HttpsClient, q: Query) -> FutureQuery {
println!("exec: {:?}", q);
let uri = q.uri;
let req = {
let mut req = Request::new(Method::Get, uri.0.parse().unwrap());
req.headers_mut().set(Accept::json());
req
};
Box::new(
client
.request(req)
.inspect(|resp| println!("HTTP {}", resp.status()))
.map(|resp| Query {
uri: uri,
response: Some(resp),
}),
)
}
if query_good(&query) {
to_future(query)
} else {
exec(client, query)
}
}
type BoxedFuture<T> = Box<Future<Item = T, Error = hyper::Error>>;
fn do_batch(client: &HttpsClient, queries: Vec<Query>) -> BoxedFuture<Vec<Query>> {
println!("do_batch() {} queries", queries.len());
let exec_if_needed = |q| exec_if_needed(client, q);
let futures = queries.into_iter().map(exec_if_needed);
println!("do_batch() futures {:?}", futures);
Box::new(
stream::futures_ordered(futures).collect(), //future::join_all(futures)
)
}
fn take<T>(right: &mut Vec<T>, suggested_n: usize) -> Vec<T> {
let n: usize = if right.len() < suggested_n {
right.len()
} else {
suggested_n
};
let left = right.drain(0..n);
return left.collect();
}
type BoxedResponses = Box<Vec<Response<Body>>>;
fn batched_throttle(uris: Vec<Uri>) -> BoxedResponses {
println!("batched_throttle({} uris)", uris.len());
let mut core = Core::new().unwrap();
let client = Client::configure()
.connector(HttpsConnector::new(4, &core.handle()).unwrap())
.build(&core.handle());
let mut rate_limit_info = RateLimitInfo::default();
let mut queries_right: Vec<Query> = uris.into_iter().map(Query::from_uri).collect();
loop {
let mut queries_left: Vec<Query> = Vec::with_capacity(queries_right.len());
println!("batched_throttle: starting inner loop");
loop {
// throttle program during testing
thread::sleep(Duration::from_millis(800));
println!("batched_throttle: {:?}", rate_limit_info);
if let Some(retry_after) = rate_limit_info.retry_after {
println!("batched_throttle: retrying after {:?}", retry_after);
thread::sleep(retry_after)
}
if queries_right.is_empty() {
break;
}
let mut queries_mid = {
let ri_count = rate_limit_info.remaining_reqs;
let iter_req_count = if ri_count == 0 { 1 } else { ri_count };
println!("batched_throttle: iter_req_count {}", iter_req_count);
take(&mut queries_right, iter_req_count as usize)
};
println!(
"batched_throttle: \
queries_right.len() {}, \
queries_left.len() {}, \
queries_mid.len() {})",
queries_right.len(),
queries_left.len(),
queries_mid.len()
);
if queries_mid.iter().all(query_good) {
println!("batched_throttle: queries_mid.iter().all(query_good)");
continue;
}
queries_mid = { core.run(do_batch(&client, queries_mid)).unwrap() };
rate_limit_info = {
let create_very_old_response =
|| Response::new().with_header(DateHeader(UNIX_EPOCH.into()));
let very_old_response = create_very_old_response();
let last_resp = queries_mid
.iter()
.map(|q| match &q.response {
Some(r) => r,
_ => panic!("Impossible"),
})
.fold(&very_old_response, resp_dated_later);
RateLimitInfo::from(&last_resp)
};
&queries_left.append(&mut queries_mid);
}
queries_right = queries_left;
if queries_right.iter().all(query_good) {
break;
}
}
println!(
"batched_throttle: finishing. queries_right.len() {}",
queries_right.len()
);
Box::new(
queries_right
.into_iter()
.map(|q| q.response.unwrap())
.collect(),
)
}
fn bucket_count_to_req_count(bucket_count: u32) -> u32 {
let needed_req_count = (bucket_count as f32 / MAX_REQ_SIZE as f32).ceil() as u32;
return needed_req_count;
}
type BoxedBuckets = Box<Vec<Bucket>>;
fn response_to_buckets(response: Response<Body>) -> BoxedFuture<Vec<Bucket>> {
Box::new(response.body().concat2().map(|body: Chunk| -> Vec<Bucket> {
println!("body.len(): {}", body.len());
println!("JSON: {}", str::from_utf8(&body).unwrap());
serde_json::from_slice(&body).unwrap()
}))
}
pub fn get_n_last(symbol: &str, bucket_count: u32) -> BoxedBuckets {
let req_count = bucket_count_to_req_count(bucket_count);
let uris = (0..req_count)
.map(|page_ix| make_uri(symbol, page_ix))
.collect();
let responses = batched_throttle(uris);
let mut core = Core::new().unwrap();
let boxed_buckets = {
let futures = responses.into_iter().map(response_to_buckets);
let future = stream::futures_ordered(futures).collect();
let groups_of_buckets = core.run(future).unwrap();
Box::new(
groups_of_buckets
.into_iter()
.flat_map(|bs| bs.into_iter())
.rev()
.collect(),
)
};
return boxed_buckets;
}
You first create a Core and start lots of requests and gather the Response "results".
After you got all the Responses you start a new Core and try to start reading the data from those Responses - but the server probably closed them long ago due to write timeouts, and you only get partial data.
You shouldn't keep the server waiting; start reading the Responses as soon as possible.

Resources