How to convert hyper's Body stream into a Result<Vec<String>>? - rust

I'm updating code to the newest versions of hyper and futures, but everything I've tried misses implemented traits in some kind or another.
A not working example playground for this ...
extern crate futures; // 0.3.5
extern crate hyper; // 0.13.6
use futures::{future, FutureExt, StreamExt, TryFutureExt, TryStreamExt};
use hyper::body;
fn get_body_as_vec<'a>(b: body::Body) -> future::BoxFuture<'a, Result<Vec<String>, hyper::Error>> {
let f = b.and_then(|bytes| {
let s = std::str::from_utf8(&bytes).expect("sends no utf-8");
let mut lines: Vec<String> = Vec::new();
for l in s.lines() {
lines.push(l.to_string());
}
future::ok(lines)
});
Box::pin(f)
}
This produces the error:
error[E0277]: the trait bound `futures::stream::AndThen<hyper::Body, futures::future::Ready<std::result::Result<std::vec::Vec<std::string::String>, hyper::Error>>, [closure#src/lib.rs:8:24: 15:6]>: futures::Future` is not satisfied
--> src/lib.rs:17:5
|
17 | Box::pin(f)
| ^^^^^^^^^^^ the trait `futures::Future` is not implemented for `futures::stream::AndThen<hyper::Body, futures::future::Ready<std::result::Result<std::vec::Vec<std::string::String>, hyper::Error>>, [closure#src/lib.rs:8:24: 15:6]>`
|
= note: required for the cast to the object type `dyn futures::Future<Output = std::result::Result<std::vec::Vec<std::string::String>, hyper::Error>> + std::marker::Send`
I'm unable to create a compatible future. Body is a stream and I can't find any "converter" function with the required traits implemented.
With hyper 0.12, I used concat2().

From the reference of and_then:
Note that this function consumes the receiving stream and returns a
wrapped version of it.
To process the entire stream and return a single future representing
success or error, use try_for_each instead.
Yes your f is still a Stream, try_for_each will work as reference suggested but try_fold would be a better choice to represent bytes as lines in vector but as #Shepmaster points in the comment; there is a possibility that if we directly convert chunks to the UTF-8 we can lose integrity of multi-byte characters from response.
Due to consistency of data, the easiest solution might be collecting all the bytes before conversion to UTF-8.
use futures::{future, FutureExt, TryStreamExt};
use hyper::body;
fn get_body_as_vec<'a>(b: body::Body) -> future::BoxFuture<'a, Result<Vec<String>>> {
let f = b
.try_fold(vec![], |mut vec, bytes| {
vec.extend_from_slice(&bytes);
future::ok(vec)
})
.map(|x| {
Ok(std::str::from_utf8(&x?)?
.lines()
.map(ToString::to_string)
.collect())
});
Box::pin(f)
}
Playground
You can test the multiple chunk behavior by using channel from hyper Body. Here is I've created the line partition across the chunks scenario, this will work fine with the code above but if you directly process the chunks you will lose the consistency.
let (mut sender, body) = body::Body::channel();
tokio::spawn(async move {
sender
.send_data("Line1\nLine2\nLine3\nLine4\nLine5".into())
.await;
sender
.send_data("next bytes of Line5\nLine6\nLine7\nLine8\n----".into())
.await;
});
println!("{:?}", get_body_as_vec(body).await);
Playground ( Success scenario )
Playground ( Fail scenario: "next bytes of Line5" will be
represented as new line in Vec)
Note : I've used std::error:Error as a return type since both hyper::Error and FromUtf8Error implement it, you may still use your expect strategy with hyper::Error.

I found two solutions, each of them is pretty simple:
/*
WARNING for beginners!!! This use statement
is important so we can later use .data() method!!!
*/
use hyper::body::{to_bytes, HttpBody};
// Takes only single chunk of data!
let my_vector: Vec<u8> = request.into_body().data().await.unwrap().unwrap().to_vec();
// Takes all data chunks, not just the first one:
let my_bytest = body::to_bytes(res.into_body()).await?;
let my_string = String::from_utf8(my_vector).unwrap();
This example doesn't handle errors properly, ensure your code does.

Related

Rust chunks method with owned values?

I'm trying to perform a parallel operation on several chunks of strings at a time, and I'm finding having an issue with the borrow checker:
(for context, identifiers is a Vec<String> from a CSV file, client is reqwest and target is an Arc<String> that is write once read many)
use futures::{stream, StreamExt};
use std::sync::Arc;
async fn nop(
person_ids: &[String],
target: &str,
url: &str,
) -> String {
let noop = format!("{} {}", target, url);
let noop2 = person_ids.iter().for_each(|f| {f.as_str();});
"Some text".into()
}
#[tokio::main]
async fn main() {
let target = Arc::new(String::from("sometext"));
let url = "http://example.com";
let identifiers = vec!["foo".into(), "bar".into(), "baz".into(), "qux".into(), "quux".into(), "quuz".into(), "corge".into(), "grault".into(), "garply".into(), "waldo".into(), "fred".into(), "plugh".into(), "xyzzy".into()];
let id_sets: Vec<&[String]> = identifiers.chunks(2).collect();
let responses = stream::iter(id_sets)
.map(|person_ids| {
let target = target.clone();
tokio::spawn( async move {
let resptext = nop(person_ids, target.as_str(), url).await;
})
})
.buffer_unordered(2);
responses
.for_each(|b| async { })
.await;
}
Playground: https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=e41c635e99e422fec8fc8a581c28c35e
Given chunks yields a Vec<&[String]>, the compiler complains that identifiers doesn't live long enough because it potentially goes out of scope while the slices are being referenced. Realistically this won't happen because there's an await. Is there a way to tell the compiler that this is safe, or is there another way of getting chunks as a set of owned Strings for each thread?
There was a similarly asked question that used into_owned() as a solution, but when I try that, rustc complains about the slice size not being known at compile time in the request_user function.
EDIT: Some other questions as well:
Is there a more direct way of using target in each thread without needing Arc? From the moment it is created, it never needs to be modified, just read from. If not, is there a way of pulling it out of the Arc that doesn't require the .as_str() method?
How do you handle multiple error types within the tokio::spawn() block? In the real world use, I'm going to receive quick_xml::Error and reqwest::Error within it. It works fine without tokio spawn for concurrency.
Is there a way to tell the compiler that this is safe, or is there another way of getting chunks as a set of owned Strings for each thread?
You can chunk a Vec<T> into a Vec<Vec<T>> without cloning by using the itertools crate:
use itertools::Itertools;
fn main() {
let items = vec![
String::from("foo"),
String::from("bar"),
String::from("baz"),
];
let chunked_items: Vec<Vec<String>> = items
.into_iter()
.chunks(2)
.into_iter()
.map(|chunk| chunk.collect())
.collect();
for chunk in chunked_items {
println!("{:?}", chunk);
}
}
["foo", "bar"]
["baz"]
This is based on the answers here.
Your issue here is that the identifiers are a Vector of references to a slice. They will not necessarily be around once you've left the scope of your function (which is what async move inside there will do).
Your solution to the immediate problem is to convert the Vec<&[String]> to a Vec<Vec<String>> type.
A way of accomplishing that would be:
let id_sets: Vec<Vec<String>> = identifiers
.chunks(2)
.map(|x: &[String]| x.to_vec())
.collect();

Error handling when forwarding from AsyncRead to an futures 0.3 mpsc::UnboundedSender<T>

I would like to forward the output of tokio::process::child::ChildStdout which implements tokio::io::AsyncRead to a futures::channel::mpsc::UnboundedSender<MyType>, which implements futures::sink::Sink.
I am using a custom codec which produces items of MyType, but to stay true to the M in MRE, I will use Tokio's LinesCodec and say that MyType = String for this question.
use futures::StreamExt; // 0.3.8
use tokio; // 1.0.1
use tokio_util; // 0.6.0
#[tokio::main]
pub async fn main() {
let (mut tx, rx) = futures::channel::mpsc::unbounded::<String>();
let mut process = tokio::process::Command::new("dmesg")
.arg("-w")
.stdout(std::process::Stdio::piped())
.spawn()
.unwrap();
let stdout = process.stdout.take().unwrap();
let codec = tokio_util::codec::LinesCodec::new();
let framed_read = tokio_util::codec::FramedRead::new(stdout, codec);
let forward = framed_read.forward(tx);
// read from the other end of the channel
tokio::spawn(async move {
while let Some(line) = rx.next().await {
eprintln!("{}", line);
}
});
//forward.await;
}
However, the compiler is reporting a mismatch in error types:
error[E0271]: type mismatch resolving `<futures::channel::mpsc::UnboundedSender<String> as futures::Sink<String>>::Error == LinesCodecError`
--> src/main.rs:19:31
|
19 | let forward = framed_read.forward(tx);
| ^^^^^^^ expected struct `futures::channel::mpsc::SendError`, found enum `LinesCodecError`
Assuming that I am not doing something fundamentally wrong here, how can I handle/convert these error types properly?
There seems to have been a similar question asked before but it seems to be for the opposite situation and futures 0.1 and I suspect it may be outdated since things are changing so quickly in Rust's async ecosystem.
The items in the stream can fail (LinesCodecError) and sending the value into the channel can fail (SendError), but the whole forwarding process can only result in a single error type.
You can use SinkExt::sink_err_into and TryStreamExt::err_into to convert the errors into a compatible unified type. Here, I've chosen Box<dyn Error>:
type Error = Box<dyn std::error::Error>;
let forward = framed_read.err_into::<Error>().forward(tx.sink_err_into::<Error>());
In many cases, you'd create a custom error type. You also probably wouldn't need to use the turbofish as much as the above example, as type inference will likely kick in at some point.
See also:
How do you define custom `Error` types in Rust?

How to use std::slice::Chunks on Arc<Mutex<Vec<u8>>> properly between threads in Rust?

i'd like to gain understanding why following seems to not work properly in Rust.
I'd like to chunk a vector and give every thread a chunk to work on it. I tried it with a Arc and a Mutex combination to have mutual access to my vec.
This was my first (obvious) attempt:
Declare the vec, chunk it, send chunk into each thread. In my understanding it should work because the Chunk methods guarantees non overlapping chunks.
use std::sync::{Arc, Mutex};
use std::thread;
fn main() {
let data = Arc::new(Mutex::new(vec![0;20]));
let chunk_size = 5;
let mut threads = vec![];
let chunks: Vec<&mut [u8]> = data.lock().unwrap().chunks_mut(chunk_size).collect();
for chunk in chunks.into_iter(){
threads.push(thread::spawn(move || {
inside_thread(chunk)
}));
}
}
fn inside_thread(chunk: &mut [u8]) {
// now do something with chunk
}
The error says data does not live enough. Silly me, with the chunking i created pointers to the array but passed no Arc reference into the thread.
So i changed a few lines, but it would make no sense because I'd have an unused reference in my thread!?
for i in 0..data.lock().unwrap().len() / 5 {
let ref_to_data = data.clone();
threads.push(thread::spawn(move || {
inside_thread(chunk, ref_to_data)
}));
}
The error still says that data does not live enough.
The next attempt didn't work either. I thought ok i could work around it and chunk it inside the thread and get my chunk by an index. But if it worked the code wouldn't be very rust idiomatic :/
use std::sync::{Arc, Mutex};
use std::thread;
fn main() {
let data = Arc::new(Mutex::new(vec![0;20]));
let chunk_size = 5;
let mut threads = vec![];
for i in 0..data.lock().unwrap().len() / chunk_size {
let ref_to_data = data.clone();
threads.push(thread::spawn(move || {
inside_thread(ref_to_data, i, chunk_size)
}));
}
}
fn inside_thread(data: Arc<Mutex<Vec<u8>>>, index: usize, chunk_size: usize) {
let chunk: &mut [u8] = data.lock().unwrap().chunks_mut(chunk_size).collect()[index];
// now do something with chunk
}
The error says:
--> src/main.rs:18:72
|
18 | let chunk: &mut [u8] = data.lock().unwrap().chunks_mut(chunk_size).collect()[index];
| ^^^^^^^
| |
| cannot infer type for type parameter `B` declared on the method `collect`
| help: consider specifying the type argument in the method call: `collect::<B>`
|
= note: type must be known at this point
And when i try to infer it it doesn't work either. So for now i tried much and nothing worked and im out of ideas.
Ok what i always could do is to "do the chunking on my own". Just mutating the vector inside the threads works fine. But this is no idiomatic nice way to do it (in my opinion).
Question: Is there a possible way to solve this problem rust idiomatic?
Hint: I know i could work with an scoped-threadpool or something like this but i want to gain this knowledge for my thesis.
Much thanks for spending your time on this!

Communicate Rust pnet packets between threads using channels

I'm working on a simple Rust program that reads and parses network packets. For reading the network packets I'm using the pnet libary.
Because the parsing may take some time I'm using two separate threads for reading and parsing the packets.
My idea now was to pass the read packages from the first thread to the second thread via message passing (using mpsc::channel()).
Here's a simplified version of my code which I wrote based on the example given in the pnet doc:
extern crate pnet;
use std::sync::mpsc;
use std::thread;
use pnet::datalink;
use pnet::datalink::Channel::Ethernet;
fn main() {
let (sender, receiver) = mpsc::channel();
thread::spawn(move || {
for packet in receiver.recv() {
println!("{:?}", packet)
}
});
let interface = datalink::interfaces().into_iter()
.find(|interface| interface.name == "enp5s0")
.unwrap();
let (_, mut package_receiver) =
match datalink::channel(&interface, Default::default()) {
Ok(Ethernet(tx, rx)) => (tx, rx),
_ => panic!()
};
loop {
match package_receiver.next() {
Ok(packet) => {
// sender.send("foo"); // this works fine
sender.send(packet);
}
_ => panic!()
}
}
}
This works fine for sending primitive types or Strings over the channel, but not for the network packets. When I try to send a packet to the parser thread via the channel I get the following compiler error:
error[E0597]: `*package_receiver` does not live long enough
--> src/main.rs:28:15
|
28 | match package_receiver.next() {
| ^^^^^^^^^^^^^^^^ borrowed value does not live long enough
...
36 | }
| - borrowed value only lives until here
|
= note: borrowed value must be valid for the static lifetime...
I'm pretty new to Rust and would really appreciate the help!
packet is &[u8] type, with some lifetime 'a that is also same as that of the reference taken to package_receiver in the next() call. The next() definition with lifetimes will look like this:
fn next(&'a mut self) -> Result<&'a [u8]>
You send the &[u8] to a thread. But the thread can outlive the references you send to it, leading to dangling references. As a result the compiler complains that they need to have 'static lifetime."foo" works because it is &'static str.
One way would be to take the ownership of the data and then send it as a value to another thread.
Ok(packet) => {
// sender.send("foo"); // this works fine
sender.send(packet.to_owned());
}
You can also have a look at using scoped threads with crossbeam
The package_receiver.next() call is defined as:
pub trait DataLinkReceiver: Send {
fn next(&mut self) -> Result<&[u8]>;
}
And the mspc::Sender defines send as:
pub fn send(&self, t: T) -> Result<(), SendError<T>>
So package_receiver.next() returns a result containing a reference to a slice of bytes, &[u8]. So when you then call sender.send(packet); that's saying you want to send the reference to the other thread. However, the match scope for package_receiver.next() doesn't promise that the reference lives longer than the end of the scope. So the other thread can't be guaranteed that the reference is still valid at the time it's accessing that data.
The str works because it's a static lifetime string. That memory will always be valid to read no matter what thread reads it.
If you change your call to:
sender.send(Vec::from(packet))
This creates a Vec variable, copies the packet slice into the new memory, then passes ownership of that variable to the other thread. This guarantees that the other receiving thread will always have clear access to that data. Because the ownership is clearly passed, the code in the receiving thread is known to the compiler as where the lifetime of received Vec variables ends.
There will also be some miscellaneous errors about using the result of the .send() which could be taken care of with something like this:
if sender.send(Vec::from(packet)).is_err() {
println!("Send error");
}
The other answers here do a great job of explaining why this happens, and it seems EthernetPacket (which derives the packet macro) is simply a wrapper around a reference of bytes. To make a "new" EthernetPacket variable from the reference of bytes, one could do the following:
let packet = EthernetPacket::owned(packet.to_owned()).unwrap();
sender.send(packet).unwrap();
This will clone the packet bytes and make send them through the channel.
Documentation for EthernetPacket::owned can be found here.

tokio-curl: capture output into a local `Vec` - may outlive borrowed value

I do not know Rust well enough to understand lifetimes and closures yet...
Trying to collect the downloaded data into a vector using tokio-curl:
extern crate curl;
extern crate futures;
extern crate tokio_core;
extern crate tokio_curl;
use std::io::{self, Write};
use std::str;
use curl::easy::Easy;
use tokio_core::reactor::Core;
use tokio_curl::Session;
fn main() {
// Create an event loop that we'll run on, as well as an HTTP `Session`
// which we'll be routing all requests through.
let mut lp = Core::new().unwrap();
let mut out = Vec::new();
let session = Session::new(lp.handle());
// Prepare the HTTP request to be sent.
let mut req = Easy::new();
req.get(true).unwrap();
req.url("https://www.rust-lang.org").unwrap();
req.write_function(|data| {
out.extend_from_slice(data);
io::stdout().write_all(data).unwrap();
Ok(data.len())
})
.unwrap();
// Once we've got our session, issue an HTTP request to download the
// rust-lang home page
let request = session.perform(req);
// Execute the request, and print the response code as well as the error
// that happened (if any).
let mut req = lp.run(request).unwrap();
println!("{:?}", req.response_code());
println!("out: {}", str::from_utf8(&out).unwrap());
}
Produces an error:
error[E0373]: closure may outlive the current function, but it borrows `out`, which is owned by the current function
--> src/main.rs:25:24
|
25 | req.write_function(|data| {
| ^^^^^^ may outlive borrowed value `out`
26 | out.extend_from_slice(data);
| --- `out` is borrowed here
|
help: to force the closure to take ownership of `out` (and any other referenced variables), use the `move` keyword, as shown:
| req.write_function(move |data| {
Investigating further, I see that Easy::write_function requires the 'static lifetime, but the example of how to collect output from the curl-rust docs uses Transfer::write_function instead:
use curl::easy::Easy;
let mut data = Vec::new();
let mut handle = Easy::new();
handle.url("https://www.rust-lang.org/").unwrap();
{
let mut transfer = handle.transfer();
transfer.write_function(|new_data| {
data.extend_from_slice(new_data);
Ok(new_data.len())
}).unwrap();
transfer.perform().unwrap();
}
println!("{:?}", data);
The Transfer::write_function does not require the 'static lifetime:
impl<'easy, 'data> Transfer<'easy, 'data> {
/// Same as `Easy::write_function`, just takes a non `'static` lifetime
/// corresponding to the lifetime of this transfer.
pub fn write_function<F>(&mut self, f: F) -> Result<(), Error>
where F: FnMut(&[u8]) -> Result<usize, WriteError> + 'data
{
...
But I can't use a Transfer instance on tokio-curl's Session::perform because it requires the Easy type:
pub fn perform(&self, handle: Easy) -> Perform {
transfer.easy is a private field that is directly passed to session.perform.
It this an issue with tokio-curl? Maybe it should mark the transfer.easy field as public or implement new function like perform_transfer? Is there another way to collect output using tokio-curl per transfer?
The first thing you have to understand when using the futures library is that you don't have any control over what thread the code is going to run on.
In addition, the documentation for curl's Easy::write_function says:
Note that the lifetime bound on this function is 'static, but that is often too restrictive. To use stack data consider calling the transfer method and then using write_function to configure a callback that can reference stack-local data.
The most straight-forward solution is to use some type of locking primitive to ensure that only one thread at a time may have access to the vector. You also have to share ownership of the vector between the main thread and the closure:
use std::sync::Mutex;
use std::sync::Arc;
let out = Arc::new(Mutex::new(Vec::new()));
let out_closure = out.clone();
// ...
req.write_function(move |data| {
let mut out = out_closure.lock().expect("Unable to lock output");
// ...
}).expect("Cannot set writing function");
// ...
let out = out.lock().expect("Unable to lock output");
println!("out: {}", str::from_utf8(&out).expect("Data was not UTF-8"));
Unfortunately, the tokio-curl library does not currently support using the Transfer type that would allow for stack-based data.

Resources