Getting multiple URLs concurrently with Hyper

Getting multiple URLs concurrently with Hyper - rust

I am trying to adapt the Hyper basic client example to get multiple URLs concurrently.
This is the code I currently have:
extern crate futures;
extern crate hyper;
extern crate tokio_core;
use std::io::{self, Write};
use std::iter;
use futures::{Future, Stream};
use hyper::Client;
use tokio_core::reactor::Core;
fn get_url() {
let mut core = Core::new().unwrap();
let client = Client::new(&core.handle());
let uris: Vec<_> = iter::repeat("http://httpbin.org/ip".parse().unwrap()).take(50).collect();
for uri in uris {
let work = client.get(uri).and_then(|res| {
println!("Response: {}", res.status());
res.body().for_each(|chunk| {
io::stdout()
.write_all(&chunk)
.map_err(From::from)
})
});
core.run(work).unwrap();
}
}
fn main() {
get_url();
}
It doesn't seem to be acting concurrently (it takes a long time to complete), am I giving the work to the core in the wrong way?

am I giving the work to the core in the wrong way?
Yes, you are giving one request to Tokio and requiring that it complete before starting the next request. You've taken asynchronous code and forced it to be sequential.
You need to give the reactor a single future that will perform different kinds of concurrent work.
Hyper 0.14
use futures::prelude::*;
use hyper::{body, client::Client};
use std::{
io::{self, Write},
iter,
};
use tokio;
const N_CONCURRENT: usize = 1;
#[tokio::main]
async fn main() {
let client = Client::new();
let uri = "http://httpbin.org/ip".parse().unwrap();
let uris = iter::repeat(uri).take(50);
stream::iter(uris)
.map(move |uri| client.get(uri))
.buffer_unordered(N_CONCURRENT)
.then(|res| async {
let res = res.expect("Error making request: {}");
println!("Response: {}", res.status());
body::to_bytes(res).await.expect("Error reading body")
})
.for_each(|body| async move {
io::stdout().write_all(&body).expect("Error writing body");
})
.await;
}
With N_CONCURRENT set to 1:
real 1.119 1119085us
user 0.012 12021us
sys 0.011 11459us
And set to 10:
real 0.216 216285us
user 0.014 13596us
sys 0.021 20640us
Cargo.toml
[dependencies]
futures = "0.3.17"
hyper = { version = "0.14.13", features = ["client", "http1", "tcp"] }
tokio = { version = "1.12.0", features = ["full"] }
Hyper 0.12
use futures::{stream, Future, Stream}; // 0.1.25
use hyper::Client; // 0.12.23
use std::{
io::{self, Write},
iter,
};
use tokio; // 0.1.15
const N_CONCURRENT: usize = 1;
fn main() {
let client = Client::new();
let uri = "http://httpbin.org/ip".parse().unwrap();
let uris = iter::repeat(uri).take(50);
let work = stream::iter_ok(uris)
.map(move |uri| client.get(uri))
.buffer_unordered(N_CONCURRENT)
.and_then(|res| {
println!("Response: {}", res.status());
res.into_body()
.concat2()
.map_err(|e| panic!("Error collecting body: {}", e))
})
.for_each(|body| {
io::stdout()
.write_all(&body)
.map_err(|e| panic!("Error writing: {}", e))
})
.map_err(|e| panic!("Error making request: {}", e));
tokio::run(work);
}
With N_CONCURRENT set to 1:
real 0m2.279s
user 0m0.193s
sys 0m0.065s
And set to 10:
real 0m0.529s
user 0m0.186s
sys 0m0.075s
See also:
How can I perform parallel asynchronous HTTP GET requests with reqwest?

Related

Calling an async function synchronously with tokio [duplicate]

I am trying to use hyper to grab the content of an HTML page and would like to synchronously return the output of a future. I realized I could have picked a better example since synchronous HTTP requests already exist, but I am more interested in understanding whether we could return a value from an async calculation.
extern crate futures;
extern crate hyper;
extern crate hyper_tls;
extern crate tokio;
use futures::{future, Future, Stream};
use hyper::Client;
use hyper::Uri;
use hyper_tls::HttpsConnector;
use std::str;
fn scrap() -> Result<String, String> {
let scraped_content = future::lazy(|| {
let https = HttpsConnector::new(4).unwrap();
let client = Client::builder().build::<_, hyper::Body>(https);
client
.get("https://hyper.rs".parse::<Uri>().unwrap())
.and_then(|res| {
res.into_body().concat2().and_then(|body| {
let s_body: String = str::from_utf8(&body).unwrap().to_string();
futures::future::ok(s_body)
})
}).map_err(|err| format!("Error scraping web page: {:?}", &err))
});
scraped_content.wait()
}
fn read() {
let scraped_content = future::lazy(|| {
let https = HttpsConnector::new(4).unwrap();
let client = Client::builder().build::<_, hyper::Body>(https);
client
.get("https://hyper.rs".parse::<Uri>().unwrap())
.and_then(|res| {
res.into_body().concat2().and_then(|body| {
let s_body: String = str::from_utf8(&body).unwrap().to_string();
println!("Reading body: {}", s_body);
Ok(())
})
}).map_err(|err| {
println!("Error reading webpage: {:?}", &err);
})
});
tokio::run(scraped_content);
}
fn main() {
read();
let content = scrap();
println!("Content = {:?}", &content);
}
The example compiles and the call to read() succeeds, but the call to scrap() panics with the following error message:
Content = Err("Error scraping web page: Error { kind: Execute, cause: None }")
I understand that I failed to launch the task properly before calling .wait() on the future but I couldn't find how to properly do it, assuming it's even possible.

Standard library futures
Let's use this as our minimal, reproducible example:
async fn example() -> i32 {
42
}
Call executor::block_on:
use futures::executor; // 0.3.1
fn main() {
let v = executor::block_on(example());
println!("{}", v);
}
Tokio
Use the tokio::main attribute on any function (not just main!) to convert it from an asynchronous function to a synchronous one:
use tokio; // 0.3.5
#[tokio::main]
async fn main() {
let v = example().await;
println!("{}", v);
}
tokio::main is a macro that transforms this
#[tokio::main]
async fn main() {}
Into this:
fn main() {
tokio::runtime::Builder::new_multi_thread()
.enable_all()
.build()
.unwrap()
.block_on(async { {} })
}
This uses Runtime::block_on under the hood, so you can also write this as:
use tokio::runtime::Runtime; // 0.3.5
fn main() {
let v = Runtime::new().unwrap().block_on(example());
println!("{}", v);
}
For tests, you can use tokio::test.
async-std
Use the async_std::main attribute on the main function to convert it from an asynchronous function to a synchronous one:
use async_std; // 1.6.5, features = ["attributes"]
#[async_std::main]
async fn main() {
let v = example().await;
println!("{}", v);
}
For tests, you can use async_std::test.
Futures 0.1
Let's use this as our minimal, reproducible example:
use futures::{future, Future}; // 0.1.27
fn example() -> impl Future<Item = i32, Error = ()> {
future::ok(42)
}
For simple cases, you only need to call wait:
fn main() {
let s = example().wait();
println!("{:?}", s);
}
However, this comes with a pretty severe warning:
This method is not appropriate to call on event loops or similar I/O situations because it will prevent the event loop from making progress (this blocks the thread). This method should only be called when it's guaranteed that the blocking work associated with this future will be completed by another thread.
Tokio
If you are using Tokio 0.1, you should use Tokio's Runtime::block_on:
use tokio; // 0.1.21
fn main() {
let mut runtime = tokio::runtime::Runtime::new().expect("Unable to create a runtime");
let s = runtime.block_on(example());
println!("{:?}", s);
}
If you peek in the implementation of block_on, it actually sends the future's result down a channel and then calls wait on that channel! This is fine because Tokio guarantees to run the future to completion.
See also:
How can I efficiently extract the first element of a futures::Stream in a blocking manner?

As this is the top result that come up in search engines by the query "How to call async from sync in Rust", I decided to share my solution here. I think it might be useful.
As #Shepmaster mentioned, back in version 0.1 futures crate had beautiful method .wait() that could be used to call an async function from a sync one. This must-have method, however, was removed from later versions of the crate.
Luckily, it's not that hard to re-implement it:
trait Block {
fn wait(self) -> <Self as futures::Future>::Output
where Self: Sized, Self: futures::Future
{
futures::executor::block_on(self)
}
}
impl<F,T> Block for F
where F: futures::Future<Output = T>
{}
After that, you can just do following:
async fn example() -> i32 {
42
}
fn main() {
let s = example().wait();
println!("{:?}", s);
}
Beware that this comes with all the caveats of original .wait() explained in the #Shepmaster's answer.

This works for me using tokio:
tokio::runtime::Runtime::new()?.block_on(fooAsyncFunction())?;

How can i make a piece of running code timeout? [duplicate]

How do I set a timeout for HTTP request using asynchronous Hyper (>= 0.11)?
Here is the example of the code without timeout:
extern crate hyper;
extern crate tokio_core;
extern crate futures;
use futures::Future;
use hyper::Client;
use tokio_core::reactor::Core;
fn main() {
let mut core = Core::new().unwrap();
let client = Client::new(&core.handle());
let uri = "http://stackoverflow.com".parse().unwrap();
let work = client.get(uri).map(|res| {
res.status()
});
match core.run(work) {
Ok(status) => println!("Status: {}", status),
Err(e) => println!("Error: {:?}", e)
}
}

Answering my own question with a working code example, based on the link provided by seanmonstar to the Hyper Guide / General Timeout:
extern crate hyper;
extern crate tokio_core;
extern crate futures;
use futures::Future;
use futures::future::Either;
use hyper::Client;
use tokio_core::reactor::Core;
use std::time::Duration;
use std::io;
fn main() {
let mut core = Core::new().unwrap();
let handle = core.handle();
let client = Client::new(&handle);
let uri: hyper::Uri = "http://stackoverflow.com".parse().unwrap();
let request = client.get(uri.clone()).map(|res| res.status());
let timeout = tokio_core::reactor::Timeout::new(Duration::from_millis(170), &handle).unwrap();
let work = request.select2(timeout).then(|res| match res {
Ok(Either::A((got, _timeout))) => Ok(got),
Ok(Either::B((_timeout_error, _get))) => {
Err(hyper::Error::Io(io::Error::new(
io::ErrorKind::TimedOut,
"Client timed out while connecting",
)))
}
Err(Either::A((get_error, _timeout))) => Err(get_error),
Err(Either::B((timeout_error, _get))) => Err(From::from(timeout_error)),
});
match core.run(work) {
Ok(status) => println!("OK: {:?}", status),
Err(e) => println!("Error: {:?}", e)
}
}

Just FYI this has gotten a lot easier with Tokyo >= 1.0, because they now have a dedicated timeout wrapper that can be applied to a future (such as a request) and which wraps the original future type inside a Result whose Ok is the original future type and whose Err is a timeout error.
Thus your code in the question can now handle timeouts as follows:
extern crate tokio; // 1.7.1, full features
use hyper::Client;
use std::time::Duration;
#[tokio::main]
async fn main() {
let client = Client::new();
let uri = "http://stackoverflow.com".parse().unwrap();
let work = client.get(uri);
match tokio::time::timeout(Duration::from_millis(10), work).await {
Ok(result) => match result {
Ok(response) => println!("Status: {}", response.status()),
Err(e) => println!("Network error: {:?}", e),
},
Err(_) => println!("Timeout: no response in 10 milliseconds."),
};
}
(Of course, this code will always give you a timeout. To see the expected 301 response from the network, try going to 200 milliseconds.)

Tokio echo server. Cannot read and write in the same future

I'm trying to build an echo server in Tokio. I've seen examples, but all of them seem to use io::copy from Tokio IO which I can't use because I want to modify the output.
However, I can't compile a server that uses writer and reader at the same time. I want to build a task based on futures that enables reading/writing in a loop (an echo server).
My actual code is this:
extern crate futures;
extern crate futures_cpupool;
extern crate tokio;
extern crate tokio_io;
use futures::prelude::*;
use futures_cpupool::CpuPool;
use tokio_io::AsyncRead;
use futures::Stream;
use futures::stream;
use tokio_io::codec::*;
use std::rc::Rc;
fn main() {
let pool = CpuPool::new_num_cpus();
use std::net::*;
let socket = SocketAddr::new(IpAddr::V4(Ipv4Addr::new(127, 0, 0, 1)), 8080);
let listener = tokio::net::TcpListener::bind(&socket).unwrap();
let server = listener.incoming().for_each(|socket| {
let (writer, reader) = socket.framed(LinesCodec::new()).split();
let writer = Rc::new(writer);
let action = reader.for_each(|line| {
println!("ECHO: {}", line);
writer.send(line);
Ok(())
});
pool.spawn(action); // std::rc::Rc<futures::stream::SplitSink<tokio_io::codec::Framed<tokio::net::TcpStream, tokio_io::codec::LinesCodec>>>` cannot be shared between threads safely
Ok(())
});
server.wait().unwrap();
}
You might say that I must use Arc because there are different threads involved. I've tried with Arc and Mutex, but another error arises and I can't figure a way to make it compile:
extern crate futures;
extern crate futures_cpupool;
extern crate tokio;
extern crate tokio_io;
use futures::prelude::*;
use std::time;
use std::thread;
use futures_cpupool::CpuPool;
use tokio_io::AsyncRead;
use futures::Stream;
use tokio_io::codec::*;
use std::sync::Arc;
use std::sync::Mutex;
fn main() {
let pool = CpuPool::new_num_cpus();
use std::net::*;
let socket = SocketAddr::new(IpAddr::V4(Ipv4Addr::new(127, 0, 0, 1)), 8080);
let listener = tokio::net::TcpListener::bind(&socket).unwrap();
let server = listener.incoming().for_each(|socket| {
let (writer, reader) = socket.framed(LinesCodec::new()).split();
let writer = Arc::new(Mutex::new(writer));
let action = reader.for_each(move |line| {
println!("ECHO: {}", line);
writer.lock().unwrap().send(line); // cannot move out of borrowed content
Ok(())
});
pool.spawn(action);
Ok(())
});
server.wait().unwrap();
}
The error it says is: cannot move out of borrowed content

I finally found that forward was the answer to my question.
extern crate tokio;
extern crate tokio_io;
extern crate futures;
use futures::prelude::*;
use tokio_io::AsyncRead;
use futures::Stream;
use tokio_io::codec::*;
struct Cancellable{
rx: std::sync::mpsc::Receiver<()>,
}
impl Future for Cancellable {
type Item = ();
type Error = std::sync::mpsc::RecvError;
fn poll(&mut self) -> Result<Async<Self::Item>,Self::Error> {
match self.rx.try_recv() {
Ok(_) => Ok(Async::Ready(())),
Err(_) => Ok(Async::NotReady)
}
}
}
fn main() {
use std::net::*;
let socket = SocketAddr::new(IpAddr::V4(Ipv4Addr::new(127, 0, 0, 1)), 8080);
let listener = tokio::net::TcpListener::bind(&socket).unwrap();
let server = listener.incoming().for_each(|socket|{
let (writer,reader) = socket.framed(LinesCodec::new()).split();
let (tx,rx) = std::sync::mpsc::channel();
let cancel = Cancellable {
rx: rx,
};
let action = reader
.map(move |line|{
println!("ECHO: {}",line);
if line == "bye"{
println!("BYE");
tx.send(()).unwrap();
}
line
})
.forward(writer)
.select2(cancel)
.map(|_|{
})
.map_err(|err|{
println!("error");
});
tokio::executor::current_thread::spawn(action);
Ok(())
}).map_err(|err|{
println!("error = {:?}",err);
});
tokio::executor::current_thread::run(|_|{
tokio::executor::current_thread::spawn(server);
});
}

Displaying the response body with Hyper only shows the size of the body

I tried to display the content (body) of an URL as text using Hyper
extern crate hyper;
use hyper::client::Client;
use std::io::Read;
fn main () {
let client = Client::new();
let mut s = String::new();
let res = client.get("https://www.reddit.com/r/programming/.rss")
.send()
.unwrap()
.read_to_string(&mut s)
.unwrap();
println!("Result: {}", res);
}
But running this script just returns the size of the body:
Result: 22871
What did I do wrong? Did I misunderstood something?

You are reading the result of the get into s but you are printing the result of this function, which is the number of bytes read. See the documentation for Read::read_to_string.
Thus the code which prints the retrieved content is:
extern crate hyper;
use hyper::client::Client;
use std::io::Read;
fn main () {
let client = Client::new();
let mut s = String::new();
let res = client.get("https://www.reddit.com/r/programming/.rss")
.send()
.unwrap()
.read_to_string(&mut s)
.unwrap();
println!("Result: {}", s);
}

Here is how to print the response status and body using tokio 0.2, hyper 0.13, and async/await syntax.
use std::error::Error;
use hyper::body;
use hyper::{Body, Client, Response};
use hyper_tls::HttpsConnector;
use tokio;
#[tokio::main]
async fn main() -> Result<(), Box<dyn Error + Send + Sync>> {
let https = HttpsConnector::new();
let client = Client::builder().build::<_, Body>(https);
let res = client
.get("https://www.reddit.com/r/programming/.rss".parse().unwrap())
.await?;
println!("Status: {}", res.status());
let body_bytes = body::to_bytes(res.into_body()).await?;
let body = String::from_utf8(body_bytes.to_vec()).expect("response was not valid utf-8");
println!("Body: {}", body);
Ok(())
}

As of hyper 0.12, the following works, provided the webpage is valid UTF-8:
extern crate hyper;
extern crate hyper_tls;
use hyper::Client;
use hyper::rt::{self, Future, Stream};
use hyper_tls::HttpsConnector;
fn main() {
rt::run(rt::lazy(|| {
let https = HttpsConnector::new(4).unwrap();
let client = Client::builder().build::<_, hyper::Body>(https);
client.get("https://www.reddit.com/r/programming/.rss".parse().unwrap())
.and_then(|res| {
println!("status {}", res.status());
res.into_body().concat2()
}).map(|body| {
println!("Body {}", String::from_utf8(body.to_vec()).unwrap());
})
.map_err(|err| {
println!("error {}", err)
})
}));
}

Read from a channel or timeout?

With Rust 1.9, I'd like to read from a mpsc::channel or timeout. Is there a clear idiom to make this work? I've seen the unstable approach described in mpsc::Select but this Github discussion suggests it is not a robust approach. Is there a better-recommended way for me to achieve receive-or-timeout semantics?

Rust 1.12 introduced Receiver::recv_timeout:
use std::sync::mpsc::channel;
use std::time::Duration;
fn main() {
let (.., rx) = channel::<bool>();
let timeout = Duration::new(3, 0);
println!("start recv");
let _ = rx.recv_timeout(timeout);
println!("done!");
}

I don't know how you'd do it with the standard library channels, but the chan crate provides a chan_select! macro:
#[macro_use]
extern crate chan;
use std::time::Duration;
fn main() {
let (_never_sends, never_receives) = chan::sync::<bool>(1);
let timeout = chan::after(Duration::from_millis(50));
chan_select! {
timeout.recv() => {
println!("timed out!");
},
never_receives.recv() => {
println!("Shouldn't have a value!");
},
}
}

I was able to get something working using the standard lib.
use std::sync::mpsc::channel;
use std::thread;
use std::time::{Duration, Instant};
use std::sync::mpsc::TryRecvError;
fn main() {
let (send, recv) = channel();
thread::spawn(move || {
send.send("Hello world!").unwrap();
thread::sleep(Duration::from_secs(1)); // block for two seconds
send.send("Delayed").unwrap();
});
println!("{}", recv.recv().unwrap()); // Received immediately
println!("Waiting...");
let mut resolved: bool = false;
let mut result: Result<&str, TryRecvError> = Ok("Null");
let now = Instant::now();
let timeout: u64= 2;
while !resolved {
result = recv.try_recv();
resolved = !result.is_err();
if now.elapsed().as_secs() as u64 > timeout {
break;
}
}
if result.is_ok(){
println!("Results: {:?}", result.unwrap());
}
println!("Time elapsed: {}", now.elapsed().as_secs());
println!("Resolved: {}", resolved.to_string());
}
This will spin for timeout seconds and will result in either the received value or an Err Result.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Getting multiple URLs concurrently with Hyper - rust

Related

Calling an async function synchronously with tokio [duplicate]

How can i make a piece of running code timeout? [duplicate]

Tokio echo server. Cannot read and write in the same future

Displaying the response body with Hyper only shows the size of the body

Read from a channel or timeout?

Categories

Resources