How do I limit simultaneous tasks with Tokio? [duplicate] - multithreading

The async example is useful, but being new to Rust and Tokio, I am struggling to work out how to do N requests at once, using URLs from a vector, and creating an iterator of the response HTML for each URL as a string.
How could this be done?

Concurrent requests
As of reqwest 0.10:
use futures::{stream, StreamExt}; // 0.3.5
use reqwest::Client; // 0.10.6
use tokio; // 0.2.21, features = ["macros"]
const CONCURRENT_REQUESTS: usize = 2;
#[tokio::main]
async fn main() {
let client = Client::new();
let urls = vec!["https://api.ipify.org"; 2];
let bodies = stream::iter(urls)
.map(|url| {
let client = &client;
async move {
let resp = client.get(url).send().await?;
resp.bytes().await
}
})
.buffer_unordered(CONCURRENT_REQUESTS);
bodies
.for_each(|b| async {
match b {
Ok(b) => println!("Got {} bytes", b.len()),
Err(e) => eprintln!("Got an error: {}", e),
}
})
.await;
}
stream::iter(urls)
stream::iter
Take a collection of strings and convert it into a Stream.
.map(|url| {
StreamExt::map
Run an asynchronous function on every element in the stream and transform the element to a new type.
let client = &client;
async move {
Take an explicit reference to the Client and move the reference (not the original Client) into an anonymous asynchronous block.
let resp = client.get(url).send().await?;
Start an asynchronous GET request using the Client's connection pool and wait for the request.
resp.bytes().await
Request and wait for the bytes of the response.
.buffer_unordered(N);
StreamExt::buffer_unordered
Convert a stream of futures into a stream of those future's values, executing the futures concurrently.
bodies
.for_each(|b| {
async {
match b {
Ok(b) => println!("Got {} bytes", b.len()),
Err(e) => eprintln!("Got an error: {}", e),
}
}
})
.await;
StreamExt::for_each
Convert the stream back into a single future, printing out the amount of data received along the way, then wait for the future to complete.
See also:
Join futures with limited concurrency
How to merge iterator of streams?
How do I synchronously return a value calculated in an asynchronous Future in stable Rust?
What is the difference between `then`, `and_then` and `or_else` in Rust futures?
Without bounded execution
If you wanted to, you could also convert an iterator into an iterator of futures and use future::join_all:
use futures::future; // 0.3.4
use reqwest::Client; // 0.10.1
use tokio; // 0.2.11
#[tokio::main]
async fn main() {
let client = Client::new();
let urls = vec!["https://api.ipify.org"; 2];
let bodies = future::join_all(urls.into_iter().map(|url| {
let client = &client;
async move {
let resp = client.get(url).send().await?;
resp.bytes().await
}
}))
.await;
for b in bodies {
match b {
Ok(b) => println!("Got {} bytes", b.len()),
Err(e) => eprintln!("Got an error: {}", e),
}
}
}
I'd encourage using the first example as you usually want to limit the concurrency, which buffer and buffer_unordered help with.
Parallel requests
Concurrent requests are generally good enough, but there are times where you need parallel requests. In that case, you need to spawn a task.
use futures::{stream, StreamExt}; // 0.3.8
use reqwest::Client; // 0.10.9
use tokio; // 0.2.24, features = ["macros"]
const PARALLEL_REQUESTS: usize = 2;
#[tokio::main]
async fn main() {
let urls = vec!["https://api.ipify.org"; 2];
let client = Client::new();
let bodies = stream::iter(urls)
.map(|url| {
let client = client.clone();
tokio::spawn(async move {
let resp = client.get(url).send().await?;
resp.bytes().await
})
})
.buffer_unordered(PARALLEL_REQUESTS);
bodies
.for_each(|b| async {
match b {
Ok(Ok(b)) => println!("Got {} bytes", b.len()),
Ok(Err(e)) => eprintln!("Got a reqwest::Error: {}", e),
Err(e) => eprintln!("Got a tokio::JoinError: {}", e),
}
})
.await;
}
The primary differences are:
We use tokio::spawn to perform work in separate tasks.
We have to give each task its own reqwest::Client. As recommended, we clone a shared client to make use of the connection pool.
There's an additional error case when the task cannot be joined.
See also:
What is the difference between concurrent programming and parallel programming?
What is the difference between concurrency and parallelism?
What is the difference between concurrency, parallelism and asynchronous methods?

If possible for your problem I recommend using std async and rayon. They are both mature now and really easy to get started with given the async{/* code here */} scope bounds in std. You can also work into/alongside tokio with feature integration https://docs.rs/async-std/1.10.0/async_std/#features

Related

Stress site with HTTP requests for rate limiting test

I want to test if the rate limiting of my site is working.
To do this I would like to send a controlled amount of requests. For example, exactly 100 requests per second, and probably save the responses.
From How can I perform parallel asynchronous HTTP GET requests with reqwest? and this gist I wrote the following:
# [dependencies]
# reqwest = { version = "0.11.6" }
# tokio = { version = "1.14.0", features = ["full"] }
# futures = "0.3.24"
use futures::stream::StreamExt;
use reqwest::Client;
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
let paths: Vec<String> = vec![String::from("https://api.ipify.org/"); 50];
let fetches = futures::stream::iter(paths.into_iter().map(|path| {
let client = Client::new();
let send_fut = client.get(&path).send();
async move {
let response = send_fut.await;
match response {
Ok(resp) => {
println!("{}", resp.status());
}
Err(_) => println!("Error"),
}
}
}))
.buffer_unordered(100)
.collect::<Vec<()>>();
fetches.await;
Ok(())
}
The point is that I dont know how to control how many request are executed per second.
Any idea will be welcome!
You can use the tokio::time::interval
use tokio::time;
async fn request() {
println!("send request");
}
#[tokio::main]
async fn main() {
let mut interval = time::interval(time::Duration::from_millis(10));
for _i in 0..100 {
interval.tick().await;
request().await;
}
}

Receiver on tokio's mpsc channel only receives messages when buffer is full

I've spent a few hours trying to figure this out and I'm pretty done. I found the question with a similar name, but that looks like something was blocking synchronously which was messing with tokio. That very well may be the issue here, but I have absolutely no idea what is causing it.
Here is a heavily stripped down version of my project which hopefully gets the issue across.
use std::io;
use futures_util::{
SinkExt,
stream::{SplitSink, SplitStream},
StreamExt,
};
use tokio::{
net::TcpStream,
sync::mpsc::{channel, Receiver, Sender},
};
use tokio_tungstenite::{
connect_async,
MaybeTlsStream,
tungstenite::Message,
WebSocketStream,
};
#[tokio::main]
async fn main() {
connect_to_server("wss://a_valid_domain.com".to_string()).await;
}
async fn read_line() -> String {
loop {
let mut str = String::new();
io::stdin().read_line(&mut str).unwrap();
str = str.trim().to_string();
if !str.is_empty() {
return str;
}
}
}
async fn connect_to_server(url: String) {
let (ws_stream, _) = connect_async(url).await.unwrap();
let (write, read) = ws_stream.split();
let (tx, rx) = channel::<ChannelMessage>(100);
tokio::spawn(channel_thread(write, rx));
tokio::spawn(handle_std_input(tx.clone()));
read_messages(read, tx).await;
}
#[derive(Debug)]
enum ChannelMessage {
Text(String),
Close,
}
// PROBLEMATIC FUNCTION
async fn channel_thread(
mut write: SplitSink<WebSocketStream<MaybeTlsStream<TcpStream>>, Message>,
mut rx: Receiver<ChannelMessage>,
) {
while let Some(msg) = rx.recv().await {
println!("{:?}", msg); // This only fires when buffer is full
match msg {
ChannelMessage::Text(text) => write.send(Message::Text(text)).await.unwrap(),
ChannelMessage::Close => {
write.close().await.unwrap();
rx.close();
return;
}
}
}
}
async fn read_messages(
mut read: SplitStream<WebSocketStream<MaybeTlsStream<TcpStream>>>,
tx: Sender<ChannelMessage>,
) {
while let Some(msg) = read.next().await {
let msg = match msg {
Ok(m) => m,
Err(_) => continue
};
match msg {
Message::Text(m) => println!("{}", m),
Message::Close(_) => break,
_ => {}
}
}
if !tx.is_closed() {
let _ = tx.send(ChannelMessage::Close).await;
}
}
async fn handle_std_input(tx: Sender<ChannelMessage>) {
loop {
let str = read_line().await;
if tx.is_closed() {
break;
}
tx.send(ChannelMessage::Text(str)).await.unwrap();
}
}
As you can see, what I'm trying to do is:
Connect to a websocket
Print outgoing messages from the websocket
Forward any input from stdin to the websocket
Also a custom heartbeat solution which was trimmed out
The issue lies in the channel_thread() function. I move the websocket writer into this function as well as the channel receiver. The issue is, it only loops over the sent objects when the buffer is full.
I've spent a lot of time trying to solve this, any help is greatly appreciated.
Here, you make a blocking synchronous call in an async context:
async fn read_line() -> String {
loop {
let mut str = String::new();
io::stdin().read_line(&mut str).unwrap();
// ^^^^^^^^^^^^^^^^^^^
// This is sync+blocking
str = str.trim().to_string();
if !str.is_empty() {
return str;
}
}
}
You never ever make blocking synchronous calls in an async context, because that prevents the entire thread from running other async tasks. Your channel receiver task is likely also assigned to this thread, so it's having to wait until all the blocking calls are done and whatever invokes this function yields back to the async runtime.
Tokio has its own async version of stdin, which you should use instead.

How to share reqwest::Client between concurrent requests? [duplicate]

The async example is useful, but being new to Rust and Tokio, I am struggling to work out how to do N requests at once, using URLs from a vector, and creating an iterator of the response HTML for each URL as a string.
How could this be done?
Concurrent requests
As of reqwest 0.10:
use futures::{stream, StreamExt}; // 0.3.5
use reqwest::Client; // 0.10.6
use tokio; // 0.2.21, features = ["macros"]
const CONCURRENT_REQUESTS: usize = 2;
#[tokio::main]
async fn main() {
let client = Client::new();
let urls = vec!["https://api.ipify.org"; 2];
let bodies = stream::iter(urls)
.map(|url| {
let client = &client;
async move {
let resp = client.get(url).send().await?;
resp.bytes().await
}
})
.buffer_unordered(CONCURRENT_REQUESTS);
bodies
.for_each(|b| async {
match b {
Ok(b) => println!("Got {} bytes", b.len()),
Err(e) => eprintln!("Got an error: {}", e),
}
})
.await;
}
stream::iter(urls)
stream::iter
Take a collection of strings and convert it into a Stream.
.map(|url| {
StreamExt::map
Run an asynchronous function on every element in the stream and transform the element to a new type.
let client = &client;
async move {
Take an explicit reference to the Client and move the reference (not the original Client) into an anonymous asynchronous block.
let resp = client.get(url).send().await?;
Start an asynchronous GET request using the Client's connection pool and wait for the request.
resp.bytes().await
Request and wait for the bytes of the response.
.buffer_unordered(N);
StreamExt::buffer_unordered
Convert a stream of futures into a stream of those future's values, executing the futures concurrently.
bodies
.for_each(|b| {
async {
match b {
Ok(b) => println!("Got {} bytes", b.len()),
Err(e) => eprintln!("Got an error: {}", e),
}
}
})
.await;
StreamExt::for_each
Convert the stream back into a single future, printing out the amount of data received along the way, then wait for the future to complete.
See also:
Join futures with limited concurrency
How to merge iterator of streams?
How do I synchronously return a value calculated in an asynchronous Future in stable Rust?
What is the difference between `then`, `and_then` and `or_else` in Rust futures?
Without bounded execution
If you wanted to, you could also convert an iterator into an iterator of futures and use future::join_all:
use futures::future; // 0.3.4
use reqwest::Client; // 0.10.1
use tokio; // 0.2.11
#[tokio::main]
async fn main() {
let client = Client::new();
let urls = vec!["https://api.ipify.org"; 2];
let bodies = future::join_all(urls.into_iter().map(|url| {
let client = &client;
async move {
let resp = client.get(url).send().await?;
resp.bytes().await
}
}))
.await;
for b in bodies {
match b {
Ok(b) => println!("Got {} bytes", b.len()),
Err(e) => eprintln!("Got an error: {}", e),
}
}
}
I'd encourage using the first example as you usually want to limit the concurrency, which buffer and buffer_unordered help with.
Parallel requests
Concurrent requests are generally good enough, but there are times where you need parallel requests. In that case, you need to spawn a task.
use futures::{stream, StreamExt}; // 0.3.8
use reqwest::Client; // 0.10.9
use tokio; // 0.2.24, features = ["macros"]
const PARALLEL_REQUESTS: usize = 2;
#[tokio::main]
async fn main() {
let urls = vec!["https://api.ipify.org"; 2];
let client = Client::new();
let bodies = stream::iter(urls)
.map(|url| {
let client = client.clone();
tokio::spawn(async move {
let resp = client.get(url).send().await?;
resp.bytes().await
})
})
.buffer_unordered(PARALLEL_REQUESTS);
bodies
.for_each(|b| async {
match b {
Ok(Ok(b)) => println!("Got {} bytes", b.len()),
Ok(Err(e)) => eprintln!("Got a reqwest::Error: {}", e),
Err(e) => eprintln!("Got a tokio::JoinError: {}", e),
}
})
.await;
}
The primary differences are:
We use tokio::spawn to perform work in separate tasks.
We have to give each task its own reqwest::Client. As recommended, we clone a shared client to make use of the connection pool.
There's an additional error case when the task cannot be joined.
See also:
What is the difference between concurrent programming and parallel programming?
What is the difference between concurrency and parallelism?
What is the difference between concurrency, parallelism and asynchronous methods?
If possible for your problem I recommend using std async and rayon. They are both mature now and really easy to get started with given the async{/* code here */} scope bounds in std. You can also work into/alongside tokio with feature integration https://docs.rs/async-std/1.10.0/async_std/#features

Why does reading from a Rusoto S3 stream inside an Actix Web handler cause a deadlock?

I'm writing an application using actix_web and rusoto_s3.
When I run a command outside of an actix request directly from main, it runs fine, and the get_object works as expected. When this is encapsulated inside an actix_web request, the stream is blocked forever.
I have a client that is shared for all requests which is encapsulated into an Arc (this happens in actix data internals).
Full code:
fn index(
_req: HttpRequest,
path: web::Path<String>,
s3: web::Data<S3Client>,
) -> impl Future<Item = HttpResponse, Error = actix_web::Error> {
s3.get_object(GetObjectRequest {
bucket: "my_bucket".to_owned(),
key: path.to_owned(),
..Default::default()
})
.and_then(move |res| {
info!("Response {:?}", res);
let mut stream = res.body.unwrap().into_blocking_read();
let mut body = Vec::new();
stream.read_to_end(&mut body).unwrap();
match process_file(body.as_slice()) {
Ok(result) => Ok(result),
Err(error) => Err(RusotoError::from(error)),
}
})
.map_err(|e| match e {
RusotoError::Service(GetObjectError::NoSuchKey(key)) => {
actix_web::error::ErrorNotFound(format!("{} not found", key))
}
error => {
error!("Error: {:?}", error);
actix_web::error::ErrorInternalServerError("error")
}
})
.from_err()
.and_then(move |img| HttpResponse::Ok().body(Body::from(img)))
}
fn health() -> HttpResponse {
HttpResponse::Ok().finish()
}
fn main() -> std::io::Result<()> {
let name = "rust_s3_test";
env::set_var("RUST_LOG", "debug");
pretty_env_logger::init();
let sys = actix_rt::System::builder().stop_on_panic(true).build();
let prometheus = PrometheusMetrics::new(name, "/metrics");
let s3 = S3Client::new(Region::Custom {
name: "eu-west-1".to_owned(),
endpoint: "http://localhost:9000".to_owned(),
});
let s3_client_data = web::Data::new(s3);
Server::build()
.bind(name, "0.0.0.0:8080", move || {
HttpService::build().keep_alive(KeepAlive::Os).h1(App::new()
.register_data(s3_client_data.clone())
.wrap(prometheus.clone())
.wrap(actix_web::middleware::Logger::default())
.service(web::resource("/health").route(web::get().to(health)))
.service(web::resource("/{file_name}").route(web::get().to_async(index))))
})?
.start();
sys.run()
}
In stream.read_to_end the thread is being blocked and never resolved.
I have tried cloning the client per request and also creating a new client per request, but I've got the same result in all scenarios.
Am I doing something wrong?
It works if I don't use it async...
s3.get_object(GetObjectRequest {
bucket: "my_bucket".to_owned(),
key: path.to_owned(),
..Default::default()
})
.sync()
.unwrap()
.body
.unwrap()
.into_blocking_read();
let mut body = Vec::new();
io::copy(&mut stream, &mut body);
Is this an issue with Tokio?
let mut stream = res.body.unwrap().into_blocking_read();
Check the implementation of into_blocking_read(): it calls .wait(). You shouldn't call blocking code inside a Future.
Since Rusoto's body is a Stream, there is a way to read it asynchronously:
.and_then(move |res| {
info!("Response {:?}", res);
let stream = res.body.unwrap();
stream.concat2().map(move |file| {
process_file(&file[..]).unwrap()
})
.map_err(|e| RusotoError::from(e)))
})
process_file should not block the enclosing Future. If it needs to block, you may consider running it on new thread or encapsulate with tokio_threadpool's blocking.
Note: You can use tokio_threadpool's blocking in your implementation, but I recommend you understand how it works first.
If you are not aiming to load the whole file into memory, you can use for_each:
stream.for_each(|part| {
//process each part in here
//Warning! Do not add blocking code here either.
})
See also:
What is the best approach to encapsulate blocking I/O in future-rs?
Why does Future::select choose the future with a longer sleep period first?

Rusoto async using FuturesOrdered combinator

I am trying to send off parallel asynchronous Rusoto SQS requests using FuturesOrdered:
use futures::prelude::*; // 0.1.26
use futures::stream::futures_unordered::FuturesUnordered;
use rusoto_core::{Region, HttpClient}; // 0.38.0
use rusoto_credential::EnvironmentProvider; // 0.17.0
use rusoto_sqs::{SendMessageBatchRequest, SendMessageBatchRequestEntry, Sqs, SqsClient}; // 0.38.0
fn main() {
let client = SqsClient::new_with(
HttpClient::new().unwrap(),
EnvironmentProvider::default(),
Region::UsWest2,
);
let messages: Vec<u32> = (1..12).map(|n| n).collect();
let chunks: Vec<_> = messages.chunks(10).collect();
let tasks: FuturesUnordered<_> = chunks.into_iter().map(|c| {
let batch = create_batch(c);
client.send_message_batch(batch)
}).collect();
let tasks = tasks
.for_each(|t| {
println!("{:?}", t);
Ok(())
})
.map_err(|e| println!("{}", e));
tokio::run(tasks);
}
fn create_batch(ids: &[u32]) -> SendMessageBatchRequest {
let queue_url = "https://sqs.us-west-2.amazonaws.com/xxx/xxx".to_string();
let entries = ids
.iter()
.map(|id| SendMessageBatchRequestEntry {
id: id.to_string(),
message_body: id.to_string(),
..Default::default()
})
.collect();
SendMessageBatchRequest {
entries,
queue_url,
}
}
The tasks complete correctly but tokio::run(tasks) doesn't stop. I assume that is because of tasks.for_each() will force it to continue to run and look for more futures?
Why doesn't tokio::run(tasks) stop? Am I using FuturesOrdered correctly?
I am also a little worried about memory usage when creating up to 60,000 futures to complete and pushing them into the FuturesUnordered combinator.
I discovered that it was the SqsClient in the main function that was causing it to block as it is still doing some house work even though the tasks are finished.
A solution provided by one of the Rusoto people was to add this just above tokio::run
std::mem::drop(client);

Resources