How can I use hyper::client from another thread? - multithreading

I have multiple threads performing some heavy operations and I need to use a client in middle of work. I'm using Hyper v0.11 as a HTTP client and I would like to reuse the connections so I need to share the same hyper::Client in order to keep open the connections (under keep-alive mode).
The client is not shareable among threads (it doesn't implement Sync or Send). Here a small snippet with the code I've tried to do:
let mut core = Core::new().expect("Create Client Event Loop");
let handle = core.handle();
let remote = core.remote();
let client = Client::new(&handle.clone());
thread::spawn(move || {
// intensive operations...
let response = &client.get("http://google.com".parse().unwrap()).and_then(|res| {
println!("Response: {}", res.status());
Ok(())
});
remote.clone().spawn(|_| {
response.map(|_| { () }).map_err(|_| { () })
});
// more intensive operations...
});
core.run(futures::future::empty::<(), ()>()).unwrap();
This code doesn't compile:
thread::spawn(move || {
^^^^^^^^^^^^^ within `[closure#src/load-balancer.rs:46:19: 56:6 client:hyper::Client<hyper::client::HttpConnector>, remote:std::sync::Arc<tokio_core::reactor::Remote>]`, the trait `std::marker::Send` is not implemented for `std::rc::Weak<std::cell::RefCell<tokio_core::reactor::Inner>>`
thread::spawn(move || {
^^^^^^^^^^^^^ within `[closure#src/load-balancer.rs:46:19: 56:6 client:hyper::Client<hyper::client::HttpConnector>, remote:std::sync::Arc<tokio_core::reactor::Remote>]`, the trait `std::marker::Send` is not implemented for `std::rc::Rc<std::cell::RefCell<hyper::client::pool::PoolInner<tokio_proto::util::client_proxy::ClientProxy<tokio_proto::streaming::message::Message<hyper::http::MessageHead<hyper::http::RequestLine>, hyper::Body>, tokio_proto::streaming::message::Message<hyper::http::MessageHead<hyper::http::RawStatus>, tokio_proto::streaming::body::Body<hyper::Chunk, hyper::Error>>, hyper::Error>>>>`
...
remote.clone().spawn(|_| {
^^^^^ the trait `std::marker::Sync` is not implemented for `futures::Future<Error=hyper::Error, Item=hyper::Response> + 'static`
Is there any way to reuse the same client from different threads or some other approach?

The short answer is no, but it's better that way.
Each Client object holds a pool of connections. Here's how Hyper's Pool is defined in version 0.11.0:
pub struct Pool<T> {
inner: Rc<RefCell<PoolInner<T>>>,
}
As inner is reference-counted with an Rc and borrow-checked in run-time with RefCell, the pool is certainly not thread-safe. When you tried to move that Client to a new thread, that object would be holding a pool that lives in another thread, which would have been a source of data races.
This implementation is understandable. Attempting to reuse an HTTP connection across multiple threads is not very usual, as it requires synchronized access to a resource that is mostly I/O intensive. This couples pretty well with Tokio's asynchronous nature. It is actually more reasonable to perform multiple requests in the same thread, and let Tokio's core take care of sending messages and receiving them asynchronously, without waiting for each response in sequence. Moreover, computationally intensive tasks can be executed by a CPU pool from futures_cpupool. With that in mind, the code below works fine:
extern crate tokio_core;
extern crate hyper;
extern crate futures;
extern crate futures_cpupool;
use tokio_core::reactor::Core;
use hyper::client::Client;
use futures::Future;
use futures_cpupool::CpuPool;
fn main() {
let mut core = Core::new().unwrap();
let handle = core.handle();
let client = Client::new(&handle.clone());
let pool = CpuPool::new(1);
println!("Begin!");
let req = client.get("http://google.com".parse().unwrap())
.and_then(|res| {
println!("Response: {}", res.status());
Ok(())
});
let intensive = pool.spawn_fn(|| {
println!("I'm working hard!!!");
std::thread::sleep(std::time::Duration::from_secs(1));
println!("Phew!");
Ok(())
});
let task = req.join(intensive)
.map(|_|{
println!("End!");
});
core.run(task).unwrap();
}
If the response is not received too late, the output will be:
Begin!
I'm working hard!!!
Response: 302 Found
Phew!
End!
If you have multiple tasks running in separate threads, the problem becomes open-ended, since there are multiple architectures feasible. One of them is to delegate all communications to a single actor, thus requiring all other worker threads to send their data to it. Alternatively, you can have one client object to each worker, thus also having separate connection pools.

Related

What are idiomatic ways to send data between threads?

I want to do some calculation in a separate thread, and then recover the data from the main thread. What are the canonical ways to pass some data from a thread to another in Rust?
fn main() {
let handle = std::thread::spawn(|| {
// I want to send this to the main thread:
String::from("Hello world!")
});
// How to recover the data from the other thread?
handle.join().unwrap();
}
There are lots of ways to send send data between threads -- without a clear "best" solution. It depends on your situation.
Using just thread::join
Many people do not realize that you can very easily send data with only the thread API, but only twice: once to the new thread and once back.
use std::thread;
let data_in = String::from("lots of data");
let handle = thread::spawn(move || {
println!("{}", data_in); // we can use the data here!
let data_out = heavy_compuations();
data_out // <-- simply return the data from the closure
});
let data_out = handle.join().expect("thread panicked :(");
println!("{}", data_out); // we can use the data generated in the thread here!
(Playground)
This is immensely useful for threads that are just spawned to do one specific job. Note the move keyword before the closure that makes sure all referenced variables are moved into the closure (which is then moved to another thread).
Channels from std
The standard library offers a multi producer single consumer channel in std::sync::mpsc. You can send arbitrarily many values through a channel, so it can be used in more situations. Simple example:
use std::{
sync::mpsc::channel,
thread,
time::Duration,
};
let (sender, receiver) = channel();
thread::spawn(move || {
sender.send("heavy computation 1").expect("receiver hung up :(");
thread::sleep(Duration::from_millis(500));
sender.send("heavy computation 2").expect("receiver hung up :(");
});
let result1 = receiver.recv().unwrap();
let result2 = receiver.recv().unwrap();
(Playground)
Of course you can create another channel to provide communication in the other direction as well.
More powerful channels by crossbeam
Unfortunately, the standard library currently only provides channels that are restricted to a single consumer (i.e. Receiver can't be cloned). To get more powerful channels, you probably want to use the channels from the awesome crossbeam library. Their description:
This crate is an alternative to std::sync::mpsc with more features and better performance.
In particular, it is a mpmc (multi consumer!) channel. This provides a nice way to easily share work between multiple threads. Example:
use std::thread;
// You might want to use a bounded channel instead...
let (sender, receiver) = crossbeam_channel::unbounded();
for _ in 0..num_cpus::get() {
let receiver = receiver.clone(); // clone for this thread
thread::spawn(move || {
for job in receiver {
// process job
}
});
}
// Generate jobs
for x in 0..10_000 {
sender.send(x).expect("all threads hung up :(");
}
(Playground)
Again, adding another channel allows you to communicate results back to the main thread.
Other methods
There are plenty of other crates that offer some other means of sending data between threads. Too many to list them here.
Note that sending data is not the only way to communicate between threads. There is also the possibility to share data between threads via Mutex, atomics, lock-free data structures and many other ways. This is conceptually very different. It depends on the situation whether sending or sharing data is the better way to describe your cross thread communication.
The idiomatic way to do so is to use a channel. It conceptually behaves like an unidirectional tunnel: you put something in one end and it comes out the other side.
use std::sync::mpsc::channel;
fn main() {
let (sender, receiver) = channel();
let handle = std::thread::spawn(move || {
sender.send(String::from("Hello world!")).unwrap();
});
let data = receiver.recv().unwrap();
println!("Got {:?}", data);
handle.join().unwrap();
}
The channel won't work anymore when the receiver is dropped.
They are mainly 3 ways to recover the data:
recv will block until something is received
try_recv will return immediately. If the channel is not closed, it is either Ok(data) or Err(TryRevcError::Empty).
recv_timeout is the same as try_recv but it waits to get a data a certain amount of time.

How can I send a message to a specific thread?

I need to create some threads where some of them are going to run until their runner variable value has been changed. This is my minimal code.
use std::sync::{Arc, Mutex};
use std::thread;
use std::time::Duration;
fn main() {
let mut log_runner = Arc::new(Mutex::new(true));
println!("{}", *log_runner.lock().unwrap());
let mut threads = Vec::new();
{
let mut log_runner_ref = Arc::clone(&log_runner);
// log runner thread
let handle = thread::spawn(move || {
while *log_runner_ref.lock().unwrap() == true {
// DO SOME THINGS CONTINUOUSLY
println!("I'm a separate thread!");
}
});
threads.push(handle);
}
// let the main thread to sleep for x time
thread::sleep(Duration::from_millis(1));
// stop the log_runner thread
*log_runner.lock().unwrap() = false;
// join all threads
for handle in threads {
handle.join().unwrap();
println!("Thread joined!");
}
println!("{}", *log_runner.lock().unwrap());
}
It looks like I'm able to set the log_runner_ref in the log runner thread after 1 second to false. Is there a way to mark the treads with some name / ID or something similar and send a message to a specific thread using its specific marker (name / ID)?
If I understand it correctly, then the let (tx, rx) = mpsc::channel(); can be used for sending messages to all the threads simultaneously rather than to a specific one. I could send some identifier with the messages and each thread will be looking for its own identifier for the decision if to act on received message or not, but I would like to avoid the broadcasting effect.
MPSC stands for Multiple Producers, Single Consumer. As such, no, you cannot use that by itself to send a message to all threads, since for that you'd have to be able to duplicate the consumer. There are tools for this, but the choice of them requires a bit more info than just "MPMC" or "SPMC".
Honestly, if you can rely on channels for messaging (there are cases where it'd be a bad idea), you can create a channel per thread, assign the ID outside of the thread, and keep a HashMap instead of a Vec with the IDs associated to the threads. Receiver<T> can be moved into the thread (it implements Send if T implements Send), so you can quite literally move it in.
You then keep the Sender outside and send stuff to it :-)

How to use Rust shiplift from a hyper server

I'm trying to write a simple Rust program that reads Docker stats using shiplift and exposes them as Prometheus metrics using rust-prometheus.
The shiplift stats example runs correctly on its own, and I'm trying to integrate it in the server as
fn handle(_req: Request<Body>) -> Response<Body> {
let docker = Docker::new();
let containers = docker.containers();
let id = "my-id";
let stats = containers
.get(&id)
.stats().take(1).wait();
for s in stats {
println!("{:?}", s);
}
// ...
}
// in main
let make_service = || {
service_fn_ok(handle)
};
let server = Server::bind(&addr)
.serve(make_service);
but it appears that the stream hangs forever (I cannot produce any error message).
I've also tried the same refactor (using take and wait instead of tokio::run) in the shiplift example, but in that case I get the error executor failed to spawn task: tokio::spawn failed (is a tokio runtime running this future?). Is tokio somehow required by shiplift?
EDIT:
If I've understood correctly, my attempt does not work because wait will block tokio executor and stats will never produce results.
shiplift's API is asynchronous, meaning wait() and other functions return a Future, instead of blocking the main thread until a result is ready. A Future won't actually do any I/O until it is passed to an executor. You need to pass the Future to tokio::run as in the example you linked to. You should read the tokio docs to get a better understanding of how to write asynchronous code in rust.
There were quite a few mistakes in my understanding of how hyper works. Basically:
if a service should handle futures, do not use service_fn_ok to create it (it is meant for synchronous services): use service_fn;
do not use wait: all futures use the same executor, the execution will just hang forever (there is a warning in the docs but oh well...);
as ecstaticm0rse notices, hyper::rt::spawn could be used to read stats asynchronously, instead of doing it in the service
Is tokio somehow required by shiplift?
Yes. It uses hyper, which throws executor failed to spawn task if the default tokio executor is not available (working with futures nearly always requires an executor anyway).
Here is a minimal version of what I ended up with (tokio 0.1.20 and hyper 0.12):
use std::net::SocketAddr;
use std::time::{Duration, Instant};
use tokio::prelude::*;
use tokio::timer::Interval;
use hyper::{
Body, Response, service::service_fn_ok,
Server, rt::{spawn, run}
};
fn init_background_task(swarm_name: String) -> impl Future<Item = (), Error = ()> {
Interval::new(Instant::now(), Duration::from_secs(1))
.map_err(|e| panic!(e))
.for_each(move |_instant| {
futures::future::ok(()) // unimplemented: call shiplift here
})
}
fn init_server(address: SocketAddr) -> impl Future<Item = (), Error = ()> {
let service = move || {
service_fn_ok(|_request| Response::new(Body::from("unimplemented")))
};
Server::bind(&address)
.serve(service)
.map_err(|e| panic!("Server error: {}", e))
}
fn main() {
let background_task = init_background_task("swarm_name".to_string());
let server = init_server(([127, 0, 0, 1], 9898).into());
run(hyper::rt::lazy(move || {
spawn(background_task);
spawn(server);
Ok(())
}));
}

Forced to use of Mutex when it's not required

I am writing a game and have a player list defined as follows:
pub struct PlayerList {
by_name: HashMap<String, Arc<Mutex<Player>>>,
by_uuid: HashMap<Uuid, Arc<Mutex<Player>>>,
}
This struct has methods for adding, removing, getting players, and getting the player count.
The NetworkServer and Server shares this list as follows:
NetworkServer {
...
player_list: Arc<Mutex<PlayerList>>,
...
}
Server {
...
player_list: Arc<Mutex<PlayerList>>,
...
}
This is inside an Arc<Mutex> because the NetworkServer accesses the list in a different thread (network loop).
When a player joins, a thread is spawned for them and they are added to the player_list.
Although the only operation I'm doing is adding to player_list, I'm forced to use Arc<Mutex<Player>> instead of the more natural Rc<RefCell<Player>> in the HashMaps because Mutex<PlayerList> requires it. I am not accessing players from the network thread (or any other thread) so it makes no sense to put them under a Mutex. Only the HashMaps need to be locked, which I am doing using Mutex<PlayerList>. But Rust is pedantic and wants to protect against all misuses.
As I'm only accessing Players in the main thread, locking every time to do that is both annoying and less performant. Is there a workaround instead of using unsafe or something?
Here's an example:
use std::cell::Cell;
use std::collections::HashMap;
use std::ffi::CString;
use std::rc::Rc;
use std::sync::{Arc, Mutex};
use std::thread;
#[derive(Clone, Copy, PartialEq, Eq, Hash)]
struct Uuid([u8; 16]);
struct Player {
pub name: String,
pub uuid: Uuid,
}
struct PlayerList {
by_name: HashMap<String, Arc<Mutex<Player>>>,
by_uuid: HashMap<Uuid, Arc<Mutex<Player>>>,
}
impl PlayerList {
fn add_player(&mut self, p: Player) {
let name = p.name.clone();
let uuid = p.uuid;
let p = Arc::new(Mutex::new(p));
self.by_name.insert(name, Arc::clone(&p));
self.by_uuid.insert(uuid, p);
}
}
struct NetworkServer {
player_list: Arc<Mutex<PlayerList>>,
}
impl NetworkServer {
fn start(&mut self) {
let player_list = Arc::clone(&self.player_list);
thread::spawn(move || {
loop {
// fake network loop
// listen for incoming connections, accept player and add them to player_list.
player_list.lock().unwrap().add_player(Player {
name: "blahblah".into(),
uuid: Uuid([0; 16]),
});
}
});
}
}
struct Server {
player_list: Arc<Mutex<PlayerList>>,
network_server: NetworkServer,
}
impl Server {
fn start(&mut self) {
self.network_server.start();
// main game loop
loop {
// I am only accessing players in this loop in this thread. (main thread)
// so Mutex for individual player is not needed although rust requires it.
}
}
}
fn main() {
let player_list = Arc::new(Mutex::new(PlayerList {
by_name: HashMap::new(),
by_uuid: HashMap::new(),
}));
let network_server = NetworkServer {
player_list: Arc::clone(&player_list),
};
let mut server = Server {
player_list,
network_server,
};
server.start();
}
As I'm only accessing Players in the main thread, locking everytime to do that is both annoying and less performant.
You mean, as right now you are only accessing Players in the main thread, but at any time later you may accidentally introduce an access to them in another thread?
From the point of view of the language, if you can get a reference to a value, you may use the value. Therefore, if multiple threads have a reference to a value, this value should be safe to use from multiple threads. There is no way to enforce, at compile-time, that a particular value, although accessible, is actually never used.
This raises the question, however:
If the value is never used by a given thread, why does this thread have access to it in the first place?
It seems to me that you have a design issue. If you can manage to redesign your program so that only the main thread has access to the PlayerList, then you will immediately be able to use Rc<RefCell<...>>.
For example, you could instead have the network thread send a message to the main thread announcing that a new player connected.
At the moment, you are "Communicating by Sharing", and you could shift toward "Sharing by Communicating" instead. The former usually has synchronization primitives (such as mutexes, atomics, ...) all over the place, and may face contention/dead-lock issues, while the latter usually has communication queues (channels) and requires an "asynchronous" style of programming.
Send is a marker trait that governs which objects can have ownership transferred across thread boundaries. It is automatically implemented for any type that is entirely composed of Send types. It is also an unsafe trait because manually implementing this trait can cause the compiler to not enforce the concurrency safety that we love about Rust.
The problem is that Rc<RefCell<Player>> isn't Send and thus your PlayerList isn't Send and thus can't be sent to another thread, even when wrapped in an Arc<Mutex<>>. The unsafe workaround would be to unsafe impl Send for your PlayerList struct.
Putting this code into your playground example allows it to compile the same way as the original with Arc<Mutex<Player>>
struct PlayerList {
by_name: HashMap<String, Rc<RefCell<Player>>>,
by_uuid: HashMap<Uuid, Rc<RefCell<Player>>>,
}
unsafe impl Send for PlayerList {}
impl PlayerList {
fn add_player(&mut self, p: Player) {
let name = p.name.clone();
let uuid = p.uuid;
let p = Rc::new(RefCell::new(p));
self.by_name.insert(name, Rc::clone(&p));
self.by_uuid.insert(uuid, p);
}
}
Playground
The Nomicon is sadly a little sparse at explaining what rules have have to be enforced by the programmer when unsafely implementing Send for a type containing Rcs, but accessing in only one thread seems safe enough...
For completeness, here's TRPL's bit on Send and Sync
I suggest solving this threading problem using a multi-sender-single-receiver channel. The network threads get a Sender<Player> and no direct access to the player list.
The Receiver<Player> gets stored inside the PlayerList. The only thread accessing the PlayerList is the main thread, so you can remove the Mutex around it. Instead in the place where the main-thread used to lock the mutexit dequeue all pending players from the Receiver<Player>, wraps them in an Rc<RefCell<>> and adds them to the appropriate collections.
Though looking at the bigger designing, I wouldn't use a per-player thread in the first place. Instead I'd use some kind single threaded event-loop based design. (I didn't look into which Rust libraries are good in that area, but tokio seems popular)

Preferred method for awaiting concurrent threads

I have a program that loops over HTTP responses. These don't depend on each other, so they can be done simultaneously. I am using threads to do this:
extern crate hyper;
use std::thread;
use std::sync::Arc;
use hyper::Client;
fn main() {
let client = Arc::new(Client::new());
for num in 0..10 {
let client_helper = client.clone();
thread::spawn(move || {
client_helper.get(&format!("http://example.com/{}", num))
.send().unwrap();
}).join().unwrap();
}
}
This works, but I can see other possibilities of doing this such as:
let mut threads = vec![];
threads.push(thread::spawn(move || {
/* snip */
for thread in threads {
let _ = thread.join();
}
It would also make sense to me to use a function that returns the thread handler, but I couldn't figure out how to do that ... not sure what the return type has to be.
What is the optimal/recommended way to wait for concurrent threads in Rust?
Your first program does not actually have any parallelism. Each time you spin up a worker thread, you immediately wait for it to finish before you start the next one. This is, of course, worse than useless.
The second way works, but there are crates that do some of the busywork for you. For example, scoped_threadpool and crossbeam have thread pools that allow you to write something like (untested, may contain mistakes):
let client = &Client::new();// No Arc needed
run_in_pool(|scope| {
for num in 0..10 {
scope.spawn(move || {
client.get(&format!("http://example.com/{}", num)).send().unwrap();
}
}
})

Resources