Rust and tokio::postgresql, use of moved value - multithreading

I am working on this kind of code:
database.rs
use tokio::task::JoinHandle;
use tokio_postgres::{Client, Connection, Error, NoTls, Socket, tls::NoTlsStream};
use crate::secret;
pub struct DatabaseConnection {
pub client: Client,
pub connection: Connection<Socket, NoTlsStream>
// pub connection: JoinHandle<Connection<Socket, NoTlsStream>>
}
impl DatabaseConnection {
async fn new() -> Result<DatabaseConnection, Error> {
let (new_client, new_connection) = DatabaseConnection::create_connection().await.expect("Error, amazing is amazing");
Ok(Self {
client: new_client,
// connection: tokio::spawn( async move { new_connection } )
connection: new_connection
})
}
async fn create_connection() -> Result<(Client, Connection<Socket, NoTlsStream>), Error> {
// Connect to the database.
let (client, connection) =
tokio_postgres::connect(
&format!(
"postgres://{user}:{pswd}#localhost/{db}",
user = secret::USERNAME,
pswd = secret::PASSWORD,
db = secret::DATABASE
)[..],
NoTls)
.await?;
Ok((client, connection))
}
}
pub async fn init_db() -> Result<(), Error> {
let database_connection =
DatabaseConnection::new()
.await;
let db_connection = tokio::spawn(async move {
if let Err(e) = database_connection {
println!("Connection error: {:?}", e);
}
});
let create = database_connection.unwrap().client
.query("CREATE TABLE person (
id SERIAL PRIMARY KEY,
name VARCHAR NOT NULLL;
)", &[]).await?;
Ok(())
}
main.rs
#[tokio::main]
async fn main() {
match database::init_db().await {
Ok(()) => println!("Successfully connected to the database"),
Err(e) => eprintln!("On main: {}", e)
}
}
I can't use the database_connection variable to perform my SQL statements because it's moved into the tokio routine.
I already tried to return on my struct a `connection: tokio::spawn( async move { new_connection } ), but the routine it's never spawned, till i call the attribute, and returns an ended database connection.
How can I solve this?
Thanks in advice

Going with 2 concurrent tasks (a concurrent event loop for the connection) allows sharing your connection between various program modules, and have a non-blocking (potentially non-async) query API.
It is possible to achieve by implementing a reactor pattern, but not as trivial as in some other languages, because Rust is going to make sure you follow the strict multi-thread correctness.
Let's say that task 1 is your main program, it spawns task 2 - a DatabaseConnection reactor event loop. Since this instance might be accessed by multiple threads potentially at the same time to start or process an SQL query, wrap it with Arc<Mutex<DatabaseConnection>>.
To execute a query task 1 needs to send an SQL command to task 2, and wait for the result. One way to do this is to use an mpsc channel for sending commands, and oneshot for the result. oneshot is similar to a promise/future in principle: you can await on one end, and push the value and wake from the other end.
For some code example check out the channels tutorial. The "Spawn manager task" chapter is gonna be a part of your DatabaseConnection task 2 waiting for SQL queries and processing them. The "Receive responses" chapter that shows how to use oneshot to send the result back.
Also note that async doesn't imply that you have to block on await. If your program is simple enough, there's a possiblity to avoid having the reactor while still not blocking. This can be done with tokio::select! inside a loop. An example of such usage can be found in the select tutorial - chapter "Resuming an async operation". Imagine that action() is your .query() method. Note that they are calling it, but not await-ing. Then the select! is able to return when the query operation results are ready, and if they are not ready - you are free to do any other async work.

Related

Why the channel in the example code of tokio::sync::Notify is a mpsc?

I'm learning the synchronizing primitive of tokio. From the example code of Notify, I found it is confused to understand why Channel<T> is mpsc.
use tokio::sync::Notify;
use std::collections::VecDeque;
use std::sync::Mutex;
struct Channel<T> {
values: Mutex<VecDeque<T>>,
notify: Notify,
}
impl<T> Channel<T> {
pub fn send(&self, value: T) {
self.values.lock().unwrap()
.push_back(value);
// Notify the consumer a value is available
self.notify.notify_one();
}
// This is a single-consumer channel, so several concurrent calls to
// `recv` are not allowed.
pub async fn recv(&self) -> T {
loop {
// Drain values
if let Some(value) = self.values.lock().unwrap().pop_front() {
return value;
}
// Wait for values to be available
self.notify.notified().await;
}
}
}
If there are elements in values, the consumer tasks will take it away
If there is no element in values, the consumer tasks will yield until the producer nitify it
But after I writen some test code, I found in no case the consumer will lose the notice from producer.
Could some one give me test code to prove the above Channel<T> fail to work well as a mpmc?
The following code shows why it is unsafe to use the above channel as mpmc.
use std::sync::Arc;
#[tokio::main]
async fn main() {
let mut i = 0;
loop{
let ch = Arc::new(Channel {
values: Mutex::new(VecDeque::new()),
notify: Notify::new(),
});
let mut handles = vec![];
for i in 0..100{
if i % 2 == 1{
for _ in 0..2{
let sender = ch.clone();
tokio::spawn(async move{
sender.send(1);
});
}
}else{
for _ in 0..2{
let receiver = ch.clone();
let handle = tokio::spawn(async move{
receiver.recv().await;
});
handles.push(handle);
}
}
}
futures::future::join_all(handles).await;
i += 1;
println!("No.{i} loop finished.");
}
}
Not running the next loop means that there are consumer tasks not finishing, and consumer tasks miss a notify.
Quote from the documentation you linked:
If you have two calls to recv and two calls to send in parallel, the following could happen:
Both calls to try_recv return None.
Both new elements are added to the vector.
The notify_one method is called twice, adding only a single permit to the Notify.
Both calls to recv reach the Notified future. One of them consumes the permit, and the other sleeps forever.
Replace try_recv with self.values.lock().unwrap().pop_front() in our case; the rest of the explanation stays identical.
The third point is the important one: Multiple calls to notify_one only result in a single token if no thread is waiting yet. And there is a short time window where it is possible that multiple threads already checked for the existance of an item but aren't waiting yet.

Why does my asynchronous request pool (using crossbeam channels) block?

My main goal is to write an API Server, which retrieves part of the information from another external API server. However, this API server is quite fragile, therefore I would like to limit the global amount of concurrent requests made to those external API Servers for example to 10 or 20.
Thus, my idea was to write something a HttpPool, which consumes task via a crossbeam bounded channels and distributes them among tokio tasks. The ideas was to use a bounded channel to avoid publishing to much work and use a set of tasks to limit the amount of request per external API call.
It deems to work, if I do not create more than 8 tasks. If I define more, it blocks after fetching the first tasks from the queue.
use std::{error::Error, result::Result};
use tokio::sync::oneshot::Sender;
use tokio::time::timeout;
use tokio::time::{sleep, Duration};
use crossbeam_channel;
#[derive(Debug)]
struct HttpTaskRequest {
url: String,
result: Sender<String>,
}
type PoolSender = crossbeam_channel::Sender<HttpTaskRequest>;
type PoolReceiver = crossbeam_channel::Receiver<HttpTaskRequest>;
#[derive(Debug)]
struct HttpPool {
size: i32,
sender: PoolSender,
receiver: PoolReceiver,
}
impl HttpPool {
fn new(capacity: i32) -> Self {
let (tx, rx) = crossbeam_channel::bounded::<HttpTaskRequest>(capacity as usize);
HttpPool {
size: capacity,
sender: tx,
receiver: rx,
}
}
async fn start(self) -> Result<HttpPool, Box<dyn Error>> {
for i in 0..self.size {
let task_receiver = self.receiver.clone();
tokio::spawn(async move {
loop {
match task_receiver.recv() {
Ok(request) => {
if request.result.is_closed() {
println!("Task[{i}] received url {} already closed by receiver, seems to reach timeout already", request.url);
} else {
println!("Task[{i}] started to work {:?}", request.url);
let resp = reqwest::get("https://httpbin.org/ip").await;
println!("Resp: {:?}", resp);
println!("Done Send request for url {}", request.url);
request.result.send("Result".to_owned()).expect("Failed to send result");
}
}
Err(err) => println!("Error: {err}"),
}
}
});
}
Ok(self)
}
pub async fn request(&self, url: String) -> Result<(), Box<dyn Error>> {
let (os_sender, os_receiver) = tokio::sync::oneshot::channel::<String>();
let request = HttpTaskRequest {
result: os_sender,
url: url.clone(),
};
self.sender.send(request).expect("Failed to publish message to task group");
// check if a timeout or value was returned
match timeout(Duration::from_millis(100), os_receiver).await {
Ok(res) => {
println!("Request finished without reaching the timeout {}",res.unwrap());
}
Err(_) => {println!("Request {url} run into timeout");}
}
Ok(())
}
}
#[tokio::main]
async fn main() {
let http_pool = HttpPool::new(20).start().await.expect("Failed to start http pool");
for i in 0..10 {
let url = format!("T{}", i.to_string());
http_pool.request(url).await.expect("Failed to request message");
}
loop {}
}
Maybe somebody can explain, why the code blocks? Is it related to the tokio::spawn?
I guess my attempt is wrong, so please let me know if there is another way to handle it. The goal can be summarized like this. I would like to requests URLs and process them in a fashion, that not more than N concurrent requests are made against the API server.
I have read this question: How can I perform parallel asynchronous HTTP GET requests with reqwest?. But here, this answer, I do know the work, which is not the case in my example. They arrive on the fly, hence I am not sure how to handle them.
I have finally solved the mystery about the blocking in my code example above. As we can see, I have used the crate crossbeam_channel, which does not cooperate with async code. If we call recv on this type of channel, the thread blocks until a message is received. Hence, there is no way, that we can return back to the tokio scheduler, which implies that no other task is able to run. To refresh your memories, async code only returns to the scheduler, if a .await is called.
Furthermore, the code was working, if we have spawned less tasks than worker threads. The normal amount of worker threads is equal to the CPU core count, in my case eight. Hence, if I have started more than this, the all threads were blocked an the application freezes.
The fix was to replace the crate crossbeam-channel with async-channel, as stated on the tokio tutorial page.
In case my answer is vague, I recommend to read the following posts:
https://github.com/tokio-rs/tokio/discussions/3858
https://ryhl.io/blog/async-what-is-blocking/
https://crates.io/crates/async-channel

Async loop on a new thread in rust: the trait `std::future::Future` is not implemented for `()`

I know this question has been asked many times, but I still can't figure out what to do (more below).
I'm trying to spawn a new thread using std::thread::spawn and then run an async loop inside of it.
The async function I want to run:
#[tokio::main]
pub async fn pull_tweets(pg_pool2: Arc<PgPool>, config2: Arc<Settings>) {
let mut scheduler = AsyncScheduler::new();
scheduler.every(10.seconds()).run(move || {
let arc_pool = pg_pool2.clone();
let arc_config = config2.clone();
async {
pull_from_main(arc_pool, arc_config).await;
}
});
tokio::spawn(async move {
loop {
scheduler.run_pending().await;
tokio::time::sleep(Duration::from_millis(100)).await;
}
});
}
Spawning a thread to run in:
#[actix_web::main]
async fn main() -> std::io::Result<()> {
let handle = thread::spawn(move || async {
pull_tweets(pg_pool2, config2).await;
});
}
The error:
error[E0277]: `()` is not a future
--> src/main.rs:89:9
|
89 | pull_tweets(pg_pool2, config2).await;
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ `()` is not a future
|
= help: the trait `std::future::Future` is not implemented for `()`
= note: required by `poll`
The last comment here does an amazing job at generalizing the problem. It seems at some point a return value is expected that implements IntoFuture and I don't have that. I tried adding Ok(()) to both the closure and the function, but no luck.
Adding into closure does literally nothing
Adding into async function gives me a new, but similar-sounding error:
`Result<(), ()>` is not a future
Then I noticed that the answer specifically talks about extension functions, which I'm not using. This also talks about extension functions.
Some other SO answers:
This is caused by a missing async.
This and this are reqwest library specific.
So none seem to work. Could someone help me understand 1)why the error exists here and 2)how to fix it?
Note 1: All of this can be easily solved by replacing std::thread::spawn with tokio::task::spawn_blocking. But I'm purposefully experimenting with thread spawn as per this article.
Note 2: Broader context on what I want to achieve: I'm pulling 150,000 tweets from twitter in an async loop. I want to compare 2 implementations: running on the main runtime vs running on separate thread. The latter is where I struggle.
Note 3: in my mind threads and async tasks are two different primitives that don't mix. Ie spawning a new thread doesn't affect how tasks behave and spawning a new task only adds work on existing thread. Please let me know if this worldview is mistaken (and what I can read).
#[tokio::main] converts your function into the following:
#[tokio::main]
pub fn pull_tweets(pg_pool2: Arc<PgPool>, config2: Arc<Settings>) {
let rt = tokio::Runtime::new();
rt.block_on(async {
let mut scheduler = AsyncScheduler::new();
// ...
});
}
Notice that it is a synchronous function, that spawns a new runtime and runs the inner future to completion. You do not await it, it is a separate runtime with it's own dedicate thread pool and scheduler:
#[actix_web::main]
async fn main() -> std::io::Result<()> {
let handle = thread::spawn(move || {
pull_tweets(pg_pool2, config2);
});
}
Note that your original example was wrong in another way:
#[actix_web::main]
async fn main() -> std::io::Result<()> {
let handle = thread::spawn(move || async {
pull_tweets(pg_pool2, config2).await;
});
}
Even if pull_tweets was an async function, the thread would not do anything, as all you do is create another future in the async block. The created future is not executed, because futures are lazy (and there is no executor context in that thread anyways).
I would structure the code to spawn the runtime directly in the new thread, and call any async functions you want from there:
#[actix_web::main]
async fn main() -> std::io::Result<()> {
let handle = thread::spawn(move || {
let rt = tokio::runtime::Builder::new_multi_thread()
.enable_all()
.build()
.unwrap();
rt.block_on(async {
pull_tweets(pg_pool2, config2).await;
});
});
}
pub async fn pull_tweets(pg_pool2: Arc<PgPool>, config2: Arc<Settings>) {
// ...
}

Why can't I communicate with a forked child process using Tokio UnixStream?

I'm trying to get a parent process and a child process to communicate with each other using a tokio::net::UnixStream. For some reason the child is unable to read whatever the parent writes to the socket, and presumably the other way around.
The function I have is similar to the following:
pub async fn run() -> Result<(), Error> {
let mut socks = UnixStream::pair()?;
match fork() {
Ok(ForkResult::Parent { .. }) => {
socks.0.write_u32(31337).await?;
Ok(())
}
Ok(ForkResult::Child) => {
eprintln!("Reading from master");
let msg = socks.1.read_u32().await?;
eprintln!("Read from master {}", msg);
Ok(())
}
Err(_) => Err(Error),
}
}
The socket doesn't get closed, otherwise I'd get an immediate error trying to read from socks.1. If I move the read into the parent process it works as expected. The first line "Reading from master" gets printed, but the second line never gets called.
I cannot change the communication paradigm, since I'll be using execve to start another binary that expects to be talking to a socketpair.
Any idea what I'm doing wrong here? Is it something to do with the async/await?
When you call the fork() system call:
The child process is created with a single thread—the one that called fork().
The default executor in tokio is a thread pool executor. The child process will only get one of the threads in the pool, so it won't work properly.
I found I was able to make your program work by setting the thread pool to contain only a single thread, like this:
use tokio::prelude::*;
use tokio::net::UnixStream;
use nix::unistd::{fork, ForkResult};
use nix::sys::wait;
use std::io::Error;
use std::io::ErrorKind;
use wait::wait;
// Limit to 1 thread
#[tokio::main(core_threads = 1)]
async fn main() -> Result<(), Error> {
let mut socks = UnixStream::pair()?;
match fork() {
Ok(ForkResult::Parent { .. }) => {
eprintln!("Writing!");
socks.0.write_u32(31337).await?;
eprintln!("Written!");
wait().unwrap();
Ok(())
}
Ok(ForkResult::Child) => {
eprintln!("Reading from master");
let msg = socks.1.read_u32().await?;
eprintln!("Read from master {}", msg);
Ok(())
}
Err(_) => Err(Error::new(ErrorKind::Other, "oh no!")),
}
}
Another change I had to make was to force the parent to wait for the child to complete, by calling wait() - also something you probably do not want to be doing in a real async program.
Most of the advice I have read that if you need to fork from a threaded program, either do it before creating any threads, or call exec_ve() in the child immediately after forking (which is what you plan to do anyway).

How to use Rust shiplift from a hyper server

I'm trying to write a simple Rust program that reads Docker stats using shiplift and exposes them as Prometheus metrics using rust-prometheus.
The shiplift stats example runs correctly on its own, and I'm trying to integrate it in the server as
fn handle(_req: Request<Body>) -> Response<Body> {
let docker = Docker::new();
let containers = docker.containers();
let id = "my-id";
let stats = containers
.get(&id)
.stats().take(1).wait();
for s in stats {
println!("{:?}", s);
}
// ...
}
// in main
let make_service = || {
service_fn_ok(handle)
};
let server = Server::bind(&addr)
.serve(make_service);
but it appears that the stream hangs forever (I cannot produce any error message).
I've also tried the same refactor (using take and wait instead of tokio::run) in the shiplift example, but in that case I get the error executor failed to spawn task: tokio::spawn failed (is a tokio runtime running this future?). Is tokio somehow required by shiplift?
EDIT:
If I've understood correctly, my attempt does not work because wait will block tokio executor and stats will never produce results.
shiplift's API is asynchronous, meaning wait() and other functions return a Future, instead of blocking the main thread until a result is ready. A Future won't actually do any I/O until it is passed to an executor. You need to pass the Future to tokio::run as in the example you linked to. You should read the tokio docs to get a better understanding of how to write asynchronous code in rust.
There were quite a few mistakes in my understanding of how hyper works. Basically:
if a service should handle futures, do not use service_fn_ok to create it (it is meant for synchronous services): use service_fn;
do not use wait: all futures use the same executor, the execution will just hang forever (there is a warning in the docs but oh well...);
as ecstaticm0rse notices, hyper::rt::spawn could be used to read stats asynchronously, instead of doing it in the service
Is tokio somehow required by shiplift?
Yes. It uses hyper, which throws executor failed to spawn task if the default tokio executor is not available (working with futures nearly always requires an executor anyway).
Here is a minimal version of what I ended up with (tokio 0.1.20 and hyper 0.12):
use std::net::SocketAddr;
use std::time::{Duration, Instant};
use tokio::prelude::*;
use tokio::timer::Interval;
use hyper::{
Body, Response, service::service_fn_ok,
Server, rt::{spawn, run}
};
fn init_background_task(swarm_name: String) -> impl Future<Item = (), Error = ()> {
Interval::new(Instant::now(), Duration::from_secs(1))
.map_err(|e| panic!(e))
.for_each(move |_instant| {
futures::future::ok(()) // unimplemented: call shiplift here
})
}
fn init_server(address: SocketAddr) -> impl Future<Item = (), Error = ()> {
let service = move || {
service_fn_ok(|_request| Response::new(Body::from("unimplemented")))
};
Server::bind(&address)
.serve(service)
.map_err(|e| panic!("Server error: {}", e))
}
fn main() {
let background_task = init_background_task("swarm_name".to_string());
let server = init_server(([127, 0, 0, 1], 9898).into());
run(hyper::rt::lazy(move || {
spawn(background_task);
spawn(server);
Ok(())
}));
}

Resources