I am writing a multi-threaded concurrent Kafka producer using Rust and Tokio. The project has 2 modes, an interactive mode that runs in an infinite loop and a file mode which takes a file as an argument and then reads the file and sends these messages to Kafka via multiple threads. Interactive mode works fine! but file mode has issues.
To achieve this, I had initially started with Rayon, but then switched to a more flexible runtime; tokio. Now, I am able to parallelize the task of sending data over a specified number of threads within tokio, however, it seems that runtime is getting dropped before all messages are produced. Here is my code:
pub fn worker(brokers: String, f: File, t: usize, topic: Arc<String>) {
let reader = BufReader::new(f);
let mut rt = runtime::Builder::new()
let producers: Arc<Vec<Mutex<BaseProducer>>> = Arc::new(
.map(|_| get_producer(&brokers))
let acounter = atomic::AtomicUsize::new(0);
let _results: Vec<_> = reader
.map(|line| line.unwrap())
.map(move |line| {
let prods = producers.clone();
let tp = topic.clone();
let cnt = acounter.swap(
(acounter.load(atomic::Ordering::SeqCst) + 1) % t,
rt.block_on(async move {
match prods[cnt]
Ok(_) => (),
Err(e) => eprintln!("{:?}", e),
fn get_producer(brokers: &String) -> Mutex<BaseProducer> {
.set("bootstrap.servers", &brokers)
.set("", "5000"),
.expect("Producer creation error"),
As a high-level walkthrough: I create mutable producers equal to the number of threads specified and every task within this thread will use one of these producers. The file is read line by line sequentially and every line is moved into the closure that produces it as a message to Kafka.
The code works fine, for the most part, but there are issues related to the runtime exiting without completing all tasks, even when I am using the block_on function in the runtime. Which is supposed to block until the future is complete (Async block in my case here).
I believe the issue is that the issue is with runtime getting dropped without all the threading within Tokio exiting successfully.
I tried reading a file with this approach habing 100,000 records, on a single thread, I was able to produce 28,000 records. On 2 threads, close to 46,000 records. And while utilising all 8 cores of my CPU, I was getting 99,000-100,000 messages indeterministically.
I have checked several answers on SO, but none help in my case. I also read through the documentation of tokio::runtime::Runtime here and tried to use spawn and then use futures::future::join, but that didn't work either.
Any help is appreciated!


Rust: Safe multi threading with recursion

I'm new to Rust.
For learning purposes, I'm writing a simple program to search for files in Linux, and it uses a recursive function:
fn ffinder(base_dir:String, prmtr:&'static str, e:bool, h:bool) -> std::io::Result<()>{
let mut handle_vec = vec![];
let pth = std::fs::read_dir(&base_dir)?;
for p in pth {
let p2 = p?.path().clone();
if p2.is_dir() {
if !h{ //search doesn't include hidden directories
let sstring:String = get_fname(p2.display().to_string());
let slice:String = sstring[..1].to_string();
if slice != ".".to_string() {
let handle = thread::spawn(move || {
else {//search include hidden directories
let handle2 = thread::spawn(move || {
else {
let handle3 = thread::spawn(move || {
if compare(rmv_underline(get_fname(p2.display().to_string())),rmv_underline(prmtr.to_string()),e){
println!("File found at: {}",p2.display().to_string().blue());
for h in handle_vec{
I've tried to use multi threading (thread::spawn), however it can create too many threads, exploding the OS limit, which breaks the program execution.
Is there a way to multi thread with recursion, using a safe,limited (fixed) amount of threads?
As one of the commenters mentioned, this is an absolutely perfect case for using Rayon. The blog post mentioned doesn't show how Rayon might be used in recursion, only making an allusion to crossbeam's scoped threads with a broken link. However, Rayon provides its own scoped threads implementation that solves your problem as well in that only uses as many threads as you have cores available, avoiding the error you ran into.
Here's the documentation for it:
Here's an example from some code I recently wrote. Basically what it does is recursively scan a folder, and each time it nests into a folder it creates a new job to scan that folder while the current thread continues. In my own tests it vastly outperforms a single threaded approach.
let source = PathBuf::from("/foo/bar/");
let (tx, rx) = mpsc::channel();
rayon::scope(|s| scan(&source, tx, s));
fn scan<'a, U: AsRef<Path>>(
src: &U,
tx: Sender<(Result<DirEntry, std::io::Error>, u64)>,
scope: &Scope<'a>,
) {
let dir = fs::read_dir(src).unwrap();
dir.into_iter().for_each(|entry| {
let info = entry.as_ref().unwrap();
let path = info.path();
if path.is_dir() {
let tx = tx.clone();
scope.spawn(move |s| scan(&path, tx, s)) // Recursive call here
} else {
// dbg!("{}", path.as_os_str().to_string_lossy());
let size = info.metadata().unwrap().len();
tx.send((entry, size)).unwrap();
I'm not an expert on Rayon, but I'm fairly certain the threading strategy works like this:
Rayon creates a pool of threads to match the number of logical cores you have available in your environment. The first call to the scoped function creates a job that the first available thread "steals" from the queue of jobs available. Each time we make another recursive call, it doesn't necessarily execute immediately, but it creates a new job that an idle thread can then "steal" from the queue. If all of the threads are busy, the job queue just fills up each time we make another recursive call, and each time a thread finishes its current job it steals another job from the queue.
The full code can be found here:
(Note that repo is a work in progress and the code there is currently typically broken and probably won't work at the time you're reading this.)
As an exercise to the reader, I'm more of a sys admin than an actual developer, so I also don't know if this is the ideal approach. From Rayon's documentation linked earlier:
scope() is a more flexible building block compared to join(), since a loop can be used to spawn any number of tasks without recursing
The language of that is a bit confusing. I'm not sure what they mean by "without recursing". Join seems to intend for you to already have tasks known about ahead of time and to execute them in parallel as threads become available, whereas scope seems to be more aimed at only creating jobs when you need them rather than having to know everything you need to do in advance. Either that or I'm understanding their meaning backwards.

What's a good detailed explanation of Tokio's simple TCP echo server example (on GitHub and API reference)?

Tokio has the same example of a simple TCP echo server on its:
GitHub main page (
API reference main page (
However, in both pages, there is no explanation of what's actually going on. Here's the example, slightly modified so that the main function does not return Result<(), Box<dyn std::error::Error>>:
use tokio::net::TcpListener;
use tokio::prelude::*;
async fn main() {
if let Ok(mut tcp_listener) = TcpListener::bind("").await {
while let Ok((mut tcp_stream, _socket_addr)) = tcp_listener.accept().await {
tokio::spawn(async move {
let mut buf = [0; 1024];
// In a loop, read data from the socket and write the data back.
loop {
let n = match buf).await {
// socket closed
Ok(n) if n == 0 => return,
Ok(n) => n,
Err(e) => {
eprintln!("failed to read from socket; err = {:?}", e);
// Write the data back
if let Err(e) = tcp_stream.write_all(&buf[0..n]).await {
eprintln!("failed to write to socket; err = {:?}", e);
After reading the Tokio documentation (, here's my mental model of this example. A task is spawned for each new TCP connection. And a task is ended whenever a read/write error occurs, or when the client ends the connection (i.e. n == 0 case). Therefore, if there are 20 connected clients at a point in time, there would be 20 spawned tasks. However, under the hood, this is NOT equivalent to spawning 20 threads to handle the connected clients concurrently. As far as I understand, this is basically the problem that asynchronous runtimes are trying to solve. Correct so far?
Next, my mental model is that a tokio scheduler (e.g. the multi-threaded threaded_scheduler which is the default for apps, or the single-threaded basic_scheduler which is the default for tests) will schedule these tasks concurrently on 1-to-N threads. (Side question: for the threaded_scheduler, is N fixed during the app's lifetime? If so, is it equal to num_cpus::get()?). If one task is .awaiting for the read or write_all operations, then the scheduler can use the same thread to perform more work for one of the other 19 tasks. Still correct?
Finally, I'm curious whether the outer code (i.e. the code that is .awaiting for tcp_listener.accept()) is itself a task? Such that in the 20 connected clients example, there aren't really 20 tasks but 21: one to listen for new connections + one per connection. All of these 21 tasks could be scheduled concurrently on one or many threads, depending on the scheduler. In the following example, I wrap the outer code in a tokio::spawn and .await the handle. Is it completely equivalent to the example above?
use tokio::net::TcpListener;
use tokio::prelude::*;
async fn main() {
let main_task_handle = tokio::spawn(async move {
if let Ok(mut tcp_listener) = TcpListener::bind("").await {
while let Ok((mut tcp_stream, _socket_addr)) = tcp_listener.accept().await {
tokio::spawn(async move {
// ... same as above ...
This answer is a summary of an answer I received on Tokio's Discord from Alice Ryhl. Big thank you!
First of all, indeed, for the multi-threaded scheduler, the number of OS threads is fixed to num_cpus.
Second, Tokio can swap the currently running task at every .await on a per-thread basis.
Third, the main function runs in its own task, which is spawned by the #[tokio::main] macro.
Therefore, for the first code block example, if there are 20 connected clients, there would be 21 tasks: one for the main macro + one for each of the 20 open TCP streams. For the second code block example, there would be 22 tasks because of the extra outer tokio::spawn but it's needless and doesn't add any concurrency.

How can I stop reading from a tokio::io::lines stream?

I want to terminate reading from a tokio::io::lines stream. I merged it with a oneshot future and terminated it, but tokio::run was still working.
use futures::{sync::oneshot, *}; // 0.1.27
use std::{io::BufReader, time::Duration};
use tokio::prelude::*; // 0.1.21
fn main() {
let (tx, rx) = oneshot::channel::<()>();
let lines = tokio::io::lines(BufReader::new(tokio::io::stdin()));
let lines = lines.for_each(|item| {
println!("> {:?}", item);
std::thread::spawn(move || {
println!("system shutting down");
let _ = tx.send(());
let lines = lines.select2(rx);
tokio::run(|_| ()).map_err(|_| ()));
How can I stop reading from this?
There's nothing wrong with your strategy, but it will only work with futures that don't execute a blocking operation via Tokio's blocking (the traditional kind of blocking should never be done inside a future).
You can test this by replacing the tokio::io::lines(..) future with a simple interval future:
let lines = Interval::new(Instant::now(), Duration::from_secs(1));
The problem is that tokio::io::Stdin internally uses tokio_threadpool::blocking .
When you use Tokio thread pool blocking (emphasis mine):
NB: The entire task that called blocking is blocked whenever the
supplied closure blocks, even if you have used future combinators such
as select - the other futures in this task will not make progress
until the closure returns. If this is not desired, ensure that
blocking runs in its own task (e.g. using
Since this will block every other future in the combinator, your Receiver will not be able to get a signal from the Senderuntil the blocking ends.
Please see How can I read non-blocking from stdin? or you can use tokio-stdin-stdout, which creates a channel to consume data from stdin thread. It also has a line-by-line example.
Thank you for your comment and correcting my sentences.
I tried to stop this non-blocking Future and succeeded.
let lines = Interval::new(Instant::now(), Duration::from_secs(1));
My understating is that it would work for this case to wrap the blocking Future with tokio threadpool::blocking.
I'll try it later.
Thank you very much.

Join futures with limited concurrency

I have a large vector of Hyper HTTP request futures and want to resolve them into a vector of results. Since there is a limit of maximum open files, I want to limit concurrency to N futures.
I've experimented with Stream::buffer_unordered but seems like it executed futures one by one.
We've used code like this in a project to avoid opening too many TCP sockets. These futures have Hyper futures within, so it seems exactly the same case.
// Convert the iterator into a `Stream`. We will process
// `PARALLELISM` futures at the same time, but with no specified
// order.
let all_done =
// Everything after here is just using the stream in
// some manner, not directly related
let mut successes = Vec::with_capacity(LIMIT);
let mut failures = Vec::with_capacity(LIMIT);
// Pull values off the stream, dividing them into success and
// failure buckets.
let mut all_done = all_done.into_future();
loop {
match {
Ok((None, _)) => break,
Ok((Some(v), next_all_done)) => {
all_done = next_all_done.into_future();
Err((v, next_all_done)) => {
all_done = next_all_done.into_future();
This is used in a piece of example code, so the event loop (core) is explicitly driven. Watching the number of file handles used by the program showed that it was capped. Additionally, before this bottleneck was added, we quickly ran out of allowable file handles, whereas afterward we did not.

How can I read non-blocking from stdin?

Is there a way to check whether data is available on stdin in Rust, or to do a read that returns immediately with the currently available data?
My goal is to be able to read the input produced for instance by cursor keys in a shell that is setup to return all read data immediately. For instance with an equivalent to: stty -echo -echok -icanon min 1 time 0.
I suppose one solution would be to use ncurses or similar libraries, but I would like to avoid any kind of large dependencies.
So far, I got only blocking input, which is not what I want:
let mut reader = stdin();
let mut s = String::new();
match reader.read_to_string(&mut s) {...} // this blocks :(
Converting OP's comment into an answer:
You can spawn a thread and send data over a channel. You can then poll that channel in the main thread using try_recv.
use std::io;
use std::sync::mpsc;
use std::sync::mpsc::Receiver;
use std::sync::mpsc::TryRecvError;
use std::{thread, time};
fn main() {
let stdin_channel = spawn_stdin_channel();
loop {
match stdin_channel.try_recv() {
Ok(key) => println!("Received: {}", key),
Err(TryRecvError::Empty) => println!("Channel empty"),
Err(TryRecvError::Disconnected) => panic!("Channel disconnected"),
fn spawn_stdin_channel() -> Receiver<String> {
let (tx, rx) = mpsc::channel::<String>();
thread::spawn(move || loop {
let mut buffer = String::new();
io::stdin().read_line(&mut buffer).unwrap();
fn sleep(millis: u64) {
let duration = time::Duration::from_millis(millis);
Most operating systems default to work with the standard input and output in a blocking way. No wonder then that the Rust library follows in stead.
To read from a blocking stream in a non-blocking way you might create a separate thread, so that the extra thread blocks instead of the main one. Checking whether a blocking file descriptor produced some input is similar: spawn a thread, make it read the data, check whether it produced any data so far.
Here's a piece of code that I use with a similar goal of processing a pipe output interactively and that can hopefully serve as an example. It sends the data over a channel, which supports the try_recv method - allowing you to check whether the data is available or not.
Someone has told me that mio might be used to read from a pipe in a non-blocking way, so you might want to check it out too. I suspect that passing the stdin file descriptor (0) to Receiver::from_raw_fd should just work.
You could also potentially look at using ncurses (also on which would allow you read in raw mode. There are a few examples in the Github repository which show how to do this.
