Why is read_line(..) much slower than lines()?

Why is read_line(..) much slower than lines()? - io

The code below runs much slower when calling read_line(..) than lines()
You can't run it in the playground but for me this prints the following
lines() took Duration { secs: 0, nanos: 41660031 }
read_line() took Duration { secs: 2, nanos: 379397138 }
The implementation of Lines does pretty much what I wrote (but more!) why is there such a difference?
use std::net::{TcpListener, TcpStream};
use std::io::{BufRead, BufReader, Write};
use std::thread;
fn main() {
let listener = TcpListener::bind("127.0.0.1:80")
.expect("listen failed");
thread::spawn(move || {
for stream in listener.incoming() {
let mut stream = stream.unwrap();
thread::spawn(move || {
for x in 1..1000 + 1 {
stream.write_all(format!("{}\n", x).as_bytes())
.expect("write failed");
}
});
}
});
let start_a = std::time::Instant::now();
{
let stream_a = TcpStream::connect("127.0.0.1:80")
.expect("connect_a failed");
let br = BufReader::new(stream_a);
for l in br.lines() {
println!("{}", l.unwrap());
}
}
let end_a = std::time::Instant::now();
let start_b = std::time::Instant::now();
{
let stream_b = TcpStream::connect("127.0.0.1:80")
.expect("connect_b failed");
let mut br = BufReader::new(stream_b);
let mut s = String::with_capacity(10);
while br.read_line(&mut s).unwrap_or(0) > 0 {
println!("{}", s);
}
}
let end_b = std::time::Instant::now();
let dur_a = end_a - start_a;
let dur_b = end_b - start_b;
println!("lines() took {:?}", dur_a);
println!("read_line() took {:?}", dur_b);
}
same code on the playground

Let's take a look at the output of your program:
1
2
...
999
1000
1
1
2
1
2
3
1
2
3
4
1
2
3
4
5
...
Ooops. It's just a simple bug in your code: you never clear() the string. Each read_line() call appends to your string. When I add a s.clear() in your while loop, the timings are more comparable:
lines() took Duration { secs: 0, nanos: 7323617 }
read_line() took Duration { secs: 0, nanos: 2877078 }
In your buggy program, most of the time was probably wasted reallocating the string and printing it to the terminal.

Related

Rust, how to perform basic recursive async?

I am just doing some quick experimenting in an attempt to learn the rust language, I have done a few successful async tests, this is my starting point:
use async_std::task;
use futures;
use std::time::SystemTime;
fn main() {
let now = SystemTime::now();
task::block_on(async {
let mut fs = Vec::new();
let sum = 100000000;
let chunks: u64 = 5; //only valid for factors of sum
let chunk_size: u64 = sum/chunks;
for n in 1..=chunks {
fs.push(task::spawn(async move {
add_range((n - 1) * chunk_size + 1, n * chunk_size + 1)
}));
}
let vals = futures::future::join_all(fs).await;
// 5000000050000000 for this configuration of inputs
println!("{}", vals.iter().sum::<u64>());
});
println!("{}ms", now.elapsed().unwrap().as_millis());
}
fn add_range(start: u64, end: u64) -> u64 {
println!("{}, {}", start, end);
let mut total: u64 = 0;
for n in start..end {
total += n;
}
return total;
}
by changing the value of chunks you can change how many task::spawns there are. Now rather than a flat set of workers, I want the add_range function to be recursive and to keep forking off workers based on the inputs, however following the compiler errors I have gotten myself quite tangled up:
use async_std::task;
use futures;
use std::future::Future;
use std::pin::Pin;
fn main() {
let pin_box_u64 = task::block_on(add_range(0, 10, 10, 1, 1001));
println!("{}", pin_box_u64/*how do i get u64 out of this*/)
}
// recursively calls itself in a branching tree structure
// forking off more worker threads
async fn add_range(
depth: u64,
chunk_split: u64,
chunk_size: u64,
start: u64,
end: u64,
) -> Pin<Box<dyn Future<Output = u64>>> {
println!("{}, {}, {}", depth, start, end);
// if the range of start to end is more than the allowed
// chunk_size then fork off more workers dividing
// the work up further.
if end - start > chunk_size {
let mut fs = Vec::new();
let next_chunk_size = (end - start) / chunk_split;
for n in 0..chunk_split {
let s = start + (next_chunk_size * n);
let mut e = start + (next_chunk_size * (n + 1));
if e > end {
e = end;
}
// spawn more workers
fs.push(task::spawn(add_range(depth + 1, chunk_split, chunk_size, s, e)));
}
return Box::pin(async move {
// join workers back up and do joining sum.
return futures::future::join_all(fs).await.iter().map(/*how do i get u64s out of here*/).sum::<u64>();
});
} else {
// else the work is less than the allowed chunk_size
// so lets now do the actual sum for my chunk
let mut total: u64 = 0;
for n in start..end {
total += n;
}
return Box::pin(async move { total });
}
}
I have played around with this for a while but I feel like Im just becoming more and more lost with the compiler errors.

You need to box the returned future, otherwise the compiler can't determine the size of the return type.
Additional context can be found here: https://rust-lang.github.io/async-book/07_workarounds/04_recursion.html
use std::pin::Pin;
use async_std::task;
use futures::Future;
use futures::FutureExt;
fn main() {
let pin_box_u64 = task::block_on(add_range(0, 10, 10, 1, 1001));
println!("{}", pin_box_u64)
}
// recursively calls itself in a branching tree structure
// forking off more worker threads
fn add_range(
depth: u64,
chunk_split: u64,
chunk_size: u64,
start: u64,
end: u64,
) -> Pin<Box<dyn Future<Output = u64> + Send + 'static>> {
println!("{}, {}, {}", depth, start, end);
// if the range of start to end is more than the allowed
// chunk_size then fork off more workers dividing
// the work up further.
if end - start > chunk_size {
let mut fs = Vec::new();
let next_chunk_size = (end - start) / chunk_split;
for n in 0..chunk_split {
let s = start + (next_chunk_size * n);
let mut e = start + (next_chunk_size * (n + 1));
if e > end {
e = end;
}
// spawn more workers
fs.push(task::spawn(add_range(
depth + 1,
chunk_split,
chunk_size,
s,
e,
)));
}
// join workers back up and do joining sum.
return futures::future::join_all(fs)
.map(|v| v.iter().sum::<u64>())
.boxed();
} else {
// else the work is less than the allowed chunk_size
// so lets now do the actual sum for my chunk
let mut total: u64 = 0;
for n in start..end {
total += n;
}
return futures::future::ready(total).boxed();
}
}

faster way to read a file in chunks

I want to read a file(>1GB) in chunks but without saving them into memory, so that I could stream every chunk using as few resources as possible.
For reading the file in chunks of 10MB I am using this:
use anyhow::Result;
use futures::stream::FuturesUnordered;
use std::fs::metadata;
use std::io::SeekFrom;
use std::time::Instant;
use std::{env, process::exit};
use tokio::fs::File;
use tokio::io::AsyncReadExt;
use tokio::stream::StreamExt;
use tokio_util::codec::{BytesCodec, FramedRead};
#[tokio::main]
async fn main() -> Result<()> {
let now = Instant::now();
let args: Vec<String> = env::args().collect();
if args.len() < 2 {
eprintln!("missing arguments: file");
exit(1);
}
let file_path = &args[1];
let fsize = metadata(&file_path).map(|m| m.len()).unwrap();
let mut chunk_size = 10_485_760;
println!(
"file size: {}, chunk size: {}, parts: {}",
fsize,
chunk_size,
fsize / chunk_size
);
let mut seek: u64 = 0;
let mut parts: Vec<(u64, u64)> = Vec::new();
while seek < fsize {
if (fsize - seek) <= chunk_size {
chunk_size = fsize % chunk_size;
}
println!("seek: {}, chunk: {}", seek, chunk_size,);
parts.push((seek, chunk_size));
seek += chunk_size;
}
let mut tasks = FuturesUnordered::new();
for (pos, part) in parts.iter().enumerate() {
tasks.push(read_file(&file_path, part.0, part.1, pos));
// limit to only 4 tasks concurrent
if tasks.len() == 4 {
if let Some(t) = tasks.next().await {
println!("{:#?}", t.unwrap());
}
}
}
while let Some(t) = tasks.next().await {
println!("{:#?}", t.unwrap());
}
println!("Elapsed: {:?}", now.elapsed());
Ok(())
}
async fn read_file(path: &str, seek: u64, chunk: u64, part: usize) -> Result<String> {
let mut file = File::open(&path).await?;
file.seek(SeekFrom::Start(seek)).await?;
let file = file.take(chunk);
let mut stream = FramedRead::with_capacity(file, BytesCodec::new(), 1024 * 64);
let mut count = 0;
while let Some(bytes) = stream.try_next().await? {
count += bytes.len();
// do something here, upload/stream the file PUT/POST
// no CPU intensive so maybe spawn will not help much
}
Ok(format!("part: {}, size: {}", part, count.to_string()))
}
I am limiting the task to a max of 4 to also prevent too many files open error.
The code works, but reading a file of approximately 4GB(4466092032) in my system with SATA disks takes on average 230seconds.
I also tried with tokio::task::spawn(...) but the speed was about the same.
Any idea of how to improve the coded so that It could read the file faster?

Why is the counter in Mutex is not in accordance with counter increment?

While I was practicing the code learn about the locking and unlocking the mutex in multiple threads. I run 10 different threads using for loop and initiated a mutex counter variable. The actual incrementing from for loop is fine but not in accordance with the mutex counter.
let counter = Arc::new(Mutex::new(0));
let mut handles = vec![];
for _ in 0..10 {
let counter = Arc::clone(&counter);
println!("Result: {:?}", counter);
let handle = thread::spawn(move || {
let mut num = counter.lock().unwrap();
*num += 1;
});
handles.push(handle);
}
for handle in handles {
handle.join().unwrap();
}
println!("Result: {}", *counter.lock().unwrap());
The output shows some absurd result in the mutex counter, I need to know why is that happening?
Result: Mutex { data: 0 }
Result: Mutex { data: 0 }
Result: Mutex { data: 1 }
Result: Mutex { data: 2 }
Result: Mutex { data: 3 }
Result: Mutex { data: 4 }
Result: Mutex { data: 5 }
Result: Mutex { data: 6 }
Result: Mutex { data: 6 }
Result: Mutex { data: 8 }
Result: 10

You're printing the results in the main thread, which runs in parallel to the other threads, and thus doesn't give you deterministic results. If you instead print the results in the spawned threads after the mutex has been locked (to ensure only one thread sees the value at a time), you'll get more reasonable results:
let counter = Arc::new(Mutex::new(0));
let mut handles = vec![];
for _ in 0..10 {
let counter = Arc::clone(&counter);
let handle = thread::spawn(move || {
let mut num = counter.lock().unwrap();
*num += 1;
println!("Result: {:?}", num);
});
handles.push(handle);
}
for handle in handles {
handle.join().unwrap();
}
println!("Result: {}", *counter.lock().unwrap());
Output:
Result: 1
Result: 2
Result: 3
Result: 4
Result: 5
Result: 6
Result: 7
Result: 8
Result: 9
Result: 10
Result: 10

How to give each CPU core mutable access to a portion of a Vec? [duplicate]

This question already has an answer here:
How do I pass disjoint slices from a vector to different threads?
(1 answer)
Closed 4 years ago.
I've got an embarrassingly parallel bit of graphics rendering code that I would like to run across my CPU cores. I've coded up a test case (the function computed is nonsense) to explore how I might parallelize it. I'd like to code this using std Rust in order to learn about using std::thread. But, I don't understand how to give each thread a portion of the framebuffer. I'll put the full testcase code below, but I'll try to break it down first.
The sequential form is super simple:
let mut buffer0 = vec![vec![0i32; WIDTH]; HEIGHT];
for j in 0..HEIGHT {
for i in 0..WIDTH {
buffer0[j][i] = compute(i as i32,j as i32);
}
}
I thought that it would help to make a buffer that was the same size, but re-arranged to be 3D & indexed by core first. This is the same computation, just a reordering of the data to show the workings.
let mut buffer1 = vec![vec![vec![0i32; WIDTH]; y_per_core]; num_logical_cores];
for c in 0..num_logical_cores {
for y in 0..y_per_core {
let j = y*num_logical_cores + c;
if j >= HEIGHT {
break;
}
for i in 0..WIDTH {
buffer1[c][y][i] = compute(i as i32,j as i32)
}
}
}
But, when I try to put the inner part of the code in a closure & create a thread, I get errors about the buffer & lifetimes. I basically don't understand what to do & could use some guidance. I want per_core_buffer to just temporarily refer to the data in buffer2 that belongs to that core & allow it to be written, synchronize all the threads & then read buffer2 afterwards. Is this possible?
let mut buffer2 = vec![vec![vec![0i32; WIDTH]; y_per_core]; num_logical_cores];
let mut handles = Vec::new();
for c in 0..num_logical_cores {
let per_core_buffer = &mut buffer2[c]; // <<< lifetime error
let handle = thread::spawn(move || {
for y in 0..y_per_core {
let j = y*num_logical_cores + c;
if j >= HEIGHT {
break;
}
for i in 0..WIDTH {
per_core_buffer[y][i] = compute(i as i32,j as i32)
}
}
});
handles.push(handle)
}
for handle in handles {
handle.join().unwrap();
}
The error is this & I don't understand:
error[E0597]: `buffer2` does not live long enough
--> src/main.rs:50:36
|
50 | let per_core_buffer = &mut buffer2[c]; // <<< lifetime error
| ^^^^^^^ borrowed value does not live long enough
...
88 | }
| - borrowed value only lives until here
|
= note: borrowed value must be valid for the static lifetime...
The full testcase is:
extern crate num_cpus;
use std::time::Instant;
use std::thread;
fn compute(x: i32, y: i32) -> i32 {
(x*y) % (x+y+10000)
}
fn main() {
let num_logical_cores = num_cpus::get();
const WIDTH: usize = 40000;
const HEIGHT: usize = 10000;
let y_per_core = HEIGHT/num_logical_cores + 1;
// ------------------------------------------------------------
// Serial Calculation...
let mut buffer0 = vec![vec![0i32; WIDTH]; HEIGHT];
let start0 = Instant::now();
for j in 0..HEIGHT {
for i in 0..WIDTH {
buffer0[j][i] = compute(i as i32,j as i32);
}
}
let dur0 = start0.elapsed();
// ------------------------------------------------------------
// On the way to Parallel Calculation...
// Reorder the data buffer to be 3D with one 2D region per core.
let mut buffer1 = vec![vec![vec![0i32; WIDTH]; y_per_core]; num_logical_cores];
let start1 = Instant::now();
for c in 0..num_logical_cores {
for y in 0..y_per_core {
let j = y*num_logical_cores + c;
if j >= HEIGHT {
break;
}
for i in 0..WIDTH {
buffer1[c][y][i] = compute(i as i32,j as i32)
}
}
}
let dur1 = start1.elapsed();
// ------------------------------------------------------------
// Actual Parallel Calculation...
let mut buffer2 = vec![vec![vec![0i32; WIDTH]; y_per_core]; num_logical_cores];
let mut handles = Vec::new();
let start2 = Instant::now();
for c in 0..num_logical_cores {
let per_core_buffer = &mut buffer2[c]; // <<< lifetime error
let handle = thread::spawn(move || {
for y in 0..y_per_core {
let j = y*num_logical_cores + c;
if j >= HEIGHT {
break;
}
for i in 0..WIDTH {
per_core_buffer[y][i] = compute(i as i32,j as i32)
}
}
});
handles.push(handle)
}
for handle in handles {
handle.join().unwrap();
}
let dur2 = start2.elapsed();
println!("Runtime: Serial={0:.3}ms, AlmostParallel={1:.3}ms, Parallel={2:.3}ms",
1000.*dur0.as_secs() as f64 + 1e-6*(dur0.subsec_nanos() as f64),
1000.*dur1.as_secs() as f64 + 1e-6*(dur1.subsec_nanos() as f64),
1000.*dur2.as_secs() as f64 + 1e-6*(dur2.subsec_nanos() as f64));
// Sanity check
for j in 0..HEIGHT {
let c = j % num_logical_cores;
let y = j / num_logical_cores;
for i in 0..WIDTH {
if buffer0[j][i] != buffer1[c][y][i] {
println!("wtf1? {0} {1} {2} {3}",i,j,buffer0[j][i],buffer1[c][y][i])
}
if buffer0[j][i] != buffer2[c][y][i] {
println!("wtf2? {0} {1} {2} {3}",i,j,buffer0[j][i],buffer2[c][y][i])
}
}
}
}

Thanks to #Shepmaster for the pointers and clarification that this is not an easy problem for Rust, and that I needed to consider crates to find a reasonable solution. I'm only just starting out in Rust, so this really wasn't clear to me.
I liked the ability to control the number of threads that scoped_threadpool gives, so I went with that. Translating my code from above directly, I tried to use the 4D buffer with core as the most-significant-index and that ran into troubles because that 3D vector does not implement the Copy trait. The fact that it implements Copy makes me concerned about performance, but I went back to the original problem and implemented it more directly & found a reasonable speedup by making each row a thread. Copying each row will not be a large memory overhead.
The code that works for me is:
let mut buffer2 = vec![vec![0i32; WIDTH]; HEIGHT];
let mut pool = Pool::new(num_logical_cores as u32);
pool.scoped(|scope| {
let mut y = 0;
for e in &mut buffer2 {
scope.execute(move || {
for x in 0..WIDTH {
(*e)[x] = compute(x as i32,y as i32);
}
});
y += 1;
}
});
On a 6 core, 12 thread i7-8700K for 400000x4000 testcase this runs in 3.2 seconds serially & 481ms in parallel--a reasonable speedup.
EDIT: I continued to think about this issue and got a suggestion from Rustlang on twitter that I should consider rayon. I converted my code to rayon and got similar speedup with the following code.
let mut buffer2 = vec![vec![0i32; WIDTH]; HEIGHT];
buffer2
.par_iter_mut()
.enumerate()
.map(|(y,e): (usize, &mut Vec<i32>)| {
for x in 0..WIDTH {
(*e)[x] = compute(x as i32,y as i32);
}
})
.collect::<Vec<_>>();

Application on OSX cannot spawn more than 2048 threads

I have a Rust application on on OSX firing up a large amount of threads as can be seen in the code below, however, after looking at how many max threads my version of OSX is allowed to create via the sysctl kern.num_taskthreads command, I can see that it is kern.num_taskthreads: 2048 which explains why I can't spin up over 2048 threads.
How do I go about getting past this hard limit?
let threads = 300000;
let requests = 1;
for _x in 0..threads {
println!("{}", _x);
let request_clone = request.clone();
let handle = thread::spawn(move || {
for _y in 0..requests {
request_clone.lock().unwrap().push((request::Request::new(request::Request::create_request())));
}
});
child_threads.push(handle);
}

Before starting, I'd encourage you to read about the C10K problem. When you get into this scale, there's a lot more things you need to keep in mind.
That being said, I'd suggest looking at mio...
a lightweight IO library for Rust with a focus on adding as little overhead as possible over the OS abstractions.
Specifically, mio provides an event loop, which allows you to handle a large number of connections without spawning threads. Unfortunately, I don't know of a HTTP library that currently supports mio. You could create one and be a hero to the Rust community!

Not sure how helpful this will be, but I was trying to create a small pool of threads that will create connections and then send them over to an event loop via a channel for reading.
I'm sure this code is probably pretty bad, but here it is anyways for examples. It uses the Hyper library, like you mentioned.
extern crate hyper;
use std::io::Read;
use std::thread;
use std::thread::{JoinHandle};
use std::sync::{Arc, Mutex};
use std::sync::mpsc::channel;
use hyper::Client;
use hyper::client::Response;
use hyper::header::Connection;
const TARGET: i32 = 100;
const THREADS: i32 = 10;
struct ResponseWithString {
index: i32,
response: Response,
data: Vec<u8>,
complete: bool
}
fn main() {
// Create a client.
let url: &'static str = "http://www.gooogle.com/";
let mut threads = Vec::<JoinHandle<()>>::with_capacity((TARGET * 2) as usize);
let conn_count = Arc::new(Mutex::new(0));
let (tx, rx) = channel::<ResponseWithString>();
for _ in 0..THREADS {
// Move var references into thread context
let conn_count = conn_count.clone();
let tx = tx.clone();
let t = thread::spawn(move || {
loop {
let idx: i32;
{
// Lock, increment, and release
let mut count = conn_count.lock().unwrap();
*count += 1;
idx = *count;
}
if idx > TARGET {
break;
}
let mut client = Client::new();
// Creating an outgoing request.
println!("Creating connection {}...", idx);
let res = client.get(url) // Get URL...
.header(Connection::close()) // Set headers...
.send().unwrap(); // Fire!
println!("Pushing response {}...", idx);
tx.send(ResponseWithString {
index: idx,
response: res,
data: Vec::<u8>::with_capacity(1024),
complete: false
}).unwrap();
}
});
threads.push(t);
}
let mut responses = Vec::<ResponseWithString>::with_capacity(TARGET as usize);
let mut buf: [u8; 1024] = [0; 1024];
let mut completed_count = 0;
loop {
if completed_count >= TARGET {
break; // No more work!
}
match rx.try_recv() {
Ok(r) => {
println!("Incoming response! {}", r.index);
responses.push(r)
},
_ => { }
}
for r in &mut responses {
if r.complete {
continue;
}
// Read the Response.
let res = &mut r.response;
let data = &mut r.data;
let idx = &r.index;
match res.read(&mut buf) {
Ok(i) => {
if i == 0 {
println!("No more data! {}", idx);
r.complete = true;
completed_count += 1;
}
else {
println!("Got data! {} => {}", idx, i);
for x in 0..i {
data.push(buf[x]);
}
}
}
Err(e) => {
panic!("Oh no! {} {}", idx, e);
}
}
}
}
}

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Why is read_line(..) much slower than lines()? - io

Related

Rust, how to perform basic recursive async?

faster way to read a file in chunks

Why is the counter in Mutex is not in accordance with counter increment?

How to give each CPU core mutable access to a portion of a Vec? [duplicate]

Application on OSX cannot spawn more than 2048 threads

Categories

Resources