How do I avoid obfuscating logic in a `loop`?

How do I avoid obfuscating logic in a `loop`? - rust

Trying to respect Rust safety rules leads me to write code that is, in this case, less clear than the alternative.
It's marginal, but must be a very common pattern, so I wonder if there's any better way.
The following example doesn't compile:
async fn query_all_items() -> Vec<u32> {
let mut items = vec![];
let limit = 10;
loop {
let response = getResponse().await;
// response is moved here
items.extend(response);
// can't do this, response is moved above
if response.len() < limit {
break;
}
}
items
}
In order to satisfy Rust safety rules, we can pre-compute the break condition:
async fn query_all_items() -> Vec<u32> {
let mut items = vec![];
let limit = 10;
loop {
let response = getResponse().await;
let should_break = response.len() < limit;
// response is moved here
items.extend(response);
// meh
if should_break {
break;
}
}
items
}
Is there any other way?

I agree with Daniel's point that this should be a while rather than a loop, though I'd move the logic to the while rather than creating a boolean:
let mut len = limit;
while len >= limit {
let response = queryItems(limit).await?;
len = response.len();
items.extend(response);
}

Not that you should do this, but an async stream version is possible. However a plain old loop is much easier to read.
use futures::{future, stream, StreamExt}; // 0.3.19
use rand::{
distributions::{Distribution, Uniform},
rngs::ThreadRng,
};
use std::sync::{Arc, Mutex};
use tokio; // 1.15.0
async fn get_response(rng: Arc<Mutex<ThreadRng>>) -> Vec<u32> {
let mut rng = rng.lock().unwrap();
let range = Uniform::from(0..100);
let len_u32 = range.sample(&mut *rng);
let len_usize = usize::try_from(len_u32).unwrap();
vec![len_u32; len_usize]
}
async fn query_all_items() -> Vec<u32> {
let rng = Arc::new(Mutex::new(ThreadRng::default()));
stream::iter(0..)
.then(|_| async { get_response(Arc::clone(&rng)).await })
.take_while(|v| future::ready(v.len() >= 10))
.collect::<Vec<_>>()
.await
.into_iter()
.flatten()
.collect()
}
#[tokio::main]
async fn main() {
// [46, 46, 46, ..., 78, 78, 78], or whatever random list you get
println!("{:?}", query_all_items().await);
}

I would do this in a while loop since the while will surface the flag more easily.
fn query_all_items () -> Vec<Item> {
let items = vec![];
let limit = 10;
let mut limit_reached = false;
while limit_reached {
let response = queryItems(limit).await?;
limit_reached = response.len() >= limit;
items.extend(response);
}
items
}

Without context it's hard to advise ideal code. I would do:
fn my_body_is_ready() -> Vec<u32> {
let mut acc = vec![];
let min = 10;
loop {
let foo = vec![42];
if foo.len() < min {
acc.extend(foo);
break acc;
} else {
acc.extend(foo);
}
}
}

Related

How to create threads that last entire duration of program and pass immutable chunks for threads to operate on?

I have a bunch of math that has real time constraints. My main loop will just call this function repeatedly and it will always store results into an existing buffer. However, I want to be able to spawn the threads at init time and then allow the threads to run and do their work and then wait for more data. The synchronization I will use a Barrier and have that part working. What I can't get working and have tried various iterations of Arc or crossbeam is splitting the thread spawning up and the actual workload. This is what I have now.
pub const WORK_SIZE: usize = 524_288;
pub const NUM_THREADS: usize = 6;
pub const NUM_TASKS_PER_THREAD: usize = WORK_SIZE / NUM_THREADS;
fn main() {
let mut work: Vec<f64> = Vec::with_capacity(WORK_SIZE);
for i in 0..WORK_SIZE {
work.push(i as f64);
}
crossbeam::scope(|scope| {
let threads: Vec<_> = work
.chunks(NUM_TASKS_PER_THREAD)
.map(|chunk| scope.spawn(move |_| chunk.iter().cloned().sum::<f64>()))
.collect();
let threaded_time = std::time::Instant::now();
let thread_sum: f64 = threads.into_iter().map(|t| t.join().unwrap()).sum();
let threaded_micros = threaded_time.elapsed().as_micros() as f64;
println!("threaded took: {:#?}", threaded_micros);
let serial_time = std::time::Instant::now();
let no_thread_sum: f64 = work.iter().cloned().sum();
let serial_micros = serial_time.elapsed().as_micros() as f64;
println!("serial took: {:#?}", serial_micros);
assert_eq!(thread_sum, no_thread_sum);
println!(
"Threaded performace was {:?}",
serial_micros / threaded_micros
);
})
.unwrap();
}
But I can't find a way to spin these threads up in an init function and then in a do_work function pass work into them. I attempted to do something like this with Arc's and Mutex's but couldn't get everything straight there either. What I want to turn this into is something like the following
use std::sync::{Arc, Barrier, Mutex};
use std::{slice::Chunks, thread::JoinHandle};
pub const WORK_SIZE: usize = 524_288;
pub const NUM_THREADS: usize = 6;
pub const NUM_TASKS_PER_THREAD: usize = WORK_SIZE / NUM_THREADS;
//simplified version of what actual work that code base will do
fn do_work(data: &[f64], result: Arc<Mutex<f64>>, barrier: Arc<Barrier>) {
loop {
barrier.wait();
let sum = data.into_iter().cloned().sum::<f64>();
let mut result = *result.lock().unwrap();
result += sum;
}
}
fn init(
mut data: Chunks<'_, f64>,
result: &Arc<Mutex<f64>>,
barrier: &Arc<Barrier>,
) -> Vec<std::thread::JoinHandle<()>> {
let mut handles = Vec::with_capacity(NUM_THREADS);
//spawn threads, in actual code these would be stored in a lib crate struct
for i in 0..NUM_THREADS {
let result = result.clone();
let barrier = barrier.clone();
let chunk = data.nth(i).unwrap();
handles.push(std::thread::spawn(|| {
//Pass the particular thread the particular chunk it will operate on.
do_work(chunk, result, barrier);
}));
}
handles
}
fn main() {
let mut work: Vec<f64> = Vec::with_capacity(WORK_SIZE);
let mut result = Arc::new(Mutex::new(0.0));
for i in 0..WORK_SIZE {
work.push(i as f64);
}
let work_barrier = Arc::new(Barrier::new(NUM_THREADS + 1));
let threads = init(work.chunks(NUM_TASKS_PER_THREAD), &result, &work_barrier);
loop {
work_barrier.wait();
//actual code base would do something with summation stored in result.
println!("{:?}", result.lock().unwrap());
}
}
I hope this expresses the intent clearly enough of what I need to do. The issue with this specific implementation is that the chunks don't seem to live long enough and when I tried wrapping them in an Arc as it just moved the argument doesn't live long enough to the Arc::new(data.chunk(_)) line.

use std::sync::{Arc, Barrier, Mutex};
use std::thread;
pub const WORK_SIZE: usize = 524_288;
pub const NUM_THREADS: usize = 6;
pub const NUM_TASKS_PER_THREAD: usize = WORK_SIZE / NUM_THREADS;
//simplified version of what actual work that code base will do
fn do_work(data: &[f64], result: Arc<Mutex<f64>>, barrier: Arc<Barrier>) {
loop {
barrier.wait();
let sum = data.iter().sum::<f64>();
*result.lock().unwrap() += sum;
}
}
fn init(
work: Vec<f64>,
result: Arc<Mutex<f64>>,
barrier: Arc<Barrier>,
) -> Vec<thread::JoinHandle<()>> {
let mut handles = Vec::with_capacity(NUM_THREADS);
//spawn threads, in actual code these would be stored in a lib crate struct
for i in 0..NUM_THREADS {
let slice = work[i * NUM_TASKS_PER_THREAD..(i + 1) * NUM_TASKS_PER_THREAD].to_owned();
let result = Arc::clone(&result);
let w = Arc::clone(&barrier);
handles.push(thread::spawn(move || {
do_work(&slice, result, w);
}));
}
handles
}
fn main() {
let mut work: Vec<f64> = Vec::with_capacity(WORK_SIZE);
let result = Arc::new(Mutex::new(0.0));
for i in 0..WORK_SIZE {
work.push(i as f64);
}
let work_barrier = Arc::new(Barrier::new(NUM_THREADS + 1));
let _threads = init(work, Arc::clone(&result), Arc::clone(&work_barrier));
loop {
thread::sleep(std::time::Duration::from_secs(3));
work_barrier.wait();
//actual code base would do something with summation stored in result.
println!("{:?}", result.lock().unwrap());
}
}

Pushing data back upstream from thread

I'm completely open to any and all critiques. I'm completely new to multi-threading and the docs don't make much sense to me.
The code provided is min-reproducible. The matrix I'm using is much more elaborate, but this works. The point is to use a global process for various vectors within the structs concurrently, then push all the data upstream and back into the global.
I'm receiving an error at line 59 the first unwrap(). Is there a better way to approach this problem altogether???
use std::{
sync::mpsc::channel,
sync::{Arc, Mutex},
thread,
};
#[derive(Debug, Clone)]
struct Data {
block: Vec<List>,
}
#[derive(Debug, Clone)]
struct List {
multiplier: f64,
num: f64,
}
static mut IterationData: Data = Data { block: Vec::new() };
const N: u128 = 100;
fn main() {
unsafe {
for n in 0..N {
let mut number = 0.0;
let list = List {
multiplier: n as f64,
num: 1.0,
};
IterationData.block.push(list);
}
let mut data = Arc::new(Mutex::new(IterationData.clone()));
let (tx, rx) = channel();
for n in 0..IterationData.block.len() {
let (data, tx) = (Arc::clone(&data), tx.clone());
thread::spawn(move || {
let mut data = data.lock().unwrap();
let mut data_2 = Arc::new(Mutex::new(IterationData.block[n].clone()));
let (t, r) = channel();
for m in 0..IterationData.block.len() {
let (data_2, t) = (Arc::clone(&data_2), t.clone());
thread::spawn(move || {
let mut data_2 = data_2.lock().unwrap();
data_2.num += N as f64;
if m + 1 == IterationData.block.len() {
t.send(()).unwrap();
}
});
}
r.recv().unwrap();
let test_2 = Arc::try_unwrap(data_2).unwrap().into_inner().unwrap();
data.block[n] = test_2;
if n + 1 == IterationData.block.len() {
tx.send(()).unwrap();
}
});
}
rx.recv().unwrap();
let test_1 = Arc::try_unwrap(data).unwrap().into_inner().unwrap();
IterationData = test_1;
}
}

How to use a clone in a Rust thread

In this rust program, inside the run function, I am trying to pass the "pair_clone" as a parameter for both threads but I keep getting a mismatched type error? I thought I was passing the pair but it says I'm passing an integer instead.
use std::sync::{Arc, Mutex, Condvar};
fn producer(pair: &(Mutex<bool>, Condvar), num_of_loops: u32) {
let (mutex, cv) = pair;
//prints "producing"
}
}
fn consumer(pair: &(Mutex<bool>, Condvar), num_of_loops: u32) {
let (mutex, cv) = pair;
//prints "consuming"
}
}
pub fn run() {
println!("Main::Begin");
let num_of_loops = 5;
let num_of_threads = 4;
let mut array_of_threads = vec!();
let pair = Arc ::new((Mutex::new(true), Condvar::new()));
for pair in 0..num_of_threads {
let pair_clone = pair.clone();
array_of_threads.push(std::thread::spawn( move || producer(&pair_clone, num_of_loops)));
array_of_threads.push(std::thread::spawn( move || consumer(&pair_clone, num_of_loops)));
}
for i in array_of_threads {
i.join().unwrap();
}
println!("Main::End");
}

You have two main errors
The first: you are using the name of the pair as the loop index. This makes pair be the integer the compiler complains about.
The second: you are using one copy while you need two, one for the producer and the other for the consumer
After Edit
use std::sync::{Arc, Mutex, Condvar};
fn producer(pair: &(Mutex<bool>, Condvar), num_of_loops: u32) {
let (mutex, cv) = pair;
//prints "producing"
}
fn consumer(pair: &(Mutex<bool>, Condvar), num_of_loops: u32) {
let (mutex, cv) = pair;
//prints "consuming"
}
pub fn run() {
println!("Main::Begin");
let num_of_loops = 5;
let num_of_threads = 4;
let mut array_of_threads = vec![];
let pair = Arc ::new((Mutex::new(true), Condvar::new()));
for _ in 0..num_of_threads {
let pair_clone1 = pair.clone();
let pair_clone2 = pair.clone();
array_of_threads.push(std::thread::spawn( move || producer(&pair_clone1, num_of_loops)));
array_of_threads.push(std::thread::spawn( move || consumer(&pair_clone2, num_of_loops)));
}
for i in array_of_threads {
i.join().unwrap();
}
println!("Main::End");
}
Demo
Note that I haven't given any attention to the code quality. just fixed the compile errors.

Application on OSX cannot spawn more than 2048 threads

I have a Rust application on on OSX firing up a large amount of threads as can be seen in the code below, however, after looking at how many max threads my version of OSX is allowed to create via the sysctl kern.num_taskthreads command, I can see that it is kern.num_taskthreads: 2048 which explains why I can't spin up over 2048 threads.
How do I go about getting past this hard limit?
let threads = 300000;
let requests = 1;
for _x in 0..threads {
println!("{}", _x);
let request_clone = request.clone();
let handle = thread::spawn(move || {
for _y in 0..requests {
request_clone.lock().unwrap().push((request::Request::new(request::Request::create_request())));
}
});
child_threads.push(handle);
}

Before starting, I'd encourage you to read about the C10K problem. When you get into this scale, there's a lot more things you need to keep in mind.
That being said, I'd suggest looking at mio...
a lightweight IO library for Rust with a focus on adding as little overhead as possible over the OS abstractions.
Specifically, mio provides an event loop, which allows you to handle a large number of connections without spawning threads. Unfortunately, I don't know of a HTTP library that currently supports mio. You could create one and be a hero to the Rust community!

Not sure how helpful this will be, but I was trying to create a small pool of threads that will create connections and then send them over to an event loop via a channel for reading.
I'm sure this code is probably pretty bad, but here it is anyways for examples. It uses the Hyper library, like you mentioned.
extern crate hyper;
use std::io::Read;
use std::thread;
use std::thread::{JoinHandle};
use std::sync::{Arc, Mutex};
use std::sync::mpsc::channel;
use hyper::Client;
use hyper::client::Response;
use hyper::header::Connection;
const TARGET: i32 = 100;
const THREADS: i32 = 10;
struct ResponseWithString {
index: i32,
response: Response,
data: Vec<u8>,
complete: bool
}
fn main() {
// Create a client.
let url: &'static str = "http://www.gooogle.com/";
let mut threads = Vec::<JoinHandle<()>>::with_capacity((TARGET * 2) as usize);
let conn_count = Arc::new(Mutex::new(0));
let (tx, rx) = channel::<ResponseWithString>();
for _ in 0..THREADS {
// Move var references into thread context
let conn_count = conn_count.clone();
let tx = tx.clone();
let t = thread::spawn(move || {
loop {
let idx: i32;
{
// Lock, increment, and release
let mut count = conn_count.lock().unwrap();
*count += 1;
idx = *count;
}
if idx > TARGET {
break;
}
let mut client = Client::new();
// Creating an outgoing request.
println!("Creating connection {}...", idx);
let res = client.get(url) // Get URL...
.header(Connection::close()) // Set headers...
.send().unwrap(); // Fire!
println!("Pushing response {}...", idx);
tx.send(ResponseWithString {
index: idx,
response: res,
data: Vec::<u8>::with_capacity(1024),
complete: false
}).unwrap();
}
});
threads.push(t);
}
let mut responses = Vec::<ResponseWithString>::with_capacity(TARGET as usize);
let mut buf: [u8; 1024] = [0; 1024];
let mut completed_count = 0;
loop {
if completed_count >= TARGET {
break; // No more work!
}
match rx.try_recv() {
Ok(r) => {
println!("Incoming response! {}", r.index);
responses.push(r)
},
_ => { }
}
for r in &mut responses {
if r.complete {
continue;
}
// Read the Response.
let res = &mut r.response;
let data = &mut r.data;
let idx = &r.index;
match res.read(&mut buf) {
Ok(i) => {
if i == 0 {
println!("No more data! {}", idx);
r.complete = true;
completed_count += 1;
}
else {
println!("Got data! {} => {}", idx, i);
for x in 0..i {
data.push(buf[x]);
}
}
}
Err(e) => {
panic!("Oh no! {} {}", idx, e);
}
}
}
}
}

Finding a way to solve "...does not live long enough"

I'm building a multiplex in rust. It's one of my first applications and a great learning experience!
However, I'm facing a problem and I cannot find out how to solve it in rust:
Whenever a new channel is added to the multiplex, I have to listen for data on this channel.
The new channel is allocated on the stack when it is requested by the open() function.
However, this channel must not be allocated on the stack but on the heap somehow, because it should stay alive and should not be freed in the next iteration of my receiving loop.
Right now my code looks like this (v0.10-pre):
extern crate collections;
extern crate sync;
use std::comm::{Chan, Port, Select};
use std::mem::size_of_val;
use std::io::ChanWriter;
use std::io::{ChanWriter, PortReader};
use collections::hashmap::HashMap;
use sync::{rendezvous, SyncPort, SyncChan};
use std::task::try;
use std::rc::Rc;
struct MultiplexStream {
internal_port: Port<(u32, Option<(Port<~[u8]>, Chan<~[u8]>)>)>,
internal_chan: Chan<u32>
}
impl MultiplexStream {
fn new(downstream: (Port<~[u8]>, Chan<~[u8]>)) -> ~MultiplexStream {
let (downstream_port, downstream_chan) = downstream;
let (p1, c1): (Port<u32>, Chan<u32>) = Chan::new();
let (p2, c2):
(Port<(u32, Option<(Port<~[u8]>, Chan<~[u8]>)>)>,
Chan<(u32, Option<(Port<~[u8]>, Chan<~[u8]>)>)>) = Chan::new();
let mux = ~MultiplexStream {
internal_port: p2,
internal_chan: c1
};
spawn(proc() {
let mut pool = Select::new();
let mut by_port_num = HashMap::new();
let mut by_handle_id = HashMap::new();
let mut handle_id2port_num = HashMap::new();
let mut internal_handle = pool.handle(&p1);
let mut downstream_handle = pool.handle(&downstream_port);
unsafe {
internal_handle.add();
downstream_handle.add();
}
loop {
let handle_id = pool.wait();
if handle_id == internal_handle.id() {
// setup new port
let port_num: u32 = p1.recv();
if by_port_num.contains_key(&port_num) {
c2.send((port_num, None))
}
else {
let (p1_,c1_): (Port<~[u8]>, Chan<~[u8]>) = Chan::new();
let (p2_,c2_): (Port<~[u8]>, Chan<~[u8]>) = Chan::new();
/********************************/
let mut h = pool.handle(&p1_); // <--
/********************************/
/* the error is HERE ^^^ */
/********************************/
unsafe { h.add() };
by_port_num.insert(port_num, c2_);
handle_id2port_num.insert(h.id(), port_num);
by_handle_id.insert(h.id(), h);
c2.send((port_num, Some((p2_,c1_))));
}
}
else if handle_id == downstream_handle.id() {
// demultiplex
let res = try(proc() {
let mut reader = PortReader::new(downstream_port);
let port_num = reader.read_le_u32().unwrap();
let data = reader.read_to_end().unwrap();
return (port_num, data);
});
if res.is_ok() {
let (port_num, data) = res.unwrap();
by_port_num.get(&port_num).send(data);
}
else {
// TODO: handle error
}
}
else {
// multiplex
let h = by_handle_id.get_mut(&handle_id);
let port_num = handle_id2port_num.get(&handle_id);
let port_num = *port_num;
let data = h.recv();
try(proc() {
let mut writer = ChanWriter::new(downstream_chan);
writer.write_le_u32(port_num);
writer.write(data);
writer.flush();
});
// todo check if chan was closed
}
}
});
return mux;
}
fn open(self, port_num: u32) -> Result<(Port<~[u8]>, Chan<~[u8]>), ()> {
let res = try(proc() {
self.internal_chan.send(port_num);
let (n, res) = self.internal_port.recv();
assert!(n == port_num);
return res;
});
if res.is_err() {
return Err(());
}
let res = res.unwrap();
if res.is_none() {
return Err(());
}
let (p,c) = res.unwrap();
return Ok((p,c));
}
}
And the compiler raises this error:
multiplex_stream.rs:81:31: 81:35 error: `p1_` does not live long enough
multiplex_stream.rs:81 let mut h = pool.handle(&p1_);
^~~~
multiplex_stream.rs:48:16: 122:4 note: reference must be valid for the block at 48:15...
multiplex_stream.rs:48 spawn(proc() {
multiplex_stream.rs:49 let mut pool = Select::new();
multiplex_stream.rs:50 let mut by_port_num = HashMap::new();
multiplex_stream.rs:51 let mut by_handle_id = HashMap::new();
multiplex_stream.rs:52 let mut handle_id2port_num = HashMap::new();
multiplex_stream.rs:53
...
multiplex_stream.rs:77:11: 87:7 note: ...but borrowed value is only valid for the block at 77:10
multiplex_stream.rs:77 else {
multiplex_stream.rs:78 let (p1_,c1_): (Port<~[u8]>, Chan<~[u8]>) = Chan::new();
multiplex_stream.rs:79 let (p2_,c2_): (Port<~[u8]>, Chan<~[u8]>) = Chan::new();
multiplex_stream.rs:80
multiplex_stream.rs:81 let mut h = pool.handle(&p1_);
multiplex_stream.rs:82 unsafe { h.add() };
Does anyone have an idea how to solve this issue?

The problem is that the new channel that you create does not live long enough—its scope is that of the else block only. You need to ensure that it will live longer—its scope must be at least that of pool.
I haven't made the effort to understand precisely what your code is doing, but what I would expect to be the simplest way to ensure the lifetime of the ports is long enough is to place it into a vector at the same scope as pool, e.g. let ports = ~[];, inserting it with ports.push(p1_); and then taking the reference as &ports[ports.len() - 1]. Sorry, that won't cut it—you can't add new items to a vector while references to its elements are active. You'll need to restructure things somewhat if you want that appraoch to work.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

How do I avoid obfuscating logic in a `loop`? - rust

I agree with Daniel's point that this should be a while rather than a loop, though I'd move the logic to the while rather than creating a boolean: let mut len = limit; while len >= limit { let response = queryItems(limit).await?; len = response.len(); items.extend(response); }

Without context it's hard to advise ideal code. I would do: fn my_body_is_ready() -> Vec<u32> { let mut acc = vec![]; let min = 10; loop { let foo = vec![42]; if foo.len() < min { acc.extend(foo); break acc; } else { acc.extend(foo); } } }

Related

How to create threads that last entire duration of program and pass immutable chunks for threads to operate on?

Pushing data back upstream from thread

How to use a clone in a Rust thread

Application on OSX cannot spawn more than 2048 threads

Finding a way to solve "...does not live long enough"

Categories

Resources