Multithreaded branch and bound search in Rust - multithreading

I am trying to solve a mathematical minimization problem using a branch and bound algorithm, in Rust. For simplicity, let's say that we're trying to find the objective value only.
Functional
Solving the problem requires solving an easier version ("the relaxation"), which then may give rise to precisely two subproblems. Solving a relaxation takes a long time and needs to happen on a separate thread.
The difficulty is that a problem's subproblems are known only after the problem is solved. As such, the binary tree in which the problem and its subproblems live, grows during the computation. Moreover, the solving of a relaxation might result in nodes of the tree getting pruned. For each known problem, a sizable table has to be stored. Because of limited memory capacity, I want to search this tree in a depth-first order.
Non-functional
Performance of the tree doesn't matter as much; the vast majority of the time will be spent solving the relaxations. I would like to avoid using manual relative references and similar constructs, and instead use Rust's toolbox of references to solve this problem. As a bonus, I'd like to capture the life-cycle of the problem-nodes in the type system:
The problem is known It is being solved The relaxation is solved and If the problem turns out to be feasible: If the objective value of the relaxation below a global maximum, the subproblems are computed If not, the problem is marked suboptimal, further subproblems are irrelevant If not, the problem is marked infeasible, further subproblems are irrelevant Both subproblems are solved and only the objective value of this problem is stored
Example of attempt
I've attempted several approaches, but I keep running into problems. My latest approach is best summarized by the definition of the node in the tree. The problem data is stored in the Tableau.
enum Node<'a, T, TP> {
/// The problem to solve. Nothing is known.
Undecided {
parent: Option<rc::Weak<Self>>,
depth: u64,
tableau: Tableau<'a, T, TP>,
},
/// Being calculated
ActiveCalculation {
parent: Option<rc::Weak<Self>>,
depth: u64,
tableau: Arc<Mutex<Tableau<'a, T, TP>>>,
sender_to_active_thread: Sender<PruneReason>,
},
/// The problem is solved, and the children (if any) should be created while this variant is
/// being instantiated.
NodeOptimal {
parent: Option<Weak<Self>>,
relaxation_value: f64,
best_lower_bound: Cell<Option<f64>>,
lower: Rc<Self>,
upper: Rc<Self>,
},
/// This problem and all generated subproblems are solved.
SubTreeOptimal {
lower_bound: f64,
},
/// Pruned.
Pruned(PruneReason), // e.g. SubOptimal, Infeasible
}
I tried to manage the tree with the main thread, while providing worker threads with an Arc to the problem data. The sender_to_active_thread field on the ActiveCalculation variant is used to terminate a calculation, when newly found information determines that the calculation could only yield a suboptimal result.
The problem with the above attempt is that I don't know how to update the tree once a solution is found. See below the code that fetches the next problem from the tree, hands it off to a thread, and processes the result:
let (solution_sender, solution_receiver) = channel();
// Stop condition
while !tree.finished() {
let mut possible_next_problem = tree.next_problem();
// Wait condition
while active_threads == max_threads || possible_next_problem.is_some() {
// Wait for a signal, block until a thread has terminated
let (solved_problem, result) = solution_receiver.recv().unwrap();
active_threads -= 1;
let new_node = match result {
None => Node::Pruned(PruneReason::Infeasible),
Some((solution, objective_value)) => {
unimplemented!()
}
};
tree.update(solved_problem, new_node);
possible_next_problem = tree.next_problem();
}
// Assumed to be of `Undecided` variant
let next_problem = possible_next_problem.unwrap();
let solution_sender_clone = solution_sender.clone();
let (termination_sender, termination_receiver) = channel();
*next_problem = next_problem.undecided_to_active_calculation(termination_sender);
let pointer_to_problem_in_tree = next_problem.clone();
if let Node::ActiveCalculation { tableau, .. } = *next_problem {
thread::spawn(move || {
let result = solve_in_separate_thread(&mut *tableau.lock().expect("Should be of variant `Undecided`"),
termination_receiver);
solution_sender_clone.send((pointer_to_problem_in_tree, result)).unwrap();
});
} else { panic!("Should be of variant `ActiveCalculation`.") };
}
The compiler tells me that just moving an Arc<Node> to the thread (and sending it to the main thread again) requires that Node and all its fields are Sync.
The code can be found here.

Related

How can I mutate a shared variable from multiple threads, disregarding data races?

How can I mutate the variable i inside the closure? Race conditions are considered to be acceptable.
use rayon::prelude::*;
fn main() {
let mut i = 0;
let mut closure = |_| {
i = i + 1;
};
(0..100).into_par_iter().for_each(closure);
}
This code fails with:
error[E0525]: expected a closure that implements the `Fn` trait, but this closure only implements `FnMut`
--> src\main.rs:6:23
|
6 | let mut closure = |_| {
| ^^^ this closure implements `FnMut`, not `Fn`
7 | i = i + 1;
| - closure is `FnMut` because it mutates the variable `i` here
...
10 | (0..100).into_par_iter().for_each(closure);
| -------- the requirement to implement `Fn` derives from here
There is a difference between a race condition and a data race.
A race condition is any situation when the outcome of two or more events depends on which one happens first, and nothing enforces a relative ordering between them. This can be fine, and as long as all possible orderings are acceptable, you may accept that your code has a race in it.
A data race is a specific kind of race condition where the events are unsynchronized accesses to the same memory and at least one of them is a mutation. Data races are undefined behavior. You cannot "accept" a data race because its existence invalidates the entire program; a program with an unavoidable data race in it does not have any defined behavior at all, so it does nothing useful.
Here's a version of your code that has a race condition, but not a data race:
use std::sync::atomic::{AtomicI32, Ordering};
let i = AtomicI32::new(0);
let closure = |_| {
i.store(i.load(Ordering::Relaxed) + 1, Ordering::Relaxed);
};
(0..100).into_par_iter().for_each(closure);
Because the loads and stores are not ordered with respect to the concurrently executing threads, there is no guarantee that the final value of i will be exactly 100. It could be 99, or 72, or 41, or even 1. This code has indeterminate, but defined behavior because although you don't know the exact order of events or the final outcome, you can still reason about its behavior. In this case, you can prove that the final value of i must be at least 1 and no greater than 100.
Note that in order to write this racy code, I still had to use AtomicI32 and atomic load and store. Not caring about the order of events in different threads doesn't free you from having to think about synchronizing memory access.
If your original code compiled, it would have a data race.¹ This means there are no guarantees about its behavior at all. So, assuming you actually accept data races, here's a version of your code that is consistent with what a compiler is allowed to do with it:
fn main() {}
Oh, right, undefined behavior must never occur. So this hypothetical compiler just deleted all your code because it is never allowed to run in the first place.
It's actually even worse than that. Suppose you had written something like this:
fn main() {
let mut i = 0;
let mut closure = |_| {
i = i + 1;
};
(0..100).into_par_iter().for_each(closure);
if i < 100 || i >= 100 {
println!("this should always print");
} else {
println!("this should never print");
}
}
What should this code print? If there are no data races, this code must emit the following:
this should always print
But if we allow data races, it might also print this:
this should never print
Or it could even print this:
this should never print
this should always print
If you think there is no way it could do the last thing, you are wrong. Undefined behavior in a program cannot be accepted, because it invalidates analysis even of correct code that has nothing obvious to do with the original error.
How likely is any of this to happen, if you just use unsafe and ignore the possibility of a data race? Well, probably not very likely, to be honest. If you use unsafe to bypass the checks and look at the generated assembly, it's likely to even be correct. But the only way to be sure is to write in assembly language directly, understand and code to the machine model: if you want to use Rust, you have to code to Rust's model, even if that means you lose a little performance.
How much performance? Probably not much, if anything. Atomic operations are very efficient and on many architectures, including the one you're probably using right now to read this, they actually are exactly as fast as non-atomic operations in cases like this. If you really want to know how much potential performance you lose, write both versions and benchmark them, or simply compare the assembly code with and without atomic operations.
¹ Technically, we can't say that a data race must occur, because it depends on whether any threads actually access i at the same time or not. If for_each decided for some reason to run all the closures on the same OS thread, for example, this code would not have a data race. But the fact that it may have a data race still poisons our analysis because we can't be sure it doesn't.
You cannot do that exactly, you need to ensure that some safe synchronisation happens in the under-layers for example. For example using an Arc + some kind of atomics operations.
You have some examples in the documentation:
use std::sync::Arc;
use std::sync::atomic::{AtomicUsize, Ordering};
use std::thread;
let val = Arc::new(AtomicUsize::new(5));
for _ in 0..10 {
let val = Arc::clone(&val);
thread::spawn(move || {
let v = val.fetch_add(1, Ordering::SeqCst);
println!("{:?}", v);
});
}
Playground
(as Adien4 points: there is no need of the Arc or the move in the second example- Rayon only requires the Fn to be Send + Sync)
Which lead us to your example, that could be adapted as:
use std::sync::Arc;
use std::sync::atomic::{AtomicUsize, Ordering};
use rayon::prelude::*;
fn main() {
let i = AtomicUsize::new(5);
let mut closure = |_| {
i.fetch_add(1, Ordering::SeqCst);
};
(0..100).into_par_iter().for_each(closure);
}
Playground
This is not possible as it would require parallel access to i which causes race conditions. You can try to use a Mutex to allow access from multiple threads.
The accepted answer explains the situation thoroughly - you definitely don't want data races in your code, because they are undefined behavior, and distinct from the more general "race conditions". Nor do you need data races to update shared data, there are better efficient ways to do that. But to satisfy curiosity, this answer attempts to answer the question as literally asked - if you were reckless enough to intentionally ignore data races and incur undefined behavior at your own peril, could you do it in unsafe Rust?
You indeed can. Code and discussion in this answer is provided for educational purposes, such as to check what kind of code the compiler generates. If code that intentionally incurs UB offends you, please stop reading here. You've been warned. :)
The obvious way to convince Rust to allow this data race is to create a raw pointer to mut i, send the pointer to the closure, and dereference it to mutate i. This dereference is unsafe because it leaves it to the programmer to ensure that no mutable references exist simultaneously, and that writes to the underlying data are synchronized with other accesses to it. While we can easily ensure the former by just not creating a reference, we obviously won't ensure the latter:
// Must wrap raw pointer in type that implements Sync.
struct Wrap(*mut i32);
unsafe impl Sync for Wrap {}
// Contains undefined behavior - don't use this!
fn main() {
let mut i = 0;
let i_ptr = Wrap(&mut i as *mut i32);
let closure = |_| {
unsafe { *i_ptr.0 = *i_ptr.0 + 1 }; // XXX: UB!
};
(0..100).into_par_iter().for_each(closure);
println!("{}", i);
}
Playground
Note that pointers don't implement Sync or Send, so they require a wrapper to use them in threads. The wrapper unsafely implements Sync, but this unsafe is actually not UB - accessing to the pointer is safe, and there would be no UB if we, say, only printed it, or even dereferenced it for reading (as long as no one else writes to i). Writing to the dereferenced pointer is where we create UB, and that itself requires unsafe.
While this is the kind of code that the OP might have been after (it even prints 100 when run), it's of course still undefined behavior, and could break on a different hardware, or when upgraded to a different compiler. Making even a slight change to the code, such as using let i_ref = unsafe { &mut *i_ptr } to create a mutable reference and update it with *i_ref += 1 will make it change behavior.
In the context of C++11 Hans Boehm wrote an entire article on the danger of so-called "benign" data races, and why they cannot be allowed in the C++ memory model (which Rust shares).

Why do we need Rc<T> when immutable references can do the job?

To illustrate the necessity of Rc<T>, the Book presents the following snippet (spoiler: it won't compile) to show that we cannot enable multiple ownership without Rc<T>.
enum List {
Cons(i32, Box<List>),
Nil,
}
use crate::List::{Cons, Nil};
fn main() {
let a = Cons(5, Box::new(Cons(10, Box::new(Nil))));
let b = Cons(3, Box::new(a));
let c = Cons(4, Box::new(a));
}
It then claims (emphasis mine)
We could change the definition of Cons to hold references instead, but then we would have to specify lifetime parameters. By specifying lifetime parameters, we would be specifying that every element in the list will live at least as long as the entire list. The borrow checker wouldn’t let us compile let a = Cons(10, &Nil); for example, because the temporary Nil value would be dropped before a could take a reference to it.
Well, not quite. The following snippet compiles under rustc 1.52.1
enum List<'a> {
Cons(i32, &'a List<'a>),
Nil,
}
use crate::List::{Cons, Nil};
fn main() {
let a = Cons(5, &Cons(10, &Nil));
let b = Cons(3, &a);
let c = Cons(4, &a);
}
Note that by taking a reference, we no longer need a Box<T> indirection to hold the nested List. Furthermore, I can point both b and c to a, which gives a multiple conceptual owners (which are actually borrowers).
Question: why do we need Rc<T> when immutable references can do the job?
With "ordinary" borrows you can very roughly think of a statically proven order-by-relationship, where the compiler needs to prove that the owner of something always comes to life before any borrows and always dies after all borrows died (a owns String, it comes to life before b which borrows a, then b dies, then a dies; valid). For a lot of use-cases, this can be done, which is Rust's insight to make the borrow-system practical.
There are cases where this can't be done statically. In the example you've given, you're sort of cheating, because all borrows have a 'static-lifetime; and 'static items can be "ordered" before or after anything out to infinity because of that - so there actually is no constraint in the first place. The example becomes much more complex when you take different lifetimes (many List<'a>, List<'b>, etc.) into account. This issue will become apparent when you try to pass values into functions and those functions try to add items. This is because values created inside functions will die after leaving their scope (i.e. when the enclosing function returns), so we cannot keep a reference to them afterwards, or there will be dangling references.
Rc comes in when one can't prove statically who is the original owner, whose lifetime starts before any other and ends after any other(!). A classic example is a graph structure derived from user input, where multiple nodes can refer to one other node. They need to form a "born after, dies before" relationship with the node they are referencing at runtime, to guarantee that they never reference invalid data. The Rc is a very simple solution to that because a simple counter can represent these relationships. As long as the counter is not zero, some "born after, dies before" relationship is still active. The key insight here is that it does not matter in which order the nodes are created and die because any order is valid. Only the points on either end - where the counter gets to 0 - are actually important, any increase or decrease in between is the same (0=+1+1+1-1-1-1=0 is the same as 0=+1+1-1+1-1-1=0) The Rc is destroyed when the counter reaches zero. In the graph example this is when a node is not being referred to any longer. This tells the owner of that Rc (the last node referring) "Oh, it turns out I am the owner of the underlying node - nobody knew! - and I get to destroy it".
Even single-threaded, there are still times the destruction order is determined dynamically, whereas for the borrow checker to work, there must be a determined lifetime tree (stack).
fn run() {
let writer = Rc::new(std::io::sink());
let mut counters = vec![
(7, Rc::clone(&writer)),
(7, writer),
];
while !counters.is_empty() {
let idx = read_counter_index();
counters[idx].0 -= 1;
if counters[idx].0 == 0 {
counters.remove(idx);
}
}
}
fn read_counter_index() -> usize {
unimplemented!()
}
As you can see in this example, the order of destruction is determined by user input.
Another reason to use smart pointers is simplicity. The borrow checker does incur some code complexity. For example, using smart pointer, you are able to maneuver around the self-referential struct problem with a tiny overhead.
struct SelfRefButDynamic {
a: Rc<u32>,
b: Rc<u32>,
}
impl SelfRefButDynamic {
pub fn new() -> Self {
let a = Rc::new(0);
let b = Rc::clone(&a);
Self { a, b }
}
}
This is not possible with static (compile-time) references:
struct WontDo {
a: u32,
b: &u32,
}

Searching a String into Vec<String> in rust

I'm writing a program that interprets a language.
I need to search for a string (not known at compile time) in a Vec.
fn get_name_index(name: &String, array: &Vec<String>) -> usize {
match array.binary_search(name) {
Ok(index) => index,
Err(_) => {
eprintln!("Error : variable {:?} not found in name array", name);
std::process::exit(1)
}
}
}
This happens multiple times during execution, but at the moment, the array.binary_search() function does not return the right answer.
I searched for the error, but my array is what it should be (printing each element, or examining with gdb: the same), and the error is still there.
Is there any other way to search for a String in a Vec<String>? Or is there an error in my code?
Thanks
First, a few issues: data must be sorted before using a binary search. A binary search is a fast search algorithm (O(log n), or scales as the log of the size of the container), much faster than a linear search (O(n), or scales linear to the size of the container). However, any speed improvements from a binary search are dwarfed by the overhead of sorting the container (O(n log n)).
Single Search
Therefore, the best approach depends on how often you search your container. If you are only going to check it a few times, you should use a linear search, as follows:
fn get_name_index(name: &String, array: &Vec<String>) -> Option<usize> {
array.iter().position(|&&x| x == name)
}
Repeated Searches
If you are going to repeatedly call get_name_index, you should use a binary search (or possibly even better, below):
// Sort the array before using
array.sort_unstable();
// Repeatedly call this function
fn get_name_index(name: &String, array: &Vec<String>) -> Option<usize> {
match array.binary_search(name) {
Ok(index) => Some(index),
Err(_) => None,
}
}
However, this may be suboptimal for some cases. A few considerations: a HashSet may be faster for certain sets of data (O(1) complexity at its best). However, this is slightly misleading, since all the characters of the name must be processed on each compare for a HashSet, while generally only a few characters must be compared to determine whether to jump left or right for a binary search. For data that is highly uniform and mostly differs with a few characters at the end, a HashSet might be better, otherwise, I'd generally recommend using binary_search on the vector.
As mcarton said, the vector needs to be sorted before you can do a binary search. Here's an example:
let mut v = vec![String::from("_res"), String::from("b"), String::from("a")];
println!("{:?}", &v);
v.sort_unstable();
println!("{:?}", &v);
I tried this with your code and it found "a" in the second position. Without the call to sort_unstable() it failed to find "a".

Does partial application in Rust have overhead?

I like using partial application, because it permits (among other things) to split a complicated function call, that is more readable.
An example of partial application:
fn add(x: i32, y: i32) -> i32 {
x + y
}
fn main() {
let add7 = |x| add(7, x);
println!("{}", add7(35));
}
Is there overhead to this practice?
Here is the kind of thing I like to do (from a real code):
fn foo(n: u32, things: Vec<Things>) {
let create_new_multiplier = |thing| ThingMultiplier::new(thing, n); // ThingMultiplier is an Iterator
let new_things = things.clone().into_iter().flat_map(create_new_multiplier);
things.extend(new_things);
}
This is purely visual. I do not like to imbricate too much the stuff.
There should not be a performance difference between defining the closure before it's used versus defining and using it it directly. There is a type system difference — the compiler doesn't fully know how to infer types in a closure that isn't immediately called.
In code:
let create_new_multiplier = |thing| ThingMultiplier::new(thing, n);
things.clone().into_iter().flat_map(create_new_multiplier)
will be the exact same as
things.clone().into_iter().flat_map(|thing| {
ThingMultiplier::new(thing, n)
})
In general, there should not be a performance cost for using closures. This is what Rust means by "zero cost abstraction": the programmer could not have written it better themselves.
The compiler converts a closure into implementations of the Fn* traits on an anonymous struct. At that point, all the normal compiler optimizations kick in. Because of techniques like monomorphization, it may even be faster. This does mean that you need to do normal profiling to see if they are a bottleneck.
In your particular example, yes, extend can get inlined as a loop, containing another loop for the flat_map which in turn just puts ThingMultiplier instances into the same stack slots holding n and thing.
But you're barking up the wrong efficiency tree here. Instead of wondering whether an allocation of a small struct holding two fields gets optimized away you should rather wonder how efficient that clone is, especially for large inputs.

Use reference to new struct in the struct initializer

I am trying to create a disjoint set structure in Rust. It looks like this
struct DisjointSet<'s> {
id: usize,
parent: &'s mut DisjointSet<'s>,
}
The default disjoint set is a singleton structure, in which the parent refers to itself. Hence, I would like to have the option to do the following:
let a: DisjointSet = DisjointSet {
id: id,
parent: self,
};
where the self is a reference to the object that will be created.
I have tried working around this issue by creating a custom constructor. However, my attempts failed because partial initialization is not allowed. The compiler suggests using Option<DisjointSet<'s>>, but this is quite ugly. Do you have any suggestions?
My question differs from Structure containing fields that know each other
because I am interested in getting the reference to the struct that will be created.
As #delnan says, at their core, these sort of data structures are directed acyclic graphs (DAGs), with all the sharing that entails. Rust is strict about what sharing can happen so it takes a bit of extra effort to convince the compiler to accept your code in this case.
Fortunately though, "all the sharing that entails" isn't literally "all the sharing": a DAG is acyclic (modulo wanting to have parent: self), so a reference counting type like Rc or Arc is a perfect way to handle the sharing (reference counting is not so good if there are cycles). Specifically:
struct DisjointSet {
id: Cell<usize>,
parent: Rc<DisjointSet>,
}
The Cell has zero runtime overhead (there is definitely some syntactic overhead) for such a small type.
Unfortunately, this still isn't quite right for the same reason that the compiler suggests using Option<...>. There's no way to create the first DisjointSet. However, the suggested fix still works:
struct DisjointSet {
id: Cell<usize>,
parent: Option<Rc<DisjointSet>>,
}
(The Option<...> is free: Option<Rc<...>> is a single nullable pointer, just like Rc<...> is a single non-nullable pointer, and presumably one would need a branch on "do I have a parent or not" anyway.)
If you are going to take this approach, I would recommend not trying to use the Option for partial initialisation, but instead use it to represent the fact that the given set is a "root". It is easy to traverse up a chain with this representation, e.g.
fn find_root(mut x: &DisjointSet) -> &DisjointSet {
while let Some(ref parent) = x.parent {
x = parent
}
x
}
The same approach should work fine with references, but the lifetimes can often be hard to juggle.

Resources