Rust SeqCst ordering - multithreading

Rust SeqCst ordering - multithreading

I'm trying to understand how Ordering::SeqCst works. For that I have few code examples where this ordering is mandatory for obtaining consistent result. In first example we just want to increment counter variable:
let a: &'static _ = Box::leak(Box::new(AtomicBool::new(false)));
let b: &'static _ = Box::leak(Box::new(AtomicBool::new(false)));
let counter: &'static _ = Box::leak(Box::new(AtomicUsize::new(0)));
let _thread_a = spawn(move || a.store(true, Ordering::Release));
let _thread_b = spawn(move || b.store(true, Ordering::Release));
let thread_1 = spawn(move || {
while !a.load(Ordering::Acquire) {} // prevents from reordering everything after
if b.load(ordering::Relaxed) { // no need of Acquire due to previous restriction
counter.fetch_add(1, Ordering::Relaxed);
}
});
let thread_2 = spawn(move || {
while !b.load(Ordering::Acquire) {} // prevents from reordering everything after
if a.load(ordering::Relaxed) { // no need of Acquire due to previous restriction
counter.fetch_add(1, Ordering::Relaxed);
}
});
thread_1.join().unwrap();
thread_2.join().unwrap();
println!("{}", counter.load(Ordering::Relaxed));
Possible values of counter with this example are 1 or 2, depends on thread scheduling. But surprisingly 0 is also possible but I don't understand how.
If thread_1 has started and only a was set to true by _thread_a, counter could will be left untouched after thread_1 will exit.
If thread_2 will start after thread_1, counter will be incremented once, bcs thread_1 has finished (here we know that a is already true), so thread_2 have just to wait for b to become true.
Or if thread_2 will be first and b was set to true, counter will be incremented only once too.
There is also possibility that _thread_a and _thread_b will both run before thread_1 and thread_2 and both of them will increment counter. So that's why 1 and 2 are valid possible outcomes for counter. But as I previously said, there is also a 0 as possible valid result, only if I won't replace all loads and stores for a and b to Ordering::SeqCst:
let _thread_a = spawn(move || a.store(true, Ordering::SeqCst));
let _thread_b = spawn(move || b.store(true, Ordering::SeqCst));
let thread_1 = spawn(move || {
while !a.load(Ordering::SeqCst) {}
if b.load(ordering::SeqCst) {
counter.fetch_add(1, Ordering::Relaxed);
}
});
let thread_2 = spawn(move || {
while !b.load(Ordering::SeqCst) {}
if a.load(ordering::SeqCst) {
counter.fetch_add(1, Ordering::Relaxed);
}
});
thread_1.join().unwrap();
thread_2.join().unwrap();
println!("{}", counter.load(Ordering::SeqCst));
Now 0 isn't possible, but I don't know why.
Second example was taken from here:
static A: AtomicBool = AtomicBool::new(false);
static B: AtomicBool = AtomicBool::new(false);
static mut S: String = String::new();
fn main() {
let a = thread::spawn(|| {
A.store(true, SeqCst);
if !B.load(SeqCst) {
unsafe { S.push('!') };
}
});
let b = thread::spawn(|| {
B.store(true, SeqCst);
if !A.load(SeqCst) {
unsafe { S.push('!') };
}
});
a.join().unwrap();
b.join().unwrap();
}
Threads a and b could start at same time and modify A and B thus none of them will modify S. Or one of them could start before the other, and modify S, leaving other thread with unmodified S. If I understood correctly, there is no possibility for S to being modified in parallel by both threads? The only reason why Ordering::SeqCst is useful here, to prevent from reordering. But if I will replace all ordering like this:
let a = thread::spawn(|| {
A.store(true, Release); // nothing can be placed before
if !B.load(Acquire) { // nothing can be reordered after
unsafe { S.push('!') };
}
});
let b = thread::spawn(|| {
B.store(true, Release); // nothing can be placed before
if !A.load(Acquire) { // nothing can be reordered after
unsafe { S.push('!') };
}
});
Isn't it the same as original?
Also Rust docs refers to C++ docs on ordering, where Ordering::SeqCst is described as:
Atomic operations tagged memory_order_seq_cst not only order memory the same way as release/acquire ordering (everything that happened-before a store in one thread becomes a visible side effect in the thread that did a load), but also establish a single total modification order of all atomic operations that are so tagged.
What is single total modification order on concrete example?

Although Chayim answer is correct, but I think it's still very difficult to wrap one's head around the concept. A useful mind model to remember is that reordering of operations might happen if it is not forbidden, and access different parts of memory from the perspective of the current thread.
In rustonomicon two types or reordering are described:
Compiler Reordering - if performance is improved while result is same why not reordering.
Hardware Reordering - this topic is more subtle and apparently rustonomicon reasoning about cache causing reordering is stricly speaking incorrect (thanks a lot to #peter-cordes for pointing that out). CPU caches are always coherent (MESI protocol guarantees invalidation of other copies before committing a write to a cache line). Out-of-order execution is one reason the reordering may happen in the CPU core but it is not the only reason. Here are two more examples of what may cause reordering in the CPU. And here is a detailed explanation of one of the examples (with still a lot of discussion afterwards).
And here is an excellent documentation (Howells, McKenney, Deacon, Zijlstra) touching both hardware and compiler which claims to still be incomplete.
We may use both hardware or compiler reordering for the first example:
let a: &'static _ = Box::leak(Box::new(AtomicBool::new(false)));
let b: &'static _ = Box::leak(Box::new(AtomicBool::new(false)));
let counter: &'static _ = Box::leak(Box::new(AtomicUsize::new(0)));
1) let _thread_a = spawn(move || a.store(true, Ordering::Release));
2) let _thread_b = spawn(move || b.store(true, Ordering::Release));
let thread_1 = spawn(move || {
3) while !a.load(Ordering::Acquire) {} // prevents from reordering everything after
4) if b.load(ordering::Relaxed) { // no need of Acquire due to previous restriction
counter.fetch_add(1, Ordering::Relaxed);
}
});
let thread_2 = spawn(move || {
5) while !b.load(Ordering::Acquire) {} // prevents from reordering everything after
6) if a.load(ordering::Relaxed) { // no need of Acquire due to previous restriction
counter.fetch_add(1, Ordering::Relaxed);
}
});
When thread2 is running on its core it has a and b and operations 5) and 6) in store. 6) is relaxed, 6) and 5) use different memory and don't affect each other, so there is nothing that prevents 6) happening before 5).
As for the second example:
let a = thread::spawn(|| {
1) A.store(true, Release); // nothing can be placed before
2) if !B.load(Acquire) { // nothing can be reordered after
unsafe { S.push('!') };
}
});
let b = thread::spawn(|| {
3) B.store(true, Release); // nothing can be placed before
4) if !A.load(Acquire) { // nothing can be reordered after
unsafe { S.push('!') };
}
});
there is this similar question talking about it.
If we look at this paper (it is for c++ but you rightly noticed that Rust uses same model), it describes the memory guarantees for Release and Acquire as:
atomic operations on the same object may never be reordered [CSC12,
1.10.19, p. 14],
(non-)atomic write operations that are sequenced before a release operation A may not be reordered after A,
(non-)atomic load operations that are sequenced after an acquire
operation A may not be reordered before A.
So in principle it is not forbiden to reorder 1) <-> 2) or 3) <-> 4).

You seem to misunderstand the basics of reordering.
We do not care what threads started first: it is possible for one thread to observe a result that means thread A was started first, while another thread will observe a result meaning thread B was started first.
In your first snippet, for example, it is possible for thread_1 to see a == true && b == false, yet thread_2 can at the same time see a == false && b == true. As long as there is no happens-before relationship that forces otherwise.
The only thing we care about (putting aside the global modification order) is the happens-before relationship. If point B established a happens-before relationship with point A, then everything before point A (including) will be observable after point B (including). But in snippet 1 we only establish a happens-before relationship with _thread_a for thread_1 and _thread_b for thread_2, leaving our relationship with the other thread that sets the other variable undefined, and therefore, the threads can observe opposite values of the variables. In snippet 2, we may not establish a happens-before relationship at all: we may establish a happens-before relationship between the other thread's store and our load, but our store is not included inside and therefore the other thread may not see it.
With SeqCst, however, all operations take place in the global modification order. This is a virtual list of all SeqCst operations, and each element in the list happens-after all elements before it and happens-before all elements after it. This can be used to establish a happens-before relationship between different variables atomic operations, like in your examples.

Related

How to define a macro vecvec to initialize a vector of vectors?

Just like vec![2,3,4], can we define a similar macro vecvec to initialize vector of vector. Eg.
let vv0 = vecvec![[2,3,4],[5,6,7]]; // vec of 2 vecs
let vv1 = vecvec![[1,2,3]];
let vv2 = vecvec![[1,2,3], []];
let vv3 = vecvec![[1,3,2]; 2];

You just need to think through the problem. You really only have 2 main cases. The first case being if elements are listed (Ex: a, b, c) and the second where a single value and length are given (Ex: a; b). We can even check our work by reading the documentation for vec!. In the documentation we can see vec! is defined as follows:
macro_rules! vec {
() => { ... };
($elem:expr; $n:expr) => { ... };
($($x:expr),+ $(,)?) => { ... };
}
As you can see, they have 3 cases. We didn't specify the the case were no items are included, but that does not really matter since your macro can call vec! and have it handle that case for you.
We can just copy the cases in their macro and add the functionality inside. The only other issue that might stop you is that [a, b, c] is an expression in of itself. Luckily we can just skip that by specifying items as requiring brackets and pick out the items ourselves before passing them off to vec!.
macro_rules! vecvec {
([$($elem:expr),*]; $n:expr) => {{
let mut vec = Vec::new();
vec.resize_with($n, || vec![$($elem),*]);
vec
}};
($([$($x:expr),*]),* $(,)?) => {
vec![$(vec![$($x),*]),*]
};
}

Instead of defining a new macro. You can initialize the vector of the vector.
In the example below, I'm explicitly setting type. It's not necessary but a good practice.
let vv0:Vec<Vec<u32>> = vec![vec![2,3,4],vec![5,6,7]];
let vv1:Vec<Vec<u32>> = vec![vec![2,3,4],vec![5]];
let vv2:Vec<Vec<u32>> = vec![vec![],vec![5,6,7]];
let vv3:Vec<Vec<u32>> = vec![vec![2,3,4],vec![]];

Do i have to create a copy of objects for threads need [duplicate]

This question already has an answer here:
How can I pass a reference to a stack variable to a thread?
(1 answer)
Closed 1 year ago.
I created a two methods one synchronous and one with multiple threads because I wanted to compare performance synchronous and parallel method. But I am having a one issue every time when I want to use my data in threads I have to copy them first even if I know that they want to be dropped till the end of this method. If I do not copy this data before using in threads then I am getting an error that I have to make my data 'static:
fn parallel_kronecker_product(&self, matrix: &Matrix) -> Matrix {
let product_rows = self.rows_count * matrix.rows_count;
let product_columns_count = self.columns_count * matrix.columns_count;
let product = Arc::new(Mutex::new(Matrix::new_zeros_matrix(
product_rows,
product_columns_count,
)));
let mut handles = vec![];
for m1_row_index in 0..self.rows_count {
let product = Arc::clone(&product);
let matrix_a = self.to_owned();
let matrix_b = matrix.to_owned();
handles.push(
thread::spawn(move || {
for m1_column_index in 0..matrix_a.columns_count {
for m2_row_index in 0..matrix_b.rows_count {
for m2_column_index in 0..matrix_b.columns_count {
let product_row_index = m1_row_index * matrix_b.rows_count + m2_row_index;
let product_column_index =
m1_column_index * matrix_b.columns_count + m2_column_index;
let mut prod = product.lock().unwrap();
(*prod)[product_row_index][product_column_index] = matrix_a[m1_row_index]
[m1_column_index]
* matrix_b[m2_row_index][m2_column_index];
}
}
}
})
);
}
for handle in handles {
handle.join().unwrap();
}
return product.lock().unwrap().clone();
}
So here I have two matrices. Base which is immutable self and one from parameter matrix. Inside for m2_row_index in 0..matrix_b.rows_count loop I am only multiplying some data which doesn't change original data. Then I iterate over all threads to tell rust to wait until all threads finish their job, so nothing outside this method scope should drop this matrix
Can you tell me, what can I do to do not copy this data?

You can use a scoped thread from a third party crate. There are a few to choose from, but a popular one is from crossbeam. The reason this is needed is because the types used for threads spawned with std::thread::spawn do not carry information about how long they last, even if you are explicitly joining them. Crossbeam's scoped threads are bound to the lifetime of the surrounding Scope so the borrow checker can be sure that they are finished with borrowed data when the scope ends.
Your provided code has a lot of definitions missing, so I didn't try to compile it, but the general idea would be this:
fn parallel_kronecker_product(&self, matrix: &Matrix) -> Matrix {
// Create a new thread scope and make it own the locals
thread::scope(move |scope| {
let product_rows = self.rows_count * matrix.rows_count;
let product_columns_count = self.columns_count * matrix.columns_count;
let product = Arc::new(Mutex::new(Matrix::new_zeros_matrix(
product_rows,
product_columns_count,
)));
let mut handles = vec![];
for m1_row_index in 0..self.rows_count {
let product = Arc::clone(&product);
let matrix_a = self.to_owned();
let matrix_b = matrix.to_owned();
// spawn a new thread inside the scope which owns the data it needs to borrow
handles.push(scope.spawn(move |_| {
for m1_column_index in 0..matrix_a.columns_count {
for m2_row_index in 0..matrix_b.rows_count {
for m2_column_index in 0..matrix_b.columns_count {
let product_row_index =
m1_row_index * matrix_b.rows_count + m2_row_index;
let product_column_index =
m1_column_index * matrix_b.columns_count + m2_column_index;
let mut prod = product.lock().unwrap();
(*prod).ind();
}
}
}
}));
}
for handle in handles {
handle.join().unwrap();
}
// probably need a new local binding here. For... reasons...
let product = product.lock().unwrap().clone();
product
}).unwrap()
}

How to avoid memory leaks when building a slice of slices using ArrayLists

I'm trying to build a slice of slices using multiple std.ArrayLists.
The code below works, but the memory allocator std.testing.allocator warns me of a memory leaks wherever I append new elements to a sublist.
const std = #import("std");
const mem = std.mem;
fn sliceOfSlices(allocator: *mem.Allocator) ![][]usize {
var list = std.ArrayList([]usize).init(allocator);
var i: usize = 0;
while (i < 3) : (i += 1) {
var sublist = std.ArrayList(usize).init(allocator);
// errdefer sublist.deinit(); // here?
var n: usize = 0;
while (n < 5) : (n += 1) {
try sublist.append(n); // leaks
// errdefer sublist.deinit(); // here?
// errdefer allocator.free(sublist.items);
}
try list.append(sublist.toOwnedSlice());
}
return list.toOwnedSlice();
}
const testing = std.testing;
test "memory leaks" {
const slice = try sliceOfSlices(testing.allocator);
testing.expectEqual(#intCast(usize, 3), slice.len);
testing.expectEqual(#intCast(usize, 5), slice[0].len);
}
I tried to use errdefer in several places to free the allocated sublist, but it didn't work. From the documentation it seems a lifetime issue, but I'm not sure how to handle it.
the std.ArrayList(T).items slice has a lifetime that remains valid until the next time the list is resized, such as by appending new elements.
— https://ziglang.org/documentation/master/#Lifetime-and-Ownership
What's the appropriate error handling when list.append() fails?

I am a beginner to zig so perhaps I am totally wrong here, but I think the reason why you are getting the memory leak is not because something failed!
As you use ArrayList its memory is allocated via an allocator, the memory has explicitly to be freed at end of usage. For an ArrayList you could simply use the deinit() function. However as your function sliceOfSlices() converts that ArrayList wrapper to a slice, you have to use testing.allocator.free(slice) to get rid of the memory used by that slice.
But note: every element of your slice is itself a slice (or a pointer to it). Also obtained via ArrayList.toOwnedSlice(). Therefore those slices you have also to get rid of, before you can deallocate the containing slice.
So I would change your test to
test "memory leaks" {
const slice = try sliceOfSlices(testing.allocator);
defer {
for (slice) |v| {
testing.allocator.free(v);
}
testing.allocator.free(slice);
}
testing.expectEqual(#intCast(usize, 3), slice.len);
testing.expectEqual(#intCast(usize, 5), slice[0].len);
}
and now no memory leak should occur anymore.
Perhaps somebody knows a better solution, but lacking experience here, this would be the way to go, IMO.
And after some thinking, answering your question what to do in case of an error, I would rewrite your function sliceOfSlices() to
fn sliceOfSlices(allocator: *mem.Allocator) ![][]usize {
var list = std.ArrayList([]usize).init(allocator);
errdefer {
for (list.items) |slice| {
allocator.free(slice);
}
list.deinit();
}
var i: usize = 0;
while (i < 3) : (i += 1) {
var sublist = std.ArrayList(usize).init(allocator);
errdefer sublist.deinit();
var n: usize = 0;
while (n < 5) : (n += 1) {
try sublist.append(n);
}
try list.append(sublist.toOwnedSlice());
}
return list.toOwnedSlice();
}
Now if any error happened in your function, both list and sublist should be cleaned up properly. Still if no error was returned from the function, your calling code would be responsible for the cleanup to avoid memory leaks like implemented in the test block above.

getting payload from a substrate event back in rust tests

i've created my first substrate project successful and the built pallet also works fine. Now i wanted to create tests for the flow and the provided functions.
My flow is to generate a random hash and store this hash associated to the sender of the transaction
let _sender = ensure_signed(origin)?;
let nonce = Nonce::get();
let _random_seed = <randomness_collective_flip::Module<T>>::random_seed();
let random_hash = (_random_seed, &_sender, nonce).using_encoded(T::Hashing::hash);
ensure!(!<Hashes<T>>::contains_key(random_hash), "This new id already exists");
let _now = <timestamp::Module<T>>::get();
let new_elem = HashElement {
id: random_hash,
parent: parent,
updated: _now,
created: _now
};
<Hashes<T>>::insert(random_hash, new_pid);
<HashOwner<T>>::insert(random_hash, &_sender);
Self::deposit_event(RawEvent::Created(random_hash, _sender));
Ok(())
works good so far, when now i want to test the flow with a written test, i want to check if the hash emitted in the Created event is also assigned in the HashOwner Map. For this i need to get the value out of the event back.
And this is my problem :D i'm not professional in rust and all examples i found are expecting all values emitted in the event like this:
// construct event that should be emitted in the method call directly above
let expected_event = TestEvent::generic_event(RawEvent::EmitInput(1, 32));
// iterate through array of `EventRecord`s
assert!(System::events().iter().any(|a| a.event == expected_event));
When debugging my written test:
assert_ok!(TemplateModule::create_hash(Origin::signed(1), None));
let events = System::events();
let lastEvent = events.last().unwrap();
let newHash = &lastEvent.event;
i see in VSCode that the values are available:
debug window of vs code
but i dont know how to get this Hash in a variable back... maybe this is only a one liner ... but my rust knowledge is damn too small :D
thank you for your help

Here's a somewhat generic example of how to parse and check events, if you only care about the last event that your module put in system and nothing else.
assert_eq!(
System::events()
// this gives you an EventRecord { event: ..., ...}
.into_iter()
// map into the inner `event`.
.map(|r| r.event)
// the inner event is like `OuterEvent::mdouleEvent(EventEnum)`. The name of the outer
// event comes from whatever you have placed in your `delc_event! {}` in test mocks.
.filter_map(|e| {
if let MetaEvent::templateModule(inner) = e {
Some(inner)
} else {
None
}
})
.last()
.unwrap(),
// RawEvent is defined and imported in the template.rs file.
// val1 and val2 are things that you want to assert against.
RawEvent::Created(val1, val2),
);
Indeed you can also omit the first map or do it in more compact ways, but I have done it like this so you can see it step by step.
Print the System::events(), this also helps.

I now got it from the response of kianenigma :)
I wanted to reuse the given data in the event:
let lastEvent = System::events()
// this gives you an EventRecord { event: ..., ...}
.into_iter()
// map into the inner `event`.
.map(|r| r.event)
// the inner event is like `OuterEvent::mdouleEvent(EventEnum)`. The name of the outer
// event comes from whatever you have placed in your `delc_event! {}` in test mocks.
.filter_map(|e| {
if let TestEvent::pid(inner) = e {
Some(inner)
} else {
None
}
})
.last()
.unwrap();
if let RawEvent::Created(newHash, initiatedAccount) = lastEvent {
// there are the values :D
}
this can maybe be written better but this helps me :)

Why is this hashmap search slower than expected?

What is the best way to check a hash map for a key?
Currently I am using this:
let hashmap = HashMap::<&str, &str>::new(); // Empty hashmap
let name = "random";
for i in 0..5000000 {
if !hashmap.contains_key(&name) {
// Do nothing
}
}
This seems to be fast in most cases and takes 0.06 seconds when run as shown, but when I use it in this following loop it becomes very slow and takes almost 1 min on my machine. (This is compiling with cargo run --release).
The code aims to open an external program, and loop over the output from that program.
let a = vec!["view", "-h"]; // Arguments to open process with
let mut child = Command::new("samtools").args(&a)
.stdout(Stdio::piped())
.spawn()
.unwrap();
let collect_pairs = HashMap::<&str, &str>::new();
if let Some(ref mut stdout) = child.stdout {
for line in BufReader::new(stdout).lines() {
// Do stuff here
let name = "random";
if !collect_pairs.contains_key(&name) {
// Do nothing
}
}
}
For some reason adding the if !collect_pairs.contains_key( line increases the run time by almost a minute. The output from child is around 5 million lines. All this code exists in fn main()
EDIT
This appears to fix the problem, resulting in a fast run time, but I do not know why the !hashmap.contains_key does not work well here:
let n: Option<&&str> = collect_pairs.get(name);
if match n {Some(v) => 1, None => 0} == 1 {
// Do something
}

One thing to consider is that HashMap<K, V> uses a cryptographically secure hashing algorithm by default, so it will always be a bit slow by nature.
get() boils down to
self.search(k).map(|bucket| bucket.into_refs().1)
contains_key is
self.search(k).is_some()
As such, that get() is faster for you seems strange to me, it's doing more work!
Also,
if match n {Some(v) => 1, None => 0} == 1 {
This can be written more idiomatically as
if let Some(v) = n {

Ive found my problem, Im sorry I didnt pick up until now. I wasnt checking the return of if !collect_pairs.contains_key(&name) properly. It returns true for some reason resulting in the rest of the if block being run. I assumed it was evaluating to false. Thanks for the help

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string