How to implement mutex lock using overloaded functions - multithreading

While learning C++ and multi-threading, I came across the question about how to lock a shared function which is called by several threads. I understood that mutex object will do the work for me. If mutex object lock is used on recursive functions, it will create a dead lock - clear. Also there is a special recursive mutex object available for such cases - also clear.
Now I have a set of overloaded functions to prepare data for a sub function. Those overladed function will be called by other threads.
I having problems to find an answer to what is the right way to implement mutex lock.
Example: Mutex lock is only on "Bar() executing" function Foo(int, int). Other Foo functions will call Foo(int, int), but missing lock in its own.
#include <thread>
#include <mutex>
std::mutex m;
std::thread th;
void one_of_many_threads(void) {
// Will call one of the Foos ... who knows ...
}
bool Foo(int a, int b) {
bool result;
// Lock for others ...
m.lock();
// Do some checks on a and b before sending to Bar.
// We assume everything is fine ...
result = Bar(a, b);
// Unlock for others ...
m.unlock();
return result;
}
bool Foo(int a) {
return Foo(a, 0);
}
bool Foo(void) {
return Foo(0, 0);
}
bool Bar(int a, int b) {
// Some magic actions with a, b to modify something ...
// will return true for now:
return true;
}
Question 1: Will lead this to issues when Foo(); or Foo(1); called by multiple threads? Those don't have a lock, could be called and then ending up into a lock ...
Question 2: Because Foo(int, int) could be also called, using mutex lock on all Foo functions will make recursive mutex object necessary. But if answer to question 1 would be "No issues!", would it make sense to implement mutex on Bar(int, int) function?
Thank you for any answer.

Question 1: Will lead this to issues when Foo(); or Foo(1); called by multiple threads? Those don't have a lock, could be called and then ending up into a lock
No, because both Foo functions only call the
bool Foo(int a, int b)
which already has a mutex wrapped around everything except “result”. And since result
is a local variable, it won’t cause data race among different threads
Question 2: Because Foo(int, int) could be also called, using mutex lock on all Foo functions will make recursive mutex object necessary. But if answer to question 1 would be "No issues!", would it make sense to implement mutex on Bar(int, int) function?
If Bar can only be called through the Foo functions, then there is no need to put an additional mutex in Bar since Bar is already inside the mutex in Foo.
If Bar can be called anywhere freely (and if it accesses/modifies some global/shared resources), then yes, you'd better put a mutex(i.e. a different mutex object, or a recursive_mutex as you mentioned) to prevent race conditions (and deadlock from Foo).
However, there are only local variables in your current example. And if you are only going to work with local objects/copies, there is no need for any mutex at all.

Related

Does calling `into_inner()` on an atomic take into account all the relaxed writes?

Does into_inner() return all the relaxed writes in this example program? If so, which concept guarantees this?
extern crate crossbeam;
use std::sync::atomic::{AtomicUsize, Ordering};
fn main() {
let thread_count = 10;
let increments_per_thread = 100000;
let i = AtomicUsize::new(0);
crossbeam::scope(|scope| {
for _ in 0..thread_count {
scope.spawn(|| {
for _ in 0..increments_per_thread {
i.fetch_add(1, Ordering::Relaxed);
}
});
}
});
println!(
"Result of {}*{} increments: {}",
thread_count,
increments_per_thread,
i.into_inner()
);
}
(https://play.rust-lang.org/?gist=96f49f8eb31a6788b970cf20ec94f800&version=stable)
I understand that crossbeam guarantees that all threads are finished and since the ownership goes back to the main thread, I also understand that there will be no outstanding borrows, but the way I see it, there could still be outstanding pending writes, if not on the CPUs, then in the caches.
Which concept guarantees that all writes are finished and all caches are synced back to the main thread when into_inner() is called? Is it possible to lose writes?
Does into_inner() return all the relaxed writes in this example program? If so, which concept guarantees this?
It's not into_inner that guarantees it, it's join.
What into_inner guarantees is that either some synchronization has been performed since the final concurrent write (join of thread, last Arc having been dropped and unwrapped with try_unwrap, etc.), or the atomic was never sent to another thread in the first place. Either case is sufficient to make the read data-race-free.
Crossbeam documentation is explicit about using join at the end of a scope:
This [the thread being guaranteed to terminate] is ensured by having the parent thread join on the child thread before the scope exits.
Regarding losing writes:
Which concept guarantees that all writes are finished and all caches are synced back to the main thread when into_inner() is called? Is it possible to lose writes?
As stated in various places in the documentation, Rust inherits the C++ memory model for atomics. In C++11 and later, the completion of a thread synchronizes with the corresponding successful return from join. This means that by the time join completes, all actions performed by the joined thread must be visible to the thread that called join, so it is not possible to lose writes in this scenario.
In terms of atomics, you can think of a join as an acquire read of an atomic that the thread performed a release store on just before it finished executing.
I will include this answer as a potential complement to the other two.
The kind of inconsistency that was mentioned, namely whether some writes could be missing before the final reading of the counter, is not possible here. It would have been undefined behaviour if writes to a value could be postponed until after its consumption with into_inner. However, there are no unexpected race conditions in this program, even without the counter being consumed with into_inner, and even without the help of crossbeam scopes.
Let us write a new version of the program without crossbeam scopes and where the counter is not consumed (Playground):
let thread_count = 10;
let increments_per_thread = 100000;
let i = Arc::new(AtomicUsize::new(0));
let threads: Vec<_> = (0..thread_count)
.map(|_| {
let i = i.clone();
thread::spawn(move || for _ in 0..increments_per_thread {
i.fetch_add(1, Ordering::Relaxed);
})
})
.collect();
for t in threads {
t.join().unwrap();
}
println!(
"Result of {}*{} increments: {}",
thread_count,
increments_per_thread,
i.load(Ordering::Relaxed)
);
This version still works pretty well! Why? Because a synchronizes-with relation is established between the ending thread and its corresponding join. And so, as well explained in a separate answer, all actions performed by the joined thread must be visible to the caller thread.
One could probably also wonder whether even the relaxed memory ordering constraint is sufficient to guarantee that the full program behaves as expected. This part is addressed by the Rust Nomicon, emphasis mine:
Relaxed accesses are the absolute weakest. They can be freely re-ordered and provide no happens-before relationship. Still, relaxed operations are still atomic. That is, they don't count as data accesses and any read-modify-write operations done to them occur atomically. Relaxed operations are appropriate for things that you definitely want to happen, but don't particularly otherwise care about. For instance, incrementing a counter can be safely done by multiple threads using a relaxed fetch_add if you're not using the counter to synchronize any other accesses.
The mentioned use case is exactly what we are doing here. Each thread is not required to observe the incremented counter in order to make decisions, and yet all operations are atomic. In the end, the thread joins synchronize with the main thread, thus implying a happens-before relation, and guaranteeing that the operations are made visible there. As Rust adopts the same memory model as C++11's (this is implemented by LLVM internally), we can see regarding the C++ std::thread::join function that "The completion of the thread identified by *this synchronizes with the corresponding successful return". In fact, the very same example in C++ is available in cppreference.com as part of the explanation on the relaxed memory order constraint:
#include <vector>
#include <iostream>
#include <thread>
#include <atomic>
std::atomic<int> cnt = {0};
void f()
{
for (int n = 0; n < 1000; ++n) {
cnt.fetch_add(1, std::memory_order_relaxed);
}
}
int main()
{
std::vector<std::thread> v;
for (int n = 0; n < 10; ++n) {
v.emplace_back(f);
}
for (auto& t : v) {
t.join();
}
std::cout << "Final counter value is " << cnt << '\n';
}
The fact that you can call into_inner (which consumes the AtomicUsize) means that there are no more borrows on that backing storage.
Each fetch_add is an atomic with the Relaxed ordering, so once the threads are complete there shouldn't be any thing that changes it (if so, then there's a bug in crossbeam).
See the description on into_inner for more info

How to create a dynamic arrays of structs in D with the constructor disabled in?

I have code like this:
struct MyStruct {
immutable int id;
immutable int value;
this() #disable;
this(immutable int pId) {
id = pId;
value = getValueById(id);
}
}
void main() {
MyStruct structs = new MyStruct[](256); // No default initializer
foreach(ulong id, MyStruct struct_; structs) {
structs[id] = MyStruct(id); // Cannot edit immutable members
}
}
I know I could just initialize a dynamic array and add to it, but I'm interested to see if there is a more efficient way of doing this. I'm mostly concerned about how it'll have to reallocate every time while it really knows how much memory it needs in advance.
Simplest solution is to use the dynamic array and call the .reserve method before doing any appends. Then it will preallocate the space and future appends will be cheap.
void main() {
MyStruct[] structs;
structs.reserve(256); // prealloc memory
foreach(id; 0 .. 256)
structs ~= MyStruct(id); // won't reallocate
}
That's how I'd do it with dynamic arrays, writing to individual members I don't think will ever work with immutability involved like this.
BTW if you wanted a static array, calling reserve won't work, but you can explicitly initialize it.... to void. That'll leave the memory completely random, but since you explicitly requested it, the disabled default constructor won't stop you. (BTW this is prohibited in #safe functions) But in this case, those immutable members will leave it garbage forever unless you cast away immutability to prepare it soo.. not really workable, just a nice thing to know if you ever need it in the future.

Is there a safe way to hold on to a reference to a Go variable from C code using CGo?

When using CGo to interface C code with Go, if I keep a reference to a Go variable on the C side, do I run the risk of that object being freed by the garbage collector or will the GC see the pointer in the variables managed by the C side?
To illustrate what I'm asking, consider the following sample program:
Go code:
package main
/*
typedef struct _Foo Foo;
Foo *foo_new(void);
void foo_send(Foo *foo, int x);
int foo_recv(Foo *foo);
*/
import "C"
//export makeChannel
func makeChannel() chan int {
return make(chan int, 1)
}
//export sendInt
func sendInt(ch chan int, x int) {
ch <- x
}
//export recvInt
func recvInt(ch chan int) int {
return <-ch
}
func main() {
foo := C.foo_new()
C.foo_send(foo, 42)
println(C.foo_recv(foo))
}
C code:
#include <stdlib.h>
#include "_cgo_export.h"
struct _Foo {
GoChan ch;
};
Foo *foo_new(void) {
Foo *foo = malloc(sizeof(Foo));
foo->ch = makeChannel();
return foo;
}
void foo_send(Foo *foo, int x) {
sendInt(foo->ch, x);
}
int foo_recv(Foo *foo) {
return recvInt(foo->ch);
}
Do I run the risk of foo->ch being freed by the garbage collector between the foo_new and foo_send calls? If so, is there a way to pin the Go variable from the C side to prevent it from being freed while I hold a reference to it?
According to the gmp CGo example :
Garbage collection is the big problem. It is fine for the Go world to
have pointers into the C world and to free those pointers when they
are no longer needed. To help, the Go code can define Go objects
holding the C pointers and use runtime.SetFinalizer on those Go objects.
It is much more difficult for the C world to have pointers into the Go
world, because the Go garbage collector is unaware of the memory
allocated by C. The most important consideration is not to
constrain future implementations, so the rule is that Go code can
hand a Go pointer to C code but must separately arrange for
Go to hang on to a reference to the pointer until C is done with it.
So I'm not sure if you can pin the variable from the C side, but you may be able to control the garbage collection of the variable from the Go side by using the runtime.SetFinalizer function.
Hope that helps.

C++11 Can only primitive data types be declared atomic?

I was wondering, can only primitive data types be declared std::atomic in C++11? Is it possible, say, to declare a library class object to be "atomically" mutated or accessed?
For example, I might have
using namespace std::chrono;
time_point<high_resolution_clock> foo;
// setter method
void set_foo() {
foo = high_resolution_clock::now();
}
// getter method
time_point<high_resolution_clock> get_foo() {
return foo;
}
But, if these setter and getter methods are called in different threads, I think that may cause undefined behavior. It would be nice if I could declare foo something like:
std::atomic<time_point<high_resolution_clock>> foo;
...so that all operations on foo would be conducted in an atomic fashion. In the application for my project there are possibly hundreds of such foo variables declared across dozens of classes, and I feel it would be far more convenient to make the object mutating and accessing "atomic" so to speak, instead of having to declare and lock_guard mutexes all over the place.
Is this not possible, or is there a better approach, or do I really have to use a mutex and lock_guard everywhere?
Update:
Any takers? I've been fishing around the web for decent information, but there are so few examples using atomic that I can't be sure the extent to which it can be applied.
atomic<> is not restricted to primitive types. It is permitted to use atomic<> with a type T that is trivially copyable. From section 29.5 Atomic types of the c++11 standard (it also stated at std::atomic):
There is a generic class template atomic. The type of the template argument T shall be trivially copyable (3.9).
If the objects for which atomic access is required cannot be used with atomic<> then define new objects, containing the original object and a std::mutex. This means the lock_guard<> is used within the getter and setter only of the new thread safe object, and not littered throughout the code. A template might be able to define the thread safety machinery required:
template <typename T>
class mutable_object
{
public:
mutable_object() : t_() {}
explicit mutable_object(T a_t) : t_(std::move(a_t)) {}
T get() const
{
std::lock_guard<std::mutex> lk(mtx_);
return t_;
}
void set(T const& a_t)
{
std::lock_guard<std::mutex> lk(mtx_);
t_ = a_t;
}
private:
T t_;
mutable std::mutex mtx_;
};
using mutable_high_resolution_clock =
mutable_object<std::chrono::time_point<
std::chrono::high_resolution_clock>>;
using mutable_string = mutable_object<std::string>;
mutable_high_resolution_clock c;
c.set(std::chrono::high_resolution_clock::now());
auto c1 = c.get();
mutable_string s;
s.set(std::string("hello"));
auto s1 = s.get();
Atomics are limited to trivially copyable classes (i.e. classes which have no custom copy constructor, and whose members are also trivially copyable).
This requirement has huge advantages for atomics:
No atomic operation can throw because a constructor threw
All atomics could be modeled with a lock (spinlock or mutex) and memcpy to copy data.
All atomics have a finite run time (bounded).
The latter is particularly useful, as atomics are sometimes implemented using spinlocks, and it is highly desired to avoid unbounded tasks while holding a spinlock. If any constructor was allowed, implementations would tend to need to fall back on full blown mutexes, which are slower than spinlocks for very small critical sections.

storing objects of a different class in a vector in C++

I have a class, user, which stores objects of another class, foo, in a vector. I am trying to figure out what is the best way of creating objects of class foo and store them in vector of class user.
class Foo {
public:
Foo();
Foo(std::string str);
std::string name;
Foo* GiveMeFoo(std::string arg);
};
Foo::Foo() {}
Foo::Foo(std::string args):name(args) {}
Foo* Foo::GiveMeFoo(std::string arg) { return new Foo(arg) };
class User {
public:
vector &lt Foo *> v;
vector &lt Foo> v2;
vector &lt Foo*> v3;
void start();
};
void User::start()
{
const int SIZE = 3;
string a[SIZE] = {"one","two","three"};
for (int i = 0; i &lt SIZE; i++ ){
//method1
Foo *f = new Foo();
f->name = a[i];
v.push_back(f);
//method2
Foo f2;
f2.name = a[j];
v2.push_back(f2);
//method3
Foo f3;
v3.push_back(f3.GiveMeFoo(a[k]));
}
}
Question1: Between methods 1, 2 and 3 is there a preferred way of doing things? Is there a better way of creating objects of remote class locally and storing them in a vector?
Question 2: Are all the objects that are getting stored in the vectors of class User persistent (e.g. even if the foo object goes away, once I have pushed those objects onto vector in user, then copies of those foo objects will be persistently stored in the vector, correct?
Unless there are other considerations, method 2 is preferred.
Vectors (and other STL containers) store copies of the argument to push_back().
This means that for method 1, you are allocating a Foo on the heap and then storing a pointer to it in the vector. If the User object is destroyed, the Foo objects will leak unless you take steps to delete them (like delete them in the User destructor).
For method 2, you have allocated a Foo on the stack. This object is then copied into the vector. When start() exits, the original Foo is destroyed but the copy survives in the vector. If the User object is destroyed then so is the vector and consequently the Foo.
For method 3, you have allocated a Foo on the stack. You then call GiveMeFoo() on it which allocates another Foo on the heap and then returns a pointer to it. This pointer is then copied into the vector. When start() exits, the original Foo will be destroyed but the heap allocated object will survive. If the User object is destroyed then so is the vector but the heap allocated Foo survives and leaks unless you destroy it manually in the destructor of User.
If you need to store a pointer to an object rather than a copy then you are better off using std::tr1::shared_ptr (if your compiler supports it) to manage the lifetime of the Foo objects.
In case it helps someone, as of C++11, std::vector (along with some of the other STL containers) now have the ability to "emplace" an instance into the collection. This has the advantage of avoiding a possible extra duplicate object or copy.
For the example above, code like this can be used instead:
v.emplace_back(a[i]);
More information: std::vector::emplace_back

Resources