Can there be a deadlock in Bakery Algorithm max() operation? - multithreading

As resources stated, Bakery algorithm is supposed to be deadlock free.
But when I tried to understand the pseudocode, I came up with a line which could raise a deadlock (according to my knowledge).
Reffering to the code below,
in Lock() function, we have a line saying
label[i] = max( label[0], ..., label[n-1] ) + 1;
What if two threads come to that state at the same time and since max is not atomic, two labels will get the same value?
Then since two labels have to same value, both threads with that labels will get the permission to go for the critical section at the same time. Wouldn't that occur a deadlock?
Tried myself best to explain the problem here. Comment if it is still not clear. Thanks .
class Bakery implements Lock {
volatile boolean[] flag;
volatile Label[] label;
public Bakery (int n) {
flag = new boolean[n];
label = new Label[n];
for (int i = 0; i < n; i++) {
flag[i] = false; label[i] = 0;
}
public void lock() {
flag[i] = true;
label[i] =max(label[0], ...,label[n-1])+1;
while ( $ k flag[k] && (label[i],i) > (label[k],k);
}
}
public void unlock() {
flag[i] = false;
}

Then since two labels have to same value, both threads with that labels will get the permission to go for the critical section at the same time. Wouldn't that occur a deadlock?
To begin with, you probably mean a race, not a deadlock.
However, no, there won't be a race here. If you look, there's the condition
(label[i],i) > (label[k],k)
and while this happens, the thread effectively busy-waits.
This means that even if label[i] is the same as label[k] (as both performed the max concurrently), the thread numbered higher will defer to the thread numbered lower.
(Arguably, this is a problem with the algorithm, as it inherently prioritizes the threads.)

Related

How to join all threads before deleting the ThreadPool

I am using a MultiThreading class which creates the required number of threads in its own threadpool and deletes itself after use.
std::thread *m_pool; //number of threads according to available cores
std::mutex m_locker;
std::condition_variable m_condition;
std::atomic<bool> m_exit;
int m_processors
m_pool = new std::thread[m_processors + 1]
void func()
{
//code
}
for (int i = 0; i < m_processors; i++)
{
m_pool[i] = std::thread(func);
}
void reset(void)
{
{
std::lock_guard<std::mutex> lock(m_locker);
m_exit = true;
}
m_condition.notify_all();
for(int i = 0; i <= m_processors; i++)
m_pool[i].join();
delete[] m_pool;
}
After running through all tasks, the for-loop is supposed to join all running threads before delete[] is being executed.
But there seems to be one last thread still running, while the m_pool does not exist anymore.
This leads to the problem, that I can't close my program anymore.
Is there any way to check if all threads are joined or wait for all threads to be joined before deleting the threadpool?
Simple typo bug I think.
Your loop that has the condition i <= m_processors is a bug and will actually process one extra entry past the end of the array. This is an off-by-one bug. Suppose m_processors is 2. You'll have an array that contains 2 elements with indices [0] and [1]. Yet, you'll be reading past the end of the array, attempting to join with the item at index [2]. m_pool[2] is undefined memory and you're likely going to either crash or block forever there.
You likely intended i < m_processors.
The real source of the problem is addressed by Wick's answer. I will extend it with some tips that also solve your problem while improving other aspects of your code.
If you use C++11 for std::thread, then you shouldn't create your thread handles using operator new[]. There are better ways of doing that with other C++ constructs, which will make everything simpler and exception safe (you don't leak memory if an unexpected exception is thrown).
Store your thread objects in a std::vector. It will manage the memory allocation and deallocation for you (no more new and delete). You can use other more flexible containers such as std::list if you insert/delete threads dynamically.
Fill the vector in place with std::generate or similar
std::vector<std::thread> m_pool;
m_pool.reserve(n_processors);
// Fill the vector
std::generate_n( std::back_inserter(m_pool), m_processors,
[](){ return std::thread(func); } );
Join all the elements using range-for loop and delete handles using container's functions.
for( std::thread& t: m_pool ) {
t.join();
}
m_pool.clear();

Allocating a pool of equivalent resources to a group of threads using semaphores

I have a doubt about a concurrent programming problem.
More specifically, we are working with the shared memory model(i.e. threads). The problem is: given a pool of N equivalent resources, there being the constraint that at a generic instant t there can only be one thread using a resource R, write a program that allocates these resources on demand to the threads that ask for them. To do this we have to use semaphores. Note that what these resources are and what the threads do with them is out of scope, the focus is on how the resources are managed and allocated. My professor gave us a C/Java-like pseudocode solution for the resources manager class:
class ResourcesManager {
semaphore mutex = 1;
semaphore availableResourcesSemaphore = N;
boolean available[N];
Resource resources[N];
public ResourcesManager(){
for(int i = 0; i < N; i++) {
available[i] = true;
resources[N] = new Resource();
}
}
public int acquireResource() {
int i = 0;
P(availableResourcesSemaphore);
P(mutex);
while(available[i]==false) i++;
available[i] = false;
V(mutex);
return i;
}
public void releaseResource(int i) {
P(mutex);
available[i] = true; //HERE IS THE PROBLEM
V(mutex);
V(availableResourcesSemaphore);
}
}
While I see that this solution works, there is something I don't get about the releaseResource(int i) method. Why is there the need of having the line marked with the comment "HERE IS THE PROBLEM":
available[i] = true;
executed in mutual exclusion? I have thought about it and to me it looks like nothing bad happens if we do otherwise.
What I mean is that while in the original solution we have:
P(mutex);
available[i] = true;
V(mutex);
we can replace these three lines simply with
available[i] = true;
and the solution is still correct.
Now, of course I see that mutual exclusion is needed when operating on the array "available" in the other method acquireResource(), and since the instruction
available[i] = true;
operates on the same variable it is more elegant and conceptually cleaner to operate on it in mutual exclusion too. On the other hand, as a beginner in concurrent programming, I don't think it's good to have mutual exclusion where it is not needed.
So am I right(the instruction can be executed without mutual exclusion) or am I missing something, and removing mutual exclusion causes some issues? One final note on the execution environment: it can be both uniprocessor or multiprocessor, meaning that the solution has to work for both cases. Thanks for your help!

Shared counter using combining tree deadlock issue

I am working on a shared counter increment application using combining tree concept. My goal is to make this application work on 2^n number of cores such as 4, 8, 16, 32, etc. This algorithm might err on any thread failure. The assumption is that there would be no thread failure or very slow threads.
Two threads compete at leaf nodes and the latter one arriving goes up the tree.
The first one that arrives waits until the second one goes up the hierarchy and comes down with the correct return value.
The second thread wakes the first thread up
Each thread gets the correct fetchAndAdd value
But this algorithm sometimes gets locked inside while (nodes[index].isActive == 1) or while(nodes[index].waiting == 1) loop. I don't see any possibility of a deadlock because only two threads are competing at each node. Could you guys enlighten me on this problem??
int increment(int threadId, int index, int value) {
int lastValue = __sync_fetch_and_add(&nodes[index].firstValue, value);
if (index == 0) return lastValue;
while (nodes[index].isActive == 1) {
}
if (lastValue == 0) {
while(nodes[index].waiting == 1) {
}
nodes[index].waiting = 1;
nodes[lindex].isActive = 0;
} else {
nodes[index].isActive = 1;
nodes[index].result = increment(threadId, (index - 1)/2, nodes[index].firstValue);
nodes[index].firstValue = 0;
nodes[index].waiting = 0;
}
return nodes[index].result + lastValue;
}
I don't think that will work on 1 core. You infinitely loop on isActive because you can't set isActive to 0 unless it is 0.
I'm not sure if you're code has a mechanism to stop this but, Here's my best crack at it Here are the threads that run and cause problems:
ex)
thread1 thread 2
nodes[10].isActive = 1
//next run on index 10
while (nodes[index].isActive == 1) {//here is the deadlock}
It's hard to understand exactly what's going on here/ what you're trying to do but I would recommend that somehow you need to be able to deactivate nodes[index].isActive. You may want to set it to 0 at the end of the function

Why am i use Parallel.For get the diffrent result?

using System.Threading.Tasks;
const int _Total = 1000000;
[ThreadStatic]
static long count = 0;
static void Main(string[] args)
{
Parallel.For(0, _Total, (i) =>
{
count++;
});
Console.WriteLine(count);
}
I get different result every time, can anybody help me and tell me why?
Most likely your "count" variable isn't atomic in any form, so you are getting concurrent modifications that aren't synchronized. Thus, the following sequence of events is possible:
Thread 1 reads "count"
Thread 2 reads "count"
Thread 1 stores value+1
Thread 2 stores value+1
Thus, the "for" loop has done 2 iterations, but the value has only increased by 1. As thread ordering is "random", so will be the result.
Things can get a lot worse, of course:
Thread 1 reads count
Thread 2 increases count 100 times
Thread 1 stores value+1
In that case, all those 100 increases done by thread 2 are undone. Although that can really only happen if the "++" is actually split into at least 2 machine instructions, so it can be interrupted in the middle of the operation. In the one-instruction case, you're only dealing with interleaved hardware threads.
It's a typical race condition scenario.
So, most likely, ThreadStatic is not working here. In this concrete sample use System.Threading.Interlocked:
void Main()
{
int total = 1000000;
int count = 0;
System.Threading.Tasks.Parallel.For(0, _Total, (i) =>
{
System.Threading.Interlocked.Increment(ref count);
});
Console.WriteLine(count);
}
Similar question
C# ThreadStatic + volatile members not working as expected

How can I make this prime finder operate in parallel

I know prime finding is well studied, and there are a lot of different implementations. My question is, using the provided method (code sample), how can I go about breaking up the work? The machine it will be running on has 4 quad core hyperthreaded processors and 16GB of ram. I realize that there are some improvements that could be made, particularly in the IsPrime method. I also know that problems will occur once the list has more than int.MaxValue items in it. I don't care about any of those improvements. The only thing I care about is how to break up the work.
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
namespace Prime
{
class Program
{
static List<ulong> primes = new List<ulong>() { 2 };
static void Main(string[] args)
{
ulong reportValue = 10;
for (ulong possible = 3; possible <= ulong.MaxValue; possible += 2)
{
if (possible > reportValue)
{
Console.WriteLine(String.Format("\nThere are {0} primes less than {1}.", primes.Count, reportValue));
try
{
checked
{
reportValue *= 10;
}
}
catch (OverflowException)
{
reportValue = ulong.MaxValue;
}
}
if (IsPrime(possible))
{
primes.Add(possible);
Console.Write("\r" + possible);
}
}
Console.WriteLine(primes[primes.Count - 1]);
Console.ReadLine();
}
static bool IsPrime(ulong value)
{
foreach (ulong prime in primes)
{
if (value % prime == 0) return false;
if (prime * prime > value) break;
}
return true;
}
}
}
There are 2 basic schemes I see: 1) using all threads to test a single number, which is probably great for higher primes but I cannot really think of how to implement it, or 2) using each thread to test a single possible prime, which can cause a non-continuous string of primes to be found and run into unused resources problems when the next number to be tested is greater than the square of the highest prime found.
To me it feels like both of these situations are challenging only in the early stages of building the list of primes, but I'm not entirely sure. This is being done for a personal exercise in breaking this kind of work.
If you want, you can parallelize both operations: the checking of a prime, and the checking of multiple primes at once. Though I'm not sure this would help. To be honest I'd consider remove the threading in main().
I've tried to stay faithful to your algorithm, but to speed it up a lot I've used x*x instead of reportvalue; this is something you could easily revert if you wish.
To further improve on my core splitting you could determine an algorithm to figure out the number of computations required to perform the divisions based on the size of the numbers and split the list that way. (aka smaller numbers take less time to divide by so make the first partitions larger)
Also my concept of threadpool may not exist the way I want to use it
Here's my go at it(pseudo-ish-code):
List<int> primes = {2};
List<int> nextPrimes = {};
int cores = 4;
main()
{
for (int x = 3; x < MAX; x=x*x){
int localmax = x*x;
for(int y = x; y < localmax; y+=2){
thread{primecheck(y);}
}
"wait for all threads to be executed"
primes.add(nextPrimes);
nextPrimes = {};
}
}
void primecheck(int y)
{
bool primality;
threadpool? pool;
for(int x = 0; x < cores; x++){
pool.add(thread{
if (!smallcheck(x*primes.length/cores,(x+1)*primes.length/cores ,y)){
primality = false;
pool.kill();
}
});
}
"wait for all threads to be executed or killed"
if (primality)
nextPrimes.add(y);
}
bool smallcheck(int a, int b, int y){
foreach (int div in primes[a to b])
if (y%div == 0)
return false;
return true;
}
E: I added what I think pooling should look like, look at revision if you want to see it without.
Use the sieve of Eratosthenes instead. It's not worthwhile to parallelize unless you use a good algorithm in the first place.
Separate the space to sieve into large regions and sieve each in its own thread. Or better use some workqueue concept for large regions.
Use a bit array to represent the prime numbers, it takes less space than representing them explicitly.
See also this answer for a good implementation of a sieve (in Java, no split into regions).

Resources