How to implement barrier using posix semaphores?

How to implement barrier using posix semaphores? - multithreading

How to implement barrier using posix semaphores?
void my_barrier_init(int a){
int i;
bar.number = a;
bar.counter = 0;
bar.arr = (sem_t*) malloc(sizeof(sem_t)*bar.number);
bar.cont = (sem_t*) malloc(sizeof(sem_t)*bar.number);
for(i = 0; i < bar.number; i++){
sem_init(&bar.arr[i], 0, 0);
sem_init(&bar.cont[i], 0, 0); }
}
void my_barrier_wait(){
int i;
bar.counter++;
if(bar.number == bar.counter){
for(i = 0; i < bar.number-1; i++){ sem_wait(&bar.arr[i]); }
for(i = 0; i < bar.number-1; i++){ sem_post(&bar.cont[i]); }
bar.counter = 0;
}else{
sem_post(&bar.arr[pthread_self()-2]);
sem_wait(&bar.cont[pthread_self()-2]);
}
}
When function my_barrier_wait is called, first (N-1) times it would set(+1) for semaphores in array 'arr' and go to sleep(calling sem_wait). N-th time it decrements semaphores in 'arr' and SHOULD (as I expect) wake up [0..bar.number-1] threads posting +1 for semaphores in 'cont' array. It doesn't work like barrier.

You need to look at this (PDF), The Little Book of Semaphores by Allen Downey. Specifically section 3.6.7. It's in Python, but gist of it should be clear enough.

Related

I have some trouble with multi threading

I am struggling with this code. It's about counting some patterns from a text file. I tried to use thread(divide and conquer) processing, but it return a wrong value. I used mutex value to synchronize critical section.. The code is below. First main argument is number of threads, second is the name of a text file I want counting patterns from, and following patterns I want to look up on the text.
please save me..
Code is below
char *buffer;
int fsize, count;
char **searchword;
int *wordcount;
int *strlength;
typedef struct _params{
int num1;
int num2;
}params;
pthread_mutex_t mutex;
void *childfunc(void *arg)
{
int size, i, j, k, t, start, end, len, flag = -1;
int result;
params *a = (params *)arg;
start = a->num1;
end = a->num2;
while(1){
if(start == 0 || start == fsize)
break;
if(buffer[start]!= ' ' && buffer[start] != '\n' && buffer[start] != '\t')
start++;
else
break;
}
while(1){
if(end == fsize)
break;
if(buffer[end] != ' ' && buffer[end] != '\n' && buffer[end] != '\t')
end++;
else
break;
}
for(i = 0; i < count; i++){
len = strlength[i];
for(j = start; j<(end - len + 1); j++){
if(buffer[j] == searchword[i][0]){
flag = 0;
for(k = j +1; k<j + len; k++){
if(buffer[k] != searchword[i][k-j])
{
flag = 1;
break;
}
}
if(flag == 0){
pthread_mutex_lock(&mutex);
wordcount[i]++;
pthread_mutex_unlock(&mutex);// mutex unlocking
sleep(1);
flag = -1;
}
}
}
}
}
int main(int argc, char **argv){
FILE *fp;
char *inputFile;
pthread_t *tid;
int *status;
int inputNumber, i, j, diff, searchstart, searchend;
int result = 0;
count = argc -3;
inputNumber = atoi(argv[1]);
inputFile = argv[2];
searchword = (char **)malloc(sizeof(char *)*count);
tid = malloc(sizeof(pthread_t)*inputNumber);
strlength = (int *)malloc(4*count);
status = (int *)malloc(4*inputNumber);
wordcount = (int *)malloc(4*count);
for(i = 0; i < count; i++)
searchword[i] = (char*)malloc(sizeof(char)*(strlen(argv[i+3]) + 1));
for(i = 3; i < argc; i++)
strcpy(searchword[i-3], argv[i]);
fp = fopen(inputFile, "r");
fseek(fp, 0, SEEK_END);
fsize = ftell(fp);
rewind(fp);
buffer = (char *)malloc(1*fsize);
fread(buffer, fsize, 1, fp);
diff = fsize / inputNumber;
if(diff == 0)
diff = 1;
for(i = 0; i < count ; i++){
strlength[i] = strlen(searchword[i]);
wordcount[i] = 0;
}
for(i = 0; i < inputNumber; i++){
searchstart = 0 + i*diff;
searchend = searchstart + diff;
if(searchstart > fsize)
searchstart = fsize;
if(searchend > fsize)
searchend = fsize;
if( i == inputNumber -1)
searchend = fsize;
params a;
a.num1 = searchstart;
a.num2 = searchend;
pthread_mutex_init(&mutex, NULL);
result = pthread_create(&tid[i], NULL, childfunc, (void *)&a);
if(result < 0){
perror("pthread_create()");
}
}
//스레드 받는 부분
for(i = 0; i < inputNumber; i++){
result = pthread_join(tid[i], (void **)status);
if(result < 0)
perror("pthread_join()");
}
pthread_mutex_destroy(&mutex); // mutex 해제
for(i = 0; i < count; i++)
printf("%s : %d \n", searchword[i], wordcount[i]);
for(i = 0; i < count; i++) //동적메모리해제
free(searchword[i]);
free(searchword);
free(buffer);
free(tid);
free(strlength);
free(wordcount);
free(status);
fclose(fp);
return 0;
}

params a; // new a for each loop, previous a no longer exists
a.num1 = searchstart;
a.num2 = searchend;
pthread_mutex_init(&mutex, NULL);
result = pthread_create(&tid[i], NULL, childfunc, (void *)&a);
if(result < 0){
perror("pthread_create()");
}
} // a goes out of scope here
You pass each thread the address of a, but then a goes out of scope immediately after you create the thread. So the thread now has the address of some random leftover junk on the stack.
You need to have some conception of ownership of any object that's accessed by more than one thread like a is here. It can be owned by the thread that called pthread_create, owned by the newly-created thread, or jointly owned. But you have to be consistent. You have neither of these, why?
Is a owned only be the thread that called pthread_create? No, because the newly-created thread has a pointer to it and accesses it through that pointer. So the thread that called pthread_create cannot destroy it.
Is a owned only be the thread created by pthread_create? No, because it's on the stack of the thread that called pthread_create and will cease to exist when the next loop comes around.
Is a jointly owned? Well, no, because the thread that called pthread_create can destroy the object before the newly-created thread accesses it.
So no sane model of multi-thread use is followed by a. It's broken.
One common solution to this problem is to allocate a new structure (using malloc or new) for each thread, fill it in, and pass the thread the address of the structure. Let the thread free (or delete) the structure when it's done with it.

mutex and its effect on execution time (and cpu usage)

I wrote a very simple test program to examine efficiency of pthread mutex. But I'm not able to analyse the results I get. (I can see 4 CPUs in Linux System Monitor and that's why I have at least 4 active threads, because I want to keep all of them busy.) The existence of mutex is not necessary in the code.
#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>
pthread_mutex_t lock1, lock2, lock3, lock4;
void do_sth() { /* just open a files, read it and copy to another file */
int i;
for (i = 0; i < 1; i++) {
FILE* fp = fopen("(2) Catching Fire.txt", "r");
if (fp == NULL) {
fprintf(stderr, "could not open file\n");
exit(1);
}
char filename[20];
sprintf(filename, "a%d", (int)pthread_self());
FILE* wfp = fopen(filename, "w");
if (wfp == NULL) {
fprintf(stderr, "could not open file for write\n");
exit(1);
}
int c;
while (c = fgetc(fp) != EOF) {
c++;
fputc(c, wfp);
}
close(fp);
close(wfp);
}
}
void* routine1(void* param) {
pthread_mutex_lock(&lock1);
do_sth();
pthread_mutex_unlock(&lock1);
}
void* routine2(void* param) {
pthread_mutex_lock(&lock2);
do_sth();
pthread_mutex_unlock(&lock2);
}
void* routine3(void* param) {
pthread_mutex_lock(&lock3);
do_sth();
pthread_mutex_unlock(&lock3);
}
void* routine4(void* param) {
pthread_mutex_lock(&lock4);
do_sth();
pthread_mutex_unlock(&lock4);
}
int main(int argc, char** argv) {
int i ;
pthread_mutex_init(&lock1, 0);
pthread_mutex_init(&lock2, 0);
pthread_mutex_init(&lock3, 0);
pthread_mutex_init(&lock4, 0);
pthread_t thread1[4];
pthread_t thread2[4];
pthread_t thread3[4];
pthread_t thread4[4];
for (i = 0; i < 4; i++)
pthread_create(&thread1[i], NULL, routine1, NULL);
for (i = 0; i < 4; i++)
pthread_create(&thread2[i], NULL, routine2, NULL);
for (i = 0; i < 4; i++)
pthread_create(&thread3[i], NULL, routine3, NULL);
for (i = 0; i < 4; i++)
pthread_create(&thread4[i], NULL, routine4, NULL);
for (i = 0; i < 4; i++)
pthread_join(thread1[i], NULL);
for (i = 0; i < 4; i++)
pthread_join(thread2[i], NULL);
for (i = 0; i < 4; i++)
pthread_join(thread3[i], NULL);
for (i = 0; i < 4; i++)
pthread_join(thread4[i], NULL);
printf("Hello, World!\n");
}
I execute this program in two ways, with and without all the mutex. and I measure time of execution (using time ./a.out) and average cpu load (using htop). here is the results:
first: when I use htop, I can see that loadavg of the system considerably increases when I do not use any mutex in the code. I have no idea why this happens. (is 4 active threads not enough to get the most out of 4 CPUs?)
second: It takes (a little) less time for the program to execute with all those mutex than without it. why does it happen? I mean, it should take some time to sleep and wake up a thread.
edit: I guess, when I use locks I put other threads to sleep and it eliminates a lot of context-switch (saving some time), could this be the reason?

You are using one lock per thread, so that's why when you use all the mutexes you don't see an increase in the execution time of the application: dosth() is not actually being protected from concurrent execution.
Since all the threads are working on the same file, all they should be accessing it using the same lock (otherwise you will have incorrect results: all the threads trying to modify the file at the same time).
Try running again the experiments using just one global lock.

Communication between two pthreads to display

I have looked at other questions and still cant seem to understand this concept. I have created four threads and each thread needs to communicate with the same thread. (Min and Display), (Max and Display), and so on..
I know I need to use:
pthread_mutex_t myLock1 = PTHREAD_MUTEX_INITIALIZER;
pthread_mutex_lock(&myLock1);
(some global variable) = ....;
pthread_mutex_unlock(&myLock1);
Here is one of the threads I have created:
pthread_create(&minThread, NULL, (void *) mini, (void *) numbers);
pthread_join(minThread, (void *) &min);
void *mini(void *numbs1)
{
int *numbers = (int *) numbs1;
min = malloc(sizeof(int));
*min = (numbers[0]);
for (i = 0; i < length; i++) //Computing min value
{
if ( numbers[i] < *min )
{
*min = numbers[i];
}
}
printf("Min has been computed.\n");
pthread_exit(min);
}
I am now supposed to incorporate the use of another thread (disThread) to display the variable I have computed in minThread (min) and other similar threads that have done computations using those mutex system calls I have specified. How can I set this up correctly? Any tips are appreciated to further my understanding.

Turns out that using the mutex system calls are pretty simple. For anyone else who runs into a similar problem, the thread functions void *mini and void *disi follow this format:
void *mini(void *numbs1)
{
pthread_mutex_lock(&theLock);
int *numbers = (int *) numbs1;
min = malloc(sizeof(int));
*min = (numbers[0]);
for (i = 0; i < LENGTH; i++) //Computing min value
{
if ( numbers[i] < *min )
{
*min = numbers[i];
}
}
pthread_mutex_unlock(&theLock);
printf("Min has been computed.\n");
pthread_exit(min);
}
void *disi()
{
pthread_mutex_lock(&theLock);
x = *min; //Displaying min value (x initialized as global variable)
pthread_mutex_unlock(&theLock);
printf("The Minimum value is %d\n", x);
}
A mutex lock and mutex unlock should cover the range of a shared variable between two threads. This is so there is no interference between the two threads working simultaneously.

pthread_join skipping first thread

So I have the following code, some of it left out so it's easier to understand.
for (unsigned int t = 0; t < NUM_THREADS; t++)
{
if (pthread_create(&threads[t], NULL, thread_run, (void*) &threadData) != 0)
{
perror("pthread_create");
}//end if
}
for (unsigned int z = 0; z < NUM_THREADS; z++)
{
if (pthread_join(threads[z], NULL) != 0)
{
perror("pthread_join");
}
}
My Problem is the join function, it is skipping the first thread I create, and continuing on. The current solution I have is adding an extra thread and not making the first one do any work.
Any ideas why this might be happening?

IMO there is not pthreads problem; you are just creating NUM_THREADS + 1 threads and join only first NUM_THREADS of them.

How to parallelize Sudoku solver using Grand Central Dispatch?

As a programming exercise, I just finished writing a Sudoku solver that uses the backtracking algorithm (see Wikipedia for a simple example written in C).
To take this a step further, I would like to use Snow Leopard's GCD to parallelize this so that it runs on all of my machine's cores. Can someone give me pointers on how I should go about doing this and what code changes I should make? Thanks!
Matt

Please let me know if you end up using it. It is run of the mill ANSI C, so should run on everything. See other post for usage.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
short sudoku[9][9];
unsigned long long cubeSolutions=0;
void* cubeValues[10];
const unsigned char oneLookup[64] = {0x8b, 0x80, 0, 0x80, 0, 0, 0, 0x80, 0, 0,0,0,0,0,0, 0x80, 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0x80,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0};
int ifOne(int val) {
if ( oneLookup[(val-1) >> 3] & (1 << ((val-1) & 0x7)) )
return val;
return 0;
}
void init_sudoku() {
int i,j;
for (i=0; i<9; i++)
for (j=0; j<9; j++)
sudoku[i][j]=0x1ff;
}
void set_sudoku( char* initialValues) {
int i;
if ( strlen (initialValues) != 81 ) {
printf("Error: inputString should have length=81, length is %2.2d\n", strlen(initialValues) );
exit (-12);
}
for (i=0; i < 81; i++)
if ((initialValues[i] > 0x30) && (initialValues[i] <= 0x3a))
sudoku[i/9][i%9] = 1 << (initialValues[i] - 0x31) ;
}
void print_sudoku ( int style ) {
int i, j, k;
for (i=0; i < 9; i++) {
for (j=0; j < 9; j++) {
if ( ifOne(sudoku[i][j]) || !style) {
for (k=0; k < 9; k++)
if (sudoku[i][j] & 1<<k)
printf("%d", k+1);
} else
printf("*");
if ( !((j+1)%3) )
printf("\t");
else
printf(",");
}
printf("\n");
if (!((i+1) % 3) )
printf("\n");
}
}
void print_HTML_sudoku () {
int i, j, k, l, m;
printf("<TABLE>\n");
for (i=0; i<3; i++) {
printf(" <TR>\n");
for (j=0; j<3; j++) {
printf(" <TD><TABLE>\n");
for (l=0; l<3; l++) { printf(" <TR>"); for (m=0; m<3; m++) { printf("<TD>"); for (k=0; k < 9; k++) { if (sudoku[i*3+l][j*3+m] & 1<<k)
printf("%d", k+1);
}
printf("</TD>");
}
printf("</TR>\n");
}
printf(" </TABLE></TD>\n");
}
printf(" </TR>\n");
}
printf("</TABLE>");
}
int doRow () {
int count=0, new_value, row_value, i, j;
for (i=0; i<9; i++) {
row_value=0x1ff;
for (j=0; j<9; j++)
row_value&=~ifOne(sudoku[i][j]);
for (j=0; j<9; j++) {
new_value=sudoku[i][j] & row_value;
if (new_value && (new_value != sudoku[i][j]) ) {
count++;
sudoku[i][j] = new_value;
}
}
}
return count;
}
int doCol () {
int count=0, new_value, col_value, i, j;
for (i=0; i<9; i++) {
col_value=0x1ff;
for (j=0; j<9; j++)
col_value&=~ifOne(sudoku[j][i]);
for (j=0; j<9; j++) {
new_value=sudoku[j][i] & col_value;
if (new_value && (new_value != sudoku[j][i]) ) {
count++;
sudoku[j][i] = new_value;
}
}
}
return count;
}
int doCube () {
int count=0, new_value, cube_value, i, j, l, m;
for (i=0; i<3; i++)
for (j=0; j<3; j++) {
cube_value=0x1ff;
for (l=0; l<3; l++)
for (m=0; m<3; m++)
cube_value&=~ifOne(sudoku[i*3+l][j*3+m]);
for (l=0; l<3; l++)
for (m=0; m<3; m++) {
new_value=sudoku[i*3+l][j*3+m] & cube_value;
if (new_value && (new_value != sudoku[i*3+l][j*3+m]) ) {
count++;
sudoku[i*3+l][j*3+m] = new_value;
}
}
}
return count;
}
#define FALSE -1
#define TRUE 1
#define INCOMPLETE 0
int validCube () {
int i, j, l, m, r, c;
int pigeon;
int solved=TRUE;
//check horizontal
for (i=0; i<9; i++) {
pigeon=0;
for (j=0; j<9; j++)
if (ifOne(sudoku[i][j])) {
if (pigeon & sudoku[i][j]) return FALSE;
pigeon |= sudoku[i][j];
} else {
solved=INCOMPLETE;
}
}
//check vertical
for (i=0; i<9; i++) {
pigeon=0;
for (j=0; j<9; j++)
if (ifOne(sudoku[j][i])) {
if (pigeon & sudoku[j][i]) return FALSE;
pigeon |= sudoku[j][i];
}
else {
solved=INCOMPLETE;
}
}
//check cube
for (i=0; i<3; i++)
for (j=0; j<3; j++) {
pigeon=0;
r=j*3; c=i*3;
for (l=0; l<3; l++)
for (m=0; m<3; m++)
if (ifOne(sudoku[r+l][c+m])) {
if (pigeon & sudoku[r+l][c+m]) return FALSE;
pigeon |= sudoku[r+l][c+m];
}
else {
solved=INCOMPLETE;
}
}
return solved;
}
int solveSudoku(int position ) {
int status, i, k;
short oldCube[9][9];
for (i=position; i < 81; i++) {
while ( doCube() + doRow() + doCol() );
status = validCube() ;
if ((status == TRUE) || (status == FALSE))
return status;
if ((status == INCOMPLETE) && !ifOne(sudoku[i/9][i%9]) ) {
memcpy( &oldCube, &sudoku, sizeof(short) * 81) ;
for (k=0; k < 9; k++) {
if ( sudoku[i/9][i%9] & (1<<k) ) {
sudoku[i/9][i%9] = 1 << k ;
if (solveSudoku(i+1) == TRUE ) {
/* return TRUE; */
/* Or look for entire set of solutions */
if (cubeSolutions < 10) {
cubeValues[cubeSolutions] = malloc ( sizeof(short) * 81 ) ;
memcpy( cubeValues[cubeSolutions], &sudoku, sizeof(short) * 81) ;
}
cubeSolutions++;
if ((cubeSolutions & 0x3ffff) == 0x3ffff ) {
printf ("cubeSolutions = %llx\n", cubeSolutions+1 );
}
//if ( cubeSolutions > 10 )
// return TRUE;
}
memcpy( &sudoku, &oldCube, sizeof(short) * 81) ;
}
if (k==8)
return FALSE;
}
}
}
return FALSE;
}
int main ( int argc, char** argv) {
int i;
if (argc != 2) {
printf("Error: number of arguments on command line is incorrect\n");
exit (-12);
}
init_sudoku();
set_sudoku(argv[1]);
printf("[----------------------- Input Data ------------------------]\n\n");
print_sudoku(1);
solveSudoku(0);
if ((validCube()==1) && !cubeSolutions) {
// If sudoku is effectively already solved, cubeSolutions will not be set
printf ("\n This is a trivial sudoku. \n\n");
print_sudoku(1);
}
if (!cubeSolutions && validCube()!=1)
printf("Not Solvable\n");
if (cubeSolutions > 1) {
if (cubeSolutions >= 10)
printf("10+ Solutions, returning first 10 (%lld) [%llx] \n", cubeSolutions, cubeSolutions);
else
printf("%llx Solutions. \n", cubeSolutions);
}
for (i=0; (i < cubeSolutions) && (i < 10); i++) {
memcpy ( &sudoku, cubeValues[i], sizeof(short) * 81 );
printf("[----------------------- Solution %2.2d ------------------------]\n\n", i+1);
print_sudoku(0);
//print_HTML_sudoku();
}
return 0;
}

For one, since backtracking is a depth-first search it is not directly parallelizable, since any newly computed result cannot be used be directly used by another thread. Instead, you must divide the problem early, i.e. thread #1 starts with the first combination for a node in the backtracking graph, and proceeds to search the rest of that subgraph. Thread #2 starts with the second possible combination at the first and so forth. In short, for n threads find the n possible combinations on the top level of the search space (do not "forward-track"), then assign these n starting points to n threads.
However I think the idea is fundamentally flawed: Many sudoku permutations are solved in a matter of a couple thousands of forward+backtracking steps, and are solved within milliseconds on a single thread. This is in fact so fast that even the small coordination required for a few threads (assume that n threads reduce computation time to 1/n of original time) on a multi-core/multi-CPU is not negligible compared to the total running time, thus it is not by any chance a more efficient solution.

Are you sure you want to do that? Like, what problem are you trying to solve? If you want to use all cores, use threads. If you want a fast sudoku solver, I can give you one I wrote, see output below. If you want to make work for yourself, go ahead and use GCD ;).
Update:
I don't think GCD is bad, it just isn't terribly relevant to the task of solving sudoku. GCD is a technology to tie GUI events to code. Essentially, GCD solves two problems, a Quirk in how the MacOS X updates windows, and, it provides an improved method (as compared to threads) of tying code to GUI events.
It doesn't apply to this problem because Sudoku can be solved significantly faster than a person can think (in my humble opinion). That being said, if your goal was to solve Sudoku faster, you would want to use threads, because you would want to directly use more than one processor.
[bear#bear scripts]$ time ./a.out ..1..4.......6.3.5...9.....8.....7.3.......285...7.6..3...8...6..92......4...1...
[----------------------- Input Data ------------------------]
*,*,1 *,*,4 *,*,*
*,*,* *,6,* 3,*,5
*,*,* 9,*,* *,*,*
8,*,* *,*,* 7,*,3
*,*,* *,*,* *,2,8
5,*,* *,7,* 6,*,*
3,*,* *,8,* *,*,6
*,*,9 2,*,* *,*,*
*,4,* *,*,1 *,*,*
[----------------------- Solution 01 ------------------------]
7,6,1 3,5,4 2,8,9
2,9,8 1,6,7 3,4,5
4,5,3 9,2,8 1,6,7
8,1,2 6,4,9 7,5,3
9,7,6 5,1,3 4,2,8
5,3,4 8,7,2 6,9,1
3,2,7 4,8,5 9,1,6
1,8,9 2,3,6 5,7,4
6,4,5 7,9,1 8,3,2
real 0m0.044s
user 0m0.041s
sys 0m0.001s

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

How to implement barrier using posix semaphores? - multithreading

You need to look at this (PDF), The Little Book of Semaphores by Allen Downey. Specifically section 3.6.7. It's in Python, but gist of it should be clear enough.

Related

I have some trouble with multi threading

mutex and its effect on execution time (and cpu usage)

Communication between two pthreads to display

pthread_join skipping first thread

How to parallelize Sudoku solver using Grand Central Dispatch?

Categories

Resources