Will this cause a race condition? - multithreading

I made a program in C that will create 10 threads, and inside each thread add 10,000 integers [0-100]. When the thread ends it adds the partial sum to the total sum. It is unlikely that 2 threads will end at the exact same time, but if they do will there be a problem?
#include <stdio.h>
#include <time.h>
#include<pthread.h>
pthread_t pid[10];
int i = 0;
int sum;
void* partial(void *arg)
{
int partial = 0;
pthread_t id = pthread_self();
int k = 0;
for(k = 0; k < 10000; k++) {
int r = rand() % 101;
partial += r;
}
sum += partial;
return NULL;
}
main() {
srand(time(NULL));
clock_t begin,end;
double timeSpent;
begin = clock();
while(i < 10) {
pthread_create(&(pid[i]), NULL, &partial, NULL);
printf("\n Thread created successfully\n");
i++;
}
sleep(10);
end = clock();
timeSpent = (double)(end-begin);
printf("\n Time taken: %f", timeSpent);
printf("\n sum: %d \n", sum);
return 0;
}

Yes, without locking with mutexes there is (an unlikely but possible) chance of a race condition.
If two threads do finish at the same time they will try to modify the common resource (sum) at the same time and that will lead to the common resource not being updated properly, since both threads will "race" to read the value of sum when incrementing it in the statement sum+=partial.

Related

Execution time of very short function

I am required to display the execution time of some searching algorithms. However, when I use start/end_t = clock(), it always displays 0.00000 due to low precision (even with double-type)
Please tell me how to display those running times.
int LinearSearch (int M[], int target, int size)
{
int k = 0;
for (k=0; k<size; k++)
{
if(M[k]==target)
{
return k;
}
//else return -1;
}
}
int LinearSentinelSearch (int M[],int target, int size)
{
int k = 0;
M[size]=target;
while (M[k] != target)
k++;
return k;
}
int binSearch(int List[], int Target, int Size)
{
int Mid;
int low = 0;
int high = Size -1;
int count=0;
int a;
while( low <= high)
{
Mid = (low + high) / 2;
if(List[Mid] == Target) return Mid;
else if( Target < List[Mid] )
high = Mid - 1;
else
low = Mid + 1;
}
return -1;
}
You can calculate the mean execution time by simply executing the algorithm multiple times N and then divide the total time by N. Using your binSearch as an example:
int i;
clock_t start, end;
start = clock();
for (i = 0 ; i < 1000 ; i++) {
binSearch(/* your actual parameters here */);
}
end = clock();
printf("Mean ticks to execute binSearch: %f\n", (end - start) / 1000.0);

Having trouble with pointers in C++

I am trying to access an array from inside of a function, but I get the
"Error C2065 'i': undeclared identifier." I know that I am making a mistake with the pointer. I was able to pull information from the array in the function below the one I'm having issues with, so I'm not sure why I am unable to do the same thing here. Thank you for your time.
#include <iostream>
#include <cmath>
using namespace std;
double mean(int size, int* numbers);
double sDeviation(int numOfScores, int average, int* scores);
int histogram(int numOfScores, int* scores); //<<<This is what I'm having trouble with
int main()
{
int count = 0;
int scores[100];
while (true)
{
int scoreToBeEntered;
cout << "Please enter a score: ";
cin >> scoreToBeEntered;
if(scoreToBeEntered == NULL)
cout << "No value entered" << endl;
else if(scoreToBeEntered != -1)
scores[count++] = scoreToBeEntered;
else
break;
}
for(int i = 9; i >= 0; i--)
cout << i << "|" << endl;
cout << "SD: " << sDeviation(count, mean(count, scores), scores) << endl;
system("pause");
return 0;
}
int histogram(int numOfScores, int* scores)//this is where the issue starts
{
int* bins = new int[10];
for(int i = 0; i < numOfScores; i++);
if(scores[i] >= 90) //<<<<This is the undeclared "i"
{
bins[9]++;
}
}
double sDeviation(int numOfScores, int average, int* scores)
{
double deviation = 0;
for (int i = 0; i < numOfScores; i++)
deviation += pow(scores[i] - average, 2);
return sqrt(deviation / numOfScores);
}
double mean(int size, int* numbers)
{
double sum = 0;
for (int i = 0; i < size; i++)
sum += numbers[i];
return sum / size;
}

Why I can duplicate a string by this way but not this way?

This code works fine to me and I can understand it:
char * strduplica(char *s)
{
int i, len = strlen(s);
for (i=0; i<len; i++)
s[i+len] = s[i];
s[i+len] = '\0';
return s;
}
main()
{
char s[20]="Ana";
puts(strduplica(s));
}
Before, I tried this and I got a "Segmentation Fault". Why?:
for (i=0; i<len; i++)
s[len++] = s[i];
s[len] = '\0';
The output should be: "AnaAna".
Because you were incrementing len which is used in the for termination condition:
for (i=0; i<len; i++)
In every iteration both i and len are incremented. Thus, i always stays less than len, and you get an infinite loop.
Eventually, the loop writes a value beyond the allocated area which results in the segmentation fault.

time command shows user time greater than real time

I have more or less the same question as
linux time command resulting real is less than user
and
user time larger than real time
but can't post a comment on those questions.
When I run the non-multi-threaded program given below, I occasionally get user time greater than real time with both /usr/bin/time and bash's builtin time. I don't see anything that might use a different core. Is rand() somehow the culprit? How? Thanks!
#include <stdio.h>
#include <stdlib.h>
#define N 100
#define MM_MAX 50000
int
main(int ac, char **av)
{
unsigned int i, j, k, n;
int A[N][N], B[N][N], C[N][N];
if (ac != 2) {
fprintf(stderr, "Usage: matmul <seed>");
exit(1);
}
srand((unsigned int) atoi(av[1]));
for (n = 0; n < atoi(av[1]); n++) {
for (i = 0; i < N; i++) {
for (j = 0; j < N; j++) {
A[i][j] = rand() % MM_MAX;
B[i][j] = rand() % MM_MAX;
}
}
for (i = 0; i < N; i++) {
for (j = 0; j < N; j++) {
C[i][j] = 0;
for (k = 0; k < N; k++) {
C[i][j] += A[i][k] * B[k][j];
}
printf("%7d ", C[i][j]);
}
putchar('\n');
}
}
return 0;
}

OpenMP nested tasking, 1 thread not executing tasks

I'm doing some tests with a simple code which is written below.
The problem is that in a four core machine, I'm only getting 75% of load. The fourth core is idling, doing nothing. The code has an omp parallel, then an omp single inside of which the thread generates a task. That task generates a number of grandchildren tasks. The task will wait in a barrier until all of its children (grandchildren for the thread in the single region) finish and the thread executing the single region waits on another barrier until its direct descendant task finishes. The problem is that the thread executing the single region does not execute any of the grandchildren tasks. Given the blocksize I'm using, I'm creating thousands of tasks, so it's not a problem of available parallelism.
Am I misunderstanding OpenMP tasking? Is it related to the taskwait only waiting for the direct children? If so, how could I get the idle thread to execute available work? Imagine that I wanted to create tasks with dependencies as in OpenMP 4.0, then I would not be able to exploit all the threads available with dependencies. The barrier in the parent task would be needed as I would not want to free next tasks dependent on it until all of its children has finished.
#include <iostream>
#include <cstdlib>
#include <omp.h>
using namespace std;
#define VECSIZE 200000000
float* A;
float* B;
float* C;
void LoopDo(int start, int end) {
for (int i = start; i < end; i++)
{
C[i] += A[i]*B[i];
A[i] *= (B[i]+C[i]);
B[i] = C[i] + A[i];
C[i] *= (A[i]*C[i]);
C[i] += A[i]*B[i];
C[i] += A[i]*B[i];
....
}
void StartTasks(int bsize)
{
int nthreads = omp_get_num_threads();
cout << "bsize is: " << bsize << endl;
cout << "nthreads is: " << nthreads << endl;
#pragma omp task default(shared)
{
for (int i =0; i <VECSIZE; i+=bsize)
{
#pragma omp task default(shared) firstprivate(i,bsize)
LoopDo(i,i+bsize);
if (i + bsize >= VECSIZE) bsize = VECSIZE - i;
}
cerr << "Task creation ended" << cerr;
#pragma omp taskwait
}
#pragma omp taskwait
}
int main(int argc, char** argv)
{
A = (float*)malloc(VECSIZE*sizeof(float));
B = (float*)malloc(VECSIZE*sizeof(float));
C = (float*)malloc(VECSIZE*sizeof(float));
int bsize = atoi(argv[1]);
for (int i = 0; i < VECSIZE; i++)
{
A[i] = i; B[i] = i; C[i] = i;
}
#pragma omp parallel
{
#pragma omp single
{
StartTasks(bsize);
}
}
free(A);
free(B);
free(C);
return 0;
}
EDIT:
I tested with ICC 15.0 and it employs all the cores of my machine. Although ICC forks 5 threads instead of 4 like GCC does. The fifth ICC thread remains idle.
EDIT 2:
The following change, adding a loop with as many top level tasks as threads, gets all threads feeded with tasks. If top level tasks < ntthreads then at some executions the master thread won't execute any task and will remain idle as before. ICC as always will generate a binary which allows to use all cores.
for (int i = 0; i<nthreads;i++)
{
#pragma omp task default(shared)
{
for (int i =0; i <VECSIZE; i+=bsize)
{
#pragma omp task default(shared) firstprivate(i,bsize)
LoopDo(i,i+bsize);
if (i + bsize >= VECSIZE) bsize = VECSIZE - i;
}
cerr << "Task creation ended" << cerr;
#pragma omp taskwait
}
}
#pragma omp taskwait

Resources