Pthread create function - multithreading

I am having difficulty understanding the creation of a pthread.
This is the function I declared in the beginning of my code
void *mini(void *numbers); //Thread calls this function
Initialization of thread
pthread_t minThread;
pthread_create(&minThread, NULL, (void *) mini, NULL);
void *mini(void *numbers)
{
min = (numbers[0]);
for (i = 0; i < 8; i++)
{
if ( numbers[i] < min )
{
min = numbers[i];
}
}
pthread_exit(0);
}
numbers is an array of integers
int numbers[8];
Im not sure if I created the pthread correctly.
In the function, mini, I get the following error about setting min (declared as an int) equal to numbers[0]:
Assigning to 'int' from incompatible type 'void'
My objective is to compute the minimum value in numbers[ ] (min) in this thread and use that value later to pass it to another thread to display it. Thanks for any help I can get.

You need to pass 'numbers' as the last argument to pthread_create(). The new thread can then call 'mini' on its own stack with 'numbers' as the argument.
In 'mini', you shoudl cast the void* back to an integer array in order to dereference it correctly - you cannot dereference a void* directly - it does not point to anything:)
Also, it's very confusing to have multiple vars in different threads with the name 'numbers'.

There are some minor improprieties in this pgm but it illustrates basically what you want to do. You should play around, break and improve it.
#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>
void *mini(void *numbs)
{
int *numbers = (int *) numbs;
int *min = malloc(sizeof(int));
*min = (numbers[0]);
for (int i = 0; i < 8; i++)
if (numbers[i] < *min )
*min = numbers[i];
pthread_exit(min);
}
int main(int argc, char *argv[])
{
pthread_t minThread;
int *min;
int numbers[8] = {28, 47, 36, 45, 14, 23, 32, 16};
pthread_create(&minThread, NULL, (void *) mini, (void *) numbers);
pthread_join(minThread, (void *) &min);
printf("min: %d\n", *min);
free(min);
return(0);
}

Related

Visual C++ function call explicit / implicit parameter count

// Function prototype with default parameters
void func(int p1, int p2 = 0, int p3 = 0);
// Function body
void func(int p1, int p2, int p3)
{
printf("The # of explicit parameters is %d\n", ??);
printf("The # of implicit parameters is %d\n", ??);
}
// Use in code
int main(int argc, char* argv[])
{
func(0); // Should print 1 explicit, 2 implicit
func(0, 0); // Should print 2 explicit, 1 implicit
func(0, 0, 0); // Should print 3 explicit, 0 implicit
return 0;
}
In the above example, how can I get the actual number of parameters passed in explicitly? And the number of parameters passed in implicitly?
As #Igor pointed out in a comment, it is not possible to detect if a parameter is default or actually passed.
One of possible ways to emulate it is to switch from default parameters to overload.
Another is to make the default value distinct from explicit value. It can be done with std::optional if you wouldn't pass empty optional explcitly:
#include <stdio.h>
#include <optional>
void func(std::optional<int> p1 = std::nullopt, std::optional<int> p2 = std::nullopt, std::optional<int> p3 = std::nullopt);
// Function body
void func(std::optional<int> p1, std::optional<int> p2, std::optional<int> p3)
{
printf("The # of explicit parameters is %d\n", p1.has_value() + p2.has_value() + p3.has_value());
printf("The # of implicit parameters is %d\n", !p1.has_value() + !p2.has_value() + !p3.has_value());
}
// Use in code
int main(int argc, char* argv[])
{
func(0); // Should print 1 explicit, 2 implicit
func(0, 0); // Should print 2 explicit, 1 implicit
func(0, 0, 0); // Should print 3 explicit, 0 implicit
return 0;
}
Also it can be emulated with variadic template parameters.

MPI_Recv takes a long and strange time to return

I am trying to compare the execution time of two programs:
-the first one uses non blocking functions MPI_Ssend and MPI_Irecv, which allows to do some calculations "while messages are being sent and received".
-the other one uses blocking OpenMPI functions.
I have no problem to assess the performances of the first program and they look "good".
My problem is that the second program, that uses MPI_Recv, often takes a very long and kind of "strange" time to finish : always a little bit more than 1 second : (for example 1.001033 seconds). As if the process had to do something that takes exactly 1 second.
I modified my initial code to show you an equivalent one.
#include "mpi.h"
#include <stdlib.h>
#include <stdio.h>
#define SIZE 5000
int main(int argc, char * argv[])
{
MPI_Request trash;
int rank;
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
int myVector[SIZE];
for(int i = 0 ; i < SIZE ; i++)
myVector[i] = 100;
int buff[SIZE];
double startTime = MPI_Wtime();
if(rank == 0)
MPI_Issend(myVector, SIZE, MPI_INT, 1, 1, MPI_COMM_WORLD, &trash);
if(rank == 1) {
MPI_Recv(buff, SIZE, MPI_INT, 0, 1, MPI_COMM_WORLD, NULL);
printf("Processus 1 finished in %lf seconds\n", MPI_Wtime()-startTime);
}
MPI_Finalize();
return 0;
}
Thank you.

pthreads code not scaling up

I wrote the following very simple pthread code to test how it scales up. I am running the code on a machine with 8 logical processors and at no time do I create more than 8 threads (to avoid context switching).
With increasing number of threads, each thread has to do lesser amount of work. Also, it is evident from the code that there are no shared Data structures between the threads which might be a bottleneck. But still, my performance degrades as I increase the number of threads.
Can somebody tell me what am I doing wrong here.
#include <pthread.h>
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
int NUM_THREADS = 3;
unsigned long int COUNTER = 10000000000000;
unsigned long int LOOP_INDEX;
void* addNum(void *data)
{
unsigned long int sum = 0;
for(unsigned long int i = 0; i < LOOP_INDEX; i++) {
sum += 100;
}
return NULL;
}
int main(int argc, char** argv)
{
NUM_THREADS = atoi(argv[1]);
pthread_t *threads = (pthread_t*)malloc(sizeof(pthread_t) * NUM_THREADS);
int rc;
clock_t start, diff;
LOOP_INDEX = COUNTER/NUM_THREADS;
start = clock();
for (int t = 0; t < NUM_THREADS; t++) {
rc = pthread_create((threads + t), NULL, addNum, NULL);
if (rc) {
printf("ERROR; return code from pthread_create() is %d", rc);
exit(-1);
}
}
void *status;
for (int t = 0; t < NUM_THREADS; t++) {
rc = pthread_join(threads[t], &status);
}
diff = clock() - start;
int sec = diff / CLOCKS_PER_SEC;
printf("%d",sec);
}
Note: All the answers I found online said that the overhead of creating the threads is more than the work they are doing. To test it, I commented out everything in the "addNum()" function. But then, after doing that no matter how many threads I create, the time taken by the code is 0 seconds. So there is no overhead as such, I think.
clock() counts CPU time used, across all threads. So all that's telling you is that you're using a little bit more total CPU time, which is exactly what you would expect.
It's the total wall clock elapsed time which should be going down if your parallelisation is effective. Measure that with clock_gettime() specifying the CLOCK_MONOTONIC clock instead of clock().

Non collective write using in file view

When trying to write blocks to a file, with my blocks being unevenly distributed across my processes, one can use MPI_File_write_at with the good offset. As this function is not a collective operation, this works well.
Exemple :
#include <cstdio>
#include <cstdlib>
#include <string>
#include <mpi.h>
int main(int argc, char* argv[])
{
int rank, size;
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &size);
int global = 7; // prime helps have unbalanced procs
int local = (global/size) + (global%size>rank?1:0);
int strsize = 5;
MPI_File fh;
MPI_File_open(MPI_COMM_WORLD, "output.txt", MPI_MODE_CREATE|MPI_MODE_WRONLY, MPI_INFO_NULL, &fh);
for (int i=0; i<local; ++i)
{
size_t idx = i * size + rank;
std::string buffer = std::string(strsize, 'a' + idx);
size_t offset = buffer.size() * idx;
MPI_File_write_at(fh, offset, buffer.c_str(), buffer.size(), MPI_CHAR, MPI_STATUS_IGNORE);
}
MPI_File_close(&fh);
MPI_Finalize();
return 0;
}
However for more complexe write, particularly when writting multi dimensional data like raw images, one may want to create a view at the file with MPI_Type_create_subarray. However, when using this methods with simple MPI_File_write (which is suppose to be non collective) I run in deadlocks. Exemple :
#include <cstdio>
#include <cstdlib>
#include <string>
#include <mpi.h>
int main(int argc, char* argv[])
{
int rank, size;
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &size);
int global = 7; // prime helps have unbalanced procs
int local = (global/size) + (global%size>rank?1:0);
int strsize = 5;
MPI_File fh;
MPI_File_open(MPI_COMM_WORLD, "output.txt", MPI_MODE_CREATE|MPI_MODE_WRONLY, MPI_INFO_NULL, &fh);
for (int i=0; i<local; ++i)
{
size_t idx = i * size + rank;
std::string buffer = std::string(strsize, 'a' + idx);
int dim = 2;
int gsizes[2] = { buffer.size(), global };
int lsizes[2] = { buffer.size(), 1 };
int offset[2] = { 0, idx };
MPI_Datatype filetype;
MPI_Type_create_subarray(dim, gsizes, lsizes, offset, MPI_ORDER_C, MPI_CHAR, &filetype);
MPI_Type_commit(&filetype);
MPI_File_set_view(fh, 0, MPI_CHAR, filetype, "native", MPI_INFO_NULL);
MPI_File_write(fh, buffer.c_str(), buffer.size(), MPI_CHAR, MPI_STATUS_IGNORE);
}
MPI_File_close(&fh);
MPI_Finalize();
return 0;
}
How to avoid such a code to lock ? Keep in mind that by real code will really use the multidimensional capabilities of MPI_Type_create_subarray and cannot just use MPI_File_write_at
Also, it is difficult for me to know the maximum number of block in a process, so I'd like to avoid doing a reduce_all and then loop on the max number of block with empty writes when localnb <= id < maxnb
You don't use MPI_REDUCE when you have a variable number of blocks per node. You use MPI_SCAN or MPI_EXSCAN: MPI IO Writing a file when offset is not known
MPI_File_set_view is collective, so if 'local' is different on each processor, you'll find yourself calling a collective routine from less than all processors in the communicator. If you really really need to do so, open the file with MPI_COMM_SELF.
the MPI_SCAN approach means each process can set the file view as needed, and then blammo you can call the collective MPI_File_write_at_all (even if some processes have zero work -- they still need to participate) and take advantage of whatever clever optimizations your MPI-IO implementation provides.

Pass multiple args to thread using struct (pthread)

I'm learning to programming using pthread for a adder program, after reference several codes still don't get how to pass multiple arguments into a thread using a struct, here is my buggy program:
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#include <pthread.h>
typedef struct s_addition {
int num1;
int num2;
int sum;
} addition;
void *thread_add_function (void *ad)
{
printf ("ad.num1:%d, ad.num2:%d\n",ad.num1, ad.num2);
ad.sum = ad.num1 + ad.num2;
pthread_exit(0);
}
int main()
{
int N = 5;
int a[N], b[N], c[N];
srand (time(NULL));
// fill them with random numbers
for ( int j = 0; j < N; j++ ) {
a[j] = rand() % 392;
b[j] = rand() % 321;
}
addition ad1;
pthread_t thread[N];
for (int i = 0; i < N; i++) {
ad1.num1 = a[i];
ad1.num2 = b[i];
printf ("ad1.num1:%d, ad1.num2:%d\n",ad1.num1, ad1.num2);
pthread_create (&thread[i], NULL, thread_add_function, &ad1);
pthread_join(thread[i], NULL);
c[i] = ad.sum;
}
printf( "This is the result of using pthread.\n");
for ( int i = 0; i < N; i++) {
printf( "%d + %d = %d\n", a[i], b[i], c[i]);
}
}
But when compiling I got the following error:
vecadd_parallel.c:15:39: error: member reference base type 'void *' is not a
structure or union
printf ("ad.num1:%d, ad.num2:%d\n",ad.num1, ad.num2);
I tried but still cannot get a clue, what I am doing wrong with it?
Seems like you have a problem with trying to access the members of a void datatype.
You will need to add a line to cast your parameter to thread_add_function to the correct datatype similar to addition* add = (addition*)ad;, and then use this variable in your function (note that you also have to change you r .'s to -> because it's a pointer)
You also should only pass data to threads that was malloc()'d, as stack allocated data may not be permanent. It should be fine for the current implementation, but changes later could easily give strange, unpredictable behaviour.

Resources