How to use PTHREAD_SETAFFINITY_NP correctly? - multithreading

I am working on a program which have computationa based on a lot of data .So I created two threads.Their work is similar,but their data are different. I do this using the code below:
status1 = pthread_create(&pthread1,NULL,func_1,(void*)t1);
if(status1 != 0)
printf("pthread1 is not created sucessfully!\n");
status = pthread_create(&pthread2,NULL,func_2,(void*)t2);
if(status != 0)
printf("pthread2 is not created successfully!\n");
func_1 and func_2 are similar.
Then I want to dispatch them to different processors.So I use PTHREAD_SETAFFINITY_NP,but it failed.
My codes are below.
void func_1()
{
cpu_set_t mask1;
CPU_ZERO(&mask1);
CPU_SET(1,&mask1);
pthread_setaffinity_np(pthread_self(),sizeof(mask1),&mask1);
for (j = 0; j < cpu_num; j++) {
if (CPU_ISSET(j, &mask1))
{
//printf("thread %lld is running in processor %d\n", pthread_self(), j);
cout<<"thread "<<pthread_self()<<" is running in processor "<< j <<endl;
}
}
void func_2()
{
cpu_set_t mask2;
CPU_ZERO(&mask2);
CPU_SET(0,&mask2);
pthread_setaffinity_np(pthread_self(),sizeof(mask2),&mask2);
for (j = 0; j < cpu_num; j++) {
if (CPU_ISSET(j, &mask2))
{
//printf("thread %lld is running in processor %d\n", pthread_self(), j);
cout<<"thread "<<pthread_self()<<" is running in processor "<< j <<endl;
}
}
int main()
{
status1 = pthread_create(&pthread1,NULL,func_1,(void*)t1);
if(status1 != 0)
printf("pthread1 is not created sucessfully!\n");
status = pthread_create(&pthread2,NULL,func_2,(void*)t2);
if(status != 0)
printf("pthread2 is not created successfully!\n");
pthread_join(pthread1,NULL);
pthread_join(pthread2,NULL);
......
}
But to my surprised, result comes out like this:
thread 7369190208 is running in processor 0
thread 3065830208 is running in processor 0
How can I do to dispatch each thread to a specific processor?
EDIT:
I want to dispatch threads to different processors because if they two are running on one processor,it is like serial.How to dispatch?as shown below:
serial:
for(i = 0 ; i < N ; i ++)
do something;
I use two threads,then it becomes like:
pthread1:
for(i = 0 ; i < N ; i += 2)
do something;
pthread2:
for(i = 1 ; i < N ; i += 2)
do something;
and the speedup can be nearly 2

Related

operating issues qustions - threads, processes etc. for the above code:

int S1 = 0;
int S2 = 0;
int x = 0;
int run = 1;
void Producer(void) {
while(run) {
while (S1 == S2);
x++;
__sync_synchronize();
S1 = S2;
__sync_synchronize();
}
}
void Consumer(void) {
while(run) {
while (S1 != S2);
x--;
__sync_synchronize();
S1 = !S2;
__sync_synchronize();
}
}
void* Worker(void *func) {
long func_id = (long)func & 0x1;
printf("%s %d\n",__func__, (int)func_id);
switch (func_id) {
case 0:
Producer();
break;
case 1:
Consumer();
break;
}
return NULL;
}
int main(int argc, char *argv[]) {
pthread_t t[argc];
pthread_attr_t at;
cpu_set_t cpuset;
int threads;
int i;
#define MAX_PROCESSORS 4 // Minimal processors is 2.
threads = argc > 1 ? (( atoi(argv[1]) < 4) ? atoi(argv[1]): MAX_PROCESSORS ) : 1;
for (i = 0;i < threads; i++){
CPU_ZERO(&cpuset);
CPU_SET(i, &cpuset);
pthread_attr_init(&at);
(&at, sizeof(cpuset), &cpuset);
if (pthread_create(&t[i], &at , Worker, (void *) (long)i) ) {
perror("pthread create 1 error\n"); }
}
do {
sleep(1);
} while(x < 0);
run = 0;
void *val;
for(i = 0; i < threads; i++)
pthread_join(t[i], &val);
printf("x=%d\n", x);
}
The questions:
In ex1.c (6.1), which of the following properties achieved:
(1) Mutual exclusion but not progress
(2) Progress but not mutual exclusion
(3) Neither mutual exclusion nor progress
(4) Both mutual exclusion and progress
Please explain?
1.2
To which arguments (in 6.1) is correct and which does not:
(1) always exits. when threads = 2 or threads <= 0
(2) always hangs. threads = 1 or thread > 2
Any help would be much appreeciated

Looping a fork()

This is a sample of my code for some piping I want to do. The problem is that pid2[0] does not supply me a child process. How do I fix it? pid2[1] and pid2[2] and so on do supply the parent with children.
int numCommands = numPipes + 1; /// not worrying about '>' and '<' right now
int *pipes = newint[numPipes*2]; /// two ends for each pipe
for(int i = 0; i < numPipes*2; i+=2) /// Offset by two since each pipe has two ends
pipe(pipes + i);
int *pid2 = new int [numCommands];
for(int i = 0; i < numCommands; i++)
{
pid2[i] = fork();
if(pid2[i] < 0)
{
std::cerr << "Failure to fork..." << std:endl;
return EXIT_FAILURE
}
if(pid2[i] == 0) /// Child process
{
if(i == 0) /// First Command
{
dup2(pipes[1], 1);
}
else if(i == numCommands -1) /// Last Command
{
dup2(pipes[2*(numCommands-1)-1], 0);
}
else /// Middle commands
{
dup2(pipes[2*(i-1)], 0);
dup2(pipes[(2*i)+1],1);
}
for(int j = 0; j < numPipes*2;j++)
close(pipes[j]);
execvp(pipeCommands[i][0], pipeCommands[i].data()); ///pipeCommands is a vector<vector<char*>>
perror("exec failed");
return EXIT_SUCCESS;
}
else /// The parent
{
for(int j = 0; j <numPipes*2;j++)
close(pipes[j]);
for(int k = 0; k < numCommands; k++)
waitpid(pid2[k],nullptr,0);
}
}
Your indexing into the pipes array is a little bit problematic. For sort file.txt | head | wc I'm assuming that numPipes is 2. Let's walk through your for loop for each value of i.
i==0
dup2(pipes[1], 1); // Send stdout to pipes[1]
i==1
dup2(pipes[2*(i-1)], 0); // dup2(pipes[0], 0), stdin from pipes[0]
dup2(pipes[(2*i)+1],1); // dup2(pipes[3], 1), stdout to pipes[3]
i == 2
dup2(pipes[2*(numCommands-1)-1], 0); //dup2(pipes[3], stdin from pipes[3]
In other words, any stdout from process 0 is going to a dead end. The second process will never read from its standard output. So the debug statement you put in there (cout == stdout, remember) will get lost as well.

Select in loop - work all the time - linux

I got next question about select:
How to make select in loop ?
I try to do like that:
struct timeval timeout;
int sel;
size_t rozmiar = sizeof(pid_t);
char buf[rozmiar];
int i;
FD_ZERO(&set);
for(i = 0; i< val; i++)
{ FD_SET(fd[i][0], &set); // val -> N pipe2
}
timeout.tv_sec = 2;
timeout.tv_usec = 0;
while(1)
{
sel = select(val+1,&set,NULL,NULL,&timeout);
if(sel < 0)
perror("select");
else if(sel == 0)
printf("No communicate \n");
else{
for(i = 0; i < val; i++)
{
if(FD_ISSET(fd[i][0],&set))
{
while(read(fd[i][0],&buf,rozmiar) > 0)
write(1,&buf,rozmiar);
} // check if exist and write to stdout
}
} // end SELECT
timeout.tv_sec = 2;
timeout.tv_usec = 0;
}
But there all the time show: ,, no communicate". Is it the correct way to create select which work all the time? I am not sure so I prefer to ask. I try to find information in books but with no lucky.
The set is changed by select, you need to refill it each time

I have some trouble with multi threading

I am struggling with this code. It's about counting some patterns from a text file. I tried to use thread(divide and conquer) processing, but it return a wrong value. I used mutex value to synchronize critical section.. The code is below. First main argument is number of threads, second is the name of a text file I want counting patterns from, and following patterns I want to look up on the text.
please save me..
Code is below
char *buffer;
int fsize, count;
char **searchword;
int *wordcount;
int *strlength;
typedef struct _params{
int num1;
int num2;
}params;
pthread_mutex_t mutex;
void *childfunc(void *arg)
{
int size, i, j, k, t, start, end, len, flag = -1;
int result;
params *a = (params *)arg;
start = a->num1;
end = a->num2;
while(1){
if(start == 0 || start == fsize)
break;
if(buffer[start]!= ' ' && buffer[start] != '\n' && buffer[start] != '\t')
start++;
else
break;
}
while(1){
if(end == fsize)
break;
if(buffer[end] != ' ' && buffer[end] != '\n' && buffer[end] != '\t')
end++;
else
break;
}
for(i = 0; i < count; i++){
len = strlength[i];
for(j = start; j<(end - len + 1); j++){
if(buffer[j] == searchword[i][0]){
flag = 0;
for(k = j +1; k<j + len; k++){
if(buffer[k] != searchword[i][k-j])
{
flag = 1;
break;
}
}
if(flag == 0){
pthread_mutex_lock(&mutex);
wordcount[i]++;
pthread_mutex_unlock(&mutex);// mutex unlocking
sleep(1);
flag = -1;
}
}
}
}
}
int main(int argc, char **argv){
FILE *fp;
char *inputFile;
pthread_t *tid;
int *status;
int inputNumber, i, j, diff, searchstart, searchend;
int result = 0;
count = argc -3;
inputNumber = atoi(argv[1]);
inputFile = argv[2];
searchword = (char **)malloc(sizeof(char *)*count);
tid = malloc(sizeof(pthread_t)*inputNumber);
strlength = (int *)malloc(4*count);
status = (int *)malloc(4*inputNumber);
wordcount = (int *)malloc(4*count);
for(i = 0; i < count; i++)
searchword[i] = (char*)malloc(sizeof(char)*(strlen(argv[i+3]) + 1));
for(i = 3; i < argc; i++)
strcpy(searchword[i-3], argv[i]);
fp = fopen(inputFile, "r");
fseek(fp, 0, SEEK_END);
fsize = ftell(fp);
rewind(fp);
buffer = (char *)malloc(1*fsize);
fread(buffer, fsize, 1, fp);
diff = fsize / inputNumber;
if(diff == 0)
diff = 1;
for(i = 0; i < count ; i++){
strlength[i] = strlen(searchword[i]);
wordcount[i] = 0;
}
for(i = 0; i < inputNumber; i++){
searchstart = 0 + i*diff;
searchend = searchstart + diff;
if(searchstart > fsize)
searchstart = fsize;
if(searchend > fsize)
searchend = fsize;
if( i == inputNumber -1)
searchend = fsize;
params a;
a.num1 = searchstart;
a.num2 = searchend;
pthread_mutex_init(&mutex, NULL);
result = pthread_create(&tid[i], NULL, childfunc, (void *)&a);
if(result < 0){
perror("pthread_create()");
}
}
//스레드 받는 부분
for(i = 0; i < inputNumber; i++){
result = pthread_join(tid[i], (void **)status);
if(result < 0)
perror("pthread_join()");
}
pthread_mutex_destroy(&mutex); // mutex 해제
for(i = 0; i < count; i++)
printf("%s : %d \n", searchword[i], wordcount[i]);
for(i = 0; i < count; i++) //동적메모리해제
free(searchword[i]);
free(searchword);
free(buffer);
free(tid);
free(strlength);
free(wordcount);
free(status);
fclose(fp);
return 0;
}
params a; // new a for each loop, previous a no longer exists
a.num1 = searchstart;
a.num2 = searchend;
pthread_mutex_init(&mutex, NULL);
result = pthread_create(&tid[i], NULL, childfunc, (void *)&a);
if(result < 0){
perror("pthread_create()");
}
} // a goes out of scope here
You pass each thread the address of a, but then a goes out of scope immediately after you create the thread. So the thread now has the address of some random leftover junk on the stack.
You need to have some conception of ownership of any object that's accessed by more than one thread like a is here. It can be owned by the thread that called pthread_create, owned by the newly-created thread, or jointly owned. But you have to be consistent. You have neither of these, why?
Is a owned only be the thread that called pthread_create? No, because the newly-created thread has a pointer to it and accesses it through that pointer. So the thread that called pthread_create cannot destroy it.
Is a owned only be the thread created by pthread_create? No, because it's on the stack of the thread that called pthread_create and will cease to exist when the next loop comes around.
Is a jointly owned? Well, no, because the thread that called pthread_create can destroy the object before the newly-created thread accesses it.
So no sane model of multi-thread use is followed by a. It's broken.
One common solution to this problem is to allocate a new structure (using malloc or new) for each thread, fill it in, and pass the thread the address of the structure. Let the thread free (or delete) the structure when it's done with it.

How to parallelize Sudoku solver using Grand Central Dispatch?

As a programming exercise, I just finished writing a Sudoku solver that uses the backtracking algorithm (see Wikipedia for a simple example written in C).
To take this a step further, I would like to use Snow Leopard's GCD to parallelize this so that it runs on all of my machine's cores. Can someone give me pointers on how I should go about doing this and what code changes I should make? Thanks!
Matt
Please let me know if you end up using it. It is run of the mill ANSI C, so should run on everything. See other post for usage.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
short sudoku[9][9];
unsigned long long cubeSolutions=0;
void* cubeValues[10];
const unsigned char oneLookup[64] = {0x8b, 0x80, 0, 0x80, 0, 0, 0, 0x80, 0, 0,0,0,0,0,0, 0x80, 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0x80,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0};
int ifOne(int val) {
if ( oneLookup[(val-1) >> 3] & (1 << ((val-1) & 0x7)) )
return val;
return 0;
}
void init_sudoku() {
int i,j;
for (i=0; i<9; i++)
for (j=0; j<9; j++)
sudoku[i][j]=0x1ff;
}
void set_sudoku( char* initialValues) {
int i;
if ( strlen (initialValues) != 81 ) {
printf("Error: inputString should have length=81, length is %2.2d\n", strlen(initialValues) );
exit (-12);
}
for (i=0; i < 81; i++)
if ((initialValues[i] > 0x30) && (initialValues[i] <= 0x3a))
sudoku[i/9][i%9] = 1 << (initialValues[i] - 0x31) ;
}
void print_sudoku ( int style ) {
int i, j, k;
for (i=0; i < 9; i++) {
for (j=0; j < 9; j++) {
if ( ifOne(sudoku[i][j]) || !style) {
for (k=0; k < 9; k++)
if (sudoku[i][j] & 1<<k)
printf("%d", k+1);
} else
printf("*");
if ( !((j+1)%3) )
printf("\t");
else
printf(",");
}
printf("\n");
if (!((i+1) % 3) )
printf("\n");
}
}
void print_HTML_sudoku () {
int i, j, k, l, m;
printf("<TABLE>\n");
for (i=0; i<3; i++) {
printf(" <TR>\n");
for (j=0; j<3; j++) {
printf(" <TD><TABLE>\n");
for (l=0; l<3; l++) { printf(" <TR>"); for (m=0; m<3; m++) { printf("<TD>"); for (k=0; k < 9; k++) { if (sudoku[i*3+l][j*3+m] & 1<<k)
printf("%d", k+1);
}
printf("</TD>");
}
printf("</TR>\n");
}
printf(" </TABLE></TD>\n");
}
printf(" </TR>\n");
}
printf("</TABLE>");
}
int doRow () {
int count=0, new_value, row_value, i, j;
for (i=0; i<9; i++) {
row_value=0x1ff;
for (j=0; j<9; j++)
row_value&=~ifOne(sudoku[i][j]);
for (j=0; j<9; j++) {
new_value=sudoku[i][j] & row_value;
if (new_value && (new_value != sudoku[i][j]) ) {
count++;
sudoku[i][j] = new_value;
}
}
}
return count;
}
int doCol () {
int count=0, new_value, col_value, i, j;
for (i=0; i<9; i++) {
col_value=0x1ff;
for (j=0; j<9; j++)
col_value&=~ifOne(sudoku[j][i]);
for (j=0; j<9; j++) {
new_value=sudoku[j][i] & col_value;
if (new_value && (new_value != sudoku[j][i]) ) {
count++;
sudoku[j][i] = new_value;
}
}
}
return count;
}
int doCube () {
int count=0, new_value, cube_value, i, j, l, m;
for (i=0; i<3; i++)
for (j=0; j<3; j++) {
cube_value=0x1ff;
for (l=0; l<3; l++)
for (m=0; m<3; m++)
cube_value&=~ifOne(sudoku[i*3+l][j*3+m]);
for (l=0; l<3; l++)
for (m=0; m<3; m++) {
new_value=sudoku[i*3+l][j*3+m] & cube_value;
if (new_value && (new_value != sudoku[i*3+l][j*3+m]) ) {
count++;
sudoku[i*3+l][j*3+m] = new_value;
}
}
}
return count;
}
#define FALSE -1
#define TRUE 1
#define INCOMPLETE 0
int validCube () {
int i, j, l, m, r, c;
int pigeon;
int solved=TRUE;
//check horizontal
for (i=0; i<9; i++) {
pigeon=0;
for (j=0; j<9; j++)
if (ifOne(sudoku[i][j])) {
if (pigeon & sudoku[i][j]) return FALSE;
pigeon |= sudoku[i][j];
} else {
solved=INCOMPLETE;
}
}
//check vertical
for (i=0; i<9; i++) {
pigeon=0;
for (j=0; j<9; j++)
if (ifOne(sudoku[j][i])) {
if (pigeon & sudoku[j][i]) return FALSE;
pigeon |= sudoku[j][i];
}
else {
solved=INCOMPLETE;
}
}
//check cube
for (i=0; i<3; i++)
for (j=0; j<3; j++) {
pigeon=0;
r=j*3; c=i*3;
for (l=0; l<3; l++)
for (m=0; m<3; m++)
if (ifOne(sudoku[r+l][c+m])) {
if (pigeon & sudoku[r+l][c+m]) return FALSE;
pigeon |= sudoku[r+l][c+m];
}
else {
solved=INCOMPLETE;
}
}
return solved;
}
int solveSudoku(int position ) {
int status, i, k;
short oldCube[9][9];
for (i=position; i < 81; i++) {
while ( doCube() + doRow() + doCol() );
status = validCube() ;
if ((status == TRUE) || (status == FALSE))
return status;
if ((status == INCOMPLETE) && !ifOne(sudoku[i/9][i%9]) ) {
memcpy( &oldCube, &sudoku, sizeof(short) * 81) ;
for (k=0; k < 9; k++) {
if ( sudoku[i/9][i%9] & (1<<k) ) {
sudoku[i/9][i%9] = 1 << k ;
if (solveSudoku(i+1) == TRUE ) {
/* return TRUE; */
/* Or look for entire set of solutions */
if (cubeSolutions < 10) {
cubeValues[cubeSolutions] = malloc ( sizeof(short) * 81 ) ;
memcpy( cubeValues[cubeSolutions], &sudoku, sizeof(short) * 81) ;
}
cubeSolutions++;
if ((cubeSolutions & 0x3ffff) == 0x3ffff ) {
printf ("cubeSolutions = %llx\n", cubeSolutions+1 );
}
//if ( cubeSolutions > 10 )
// return TRUE;
}
memcpy( &sudoku, &oldCube, sizeof(short) * 81) ;
}
if (k==8)
return FALSE;
}
}
}
return FALSE;
}
int main ( int argc, char** argv) {
int i;
if (argc != 2) {
printf("Error: number of arguments on command line is incorrect\n");
exit (-12);
}
init_sudoku();
set_sudoku(argv[1]);
printf("[----------------------- Input Data ------------------------]\n\n");
print_sudoku(1);
solveSudoku(0);
if ((validCube()==1) && !cubeSolutions) {
// If sudoku is effectively already solved, cubeSolutions will not be set
printf ("\n This is a trivial sudoku. \n\n");
print_sudoku(1);
}
if (!cubeSolutions && validCube()!=1)
printf("Not Solvable\n");
if (cubeSolutions > 1) {
if (cubeSolutions >= 10)
printf("10+ Solutions, returning first 10 (%lld) [%llx] \n", cubeSolutions, cubeSolutions);
else
printf("%llx Solutions. \n", cubeSolutions);
}
for (i=0; (i < cubeSolutions) && (i < 10); i++) {
memcpy ( &sudoku, cubeValues[i], sizeof(short) * 81 );
printf("[----------------------- Solution %2.2d ------------------------]\n\n", i+1);
print_sudoku(0);
//print_HTML_sudoku();
}
return 0;
}
For one, since backtracking is a depth-first search it is not directly parallelizable, since any newly computed result cannot be used be directly used by another thread. Instead, you must divide the problem early, i.e. thread #1 starts with the first combination for a node in the backtracking graph, and proceeds to search the rest of that subgraph. Thread #2 starts with the second possible combination at the first and so forth. In short, for n threads find the n possible combinations on the top level of the search space (do not "forward-track"), then assign these n starting points to n threads.
However I think the idea is fundamentally flawed: Many sudoku permutations are solved in a matter of a couple thousands of forward+backtracking steps, and are solved within milliseconds on a single thread. This is in fact so fast that even the small coordination required for a few threads (assume that n threads reduce computation time to 1/n of original time) on a multi-core/multi-CPU is not negligible compared to the total running time, thus it is not by any chance a more efficient solution.
Are you sure you want to do that? Like, what problem are you trying to solve? If you want to use all cores, use threads. If you want a fast sudoku solver, I can give you one I wrote, see output below. If you want to make work for yourself, go ahead and use GCD ;).
Update:
I don't think GCD is bad, it just isn't terribly relevant to the task of solving sudoku. GCD is a technology to tie GUI events to code. Essentially, GCD solves two problems, a Quirk in how the MacOS X updates windows, and, it provides an improved method (as compared to threads) of tying code to GUI events.
It doesn't apply to this problem because Sudoku can be solved significantly faster than a person can think (in my humble opinion). That being said, if your goal was to solve Sudoku faster, you would want to use threads, because you would want to directly use more than one processor.
[bear#bear scripts]$ time ./a.out ..1..4.......6.3.5...9.....8.....7.3.......285...7.6..3...8...6..92......4...1...
[----------------------- Input Data ------------------------]
*,*,1 *,*,4 *,*,*
*,*,* *,6,* 3,*,5
*,*,* 9,*,* *,*,*
8,*,* *,*,* 7,*,3
*,*,* *,*,* *,2,8
5,*,* *,7,* 6,*,*
3,*,* *,8,* *,*,6
*,*,9 2,*,* *,*,*
*,4,* *,*,1 *,*,*
[----------------------- Solution 01 ------------------------]
7,6,1 3,5,4 2,8,9
2,9,8 1,6,7 3,4,5
4,5,3 9,2,8 1,6,7
8,1,2 6,4,9 7,5,3
9,7,6 5,1,3 4,2,8
5,3,4 8,7,2 6,9,1
3,2,7 4,8,5 9,1,6
1,8,9 2,3,6 5,7,4
6,4,5 7,9,1 8,3,2
real 0m0.044s
user 0m0.041s
sys 0m0.001s

Resources