I have a process with multiple threads. I have registered prepare function and parent handler using __register_atfork(blocksigprof,restoresigprof,NULL,NULL);
function.
Now let us assume that two threads call fork at the same time. And I have a counter increment in blocksigprof and counter decrement in restoresigprof.
Considering above scenario, will the blocksigprof and restoresigprof be called in pair always?
Is there any locking mechanism which inherently done in __register_atfork.
#define NUM_THREADS 8
static int go=0;
static int exec = 1;
static int ev_od = 0;
static void *
test_thread (void *arg) {
int j;
pid_t c, d;
while(!go) // Wait, so that all threads are here.
continue;
// All will fork, hopefully at same time because of go signal wait.
while(exec) {
c = fork();
if (c < 0) {
printf("SANJAY: fork() failed.\n");
exit(1);
} else if (c == 0) { // Child
exit(0);
}
else { // parent
d = waitpid(c, NULL, 0);
}
}
return NULL;
}
extern int __register_atfork(void (*)(void),void (*)(void),void (*)(void),void *);
static sigset_t s_new;
static sigset_t s_old;
static int count = 0;
static void blocksigprof(void){
count++;
#ifdef SYS_gettid
pid_t tid = syscall(SYS_gettid);
if (tid % 2) {
printf("sleep.\n");
usleep(1);
}
#else
#error "SYS_gettid unavailable on this system"
#endif
printf("Pre-fork. Count should be one. %d\n", count);
}
static void restoresigprof(void){
printf("Post-fork. Count should be one. %d\n", count);
count--;
}
int
main () {
pthread_t t[NUM_THREADS];
void *ptr;
long size = 500 * 1024 * 1024;
int i, m;
volatile int result = 0;
int g_iters = 100;
(void) __register_atfork(blocksigprof,restoresigprof,NULL,NULL);
// Increase size, so fork takes time.
printf("SANJAY: Increasing process size.\n");
ptr = malloc(size);
memset(ptr, 0, size);
ptr = malloc(size);
memset(ptr, 0, size);
ptr = malloc(size);
memset(ptr, 0, size);
ptr = malloc(size);
memset(ptr, 0, size);
ptr = malloc(size);
memset(ptr, 0, size);
ptr = malloc(size);
memset(ptr, 0, size);
ptr = malloc(size);
memset(ptr, 0, size);
ptr = malloc(size);
memset(ptr, 0, size);
ptr = malloc(size);
memset(ptr, 0, size);
// Create threads.
for (i = 0; i < NUM_THREADS; ++i) {
pthread_create(&t[i], NULL, test_thread, NULL);
}
printf("SANJAY: Killing time.\n");
// Kill time, so that all threads are at same place post it, waiting for go. 100M cycles.
for (m = 0; m < 1000000; ++m)
for (i = 0; i < g_iters; ++i )
result ^= i;
// Give all threads go at same time.
printf("SANJAY: Let threads execute.\n");
go = 1;
usleep(10000000); // Wait for 10 sec.
exec = 0;
// Wait for all threads to finish.
for (i = 0; i < NUM_THREADS; ++i) {
pthread_join(t[i], NULL);
}
printf("SANJAY: Done.\n");
return 0;
}
pthread_atfork specification doesn't require its implementation to serialize calls to prepare and parent handlers, so a safe assumption is that there is no syncronization.
glibc implementation does lock an internal mutex that prevents multiple threads from entering the handlers in parallel. However, that is an implementation detail. The comments in the code say that such an implementation is not POSIX-compliant because POSIX requires pthread_atfork to be async-signal-safe, and using a mutex there makes it not async-signal-safe.
To make your code robust, I recommend using atomics or a mutex to protect your shared state from race condition.
I am struggling with this code. It's about counting some patterns from a text file. I tried to use thread(divide and conquer) processing, but it return a wrong value. I used mutex value to synchronize critical section.. The code is below. First main argument is number of threads, second is the name of a text file I want counting patterns from, and following patterns I want to look up on the text.
please save me..
Code is below
char *buffer;
int fsize, count;
char **searchword;
int *wordcount;
int *strlength;
typedef struct _params{
int num1;
int num2;
}params;
pthread_mutex_t mutex;
void *childfunc(void *arg)
{
int size, i, j, k, t, start, end, len, flag = -1;
int result;
params *a = (params *)arg;
start = a->num1;
end = a->num2;
while(1){
if(start == 0 || start == fsize)
break;
if(buffer[start]!= ' ' && buffer[start] != '\n' && buffer[start] != '\t')
start++;
else
break;
}
while(1){
if(end == fsize)
break;
if(buffer[end] != ' ' && buffer[end] != '\n' && buffer[end] != '\t')
end++;
else
break;
}
for(i = 0; i < count; i++){
len = strlength[i];
for(j = start; j<(end - len + 1); j++){
if(buffer[j] == searchword[i][0]){
flag = 0;
for(k = j +1; k<j + len; k++){
if(buffer[k] != searchword[i][k-j])
{
flag = 1;
break;
}
}
if(flag == 0){
pthread_mutex_lock(&mutex);
wordcount[i]++;
pthread_mutex_unlock(&mutex);// mutex unlocking
sleep(1);
flag = -1;
}
}
}
}
}
int main(int argc, char **argv){
FILE *fp;
char *inputFile;
pthread_t *tid;
int *status;
int inputNumber, i, j, diff, searchstart, searchend;
int result = 0;
count = argc -3;
inputNumber = atoi(argv[1]);
inputFile = argv[2];
searchword = (char **)malloc(sizeof(char *)*count);
tid = malloc(sizeof(pthread_t)*inputNumber);
strlength = (int *)malloc(4*count);
status = (int *)malloc(4*inputNumber);
wordcount = (int *)malloc(4*count);
for(i = 0; i < count; i++)
searchword[i] = (char*)malloc(sizeof(char)*(strlen(argv[i+3]) + 1));
for(i = 3; i < argc; i++)
strcpy(searchword[i-3], argv[i]);
fp = fopen(inputFile, "r");
fseek(fp, 0, SEEK_END);
fsize = ftell(fp);
rewind(fp);
buffer = (char *)malloc(1*fsize);
fread(buffer, fsize, 1, fp);
diff = fsize / inputNumber;
if(diff == 0)
diff = 1;
for(i = 0; i < count ; i++){
strlength[i] = strlen(searchword[i]);
wordcount[i] = 0;
}
for(i = 0; i < inputNumber; i++){
searchstart = 0 + i*diff;
searchend = searchstart + diff;
if(searchstart > fsize)
searchstart = fsize;
if(searchend > fsize)
searchend = fsize;
if( i == inputNumber -1)
searchend = fsize;
params a;
a.num1 = searchstart;
a.num2 = searchend;
pthread_mutex_init(&mutex, NULL);
result = pthread_create(&tid[i], NULL, childfunc, (void *)&a);
if(result < 0){
perror("pthread_create()");
}
}
//스레드 받는 부분
for(i = 0; i < inputNumber; i++){
result = pthread_join(tid[i], (void **)status);
if(result < 0)
perror("pthread_join()");
}
pthread_mutex_destroy(&mutex); // mutex 해제
for(i = 0; i < count; i++)
printf("%s : %d \n", searchword[i], wordcount[i]);
for(i = 0; i < count; i++) //동적메모리해제
free(searchword[i]);
free(searchword);
free(buffer);
free(tid);
free(strlength);
free(wordcount);
free(status);
fclose(fp);
return 0;
}
params a; // new a for each loop, previous a no longer exists
a.num1 = searchstart;
a.num2 = searchend;
pthread_mutex_init(&mutex, NULL);
result = pthread_create(&tid[i], NULL, childfunc, (void *)&a);
if(result < 0){
perror("pthread_create()");
}
} // a goes out of scope here
You pass each thread the address of a, but then a goes out of scope immediately after you create the thread. So the thread now has the address of some random leftover junk on the stack.
You need to have some conception of ownership of any object that's accessed by more than one thread like a is here. It can be owned by the thread that called pthread_create, owned by the newly-created thread, or jointly owned. But you have to be consistent. You have neither of these, why?
Is a owned only be the thread that called pthread_create? No, because the newly-created thread has a pointer to it and accesses it through that pointer. So the thread that called pthread_create cannot destroy it.
Is a owned only be the thread created by pthread_create? No, because it's on the stack of the thread that called pthread_create and will cease to exist when the next loop comes around.
Is a jointly owned? Well, no, because the thread that called pthread_create can destroy the object before the newly-created thread accesses it.
So no sane model of multi-thread use is followed by a. It's broken.
One common solution to this problem is to allocate a new structure (using malloc or new) for each thread, fill it in, and pass the thread the address of the structure. Let the thread free (or delete) the structure when it's done with it.
I'm getting a weird error. I implemented these two functions:
int flag_and_sleep(volatile unsigned int *flag)
{
int res = 0;
(*flag) = 1;
res = syscall(__NR_futex, flag, FUTEX_WAIT, 1, NULL, NULL, 0);
if(0 == res && (0 != (*flag)))
die("0 == res && (0 != (*flag))");
return 0;
}
int wake_up_if_any(volatile unsigned int *flag)
{
if(1 == (*flag))
{
(*flag) = 0;
return syscall(__NR_futex, flag, FUTEX_WAKE, 1, NULL, NULL, 0);
}
return 0;
}
and test them by running two Posix threads:
static void die(const char *msg)
{
fprintf(stderr, "%s %u %lu %lu\n", msg, thread1_waits, thread1_count, thread2_count);
_exit( 1 );
}
volatile unsigned int thread1_waits = 0;
void* threadf1(void *p)
{
int res = 0;
while( 1 )
{
res = flag_and_sleep( &thread1_waits );
thread1_count++;
}
return NULL;
}
void* threadf2(void *p)
{
int res = 0;
while( 1 )
{
res = wake_up_if_any( &thread1_waits );
thread2_count++;
}
return NULL;
}
After thread2 has had a million or so iterations, I get the assert fire on me:
./a.out
0 == res && (0 != (*flag)) 1 261129 1094433
This means that the syscall - and thereby do_futex() - returned 0. Man says it should only do so if woken up by a do_futex(WAKE) call. But then before I do a WAKE call, I set the flag to 0. Here it appears that flag is still 1.
This is Intel, which means strong memory model. So if in thread1 I see results from a syscall in thread2, I must also see the results of the write in thread 2 which was before the call.
Flag and all pointers to it are volatile, so I don't see how gcc could fail to read the correct value.
I'm baffled.
Thanks!
the race happens when thread 1 goes the full cycle and re-enters WAIT call when thread 2 goes from
(*flag) = 0;
to
return syscall(__NR_futex, flag, FUTEX_WAKE, 1, NULL, NULL, 0);
So the test is faulty.
As a programming exercise, I just finished writing a Sudoku solver that uses the backtracking algorithm (see Wikipedia for a simple example written in C).
To take this a step further, I would like to use Snow Leopard's GCD to parallelize this so that it runs on all of my machine's cores. Can someone give me pointers on how I should go about doing this and what code changes I should make? Thanks!
Matt
Please let me know if you end up using it. It is run of the mill ANSI C, so should run on everything. See other post for usage.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
short sudoku[9][9];
unsigned long long cubeSolutions=0;
void* cubeValues[10];
const unsigned char oneLookup[64] = {0x8b, 0x80, 0, 0x80, 0, 0, 0, 0x80, 0, 0,0,0,0,0,0, 0x80, 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0x80,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0};
int ifOne(int val) {
if ( oneLookup[(val-1) >> 3] & (1 << ((val-1) & 0x7)) )
return val;
return 0;
}
void init_sudoku() {
int i,j;
for (i=0; i<9; i++)
for (j=0; j<9; j++)
sudoku[i][j]=0x1ff;
}
void set_sudoku( char* initialValues) {
int i;
if ( strlen (initialValues) != 81 ) {
printf("Error: inputString should have length=81, length is %2.2d\n", strlen(initialValues) );
exit (-12);
}
for (i=0; i < 81; i++)
if ((initialValues[i] > 0x30) && (initialValues[i] <= 0x3a))
sudoku[i/9][i%9] = 1 << (initialValues[i] - 0x31) ;
}
void print_sudoku ( int style ) {
int i, j, k;
for (i=0; i < 9; i++) {
for (j=0; j < 9; j++) {
if ( ifOne(sudoku[i][j]) || !style) {
for (k=0; k < 9; k++)
if (sudoku[i][j] & 1<<k)
printf("%d", k+1);
} else
printf("*");
if ( !((j+1)%3) )
printf("\t");
else
printf(",");
}
printf("\n");
if (!((i+1) % 3) )
printf("\n");
}
}
void print_HTML_sudoku () {
int i, j, k, l, m;
printf("<TABLE>\n");
for (i=0; i<3; i++) {
printf(" <TR>\n");
for (j=0; j<3; j++) {
printf(" <TD><TABLE>\n");
for (l=0; l<3; l++) { printf(" <TR>"); for (m=0; m<3; m++) { printf("<TD>"); for (k=0; k < 9; k++) { if (sudoku[i*3+l][j*3+m] & 1<<k)
printf("%d", k+1);
}
printf("</TD>");
}
printf("</TR>\n");
}
printf(" </TABLE></TD>\n");
}
printf(" </TR>\n");
}
printf("</TABLE>");
}
int doRow () {
int count=0, new_value, row_value, i, j;
for (i=0; i<9; i++) {
row_value=0x1ff;
for (j=0; j<9; j++)
row_value&=~ifOne(sudoku[i][j]);
for (j=0; j<9; j++) {
new_value=sudoku[i][j] & row_value;
if (new_value && (new_value != sudoku[i][j]) ) {
count++;
sudoku[i][j] = new_value;
}
}
}
return count;
}
int doCol () {
int count=0, new_value, col_value, i, j;
for (i=0; i<9; i++) {
col_value=0x1ff;
for (j=0; j<9; j++)
col_value&=~ifOne(sudoku[j][i]);
for (j=0; j<9; j++) {
new_value=sudoku[j][i] & col_value;
if (new_value && (new_value != sudoku[j][i]) ) {
count++;
sudoku[j][i] = new_value;
}
}
}
return count;
}
int doCube () {
int count=0, new_value, cube_value, i, j, l, m;
for (i=0; i<3; i++)
for (j=0; j<3; j++) {
cube_value=0x1ff;
for (l=0; l<3; l++)
for (m=0; m<3; m++)
cube_value&=~ifOne(sudoku[i*3+l][j*3+m]);
for (l=0; l<3; l++)
for (m=0; m<3; m++) {
new_value=sudoku[i*3+l][j*3+m] & cube_value;
if (new_value && (new_value != sudoku[i*3+l][j*3+m]) ) {
count++;
sudoku[i*3+l][j*3+m] = new_value;
}
}
}
return count;
}
#define FALSE -1
#define TRUE 1
#define INCOMPLETE 0
int validCube () {
int i, j, l, m, r, c;
int pigeon;
int solved=TRUE;
//check horizontal
for (i=0; i<9; i++) {
pigeon=0;
for (j=0; j<9; j++)
if (ifOne(sudoku[i][j])) {
if (pigeon & sudoku[i][j]) return FALSE;
pigeon |= sudoku[i][j];
} else {
solved=INCOMPLETE;
}
}
//check vertical
for (i=0; i<9; i++) {
pigeon=0;
for (j=0; j<9; j++)
if (ifOne(sudoku[j][i])) {
if (pigeon & sudoku[j][i]) return FALSE;
pigeon |= sudoku[j][i];
}
else {
solved=INCOMPLETE;
}
}
//check cube
for (i=0; i<3; i++)
for (j=0; j<3; j++) {
pigeon=0;
r=j*3; c=i*3;
for (l=0; l<3; l++)
for (m=0; m<3; m++)
if (ifOne(sudoku[r+l][c+m])) {
if (pigeon & sudoku[r+l][c+m]) return FALSE;
pigeon |= sudoku[r+l][c+m];
}
else {
solved=INCOMPLETE;
}
}
return solved;
}
int solveSudoku(int position ) {
int status, i, k;
short oldCube[9][9];
for (i=position; i < 81; i++) {
while ( doCube() + doRow() + doCol() );
status = validCube() ;
if ((status == TRUE) || (status == FALSE))
return status;
if ((status == INCOMPLETE) && !ifOne(sudoku[i/9][i%9]) ) {
memcpy( &oldCube, &sudoku, sizeof(short) * 81) ;
for (k=0; k < 9; k++) {
if ( sudoku[i/9][i%9] & (1<<k) ) {
sudoku[i/9][i%9] = 1 << k ;
if (solveSudoku(i+1) == TRUE ) {
/* return TRUE; */
/* Or look for entire set of solutions */
if (cubeSolutions < 10) {
cubeValues[cubeSolutions] = malloc ( sizeof(short) * 81 ) ;
memcpy( cubeValues[cubeSolutions], &sudoku, sizeof(short) * 81) ;
}
cubeSolutions++;
if ((cubeSolutions & 0x3ffff) == 0x3ffff ) {
printf ("cubeSolutions = %llx\n", cubeSolutions+1 );
}
//if ( cubeSolutions > 10 )
// return TRUE;
}
memcpy( &sudoku, &oldCube, sizeof(short) * 81) ;
}
if (k==8)
return FALSE;
}
}
}
return FALSE;
}
int main ( int argc, char** argv) {
int i;
if (argc != 2) {
printf("Error: number of arguments on command line is incorrect\n");
exit (-12);
}
init_sudoku();
set_sudoku(argv[1]);
printf("[----------------------- Input Data ------------------------]\n\n");
print_sudoku(1);
solveSudoku(0);
if ((validCube()==1) && !cubeSolutions) {
// If sudoku is effectively already solved, cubeSolutions will not be set
printf ("\n This is a trivial sudoku. \n\n");
print_sudoku(1);
}
if (!cubeSolutions && validCube()!=1)
printf("Not Solvable\n");
if (cubeSolutions > 1) {
if (cubeSolutions >= 10)
printf("10+ Solutions, returning first 10 (%lld) [%llx] \n", cubeSolutions, cubeSolutions);
else
printf("%llx Solutions. \n", cubeSolutions);
}
for (i=0; (i < cubeSolutions) && (i < 10); i++) {
memcpy ( &sudoku, cubeValues[i], sizeof(short) * 81 );
printf("[----------------------- Solution %2.2d ------------------------]\n\n", i+1);
print_sudoku(0);
//print_HTML_sudoku();
}
return 0;
}
For one, since backtracking is a depth-first search it is not directly parallelizable, since any newly computed result cannot be used be directly used by another thread. Instead, you must divide the problem early, i.e. thread #1 starts with the first combination for a node in the backtracking graph, and proceeds to search the rest of that subgraph. Thread #2 starts with the second possible combination at the first and so forth. In short, for n threads find the n possible combinations on the top level of the search space (do not "forward-track"), then assign these n starting points to n threads.
However I think the idea is fundamentally flawed: Many sudoku permutations are solved in a matter of a couple thousands of forward+backtracking steps, and are solved within milliseconds on a single thread. This is in fact so fast that even the small coordination required for a few threads (assume that n threads reduce computation time to 1/n of original time) on a multi-core/multi-CPU is not negligible compared to the total running time, thus it is not by any chance a more efficient solution.
Are you sure you want to do that? Like, what problem are you trying to solve? If you want to use all cores, use threads. If you want a fast sudoku solver, I can give you one I wrote, see output below. If you want to make work for yourself, go ahead and use GCD ;).
Update:
I don't think GCD is bad, it just isn't terribly relevant to the task of solving sudoku. GCD is a technology to tie GUI events to code. Essentially, GCD solves two problems, a Quirk in how the MacOS X updates windows, and, it provides an improved method (as compared to threads) of tying code to GUI events.
It doesn't apply to this problem because Sudoku can be solved significantly faster than a person can think (in my humble opinion). That being said, if your goal was to solve Sudoku faster, you would want to use threads, because you would want to directly use more than one processor.
[bear#bear scripts]$ time ./a.out ..1..4.......6.3.5...9.....8.....7.3.......285...7.6..3...8...6..92......4...1...
[----------------------- Input Data ------------------------]
*,*,1 *,*,4 *,*,*
*,*,* *,6,* 3,*,5
*,*,* 9,*,* *,*,*
8,*,* *,*,* 7,*,3
*,*,* *,*,* *,2,8
5,*,* *,7,* 6,*,*
3,*,* *,8,* *,*,6
*,*,9 2,*,* *,*,*
*,4,* *,*,1 *,*,*
[----------------------- Solution 01 ------------------------]
7,6,1 3,5,4 2,8,9
2,9,8 1,6,7 3,4,5
4,5,3 9,2,8 1,6,7
8,1,2 6,4,9 7,5,3
9,7,6 5,1,3 4,2,8
5,3,4 8,7,2 6,9,1
3,2,7 4,8,5 9,1,6
1,8,9 2,3,6 5,7,4
6,4,5 7,9,1 8,3,2
real 0m0.044s
user 0m0.041s
sys 0m0.001s