mutex and its effect on execution time (and cpu usage)

mutex and its effect on execution time (and cpu usage) - linux

I wrote a very simple test program to examine efficiency of pthread mutex. But I'm not able to analyse the results I get. (I can see 4 CPUs in Linux System Monitor and that's why I have at least 4 active threads, because I want to keep all of them busy.) The existence of mutex is not necessary in the code.
#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>
pthread_mutex_t lock1, lock2, lock3, lock4;
void do_sth() { /* just open a files, read it and copy to another file */
int i;
for (i = 0; i < 1; i++) {
FILE* fp = fopen("(2) Catching Fire.txt", "r");
if (fp == NULL) {
fprintf(stderr, "could not open file\n");
exit(1);
}
char filename[20];
sprintf(filename, "a%d", (int)pthread_self());
FILE* wfp = fopen(filename, "w");
if (wfp == NULL) {
fprintf(stderr, "could not open file for write\n");
exit(1);
}
int c;
while (c = fgetc(fp) != EOF) {
c++;
fputc(c, wfp);
}
close(fp);
close(wfp);
}
}
void* routine1(void* param) {
pthread_mutex_lock(&lock1);
do_sth();
pthread_mutex_unlock(&lock1);
}
void* routine2(void* param) {
pthread_mutex_lock(&lock2);
do_sth();
pthread_mutex_unlock(&lock2);
}
void* routine3(void* param) {
pthread_mutex_lock(&lock3);
do_sth();
pthread_mutex_unlock(&lock3);
}
void* routine4(void* param) {
pthread_mutex_lock(&lock4);
do_sth();
pthread_mutex_unlock(&lock4);
}
int main(int argc, char** argv) {
int i ;
pthread_mutex_init(&lock1, 0);
pthread_mutex_init(&lock2, 0);
pthread_mutex_init(&lock3, 0);
pthread_mutex_init(&lock4, 0);
pthread_t thread1[4];
pthread_t thread2[4];
pthread_t thread3[4];
pthread_t thread4[4];
for (i = 0; i < 4; i++)
pthread_create(&thread1[i], NULL, routine1, NULL);
for (i = 0; i < 4; i++)
pthread_create(&thread2[i], NULL, routine2, NULL);
for (i = 0; i < 4; i++)
pthread_create(&thread3[i], NULL, routine3, NULL);
for (i = 0; i < 4; i++)
pthread_create(&thread4[i], NULL, routine4, NULL);
for (i = 0; i < 4; i++)
pthread_join(thread1[i], NULL);
for (i = 0; i < 4; i++)
pthread_join(thread2[i], NULL);
for (i = 0; i < 4; i++)
pthread_join(thread3[i], NULL);
for (i = 0; i < 4; i++)
pthread_join(thread4[i], NULL);
printf("Hello, World!\n");
}
I execute this program in two ways, with and without all the mutex. and I measure time of execution (using time ./a.out) and average cpu load (using htop). here is the results:
first: when I use htop, I can see that loadavg of the system considerably increases when I do not use any mutex in the code. I have no idea why this happens. (is 4 active threads not enough to get the most out of 4 CPUs?)
second: It takes (a little) less time for the program to execute with all those mutex than without it. why does it happen? I mean, it should take some time to sleep and wake up a thread.
edit: I guess, when I use locks I put other threads to sleep and it eliminates a lot of context-switch (saving some time), could this be the reason?

You are using one lock per thread, so that's why when you use all the mutexes you don't see an increase in the execution time of the application: dosth() is not actually being protected from concurrent execution.
Since all the threads are working on the same file, all they should be accessing it using the same lock (otherwise you will have incorrect results: all the threads trying to modify the file at the same time).
Try running again the experiments using just one global lock.

Related

Scheduling policy and priority using pthread does not make any difference

I am running a simple multi-threaded program using pthread. Considering using real-scheduler (SCHED_FIFO policy), lower priority threads won't be able to run until higher priority ones are finished. But, when I run two versions of this program (the only difference is priority 99->1) at the same time, they finish at almost the same time. I even changed the policy to SCHED_OTHER but still no difference.
# include <stdio.h>
# include <string.h>
# include <pthread.h>
# include <stdlib.h>
# include <unistd.h>
# include <math.h>
# define NUM_THREADS 128
pthread_t tid[NUM_THREADS];
int indexes[NUM_THREADS];
void* dummyThread(void *arg)
{
unsigned long i = 0;
pthread_t id = pthread_self();
float a, b = 5, c = 8;
printf("Thread %d started.\n", *(int*)arg + 1);
for(i = 0; i < 10000000; i++)
a = sin(b) + sqrt(b);
printf("Thread %d finished.\n", *(int*)arg + 1);
return NULL;
}
int main(void)
{
int i = 0;
pthread_attr_t attr;
struct sched_param schedParam;
struct timespec start, finish;
double elapsed;
pthread_attr_init(&attr);
pthread_attr_setschedpolicy(&attr, SCHED_FIFO);
schedParam.sched_priority = 1;
pthread_attr_setschedparam(&attr, &schedParam);
pthread_attr_setinheritsched(&attr, PTHREAD_EXPLICIT_SCHED);
clock_gettime(CLOCK_MONOTONIC, &start);
for (i = 0 ; i < NUM_THREADS; i++)
{
indexes[i] = i;
if (!pthread_create((void*)&tid[i], &attr, &dummyThread, &indexes[i]))
printf("Thread %d created successfully.\n", i + 1);
else
printf("Failed to create Thread %d.\n", i + 1);
}
for (i = 0 ; i < NUM_THREADS; i++)
pthread_join(tid[i], NULL);
clock_gettime(CLOCK_MONOTONIC, &finish);
elapsed = (finish.tv_sec - start.tv_sec);
elapsed += (finish.tv_nsec - start.tv_nsec) / 1000000000.0;
printf("%lf\n", elapsed);
return 0;
}
Edit 1: Updated my code by adding pthread_attr_setschedparam and error checking. I don't get any errors when running it without sudo, and changing priority or scheduling policy still does not change the result.
Edit 2: I noticed that when I create threads with different priorities within the same process it works well. In the following code for threads with even index I assign priority 1 while for threads with an odd index I assign priority 99. It works well and odd threads finish first before even threads.
# include <stdio.h>
# include <string.h>
# include <pthread.h>
# include <stdlib.h>
# include <unistd.h>
# include <math.h>
# define NUM_THREADS 128
pthread_t tid[NUM_THREADS];
int indexes[NUM_THREADS];
void* dummyThread(void *arg)
{
unsigned long i = 0;
pthread_t id = pthread_self();
float a, b = 5, c = 8;
printf("Thread %d started.\n", *(int*)arg);
for(i = 0; i < 10000000; i++)
a = sin(b) + sqrt(b);
printf("Thread %d finished.\n", *(int*)arg);
return NULL;
}
int main(void)
{
int i = 0;
pthread_attr_t attr;
struct sched_param schedParam;
struct timespec start, finish;
double elapsed;
clock_gettime(CLOCK_MONOTONIC, &start);
for (i = 0 ; i < NUM_THREADS; i++)
{
indexes[i] = i;
pthread_attr_init(&attr);
pthread_attr_setschedpolicy(&attr, SCHED_FIFO);
schedParam.sched_priority = i % 2 == 0 ? 1 : 99;
pthread_attr_setschedparam(&attr, &schedParam);
pthread_attr_setinheritsched(&attr, PTHREAD_EXPLICIT_SCHED);
if (!pthread_create((void*)&tid[i], &attr, &dummyThread, &indexes[i]))
printf("Thread %d created successfully.\n", i);
else
printf("Failed to create Thread %d.\n", i);
}
for (i = 0 ; i < NUM_THREADS; i++)
pthread_join(tid[i], NULL);
clock_gettime(CLOCK_MONOTONIC, &finish);
elapsed = (finish.tv_sec - start.tv_sec);
elapsed += (finish.tv_nsec - start.tv_nsec) / 1000000000.0;
printf("%lf\n", elapsed);
return 0;
}
Since threads from different processes are all sent to the same scheduler in the Kernel, I don't know why it does not work with different processes.

From man pthread_attr_setschedpolicy
In order for the policy setting made by
pthread_attr_setschedpolicy() to have effect when calling
pthread_create(3), the caller must use
pthread_attr_setinheritsched(3) to set the inherit-scheduler
attribute of the attributes object attr to
PTHREAD_EXPLICIT_SCHED.
You've neglected to do that, so your SCHED_FIFO didn't have any effect.
As soon as I add pthread_attr_setinheritsched(&attr, PTHREAD_EXPLICIT_SCHED); call, pthread_create() starts failing with EPERM (since only root can create SCHED_FIFO threads).

Pthread Mutex Lock Linux

I created a simple program that shows the use of mutex lock. Here is the code...
#include <stdio.h>
#include <unistd.h>
#include <pthread.h>
#define NUM_THREAD 2
pthread_mutex_t mutex;
int call_time;
void *makeCall(void *param)
{
call_time = 10;
pthread_mutex_lock(&mutex);
printf("Hi I'm thread #%u making a call\n", (unsigned int) pthread_self());
do{
printf("%d\n", call_time);
call_time--;
sleep(1);
}
while(call_time > 0);
pthread_mutex_unlock(&mutex);
return 0;
}
int main()
{
int i;
pthread_t thread[NUM_THREAD];
//init mutex
pthread_mutex_init(&mutex, NULL);
//create thread
for(i = 0; i < NUM_THREAD; i++)
pthread_create(&thread[i], NULL, makeCall, NULL);
//join thread
for(i = 0; i < NUM_THREAD; i++)
pthread_join(thread[i], NULL);
pthread_mutex_destroy(&mutex);
return 0;
}
The output is...
Hi I'm thread #3404384000 making a call
10
10
9
8
7
6
5
4
3
2
1
Hi I'm thread #3412776704 making a call
0
However, if I modify the function makeCall and transfer the variable call_time inside the mutex locks...
pthread_mutex_lock(&mutex);
call_time = 10;
/*
*
*
*
*/
pthread_mutex_unlock(&mutex);
The program now gives me the correct output where each of the thread counts down from 10 to 0. I don't understand the difference it makes transferring the variable call_time inside the locks. I hope someone can make me understand this behavior of my program. Cheers!

call_time is a shared variable that is accessed from 2 threads and so must be protected. What is happening is that the first thread starts, sets call_time to 10 and prints the first round.Then the second thread starts, resets call_time back to 10 and waits for the mutex. The first thread now comes back and keeps running with call_time reset to 10. After it is done and frees the mutex, the second thread can now run. call_time is now 0 since the first thread left it at 0, and so it just prints the last round.
Try this program, I think it will demonstrate threads better:
#include <stdio.h>
#include <unistd.h>
#include <pthread.h>
#define NUM_THREAD 2
pthread_mutex_t mutex;
int call_time;
void *makeCall(void *param)
{
int temp;
do{
pthread_mutex_lock(&mutex);
printf("Hi I'm thread #%u making a call\n", (unsigned int) pthread_self());
printf("%d\n", call_time);
temp = call_time--;
pthread_mutex_unlock(&mutex);
//sleep(1); //try with and without this line and see the difference.
}
while(temp > 0);
return 0;
}
int main()
{
int i;
call_time = 100;
pthread_t thread[NUM_THREAD];
//init mutex
pthread_mutex_init(&mutex, NULL);
//create thread
for(i = 0; i < NUM_THREAD; i++)
pthread_create(&thread[i], NULL, makeCall, NULL);
//join thread
for(i = 0; i < NUM_THREAD; i++)
pthread_join(thread[i], NULL);
pthread_mutex_destroy(&mutex);
return 0;
}

My semaphore module is not working properly(Dining philosopher)

I'm implementing a semaphore methods to understand synchronization and thread things.
By using my semaphore, I tried to solve the Dining Philosophers problem.
My plan was making deadlock situation first.
But I found that just only one philosopher eat repeatedly.
And I checked that my semaphore is working quite good by using other synchronization problems. I think there is some problem with grammar.
please let me know what is the problem.
Here is my code.
dinig.c (including main function)
#include "sem.h"
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
static tsem_t *chopstick[5];
static tsem_t *updating;
static int update_status (int i, int eating)
{
static int status[5] = { 0, };
static int duplicated;
int idx;
int sum;
tsem_wait (updating);
status[i] = eating;
/* Check invalid state. */
duplicated = 0;
sum = 0;
for (idx = 0; idx < 5; idx++)
{
sum += status[idx];
if (status[idx] && status[(idx + 1) % 5])
duplicated++;
}
/* Avoid printing empty table. */
if (sum == 0)
{
tsem_signal (updating);
return 0;
}
for (idx = 0; idx < 5; idx++)
fprintf (stdout, "%3s ", status[idx] ? "EAT" : "...");
/* Stop on invalid state. */
if (sum > 2 || duplicated > 0)
{
fprintf (stdout, "invalid %d (duplicated:%d)!\n", sum, duplicated);
exit (1);
}
else
fprintf (stdout, "\n");
tsem_signal (updating);
return 0;
}
void *thread_func (void *arg)
{
int i = (int) (long) arg;
int k = (i + 1) % 5;
do
{
tsem_wait (chopstick[i]);
tsem_wait (chopstick[k]);
update_status (i, 1);
update_status (i, 0);
tsem_signal (chopstick[i]);
tsem_signal (chopstick[k]);
}
while (1);
return NULL;
}
int main (int argc,
char **argv)
{
int i;
for (i = 0; i < 5; i++)
chopstick[i] = tsem_new (1);
updating = tsem_new (1);
for (i = 0; i < 5; i++)
{
pthread_t tid;
pthread_create (&tid, NULL, thread_func, (void *) (long) i);
}
/* endless thinking and eating... */
while (1)
usleep (10000000);
return 0;
}
sem.c(including semaphore methods)
#include "sem.h"
.
sem.h(Header for sem.c)
#ifndef __SEM_H__
#define __SEM_H__
#include <pthread.h>
typedef struct test_semaphore tsem_t;
tsem_t *tsem_new (int value);
void tsem_free (tsem_t *sem);
void tsem_wait (tsem_t *sem);
int tsem_try_wait (tsem_t *sem);
void tsem_signal (tsem_t *sem);
#endif /* __SEM_H__ */
compile command
gcc sem.c dining.c -pthread -o dining

One problem is that in tsem_wait() you have the following code sequence outside of a lock:
while(sem->count <= 0)
continue;
There's no guarantee that the program will actually re-read sem->count - the compiler is free to produce machine code that does something like the following:
int temp = sem->count;
while(temp <= 0)
continue;
In fact, this will likely happen in an optimized build.
Try changing your busy wait loop to something like this so the count is checked while holding the lock:
void tsem_wait (tsem_t *sem)
{
pthread_mutex_lock(&(sem->mutexLock));
while (sem->count <= 0) {
pthread_mutex_unlock(&(sem->mutexLock));
usleep(1);
pthread_mutex_lock(&(sem->mutexLock));
}
// sem->mutexLock is still held here...
sem->count--;
pthread_mutex_unlock(&(sem->mutexLock));
}
Strictly speaking, you should do something similar for tsem_try_wait() (which you're not using yet).
Note that you might want to consider using a pthread_cond_t to make waiting on the counter changing more efficient.
Finally, your code to 'get' the chopsticks in thread_func() has the classic Dining Philosopher deadlock problem in the situation where each philosopher simultaneously acquires the 'left' chopstick (chopstick[i]) and ends up waiting forever to get the 'right' chopstick (chopstick[k]) since all the chopsticks are in some philosopher's left hand.

The time consume is not normal in multi-thread in Windows

The time consume is not normal in multi-thread in Windows. Our device has 5 nozzles, the process is:
The nozzles pick chips up at the same time, so I use the 5 threads do it
Move the nozzles to another place
Put the chips
It's smooth at normal time, but sometimes it has a short stop before moving to another place (we can see it obviously). Picking chips takes about 80 milliseconds at normal time, and sometimes it becomes 130 milliseconds. I write a simple code to test it:
#include "stdafx.h"
#include <WINDOWS.H>
#include <PROCESS.H>
#include <iostream>
#include <Mmsystem.h>
#pragma comment(lib, "winmm.lib")
using namespace std;
static TIMECAPS l_timecaps;
UINT WINAPI MainThread(LPVOID lParam /* = NULL */);
UINT WINAPI TestThread(LPVOID lParam /* = NULL */);
void MainProcess();
int _tmain(int argc, _TCHAR* argv[])
{
//set current process priority as real time
SetPriorityClass(GetCurrentProcess(), REALTIME_PRIORITY_CLASS);
//use more accurate time
timeGetDevCaps(&l_timecaps, sizeof(l_timecaps));
timeBeginPeriod(l_timecaps.wPeriodMin);
UINT uiThreadId = 0;
HANDLE hEvents = (HANDLE) _beginthreadex(NULL, 0, MainThread, NULL, 0, &uiThreadId);
SetThreadPriority(hEvents, THREAD_PRIORITY_TIME_CRITICAL);
WaitForSingleObject(hEvents, INFINITE);
cerr << endl << "Press Enter to exit." << endl;
while (cin.get() != '\n');
timeEndPeriod(l_timecaps.wPeriodMin);
return 0;
}
UINT WINAPI MainThread(LPVOID lParam /* = NULL */)
{
int i = 0;
while (i < 100)
{
MainProcess();
i++;
}
return 0;
}
void MainProcess()
{
const int THREAD_NUMBER = 5;
static HANDLE hEvents[THREAD_NUMBER];
for (int i = 0; i < THREAD_NUMBER; ++i)
hEvents[i] = NULL;
//log time with more accurate time
LARGE_INTEGER liPerfFreq={0};
LARGE_INTEGER liBeginRunTime = {0};
long lBeginRunTime = 0;
QueryPerformanceFrequency(&liPerfFreq);
QueryPerformanceCounter(&liBeginRunTime);
lBeginRunTime = liBeginRunTime.QuadPart * 1000 / liPerfFreq.QuadPart;
for (int i = 0; i < THREAD_NUMBER; ++i)
{
UINT uiThreadId = 0;
hEvents[i] = (HANDLE) _beginthreadex(NULL, 0, TestThread, NULL, 0, &uiThreadId);
SetThreadPriority(hEvents[i], THREAD_PRIORITY_TIME_CRITICAL);
//assign to cpu
SetThreadAffinityMask(hEvents[i], 0x00000001 + i);
}
//wait all threads finished
WaitForMultipleObjects(THREAD_NUMBER, hEvents, TRUE, INFINITE);
LARGE_INTEGER liEndRunTime = {0};
long lEndRunTime = 0;
QueryPerformanceCounter(&liEndRunTime);
lEndRunTime = liEndRunTime.QuadPart * 1000 / liPerfFreq.QuadPart;
cout << "time: " << lEndRunTime - lBeginRunTime << endl;
}
UINT WINAPI TestThread(LPVOID lParam /* = NULL */)
{
//do nothing
return 0;
}
The output result time is 2,3 or 4 millisecond, but sometimes it becomes 57 or 62 millisecond. It's bad for our device when running, the device becomes slow.

Your test threads do nothing. All the time is spent creating and shutting down the thread. Overheads in the kernel object manager and scheduler will dominate. Perhaps some of the threads are having to wait on other threads holding (via API calls) kernel locks and thus seeing delays.
And of course those inner threads could be completing before the call to set their priority completes: to set this you really need to start the thread suspended and then start it.
Because you are measuring nothing, all you have are overheads which will depend on what else is going on.
Also remember, while you have names like THREAD_PRIORITY_TIME_CRITICAL Windows is not a real-time OS.

Conditional variable and rwlock deadlock

I have a simple threaded program which use a conditional variable and a rwlock. I've been staring at it for hours trying different approaches. The problem is that a thread or more stops at the rwlock after a while although it is not locked for writing. Maybe I miss something about how those locks work or how they are implemented.
#include <stdio.h>
#include <pthread.h>
#include <stdlib.h>
#include <time.h>
#include <math.h>
#include <unistd.h>
//global variables
pthread_mutex_t mutex;
pthread_cond_t cond;
pthread_rwlock_t rwlock;
int counter;
int listLength = 1;
void* worker(void* arg){
do {
usleep(200);
printf("Before rwlock\n");
pthread_rwlock_rdlock(&rwlock);
printf("Before mutex\n");
pthread_mutex_lock(&mutex);
printf("Afer mutex\n");
counter++;
//signal the main
if (counter == 5 ||
(listLength < 5 && counter == listLength)){
printf("Signal main\n");
pthread_cond_signal(&cond);
counter = 0;
}
pthread_mutex_unlock(&mutex);
pthread_rwlock_unlock(&rwlock);
} while(listLength != 0);
return NULL;
}
int main(int argc, char* argv[]){
if (argc != 2){
perror("Invalid number of args");
exit(1);
}
//get arguments
int workers = atoi(argv[1]);
//initialize sync vars
pthread_rwlockattr_t attr;
pthread_rwlockattr_setkind_np(&attr,
PTHREAD_RWLOCK_PREFER_WRITER_NONRECURSIVE_NP);
pthread_mutex_init(&mutex, NULL);
pthread_cond_init(&cond, NULL);
pthread_rwlock_init(&rwlock, &attr);
counter = 0;
//create threads
pthread_t threadArray[workers];
int threadOrder[workers];
for (int i = 0; i < workers; i++){
threadOrder[i] = i;
if (pthread_create(&threadArray[i], NULL,
worker, &threadOrder[i]) != 0){
perror("Cannot create thread");
exit(1);
}
}
while(listLength != 0) {
//wait for signal and lock the list
pthread_mutex_lock(&mutex);
while (pthread_cond_wait(&cond, &mutex) != 0);
pthread_rwlock_wrlock(&rwlock);
printf("In write lock\n");
pthread_mutex_unlock(&mutex);
pthread_rwlock_unlock(&rwlock);
printf("release wrlock\n");
}
//join the threads
for (int i = 0; i < workers; i++){
if (pthread_join(threadArray[i], NULL) !=0){
perror("Cannot join thread");
exit(1);
}
}
//release resources
pthread_mutex_destroy(&mutex);
pthread_cond_destroy(&cond);
pthread_rwlock_destroy(&rwlock);
return 0;
}

Looks like this code has several inconsistencies in it.
You're using mutex together with rwlock which means that all the threads of this kind are always locked. If you remove the rwlock code - it won't change the behaviour.
I cannot see the pthread_rwlock_init() call, and suppose you've called it in another place. Anyway pay attention you do call it and you don't call it twice or more times with the same rowlock object.
The same applies to pthread_rwlockattr_destroy()
I cannot see the reason why pthread_rwlock_rdlock() would block without write lock. Be sure you don't do it. Or else you could do a mutual lock of your mutex

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

mutex and its effect on execution time (and cpu usage) - linux

Related

Scheduling policy and priority using pthread does not make any difference

Pthread Mutex Lock Linux

My semaphore module is not working properly(Dining philosopher)

The time consume is not normal in multi-thread in Windows

Conditional variable and rwlock deadlock

Categories

Resources