How to execute a while loop precisely every 10 seconds in windows vc++ - visual-c++

Please help me in running the following loop precisely every 10 seconds in windows vc++.
Initially It should start at something like say 12:12:40:000, It should neglect the milliseconds it takes to do some work commented, and restart the next loop at 12:12:50:000 and so on every 10 seconds precisely.
void controlloop()
{
struct timeb start, end;
while(1)
{
ftime(&start);
if(start.time %10 == 0)
break;
else
Sleep(100);
}
while(1)
{
ftime(&start);
if(start.time %10 == 0)
{
// some work here which will roughly take 100 ms
ftime(&end);
elapsedtime = (int) (1000.0 * (end.time - start.time) + (end.millitm - start.millitm));
if(elapsedtime > 10000)
{
sleeptime = 0;
}
else
{
sleeptime = 10000-(elapsedtime);
}
}
Sleep(sleeptime);
}//1
}

The Sleep approach only guarantees you sleep at least 10 seconds. After that your thread is considered eligible for scheduling and on the next quanta it will be considered again. You are still subject to the priority of any other threads on the system, the number of logical cores, etc. You are also still subject to the resolution of the threading quanta which is by default ~15 ms. You can change it with timeBeginPeriod, but that has system-wide power implications.
For more information on Windows scheduling see Microsoft Docs. For more on the power issues, see this blog post.
For Windows the best option is to use the high-frequency performance counter via QueryPerformanceCounter. You use QueryPerformanceFrequency to convert between cycles and seconds.
LARGE_INTEGER qpcFrequency;
QueryPerformanceFrequency(&qpcFrequency);
LARGE_INTEGER startTime;
QueryPerformanceCounter(&startTime);
LARGE_INTEGER tenSeconds;
tenSeconds.QuadPart = startTime .QuadPart + qpcFrequency.QuadPart * 10;
while (true)
{
LARGE_INTEGER currentTime;
QueryPerformanceCounter(&currentTime);
if (currentTime.QuadPart >= tenSeconds.QuadPart)
break;
}
The timer resolution for QPC is typically close the cycle speed of your CPU processor.
If you want to run a thread for as close to 10 seconds as you can while still yielding the processor use:
LARGE_INTEGER qpcFrequency;
QueryPerformanceFrequency(&qpcFrequency);
LARGE_INTEGER startTime;
QueryPerformanceCounter(&startTime);
LARGE_INTEGER tenSeconds;
tenSeconds.QuadPart = startTime .QuadPart + qpcFrequency.QuadPart * 10;
while (true)
{
LARGE_INTEGER currentTIme;
QueryPerformanceCounter(&currentTIme);
if (currentTime.QuadPart >= tenSeconds.QuadPart)
{
// do a thing
tenSeconds.QuadPart = currentTime.QuadPart + qpcFrequency.QuadPart * 10;
SwitchToThread();
}
This is not really the most efficient way to do a periodic timer, but you asked for precision not efficiency.
If you are using VS 2015 or later, you can use the C++11 type high_resolution_clock which uses QPC for it’s implementation. In older versions of Visual C++ used ‘file system time’ which is back to your original resolution problem with ftime.

Related

Multithreading in Windows Forms

I wrote app, Caesar Cipher in Windows Forms CLI with dynamic linking libraries(in C++ and in ASM) with my alghorithms for model(eciphering and deciphering). That part of my app is working.
Here is also a multithreading from Windows Forms. User can chose number of threads(1-64). If he chose 2, message to encipher(decipher) will be divided on two substrings which will be divided on two threads. And I want to execute these threads paraller, and finally reduce cost of execution time.
When user push encipher or decipher button there will be displayed enciphered or deciphered text and time costs for execution functions in C++ and ASM. Actualy everything is alright, but times for greater threads than 1 aren't smaller, they are bigger.
There is some code:
/*Function which concats string for substrings to threads*/
array<String^>^ ThreadEncipherFuncCpp(int nThreads, string str2){
//Tablica wątków
array<String^>^ arrayOfThreads = gcnew array <String^>(nThreads);
//Przechowuje n-tą część wiadomosci do przetworzenia
string loopSubstring;
//Długość podstringa w wiadomości
int numberOfSubstring = str2.length() / nThreads;
int isModulo = str2.length() % nThreads;
array<Thread^>^ xThread = gcnew array < Thread^ >(nThreads);
for (int i = 0; i < nThreads; i++)
{
if (i == 0 && numberOfSubstring != 0)
loopSubstring = str2.substr(0, numberOfSubstring);
else if ((i == nThreads - 1) && numberOfSubstring != 0){
if (isModulo != 0)
loopSubstring = str2.substr(numberOfSubstring*i, numberOfSubstring + isModulo);
else
loopSubstring = str2.substr(numberOfSubstring*i, numberOfSubstring);
}
else if (numberOfSubstring == 0){
loopSubstring = str2.substr(0, isModulo);
i = nThreads - 1;
}
else
loopSubstring = str2.substr(numberOfSubstring*i, numberOfSubstring);
ThreadExample::inputString = gcnew String(loopSubstring.c_str());
xThread[i] = gcnew Thread(gcnew ThreadStart(&ThreadExample::ThreadEncipher));
xThread[i]->Start();
xThread[i]->Join();
arrayOfThreads[i] = ThreadExample::outputString;
}
return arrayOfThreads;
}}
Here is a fragment which is responsible for the calculation of the time for C++:
/*****************C++***************/
auto start = chrono::high_resolution_clock::now();
array<String^>^ arrayOfThreads = ThreadEncipherFuncCpp(nThreads, str2);
auto elapsed = chrono::high_resolution_clock::now() - start;
long long milliseconds = chrono::duration_cast<std::chrono::microseconds>(elapsed).count();
double micro = milliseconds;
this->label4->Text = Convert::ToString(micro + " microseconds");
String^ str3;
String^ str4;
str4 = str3->Concat(arrayOfThreads);
this->textBox2->Text = str4;
/**********************************/
And example of working:
For input data: "Some example text. Some example text2."
Program will display: "Vrph hadpsoh whaw. Vrph hadpsoh whaw2."
Times of execution for 1 thread:
C++ time: 31231us.
Asm time: 31212us.
Times of execution for 2 threads:
C++ time: 62488us.
Asm time: 62505us.
Times of execution for 4 threads:
C++ time: 140254us.
Asm time: 124587us.
Times of execution for 32 threads:
C++ time: 1002548us.
Asm time: 1000020us.
How to solve this problem?
I need this structure of program, this is academic project.
My CPU has 4 cores.
The reason it's not going any faster is because you aren't letting your threads run in parallel.
xThread[i] = gcnew Thread(gcnew ThreadStart(&ThreadExample::ThreadEncipher));
xThread[i]->Start();
xThread[i]->Join();
These three lines create the thread, start it running, and then wait for it to finish. You're not getting any parallelism here, you're just adding the overhead of spawning & waiting for threads.
If you want to have a speedup from multithreading, the way to do it is to start all the threads at once, let them all run, and then collect up the results.
In this case, I'd make it so that ThreadEncipher (which you haven't shown us the source of, so I'm making assumptions) takes a parameter, which is used as an array index. Instead of having ThreadEncipher read from inputString and write to outputString, have it read from & write to one index of an array. That way, each thread can read & write at the same time. After you've spawned all these threads, then you can wait for all of them to finish, and you can either process the output array, or since array<String^>^ is already your return type, just return it as-is.
Other thoughts:
You've got a mix of unmanaged and managed objects here. It will be better if you pick one and stick with it. Since you're in C++/CLI, I'd recommend that you stick with the managed objects. I'd stop using std::string, and use System::String^ exclusively.
Since your CPU has 4 cores, you're not going to get any speedup by using more than 4 threads. Don't be surprised when 32 threads takes longer than 4, because you're doing 8x the string manipulation, and you've got 32 threads fighting over 4 processor cores.
Your string splitting code is more complex than it needs to be. You've got five different cases in there, I'd have to sit down and think about it for a while to be sure it's correct. Try this:
int totalLen = str2->length;
for (int i = 0; i < nThreads; i++)
{
int startIndex = totalLen * i / nThreads;
int endIndex = totalLen * (i+1) / nThreads;
int substrLen = endIndex - startIndex;
String^ substr = str2->SubString(startIndex, substrLen);
...
}

how to use pthread_cond_timedwait with millisecond

I am trying to use pthread_cond_timedwait for millisecond sleep interval but I am not getting sleep duration. my thread is sleeping more than I have mentioned. below is my implementation. Let me know if i am wrong anywhere.
struct timeval tp;
struct timespec ts;
int rc = gettimeofday(&tp, NULL);
ts.tv_sec = tp.tv_sec;
ts.tv_nsec = tp.tv_usec * 1000;
ts.tv_nsec += 30 * 1000000; //30 is my milliseconds
pthread_mutex_lock(&mtxPlaybackWait);
pthread_cond_timedwait(&playbackSignal, &mtxPlaybackWait, &ts);
pthread_mutex_unlock(&mtxPlaybackWait);
timespac might be overflowed and causing timeout.
Try following:
ts.tv_sec = tp.tv_sec;
ts.tv_nsec = tp.tv_usec * 1000;
ts.tv_nsec += 30 * 1000000;
ts.tv_sec += ts.tv_nsec / 1000000000L;
ts.tv_nsec = ts.tv_nsec % 1000000000L;
You have an addition of seconds and microseconds on one side, and milliseconds on the other. The result is in seconds and nanoseconds.
If you try to express seconds in nanoseconds, this may overflow quickly: 1 second = 1,000,000,000 nanoseconds, which takes up ~30 bits. An unsigned 32-bit integer value can hold up to ~4 seconds if unsigned (~2 for a signed int) and will overflow beyond that.
Also, I am not sure if all functions behave correctly under all circumstances when passed a struct where the fractional seconds amount to more than a second. I’d expect widely used standard libraries to have done their homework and normalize first (or otherwise ensure correct behavior), but some quickly assembled niche product might not handle such cases properly.
To prevent both the overflow and strange side effects of anomalies, shave off integer seconds wherever you can and store them in the seconds part rather than in the fractional seconds.
Here is a version of your calculation which avoids both these things:
gettimeofday(&tp, NULL);
/* if msec is 1 s or more, add its integer part to tv_sec */
ts.tv_sec = tp.tv_sec + floor(msec / 1000);
/* for now, these are really µsec, not nsec, to prevent overflow */
ts.tv_nsec = tp.tv_usec + (msec % 1000) * 1000000;
/* if tv_nsec is 1s or more, move integer second part to tv_sec */
ts.tv_sec += floor(ts.tv_nsec / 1000000);
ts.tv_nsec %= 1000000;
/* and finally, convert µsec to nsec */
ts.tv_nsec *= 1000;
You might not need floor if you are certain that you are operating on integer types (i.e. for msec and ts.tv_nsec)—in that case, a simple division will do.

How can i measure the overhead due to task migration/load balancing on linux with the real time patch?

I am trying to measure the overhead due to task migration. by overhead i would like to measure the latency involved in such a an activity. I know there are separate run queues available for each core and the kernel periodically checks the run queues to check whether there is a imbalance and wakes up a kernel thread ( perhaps a higher priority ) that does the migration.
Could any one provide me with pointers to kernel source code where i can insert time stamps to measure this value?
Is there any other performance metric which i probably investigate to get such an overhead?
I remember there is a post before that discussed about this topic, and someone also posted some codes about how to get the system overhead.
I see you want to add some codes to insert time stamps, do you think it's feasible because task schedule is so frequent. I think you can follow the topic that posted before.
I ever saved the source codes from the post, thanks for the author!
double getCurrentValue() {
double percent;
FILE* file;
unsigned long long totalUser, totalUserLow, totalSys, totalIdle, total;
file = fopen("/proc/stat", "r");
fscanf(file, "cpu %Ld %Ld %Ld %Ld", &totalUser, &totalUserLow,
&totalSys, &totalIdle);
fclose(file);
if (totalUser < lastTotalUser || totalUserLow < lastTotalUserLow ||
totalSys < lastTotalSys || totalIdle < lastTotalIdle) {
//Overflow detection. Just skip this value.
percent = -1.0;
}
else {
total = (totalUser - lastTotalUser) + (totalUserLow - lastTotalUserLow) +
(totalSys - lastTotalSys);
percent = total;
total += (totalIdle - lastTotalIdle);
percent /= total;
percent *= 100;
}
lastTotalUser = totalUser;
lastTotalUserLow = totalUserLow;
lastTotalSys = totalSys;
lastTotalIdle = totalIdle;
return percent;
}

how to generate a delay

i'm new to kernel programming and i'm trying to understand some basics of OS. I am trying to generate a delay using a technique which i've implemented successfully in a 20Mhz microcontroller.
I know this is a totally different environment as i'm using linux centOS in my 2 GHz Core 2 duo processor.
I've tried the following code but i'm not getting a delay.
#include<linux/kernel.h>
#include<linux/module.h>
int init_module (void)
{
unsigned long int i, j, k, l;
for (l = 0; l < 100; l ++)
{
for (i = 0; i < 10000; i ++)
{
for ( j = 0; j < 10000; j ++)
{
for ( k = 0; k < 10000; k ++);
}
}
}
printk ("\nhello\n");
return 0;
}
void cleanup_module (void)
{
printk ("bye");
}
When i dmesg after inserting the module as quickly as possile for me, the string "hello" is already there. If my calculation is right, the above code should give me atleast 10 seconds delay.
Why is it not working? Is there anything related to threading? How could a 20 Ghz processor execute the above code instantly without any noticable delay?
The compiler is optimizing your loop away since it has no side effects.
To actually get a 10 second (non-busy) delay, you can do something like this:
#include <linux/sched.h>
//...
unsigned long to = jiffies + (10 * HZ); /* current time + 10 seconds */
while (time_before(jiffies, to))
{
schedule();
}
or better yet:
#include <linux/delay.h>
//...
msleep(10 * 1000);
for short delays you may use mdelay, ndelay and udelay
I suggest you read Linux Device Drivers 3rd edition chapter 7.3, which deals with delays for more information
To answer the question directly, it's likely your compiler seeing that these loops don't do anything and "optimizing" them away.
As for this technique, what it looks like you're trying to do is use all of the processor to create a delay. While this may work, an OS should be designed to maximize processor time. This will just waste it.
I understand it's experimental, but just the heads up.

CPU contention (wait time) for a process in Linux

How can I check how long a process spends waiting for the CPU in a Linux box?
For example, in a loaded system I want to check how long a SQL*Loader (sqlldr) process waits.
It would be useful if there is a command line tool to do this.
I've quickly slapped this together. It prints out the smallest and largest "interferences" from task switching...
#include <sys/time.h>
#include <stdio.h>
double seconds()
{
timeval t;
gettimeofday(&t, NULL);
return t.tv_sec + t.tv_usec / 1000000.0;
}
int main()
{
double min = 999999999, max = 0;
while (true)
{
double c = -(seconds() - seconds());
if (c < min)
{
min = c;
printf("%f\n", c);
fflush(stdout);
}
if (c > max)
{
max = c;
printf("%f\n", c);
fflush(stdout);
}
}
return 0;
}
Here's how you should go about measuring it. Have a number of processes, greater than the number of your processors * cores * threading capability wait (block) on an event that will wake them up all at the same time. One such event is a multicast network packet. Use an instrumentation library like PAPI (or one more suited to your needs) to measure the differences in real and virtual "wakeup" time between your processes. From several iterations of the experiment you can get an estimate of the CPU contention time for your processes. Obviously, it's not going to be at all accurate for multicore processors, but maybe it'll help you.
Cheers.
I had this problem some time back. I ended up using getrusage :
You can get detailed help at :
http://www.opengroup.org/onlinepubs/009695399/functions/getrusage.html
getrusage populates the rusage struct.
Measuring Wait Time with getrusage
You can call getrusage at the beginning of your code and then again call it at the end, or at some appropriate point during execution. You have then initial_rusage and final_rusage. The user-time spent by your process is indicated by rusage->ru_utime.tv_sec and system-time spent by the process is indicated by rusage->ru_stime.tv_sec.
Thus the total user-time spent by the process will be:
user_time = final_rusage.ru_utime.tv_sec - initial_rusage.ru_utime.tv_sec
The total system-time spent by the process will be:
system_time = final_rusage.ru_stime.tv_sec - initial_rusage.ru_stime.tv_sec
If total_time is the time elapsed between the two calls of getrusage then the wait time will be
wait_time = total_time - (user_time + system_time)
Hope this helps

Resources