how to use pthread_cond_timedwait with millisecond - linux

I am trying to use pthread_cond_timedwait for millisecond sleep interval but I am not getting sleep duration. my thread is sleeping more than I have mentioned. below is my implementation. Let me know if i am wrong anywhere.
struct timeval tp;
struct timespec ts;
int rc = gettimeofday(&tp, NULL);
ts.tv_sec = tp.tv_sec;
ts.tv_nsec = tp.tv_usec * 1000;
ts.tv_nsec += 30 * 1000000; //30 is my milliseconds
pthread_mutex_lock(&mtxPlaybackWait);
pthread_cond_timedwait(&playbackSignal, &mtxPlaybackWait, &ts);
pthread_mutex_unlock(&mtxPlaybackWait);

timespac might be overflowed and causing timeout.
Try following:
ts.tv_sec = tp.tv_sec;
ts.tv_nsec = tp.tv_usec * 1000;
ts.tv_nsec += 30 * 1000000;
ts.tv_sec += ts.tv_nsec / 1000000000L;
ts.tv_nsec = ts.tv_nsec % 1000000000L;

You have an addition of seconds and microseconds on one side, and milliseconds on the other. The result is in seconds and nanoseconds.
If you try to express seconds in nanoseconds, this may overflow quickly: 1 second = 1,000,000,000 nanoseconds, which takes up ~30 bits. An unsigned 32-bit integer value can hold up to ~4 seconds if unsigned (~2 for a signed int) and will overflow beyond that.
Also, I am not sure if all functions behave correctly under all circumstances when passed a struct where the fractional seconds amount to more than a second. I’d expect widely used standard libraries to have done their homework and normalize first (or otherwise ensure correct behavior), but some quickly assembled niche product might not handle such cases properly.
To prevent both the overflow and strange side effects of anomalies, shave off integer seconds wherever you can and store them in the seconds part rather than in the fractional seconds.
Here is a version of your calculation which avoids both these things:
gettimeofday(&tp, NULL);
/* if msec is 1 s or more, add its integer part to tv_sec */
ts.tv_sec = tp.tv_sec + floor(msec / 1000);
/* for now, these are really µsec, not nsec, to prevent overflow */
ts.tv_nsec = tp.tv_usec + (msec % 1000) * 1000000;
/* if tv_nsec is 1s or more, move integer second part to tv_sec */
ts.tv_sec += floor(ts.tv_nsec / 1000000);
ts.tv_nsec %= 1000000;
/* and finally, convert µsec to nsec */
ts.tv_nsec *= 1000;
You might not need floor if you are certain that you are operating on integer types (i.e. for msec and ts.tv_nsec)—in that case, a simple division will do.

Related

GMP setting last digit to zero

I’m looking for fastest way to set last digit of positive number l declated as mpz_t to zero. I didn’t find the function could to this already. For example 6531489321483 should be changed to 6531489321480.
Update
It appears that subtraction and modulo is the superior method for zeroing out the last digit with mpz_t types. Just as #MarkDickinson and #MarcGlisse pointed out, the asymptotic behavior greatly favors using mpz_tdiv_r_ui (or mpz_fdiv_r_ui) over mpz_div_ui followed by mpz-mul_ui. My original benchmarks were on relatively small numbers (25 digits). I retested on a 175 digit number and the sub_mod method was nearly 40% faster.
Test value: 1234567898765432123456789123456789876543212345678912345678987654321234567891234567898765432123456789123456789876543212345678912345678987654321234567891234567898765432123456789
Result with div_mul: 1234567898765432123456789123456789876543212345678912345678987654321234567891234567898765432123456789123456789876543212345678912345678987654321234567891234567898765432123456780
Result with sub_mod: 1234567898765432123456789123456789876543212345678912345678987654321234567891234567898765432123456789123456789876543212345678912345678987654321234567891234567898765432123456780
time with division followed by multiplication: 6.145656
time with subtraction and modulo: 4.413998
And with a 350 digit number we see that sub_mod is around 85% faster:
Test value: 12345678987654321234567891234567898765432123456789123456789876543212345678912345678987654321234567891234567898765432123456789123456789876543212345678912345678987654321234567891234567898765432123456789123456789876543212345678912345678987654321234567891234567898765432123456789123456789876543212345678912345678987654321234567891234567898765432123456789
Result with div_mul: 12345678987654321234567891234567898765432123456789123456789876543212345678912345678987654321234567891234567898765432123456789123456789876543212345678912345678987654321234567891234567898765432123456789123456789876543212345678912345678987654321234567891234567898765432123456789123456789876543212345678912345678987654321234567891234567898765432123456780
Result with sub_mod: 12345678987654321234567891234567898765432123456789123456789876543212345678912345678987654321234567891234567898765432123456789123456789876543212345678912345678987654321234567891234567898765432123456789123456789876543212345678912345678987654321234567891234567898765432123456789123456789876543212345678912345678987654321234567891234567898765432123456780
time with division followed by multiplication: 10.256122
time with subtraction and modulo: 5.522990
It should be noted that whether we use mpz_tdiv_r_ui or mpz_fdiv_r_ui, the results were almost identical.
Since the sub_mod method was only marginally slower with smaller numbers, it seems reasonable to only use this method for all cases.
It would be nice to tests this on different compilers. I'm currently using clang 5.0.1.
Original
Benchmarks on my machine show that division followed by multiplication is faster than finding the remainder via modulo operator and subtracting.
#include <stdio.h>
#include <time.h>
#include <gmp.h>
void div_mul(mpz_t x) {
mpz_tdiv_q_ui(x, x, 10u);
mpz_mul_ui(x, x, 10u);
}
// Maybe this could be simpler?
void sub_mod(mpz_t x, mpz_t y) {
// N.B. mpz_mod_ui is equivalent to mpz_fdiv_r_ui. Changed to
// mpz_tdiv_r_ui for consistency with div_mul.
mpz_tdiv_r_ui(y, x, 10u);
mpz_sub(x, x, y);
}
Main:
int main() {
mpz_t testVal;
mpz_init(testVal);
mpz_set_str(testVal, "1234567898765432123456789", 10);
gmp_printf("Test value: %Zd\n", testVal);
mpz_t x;
mpz_t y;
mpz_init(x);
mpz_init(y);
mpz_set(x, testVal);
div_mul(x);
gmp_printf("Result with div_mul: %Zd\n", x);
mpz_set(x, testVal);
sub_mod(x, y);
gmp_printf("Result with sub_mod: %Zd\n", x);
const int limit = 100000000;
const double checkPoint0 = (double) clock() / CLOCKS_PER_SEC;
for (int i = 0; i < limit; ++i) {
mpz_set(x, testVal);
div_mul(x);
}
const double checkPoint1 = (double) clock() / CLOCKS_PER_SEC;
const double time_div_mul = checkPoint1 - checkPoint0;
printf("time with division followed by multiplication: %f\n", time_div_mul);
const double checkPoint2 = (double) clock() / CLOCKS_PER_SEC;
for (int i = 0; i < limit; ++i) {
mpz_set(x, testVal);
sub_mod(x, y);
}
const double checkPoint3 = (double) clock() / CLOCKS_PER_SEC;
const double time_sub_mod = checkPoint3 - checkPoint2;
printf("time with subtraction and modulo: %f\n", time_sub_mod);
mpz_clear(testVal);
mpz_clear(x);
mpz_clear(y);
return 0;
}
Output:
Test value: 1234567898765432123456789
Result with div_mul: 1234567898765432123456780
Result with sub_mod: 1234567898765432123456780
time with division followed by multiplication: 2.941251
time with subtraction and modulo: 3.171949
I suspect that one of the reasons that the latter method is slightly slower is that 2 variables are needed as complicated multi operations on the same line are not accessible in the C api. If we could use gmpxx, we could write x - x % 10.
Another thought as to why the first method is faster, is that the div_mul involves two operations with unsigned integers while the sub_mod method involves an operation with an unsigned integer followed by an operation with mpz_t.
I tried to get this reproduced on ideone.com but could not get gmp.h loaded. I opted to implement a similar benchmark with type long long int just for fun. You will note the presence of volatile and that the limit is one billion instead of one hundred million as seen above. The volatile was need to keep the for loop from being optimized away.
Converting the number to a string and changing last character wouldn't be the fastest way?

How to execute a while loop precisely every 10 seconds in windows vc++

Please help me in running the following loop precisely every 10 seconds in windows vc++.
Initially It should start at something like say 12:12:40:000, It should neglect the milliseconds it takes to do some work commented, and restart the next loop at 12:12:50:000 and so on every 10 seconds precisely.
void controlloop()
{
struct timeb start, end;
while(1)
{
ftime(&start);
if(start.time %10 == 0)
break;
else
Sleep(100);
}
while(1)
{
ftime(&start);
if(start.time %10 == 0)
{
// some work here which will roughly take 100 ms
ftime(&end);
elapsedtime = (int) (1000.0 * (end.time - start.time) + (end.millitm - start.millitm));
if(elapsedtime > 10000)
{
sleeptime = 0;
}
else
{
sleeptime = 10000-(elapsedtime);
}
}
Sleep(sleeptime);
}//1
}
The Sleep approach only guarantees you sleep at least 10 seconds. After that your thread is considered eligible for scheduling and on the next quanta it will be considered again. You are still subject to the priority of any other threads on the system, the number of logical cores, etc. You are also still subject to the resolution of the threading quanta which is by default ~15 ms. You can change it with timeBeginPeriod, but that has system-wide power implications.
For more information on Windows scheduling see Microsoft Docs. For more on the power issues, see this blog post.
For Windows the best option is to use the high-frequency performance counter via QueryPerformanceCounter. You use QueryPerformanceFrequency to convert between cycles and seconds.
LARGE_INTEGER qpcFrequency;
QueryPerformanceFrequency(&qpcFrequency);
LARGE_INTEGER startTime;
QueryPerformanceCounter(&startTime);
LARGE_INTEGER tenSeconds;
tenSeconds.QuadPart = startTime .QuadPart + qpcFrequency.QuadPart * 10;
while (true)
{
LARGE_INTEGER currentTime;
QueryPerformanceCounter(&currentTime);
if (currentTime.QuadPart >= tenSeconds.QuadPart)
break;
}
The timer resolution for QPC is typically close the cycle speed of your CPU processor.
If you want to run a thread for as close to 10 seconds as you can while still yielding the processor use:
LARGE_INTEGER qpcFrequency;
QueryPerformanceFrequency(&qpcFrequency);
LARGE_INTEGER startTime;
QueryPerformanceCounter(&startTime);
LARGE_INTEGER tenSeconds;
tenSeconds.QuadPart = startTime .QuadPart + qpcFrequency.QuadPart * 10;
while (true)
{
LARGE_INTEGER currentTIme;
QueryPerformanceCounter(&currentTIme);
if (currentTime.QuadPart >= tenSeconds.QuadPart)
{
// do a thing
tenSeconds.QuadPart = currentTime.QuadPart + qpcFrequency.QuadPart * 10;
SwitchToThread();
}
This is not really the most efficient way to do a periodic timer, but you asked for precision not efficiency.
If you are using VS 2015 or later, you can use the C++11 type high_resolution_clock which uses QPC for it’s implementation. In older versions of Visual C++ used ‘file system time’ which is back to your original resolution problem with ftime.

Linux kernel: Why add_timer() is modifying my "expires" value?

I am trying to setup a periodic timer triggering a function every seconds, but there is a small drift between each call. After some investigations, I found that this is the add_timer() call which adds an offset of 2 to the expires field (~2ms in my case).
Why is this drift added? Is there a clean way to prevent it? I am not trying to get an accurate millisecond precision, I have a vague understanding of the kernel real-time limitations, but at least to avoid this intentional delay at each call.
Here is the output from a test module. Each couple of numbers is the value of the expires field just before and after the call:
[100047.127123] Init timer 1000
[100048.127986] Expired timer 99790884 99790886
[100049.129578] Expired timer 99791886 99791888
[100050.131146] Expired timer 99792888 99792890
[100051.132728] Expired timer 99793890 99793892
[100052.134315] Expired timer 99794892 99794894
[100053.135882] Expired timer 99795894 99795896
[100054.137411] Expired timer 99796896 99796898
[...]
[100071.164276] Expired timer 99813930 99813932
[100071.529455] Exit timer
And here is the source:
#include <linux/module.h>
#include <linux/kernel.h>
#include <linux/init.h>
#include <linux/jiffies.h>
#include <linux/time.h>
static struct timer_list t;
static void timer_func(unsigned long data)
{
unsigned long pre, post;
t.expires = jiffies + HZ;
pre = t.expires;
add_timer(&t);
post = t.expires;
printk("Expired timer %lu %lu\n", pre, post);
}
static int __init timer_init(void)
{
init_timer(&t);
t.function = timer_func;
t.expires = jiffies + HZ;
add_timer(&t);
printk("Init timer %d\n", HZ);
return 0;
}
static void __exit timer_exit(void)
{
del_timer(&t);
printk("Exit timer\n");
}
module_init(timer_init);
module_exit(timer_exit);
I found the cause. Let's trace the add_timer function:
The add_timer function calls:
mod_timer(timer, timer->expires);
The mod_timer function calls:
expires = apply_slack(timer, expires);
and then goes on to actually modify the timer.
The apply_slack function says:
/*
* Decide where to put the timer while taking the slack into account
*
* Algorithm:
* 1) calculate the maximum (absolute) time
* 2) calculate the highest bit where the expires and new max are different
* 3) use this bit to make a mask
* 4) use the bitmask to round down the maximum time, so that all last
* bits are zeros
*/
Before continuing, let's see what is the timer's slack. The init_timer macro eventually calls do_init_timer which sets the slack by default to -1.
With this knowledge, let's reduce apply_slack and see what remains of it:
static inline
unsigned long apply_slack(struct timer_list *timer, unsigned long expires)
{
unsigned long expires_limit, mask;
int bit;
if (timer->slack >= 0) {
expires_limit = expires + timer->slack;
} else {
long delta = expires - jiffies;
if (delta < 256)
return expires;
expires_limit = expires + delta / 256;
}
mask = expires ^ expires_limit;
if (mask == 0)
return expires;
bit = find_last_bit(&mask, BITS_PER_LONG);
mask = (1 << bit) - 1;
expires_limit = expires_limit & ~(mask);
return expires_limit;
}
The first if, checking for timer->slack >= 0 fails, so the else part is applied. In that part the difference between expires and jiffies is slightly less than HZ (you just did t.expires = jiffies + HZ. Therefore, the delta in the function (with your data) is most likely about 4 and delta / 4 is non zero.
This in turn implies that mask (which is expires ^ expires_limit) is not zero. The rest really depends on the value of expires, but for sure, it gets changed.
So there you have it, since slack is automatically set to -1, the apply_slack function is changing your expires time to align with, I guess, the timer ticks.
If you don't want this slack, you can set t.slack = 0; when you are initializing the timer in timer_init.
This is the old answer! It doesn't address the issue in your question, but it is an issue with what you are trying to achieve nonetheless: having a periodic function.
Let's visualize your program in a timeline (assuming start time 1000 and HZ=50 with imaginary time units):
time (jiffies) event
1000 in timer_init(): t.expires = jiffies + HZ; // t.expires == 1050
1050 timer_func() is called by timer
1052 in timer_func(): t.expires = jiffies + HZ; // t.expires == 1102
1102 timer_func() is called by timer
1104 in timer_func(): t.expires = jiffies + HZ; // t.expires == 1154
I hope you see where this is going! The problem is that there is a delay between the time the timer expires and the time you calculate when the next expiration should be. That's where the drift comes from. The drift could get even larger, by the way, if the system is busy and your function call is delayed.
The way to fix it is very very easy. The problem was that when you update t.expires by jiffies, which is the current time. What you should do is update t.expires by the last time it expired (which is already in t.expires!).
So, in your timer_func function, instead of:
t.expires = jiffies + HZ;
simply do:
t.expires += HZ;

Accurately Calculating CPU Utilization in Linux using /proc/stat

There are a number of posts and references on how to get CPU Utilization using statistics in /proc/stat. However, most of them use only four of the 7+ CPU stats (user, nice, system, and idle), ignoring the remaining jiffie CPU counts present in Linux 2.6 (iowait, irq, softirq).
As an example, see Determining CPU utilization.
My question is this: Are the iowait/irq/softirq numbers also counted in one of the first four numbers (user/nice/system/idle)? In other words, does the total jiffie count equal the sum of the first four stats? Or, is the total jiffie count equal to the sum of all 7 stats? If the latter is true, then a CPU utilization formula should take all of the numbers into account, like this:
#include <stdio.h>
#include <stdlib.h>
int main(void)
{
long double a[7],b[7],loadavg;
FILE *fp;
for(;;)
{
fp = fopen("/proc/stat","r");
fscanf(fp,"%*s %Lf %Lf %Lf %Lf",&a[0],&a[1],&a[2],&a[3],&a[4],&a[5],&a[6]);
fclose(fp);
sleep(1);
fp = fopen("/proc/stat","r");
fscanf(fp,"%*s %Lf %Lf %Lf %Lf",&b[0],&b[1],&b[2],&b[3],&b[4],&b[5],&b[6]);
fclose(fp);
loadavg = ((b[0]+b[1]+b[2]+b[4]+b[5]+b[6]) - (a[0]+a[1]+a[2]+a[4]+a[5]+a[6]))
/ ((b[0]+b[1]+b[2]+b[3]+b[4]+b[5]+b[6]) - (a[0]+a[1]+a[2]+a[3]+a[4]+a[5]+a[6]));
printf("The current CPU utilization is : %Lf\n",loadavg);
}
return(0);
}
I think iowait/irq/softirq are not counted in one of the first 4 numbers. You can see the comment of irqtime_account_process_tick in kernel code for more detail:
(for Linux kernel 4.1.1)
2815 * Tick demultiplexing follows the order
2816 * - pending hardirq update <-- this is irq
2817 * - pending softirq update <-- this is softirq
2818 * - user_time
2819 * - idle_time <-- iowait is included in here, discuss below
2820 * - system time
2821 * - check for guest_time
2822 * - else account as system_time
For the idle time handling, see account_idle_time function:
2772 /*
2773 * Account for idle time.
2774 * #cputime: the cpu time spent in idle wait
2775 */
2776 void account_idle_time(cputime_t cputime)
2777 {
2778 u64 *cpustat = kcpustat_this_cpu->cpustat;
2779 struct rq *rq = this_rq();
2780
2781 if (atomic_read(&rq->nr_iowait) > 0)
2782 cpustat[CPUTIME_IOWAIT] += (__force u64) cputime;
2783 else
2784 cpustat[CPUTIME_IDLE] += (__force u64) cputime;
2785 }
If the cpu is idle AND there is some IO pending, it will count the time in CPUTIME_IOWAIT. Otherwise, it is count in CPUTIME_IDLE.
To conclude, I think the jiffies in irq/softirq should be counted as "busy" for cpu because it was actually handling some IRQ or soft IRQ. On the other hand, the jiffies in "iowait" should be counted as "idle" for cpu because it was not doing something but waiting for a pending IO to happen.
from busybox, its top magic is:
static const char fmt[] ALIGN1 = "cp%*s %llu %llu %llu %llu %llu %llu %llu %llu";
int ret;
if (!fgets(line_buf, LINE_BUF_SIZE, fp) || line_buf[0] != 'c' /* not "cpu" */)
return 0;
ret = sscanf(line_buf, fmt,
&p_jif->usr, &p_jif->nic, &p_jif->sys, &p_jif->idle,
&p_jif->iowait, &p_jif->irq, &p_jif->softirq,
&p_jif->steal);
if (ret >= 4) {
p_jif->total = p_jif->usr + p_jif->nic + p_jif->sys + p_jif->idle
+ p_jif->iowait + p_jif->irq + p_jif->softirq + p_jif->steal;
/* procps 2.x does not count iowait as busy time */
p_jif->busy = p_jif->total - p_jif->idle - p_jif->iowait;
}

Converting jiffies to milli seconds

How do I manually convert jiffies to milliseconds and vice versa in Linux? I know kernel 2.6 has a function for this, but I'm working on 2.4 (homework) and though I looked at the code it uses lots of macro constants which I have no idea if they're defined in 2.4.
As a previous answer said, the rate at which jiffies increments is fixed.
The standard way of specifying time for a function that accepts jiffies is using the constant HZ.
That's the abbreviation for Hertz, or the number of ticks per second. On a system with a timer tick set to 1ms, HZ=1000. Some distributions or architectures may use another number (100 used to be common).
The standard way of specifying a jiffies count for a function is using HZ, like this:
schedule_timeout(HZ / 10); /* Timeout after 1/10 second */
In most simple cases, this works fine.
2*HZ /* 2 seconds in jiffies */
HZ /* 1 second in jiffies */
foo * HZ /* foo seconds in jiffies */
HZ/10 /* 100 milliseconds in jiffies */
HZ/100 /* 10 milliseconds in jiffies */
bar*HZ/1000 /* bar milliseconds in jiffies */
Those last two have a bit of a problem, however, as on a system with a 10 ms timer tick, HZ/100 is 1, and the precision starts to suffer. You may get a delay anywhere between 0.0001 and 1.999 timer ticks (0-2 ms, essentially). If you tried to use HZ/200 on a 10ms tick system, the integer division gives you 0 jiffies!
So the rule of thumb is, be very careful using HZ for tiny values (those approaching 1 jiffie).
To convert the other way, you would use:
jiffies / HZ /* jiffies to seconds */
jiffies * 1000 / HZ /* jiffies to milliseconds */
You shouldn't expect anything better than millisecond precision.
Jiffies are hard-coded in Linux 2.4. Check the definition of HZ, which is defined in the architecture-specific param.h. It's often 100 Hz, which is one tick every (1 sec/100 ticks * 1000 ms/sec) 10 ms.
This holds true for i386, and HZ is defined in include/asm-i386/param.h.
There are functions in include/linux/time.h called timespec_to_jiffies and jiffies_to_timespec where you can convert back and forth between a struct timespec and jiffies:
#define MAX_JIFFY_OFFSET ((~0UL >> 1)-1)
static __inline__ unsigned long
timespec_to_jiffies(struct timespec *value)
{
unsigned long sec = value->tv_sec;
long nsec = value->tv_nsec;
if (sec >= (MAX_JIFFY_OFFSET / HZ))
return MAX_JIFFY_OFFSET;
nsec += 1000000000L / HZ - 1;
nsec /= 1000000000L / HZ;
return HZ * sec + nsec;
}
static __inline__ void
jiffies_to_timespec(unsigned long jiffies, struct timespec *value)
{
value->tv_nsec = (jiffies % HZ) * (1000000000L / HZ);
value->tv_sec = jiffies / HZ;
}
Note: I checked this info in version 2.4.22.
I found this sample code on kernelnewbies. Make sure you link with -lrt
#include <unistd.h>
#include <time.h>
#include <stdio.h>
int main()
{
struct timespec res;
double resolution;
printf("UserHZ %ld\n", sysconf(_SC_CLK_TCK));
clock_getres(CLOCK_REALTIME, &res);
resolution = res.tv_sec + (((double)res.tv_nsec)/1.0e9);
printf("SystemHZ %ld\n", (unsigned long)(1/resolution + 0.5));
return 0;
}
To obtain the USER_HZ value (see the comments under the accepted answer) using CLI:
getconf CLK_TCK

Resources