pthread_cond_timedwait returns one second early - linux

The program below produces this output:
$ ./test_condvar 9000
1343868189.623067126 1343868198.623067126 FIRST
1343868197.623132345 1343868206.623132345 TIMEOUT
1343868205.623190120 1343868214.623190120 TIMEOUT
1343868213.623248184 1343868222.623248184 TIMEOUT
1343868221.623311549 1343868230.623311549 TIMEOUT
1343868229.623369718 1343868238.623369718 TIMEOUT
1343868237.623428856 1343868246.623428856 TIMEOUT
Note that reading across rows shows a time delta of the intended 9 seconds, but reading down columns show that pthread_cond_timedwait returns ETIMEDOUT in 8 seconds.
pthread lib is glibc 2.12. running Red Hat EL6. uname -a shows 2.6.32-131.12.1.el6.x86_64 #1 SMP Tue Aug 23 11:13:45 CDT 2011 x86_64 x86_64 x86_64 GNU/Linux
it looks like pthread_cond_timedwait relies on lll_futex_timed_wait for the timeout behavior.
Any ideas on where else to search for an explanation?
#include <time.h>
#include <sys/time.h>
#include <pthread.h>
#include <errno.h>
#include <stdlib.h>
#include <stdio.h>
int main ( int argc, char *argv[] )
{
pthread_mutexattr_t mtx_attr;
pthread_mutex_t mtx;
pthread_condattr_t cond_attr;
pthread_cond_t cond;
int milliseconds;
const char *res = "FIRST";
if ( argc < 2 )
{
fputs ( "must specify interval in milliseconds", stderr );
exit ( EXIT_FAILURE );
}
milliseconds = atoi ( argv[1] );
pthread_mutexattr_init ( &mtx_attr );
pthread_mutexattr_settype ( &mtx_attr, PTHREAD_MUTEX_NORMAL );
pthread_mutexattr_setpshared ( &mtx_attr, PTHREAD_PROCESS_PRIVATE );
pthread_mutex_init ( &mtx, &mtx_attr );
pthread_mutexattr_destroy ( &mtx_attr );
#ifdef USE_CONDATTR
pthread_condattr_init ( &cond_attr );
if ( pthread_condattr_setclock ( &cond_attr, CLOCK_REALTIME ) != 0 )
{
fputs ( "pthread_condattr_setclock failed", stderr );
exit ( EXIT_FAILURE );
}
pthread_cond_init ( &cond, &cond_attr );
pthread_condattr_destroy ( &cond_attr );
#else
pthread_cond_init ( &cond, NULL );
#endif
for (;;)
{
struct timespec now, ts;
clock_gettime ( CLOCK_REALTIME, &now );
ts.tv_sec = now.tv_sec + milliseconds / 1000;
ts.tv_nsec = now.tv_nsec + (milliseconds % 1000) * 1000000;
if (ts.tv_nsec > 1000000000)
{
ts.tv_nsec -= 1000000000;
++ts.tv_sec;
}
printf ( "%ld.%09ld %ld.%09ld %s\n", now.tv_sec, now.tv_nsec,
ts.tv_sec, ts.tv_nsec, res );
pthread_mutex_lock ( &mtx );
if ( pthread_cond_timedwait ( &cond, &mtx, &ts ) == ETIMEDOUT )
res = "TIMEOUT";
else
res = "OTHER";
pthread_mutex_unlock ( &mtx );
}
}

There was a Linux kernel bug triggered by the insertion of a leap second on July 1st this year, which resulted in futexes expiring one second too early until either the machine was rebooted or you ran the workaround:
# date -s "`date`"
It sounds like you've been bitten by that.

I'm not sure that this is related to the specific issue but your line:
if (ts.tv_nsec > 1000000000)
should really be:
if (ts.tv_nsec >= 1000000000)
And, in fact, if you do something unexpected and pass in 10000 (for example), you may want to consider making it:
while (ts.tv_nsec >= 1000000000)
though it's probably better at some point to use modulus arithmetic so that loop doesn't run for too long.
Other than that, this appears to be some sort of issue with your environment. The code works fine for me under Debian, Linux MYBOX 2.6.32-5-686 #1 SMP Sun May 6 04:01:19 UTC 2012 i686 GNU/Linux:
1343871427.442705862 1343871436.442705862 FIRST
1343871436.442773672 1343871445.442773672 TIMEOUT
1343871445.442832158 1343871454.442832158 TIMEOUT
:
One possibility is the fact that the system clock is not sacrosanct - it may be modified periodically by NTP or other time synchronisation processes. I mention that as a possibility but it seems a little strange that it would happen in the short time between the timeout and getting the current time.
One test would be to use a different timeout (such as alternating seven and thirteen seconds) to see if the effect is the same (those numbers were chosen to be prime and unlikely to be a multiple of any other activity on the system.

Related

Why does my process take too long to die?

Basically I'm using Linux 2.6.34 on PowerPC (Freescale e500mc). I have a process (a kind of VM that was developed in-house) that uses about 2.25 G of mlocked VM. When I kill it, I notice that it takes upwards of 2 minutes to terminate.
I investigated a little. First, I closed all open file descriptors but that didn't seem to make a difference. Then I added some printk in the kernel and through it I found that all delay comes from the kernel unlocking my VMAs. The delay is uniform across pages, which I verified by repeatedly checking the locked page count in /proc/meminfo. I've checked with programs that allocate that much memory and they all die as soon as I signal them.
What do you think I should check now? Thanks for your replies.
Edit: I had to find a way to share more information about the problem so I wrote this below program:
#include <stdio.h>
#include <stdlib.h>
#include <sys/mman.h>
#include <string.h>
#include <errno.h>
#include <signal.h>
#include <sys/time.h>
#define MAP_PERM_1 (PROT_WRITE | PROT_READ | PROT_EXEC)
#define MAP_PERM_2 (PROT_WRITE | PROT_READ)
#define MAP_FLAGS (MAP_ANONYMOUS | MAP_FIXED | MAP_PRIVATE)
#define PG_LEN 4096
#define align_pg_32(addr) (addr & 0xFFFFF000)
#define num_pg_in_range(start, end) ((end - start + 1) >> 12)
inline void __force_pgtbl_alloc(unsigned int start)
{
volatile int *s = (int *) start;
*s = *s;
}
int __map_a_page_at(unsigned int start, int whichperm)
{
int perm = whichperm ? MAP_PERM_1 : MAP_PERM_2;
if(MAP_FAILED == mmap((void *)start, PG_LEN, perm, MAP_FLAGS, 0, 0)){
fprintf(stderr,
"mmap failed at 0x%x: %s.\n",
start, strerror(errno));
return 0;
}
return 1;
}
int __mlock_page(unsigned int addr)
{
if (mlock((void *)addr, (size_t)PG_LEN) < 0){
fprintf(stderr,
"mlock failed on page: 0x%x: %s.\n",
addr, strerror(errno));
return 0;
}
return 1;
}
void sigint_handler(int p)
{
struct timeval start = {0 ,0}, end = {0, 0}, diff = {0, 0};
gettimeofday(&start, NULL);
munlockall();
gettimeofday(&end, NULL);
timersub(&end, &start, &diff);
printf("Munlock'd entire VM in %u secs %u usecs.\n",
diff.tv_sec, diff.tv_usec);
exit(0);
}
int make_vma_map(unsigned int start, unsigned int end)
{
int num_pg = num_pg_in_range(start, end);
if (end < start){
fprintf(stderr,
"Bad range: start: 0x%x end: 0x%x.\n",
start, end);
return 0;
}
for (; num_pg; num_pg --, start += PG_LEN){
if (__map_a_page_at(start, num_pg % 2) && __mlock_page(start))
__force_pgtbl_alloc(start);
else
return 0;
}
return 1;
}
void display_banner()
{
printf("-----------------------------------------\n");
printf("Virtual memory allocator. Ctrl+C to exit.\n");
printf("-----------------------------------------\n");
}
int main()
{
unsigned int vma_start, vma_end, input = 0;
int start_end = 0; // 0: start; 1: end;
display_banner();
// Bind SIGINT handler.
signal(SIGINT, sigint_handler);
while (1){
if (!start_end)
printf("start:\t");
else
printf("end:\t");
scanf("%i", &input);
if (start_end){
vma_end = align_pg_32(input);
make_vma_map(vma_start, vma_end);
}
else{
vma_start = align_pg_32(input);
}
start_end = !start_end;
}
return 0;
}
As you would see, the program accepts ranges of virtual addresses, each range being defined by start and end. Each range is then further subdivided into page-sized VMAs by giving different permissions to adjacent pages. Interrupting (using SIGINT) the program triggers a call to munlockall() and the time for said procedure to complete is duly noted.
Now, when I run it on freescale e500mc with Linux version at 2.6.34 over the range 0x30000000-0x35000000, I get a total munlockall() time of almost 45 seconds. However, if I do the same thing with smaller start-end ranges in random orders (that is, not necessarily increasing addresses) such that the total number of pages (and locked VMAs) is roughly the same, observe total munlockall() time to be no more than 4 seconds.
I tried the same thing on x86_64 with Linux 2.6.34 and my program compiled against the -m32 parameter and it seems the variations, though not so pronounced as with ppc, are still 8 seconds for the first case and under a second for the second case.
I tried the program on Linux 2.6.10 on the one end and on 3.19, on the other and it seems these monumental differences don't exist there. What's more, munlockall() always completes at under a second.
So, it seems that the problem, whatever it is, exists only around the 2.6.34 version of the Linux kernel.
You said the VM was developed in-house. Does this mean you have access to the source? I would start by checking to see if it has anything to stop it from immediately terminating to avoid data loss.
Otherwise, could you potentially try to provide more information? You may also want to check out: https://unix.stackexchange.com/ as they would be better suited to help with any issues the linux kernel may be having.

Execute a command in shell and print out the summary (time + memory)?

Sorry if the question is basic, is there options in shell to show how much time and how much memory (maximum memory occupied) the execution of a command has took?
Example: I want to call a binary file as follow: ./binary --option1 option1 --option2 option2
After the execution of this command I would like to know how much time and memory this command took.
Thanks
The time(1) command, either as a separate executable or as a shell built-in, can be used to measure de the time used by a program, both in terms of wallclock time and CPU time.
But measuring the memory usage of a program, or even agreeing how to define it, is a bit different. Do you want the sum of its allocations? The maximum amount of memory allocated at a single moment? Are you interested in what the code does, or in the program behavior as a whole, where the memory allocator makes a difference? And how do you consider the memory used by shared objects? Or memory-mapped areas?
Valgrind may help with some memory-related questions, but it is more of a development tool, rather than a day-to-day system administrator tool. More specifically the Massif heap profiler can be used to profile the memory usage of an application, but it does have a measurable performance impact, especially with stack profiling enabled.
There are several files in /proc that might be simpler than using a profiler, assuming you know the PID of the process in which you're interested.
Of primary interest is /proc/$PID/status which lists ( among other things ) Virtual Memory Peak and Size ( VmPeak, VmSize respectively), and Resident "High Water Mark" and current sets ( VmHWM, VnRSS respectively )
I set up a simple C program to grab memory and then free it and then watched the file in proc corresponding to that program's PID, and it seemed to verify the manpage.
see man proc for a complete list of files that may interest you.
Here's the command line and program I used for the test:
Monitored with
PID="<program's pid>"
watch "cat /proc/$PID/status | grep ^Vm"
( compile with gcc -o grabmem grabmem.c -std=c99 )
#include <sys/types.h>
#include <stdlib.h>
#include <unistd.h>
#include <stdio.h>
#define INTNUM ( 1024 * 1024 )
#define PASSES 16
int main(){
int* mems[16];
int cur = 0;
while( mems[cur] = (int*)calloc( sizeof( int ), INTNUM ) ){
for( int i = 0; i < INTNUM; i++)
mems[cur][i] = rand();
printf("PID %i Pass %i: Grabbed another %lu of mem.\n",
(int)getpid(),
cur,
sizeof(int) * INTNUM
);
sleep( 1 );
cur++;
if( cur >= 16 ){
printf("Freeing memory...");
for( cur = (PASSES - 1); cur >= 0; cur-- ){
free(mems[cur] );
mems[cur] = NULL;
}
cur = 0;
printf("OK\n");
}
}
fprintf(stderr, "Couldn't calloc() memory.\n");
exit( -1 );
}

c code make successfully in Mac while failed in Linux

I have some source files(e.g, layout.cpp). I can use 'make' command in my local MAC machine. After make successfully, I copied all files to remote Linux machine. However, the executable file in my MAC machine could not be executed in remote Linux machine. The error message is below.
layout is not a binary executable file
I think the failure is due to format of layout file is 'Mach-O 64-bit executable', which couldn't run in linux machine.
Therefore, I tried to make source files in remote linux machine. However, it showed a lot of error messages like below.
layout.cpp:75: error: ‘exit’ was not declared in this scope
layout.cpp:88: error: ‘strcpy’ was not declared in this scope
But these errors didn't show in make process in MAC. Are these errors caused by difference of compilers in MAC and Linux? Since there are so many different errors, hence we could not simply add '#include cstdlib' or '#include string.h' to solve that. Thanks.
Source Code:
#include <iostream>
#include <fstream>
#include <map>
#include <set>
#include <string>
#include <deque>
#include <vector>
using namespace std;
// layout routines and constants
#include <layout.h>
#include <parse.h>
#include <graph.h>
// MPI
#ifdef MUSE_MPI
#include <mpi.h>
#endif
int main(int argc, char **argv) {
// initialize MPI
int myid, num_procs;
#ifdef MUSE_MPI
MPI_Init ( &argc, &argv );
MPI_Comm_size ( MPI_COMM_WORLD, &num_procs );
MPI_Comm_rank ( MPI_COMM_WORLD, &myid );
#else
myid = 0;
num_procs = 1;
#endif
// parameters that must be broadcast to all processors
int rand_seed;
float edge_cut;
char int_file[MAX_FILE_NAME];
char coord_file[MAX_FILE_NAME];
char real_file[MAX_FILE_NAME];
char parms_file[MAX_FILE_NAME];
int int_out = 0;
int edges_out = 0;
int parms_in = 0;
float real_in = -1.0;
// user interaction is handled by processor 0
if ( myid == 0 )
{
if ( num_procs > MAX_PROCS )
{
cout << "Error: Maximum number of processors is " << MAX_PROCS << "." << endl;
cout << "Adjust compile time parameter." << endl;
#ifdef MUSE_MPI
MPI_Abort ( MPI_COMM_WORLD, 1 );
#else
exit (1);
#endif
}
// get user input
parse command_line ( argc, argv );
rand_seed = command_line.rand_seed;
edge_cut = command_line.edge_cut;
int_out = command_line.int_out;
edges_out = command_line.edges_out;
parms_in = command_line.parms_in;
real_in = command_line.real_in;
strcpy ( coord_file, command_line.coord_file.c_str() );
strcpy ( int_file, command_line.sim_file.c_str() );
strcpy ( real_file, command_line.real_file.c_str() );
strcpy ( parms_file, command_line.parms_file.c_str() );
}
// now we initialize all processors by reading .int file
#ifdef MUSE_MPI
MPI_Bcast ( &int_file, MAX_FILE_NAME, MPI_CHAR, 0, MPI_COMM_WORLD );
#endif
graph neighbors ( myid, num_procs, int_file );
// finally we output file and quit
float tot_energy;
tot_energy = neighbors.get_tot_energy ();
if ( myid == 0 )
{
neighbors.write_coord ( coord_file );
cout << "Total Energy: " << tot_energy << "." << endl
<< "Program terminated successfully." << endl;
}
// MPI finalize
#ifdef MUSE_MPI
MPI_Finalize ();
#endif
}
You are missing several standard #include notably the standard C++ header <cstdlib> or the C header <stdlib.h> (for exit) and the standard C++ header <cstring> or the C header <string.h> (for strcpy), as commented by Jonathan Leffler (who also explained why that works on MacOSX but not on Linux).
You probably should switch to C++11 standard. This means installing a recent GCC (4.8) and compile with g++ -std=c++11 and of course -Wall -g (to get all warnings and debugging information) ....
And you did not search enough on these issues. You could have typed man exit to get exit(3) man page, or man strcpy to get strcpy(3) man page. Both man pages give relevant C include headers....
BTW, the bug is really in your source code. A code using exit really should include <stdlib.h> or <cstdlib> explicitly by itself (at least for readability reasons). You should not suppose that some other system header is (accidentally) including that.

setitimer and signal count on Linux. Is signal count directly proportional to run time?

There is a test program to work with setitimer on Linux (kernel 2.6; HZ=100). It sets various itimers to send signal every 10 ms (actually it is set as 9ms, but the timeslice is 10 ms). Then program runs for some fixed time (e.g. 30 sec) and counts signals.
Is it guaranteed that signal count will be proportional to running time? Will count be the same in every run and with every timer type (-r -p -v)?
Note, on the system should be no other cpu-active processes; and the question is about fixed-HZ kernel.
#include <stdlib.h>
#include <stdio.h>
#include <signal.h>
#include <unistd.h>
#include <sys/time.h>
/* Use 9 ms timer */
#define usecs 9000
int events = 0;
void count(int a) {
events++;
}
int main(int argc, char**argv)
{
int timer,j,i,k=0;
struct itimerval timerval = {
.it_interval = {.tv_sec=0, .tv_usec=usecs},
.it_value = {.tv_sec=0, .tv_usec=usecs}
};
if ( (argc!=2) || (argv[1][0]!='-') ) {
printf("Usage: %s -[rpv]\n -r - ITIMER_REAL\n -p - ITIMER_PROF\n -v - ITIMER_VIRTUAL\n", argv[0]);
exit(0);
}
switch(argv[1][1]) {
case'r':
timer=ITIMER_REAL;
break;
case'p':
timer=ITIMER_PROF;
break;
case'v':
timer=ITIMER_VIRTUAL;
};
signal(SIGALRM,count);
signal(SIGPROF,count);
signal(SIGVTALRM,count);
setitimer(timer, &timerval, NULL);
/* constants should be tuned to some huge value */
for (j=0; j<4; j++)
for (i=0; i<2000000000; i++)
k += k*argc + 5*k + argc*3;
printf("%d events\n",events);
return 0;
}
Is it guaranteed that signal count will be proportional to running time?
Yes. In general, for all the three timers the longer the code runs, the more the number of signals received.
Will count be the same in every run and with every timer type (-r -p -v)?
No.
When the timer is set using ITIMER_REAL, the timer decrements in real time.
When it is set using ITIMER_VIRTUAL, the timer decrements only when the process is executing in the user address space. So, it doesn't decrement when the process makes a system call or during interrupt service routines.
So we can expect that #real_signals > #virtual_signals
ITIMER_PROF timers decrement both during user space execution of the process and when the OS is executing on behalf of the process i.e. during system calls.
So #prof_signals > #virtual_signals
ITIMER_PROF doesn't decrement when OS is not executing on behalf of the process. So #real_signals > #prof_signals
To summarise, #real_signals > #prof_signals > #virtual_signals.

Getting stack traces on Unix systems, automatically

What methods are there for automatically getting a stack trace on Unix systems? I don't mean just getting a core file or attaching interactively with GDB, but having a SIGSEGV handler that dumps a backtrace to a text file.
Bonus points for the following optional features:
Extra information gathering at crash time (eg. config files).
Email a crash info bundle to the developers.
Ability to add this in a dlopened shared library
Not requiring a GUI
FYI,
the suggested solution (using backtrace_symbols in a signal handler) is dangerously broken. DO NOT USE IT -
Yes, backtrace and backtrace_symbols will produce a backtrace and a translate it to symbolic names, however:
backtrace_symbols allocates memory using malloc and you use free to free it - If you're crashing because of memory corruption your malloc arena is very likely to be corrupt and cause a double fault.
malloc and free protect the malloc arena with a lock internally. You might have faulted in the middle of a malloc/free with the lock taken, which will cause these function or anything that calls them to dead lock.
You use puts which uses the standard stream, which is also protected by a lock. If you faulted in the middle of a printf you once again have a deadlock.
On 32bit platforms (e.g. your normal PC of 2 year ago), the kernel will plant a return address to an internal glibc function instead of your faulting function in your stack, so the single most important piece of information you are interested in - in which function did the program fault, will actually be corrupted on those platform.
So, the code in the example is the worst kind of wrong - it LOOKS like it's working, but it will really fail you in unexpected ways in production.
BTW, interested in doing it right? check this out.
Cheers,
Gilad.
If you are on systems with the BSD backtrace functionality available (Linux, OSX 1.5, BSD of course), you can do this programmatically in your signal handler.
For example (backtrace code derived from IBM example):
#include <execinfo.h>
#include <signal.h>
#include <stdio.h>
#include <stdlib.h>
void sig_handler(int sig)
{
void * array[25];
int nSize = backtrace(array, 25);
char ** symbols = backtrace_symbols(array, nSize);
for (int i = 0; i < nSize; i++)
{
puts(symbols[i]);;
}
free(symbols);
signal(sig, &sig_handler);
}
void h()
{
kill(0, SIGSEGV);
}
void g()
{
h();
}
void f()
{
g();
}
int main(int argc, char ** argv)
{
signal(SIGSEGV, &sig_handler);
f();
}
Output:
0 a.out 0x00001f2d sig_handler + 35
1 libSystem.B.dylib 0x95f8f09b _sigtramp + 43
2 ??? 0xffffffff 0x0 + 4294967295
3 a.out 0x00001fb1 h + 26
4 a.out 0x00001fbe g + 11
5 a.out 0x00001fcb f + 11
6 a.out 0x00001ff5 main + 40
7 a.out 0x00001ede start + 54
This doesn't get bonus points for the optional features (except not requiring a GUI), however, it does have the advantage of being very simple, and not requiring any additional libraries or programs.
Here is an example of how to get some more info using a demangler. As you can see this one also logs the stacktrace to file.
#include <iostream>
#include <sstream>
#include <string>
#include <fstream>
#include <cxxabi.h>
void sig_handler(int sig)
{
std::stringstream stream;
void * array[25];
int nSize = backtrace(array, 25);
char ** symbols = backtrace_symbols(array, nSize);
for (unsigned int i = 0; i < size; i++) {
int status;
char *realname;
std::string current = symbols[i];
size_t start = current.find("(");
size_t end = current.find("+");
realname = NULL;
if (start != std::string::npos && end != std::string::npos) {
std::string symbol = current.substr(start+1, end-start-1);
realname = abi::__cxa_demangle(symbol.c_str(), 0, 0, &status);
}
if (realname != NULL)
stream << realname << std::endl;
else
stream << symbols[i] << std::endl;
free(realname);
}
free(symbols);
std::cerr << stream.str();
std::ofstream file("/tmp/error.log");
if (file.is_open()) {
if (file.good())
file << stream.str();
file.close();
}
signal(sig, &sig_handler);
}
Dereks solution is probably the best, but here's an alternative anyway:
Recent Linux kernel version allow you to pipe core dumps to a script or program. You could write a script to catch the core dump, collect any extra information you need and mail everything back.
This is a global setting though, so it'd apply to any crashing program on the system. It will also require root rights to set up.
It can be configured through the /proc/sys/kernel/core_pattern file. Set that to something like ' | /home/myuser/bin/my-core-handler-script'.
The Ubuntu people use this feature as well.

Resources