Execute a command in shell and print out the summary (time + memory)? - linux

Sorry if the question is basic, is there options in shell to show how much time and how much memory (maximum memory occupied) the execution of a command has took?
Example: I want to call a binary file as follow: ./binary --option1 option1 --option2 option2
After the execution of this command I would like to know how much time and memory this command took.
Thanks

The time(1) command, either as a separate executable or as a shell built-in, can be used to measure de the time used by a program, both in terms of wallclock time and CPU time.
But measuring the memory usage of a program, or even agreeing how to define it, is a bit different. Do you want the sum of its allocations? The maximum amount of memory allocated at a single moment? Are you interested in what the code does, or in the program behavior as a whole, where the memory allocator makes a difference? And how do you consider the memory used by shared objects? Or memory-mapped areas?
Valgrind may help with some memory-related questions, but it is more of a development tool, rather than a day-to-day system administrator tool. More specifically the Massif heap profiler can be used to profile the memory usage of an application, but it does have a measurable performance impact, especially with stack profiling enabled.

There are several files in /proc that might be simpler than using a profiler, assuming you know the PID of the process in which you're interested.
Of primary interest is /proc/$PID/status which lists ( among other things ) Virtual Memory Peak and Size ( VmPeak, VmSize respectively), and Resident "High Water Mark" and current sets ( VmHWM, VnRSS respectively )
I set up a simple C program to grab memory and then free it and then watched the file in proc corresponding to that program's PID, and it seemed to verify the manpage.
see man proc for a complete list of files that may interest you.
Here's the command line and program I used for the test:
Monitored with
PID="<program's pid>"
watch "cat /proc/$PID/status | grep ^Vm"
( compile with gcc -o grabmem grabmem.c -std=c99 )
#include <sys/types.h>
#include <stdlib.h>
#include <unistd.h>
#include <stdio.h>
#define INTNUM ( 1024 * 1024 )
#define PASSES 16
int main(){
int* mems[16];
int cur = 0;
while( mems[cur] = (int*)calloc( sizeof( int ), INTNUM ) ){
for( int i = 0; i < INTNUM; i++)
mems[cur][i] = rand();
printf("PID %i Pass %i: Grabbed another %lu of mem.\n",
(int)getpid(),
cur,
sizeof(int) * INTNUM
);
sleep( 1 );
cur++;
if( cur >= 16 ){
printf("Freeing memory...");
for( cur = (PASSES - 1); cur >= 0; cur-- ){
free(mems[cur] );
mems[cur] = NULL;
}
cur = 0;
printf("OK\n");
}
}
fprintf(stderr, "Couldn't calloc() memory.\n");
exit( -1 );
}

Related

capturing pid that is changing frequently [duplicate]

I want to know the CPU utilization of a process and all the child processes, for a fixed period of time, in Linux.
To be more specific, here is my use-case:
There is a process which waits for a request from the user to execute the programs. To execute the programs, this process invokes child processes (maximum limit of 5 at a time) & each of this child process executes 1 of these submitted programs (let's say user submitted 15 programs at once). So, if user submits 15 programs, then 3 batches of 5 child processes each will run. Child processes are killed as soon as they finish their execution of the program.
I want to know about % CPU Utilization for the parent process and all its child process during the execution of those 15 programs.
Is there any simple way to do this using top or another command? (Or any tool i should attach to the parent process.)
You can find this information in /proc/PID/stat where PID is your parent process's process ID. Assuming that the parent process waits for its children then the total CPU usage can be calculated from utime, stime, cutime and cstime:
utime %lu
Amount of time that this process has been scheduled in user mode,
measured in clock ticks (divide by sysconf(_SC_CLK_TCK). This includes
guest time, guest_time (time spent running a virtual CPU, see below),
so that applications that are not aware of the guest time field do not
lose that time from their calculations.
stime %lu
Amount of time that this process has been scheduled in kernel mode,
measured in clock ticks (divide by sysconf(_SC_CLK_TCK).
cutime %ld
Amount of time that this process's waited-for children have been
scheduled in user mode, measured in clock ticks (divide by
sysconf(_SC_CLK_TCK). (See also times(2).) This includes guest time,
cguest_time (time spent running a virtual CPU, see below).
cstime %ld
Amount of time that this process's waited-for children have been
scheduled in kernel mode, measured in clock ticks (divide by
sysconf(_SC_CLK_TCK).
See proc(5) manpage for details.
And of course you can do it in hardcore-way using good old C
find_cpu.c
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#define MAX_CHILDREN 100
/**
* System command execution output
* #param <char> command - system command to execute
* #returb <char> execution output
*/
char *system_output (const char *command)
{
FILE *pipe;
static char out[1000];
pipe = popen (command, "r");
fgets (out, sizeof(out), pipe);
pclose (pipe);
return out;
}
/**
* Finding all process's children
* #param <Int> - process ID
* #param <Int> - array of childs
*/
void find_children (int pid, int children[])
{
char empty_command[] = "/bin/ps h -o pid --ppid ";
char pid_string[5];
snprintf(pid_string, 5, "%d", pid);
char *command = (char*) malloc(strlen(empty_command) + strlen(pid_string) + 1);
sprintf(command, "%s%s", empty_command, pid_string);
FILE *fp = popen(command, "r");
int child_pid, i = 1;
while (fscanf(fp, "%i", &child_pid) != EOF)
{
children[i] = child_pid;
i++;
}
}
/**
* Parsign `ps` command output
* #param <char> out - ps command output
* #return <int> cpu utilization
*/
float parse_cpu_utilization (const char *out)
{
float cpu;
sscanf (out, "%f", &cpu);
return cpu;
}
int main(void)
{
unsigned pid = 1;
// getting array with process children
int process_children[MAX_CHILDREN] = { 0 };
process_children[0] = pid; // parent PID as first element
find_children(pid, process_children);
// calculating summary processor utilization
unsigned i;
float common_cpu_usage = 0.0;
for (i = 0; i < sizeof(process_children)/sizeof(int); ++i)
{
if (process_children[i] > 0)
{
char *command = (char*)malloc(1000);
sprintf (command, "/bin/ps -p %i -o 'pcpu' --no-headers", process_children[i]);
common_cpu_usage += parse_cpu_utilization(system_output(command));
}
}
printf("%f\n", common_cpu_usage);
return 0;
}
Compile:
gcc -Wall -pedantic --std=gnu99 find_cpu.c
Enjoy!
Might not be the exact command. But you can do something like below to get cpu usage of various process and add it.
#ps -C sendmail,firefox -o pcpu= | awk '{s+=$1} END {print s}'
/proc/[pid]/stat Status information about the process. This is used by ps and made into human readable form.
Another way is to use cgroups and use cpuacct.
http://www.kernel.org/doc/Documentation/cgroups/cpuacct.txt
https://access.redhat.com/knowledge/docs/en-US/Red_Hat_Enterprise_Linux/6/html/Resource_Management_Guide/sec-cpuacct.html
Here's one-liner to compute total CPU for all processes. You can adjust it by passing column filter into top output:
top -b -d 5 -n 2 | awk '$1 == "PID" {block_num++; next} block_num == 2 {sum += $9;} END {print sum}'

Linux clock_gettime() elapse spikes?

I'm try to get high resolution timestamp on linux. Using clock_gettime(), as below, I got "spike" elapses that looks pretty horrible at almost 26 micro second elapse. Most of the "dt"'s are around 30 ns. I was on linux 2.6.32, Red Hat 4.4.6. 'lscpu' shows CPU MHz=2666.121. I thought that means each each clock tick needs about 2 ns. So, asking for ns resolution didn't see like too unreasonable here.
output of program (sorry wasn't able to post this without making it a list. It thinks it's code some how)
1397534268,40823395 1397534268,40827950,dt=4555
1397534268,41233555 1397534268,41236716,dt=3161
1397534268,41389902 1397534268,41392922,dt=3020
1397534268,46488430 1397534268,46491674,dt=3244
1397534268,46531297 1397534268,46534279,dt=2982
1397534268,46823368 1397534268,46849336,dt=25968
1397534268,46915657 1397534268,46918663,dt=3006
1397534268,51488643 1397534268,51491791,dt=3148
1397534268,51530490 1397534268,51533496,dt=3006
1397534268,51823307 1397534268,51826904,dt=3597
1397534268,55823359 1397534268,55827826,dt=4467
1397534268,60531184 1397534268,60534183,dt=2999
1397534268,60823381 1397534268,60844866,dt=21485
1397534268,60913003 1397534268,60915998,dt=2995
1397534268,65823269 1397534268,65827742,dt=4473
1397534268,70823376 1397534268,70835280,dt=11904
1397534268,75823489 1397534268,75828872,dt=5383
1397534268,80823503 1397534268,80859500,dt=35997
1397534268,86823381 1397534268,86831907,dt=8526
Any ideas? thanks
#include <vector>
#include <iostream>
#include <time.h>
long long elapse( const timespec& t1, const timespec& t2 )
{
return ( t2.tv_sec * 1000000000L + t2.tv_nsec ) -
t1.tv_sec * 1000000000L + t1.tv_nsec );
}
int main()
{
const unsigned n=30000;
timespec ts;
std::vector<timespec> t( n );
for( unsigned i=0; i < n; ++i )
{
clock_gettime( CLOCK_REALTIME, &ts );
t[i] = ts;
}
std::vector<long> dt( n );
for( unsigned i=1; i < n; ++i )
{
dt[i] = elapse( t[i-1], t[i] );
if( dt[i] > 1000 )
{
std::cerr <<
t[i-1].tv_sec << ","
<< t[i-1].tv_nsec << " "
<< t[i].tv_sec << ","
<< t[i].tv_nsec
<< ",dt=" << dt[i] << std::endl;
}
else
{
//normally I get dt[i] = approx 30-35 nano secs
}
}
return 0;
}
The numbers you quoted are in the 3 to 30 microsecond range (3,000 to 30,000 nanoseconds). That is too short a time to be a context switch to another thread/process, let the other thread run, and context switch back to your thread. Most likely the core where your process was running was used by the kernel to service an external interrupt (e.g. network card, disk, timer), then returned to running your process.
You can watch the linux interrupt counters (per CPU core and per source) with this command
watch -d -n 0.2 cat /proc/interrupts
The -n 0.2 will cause the command to be issued at 5Hz, the -d flag will highlight what has changed.
The source of the interrupt could also be a TLB shootdown, which results in an IPI (Inter-Processor Interrupt). You can read more about TLB shootdowns here.
If you want to reduce the number of interrupts serviced by the core running your thread/process, you need to set the interrupt affinity. You can learn more about Red Hat Interrupts and IRQ (Interrupt requests) tuning here, and here.
Worth noting is that you are using CLOCK_REALTIME which isn't guaranteed to be "smooth", it could jump around as the system clock is "disciplined" to keep accurate time by a service like NTP (Network Time Protocol) or PTP (Precision Time Protocol). For your purposes it is better to use CLOCK_MONOTONIC, you can read more about the difference here. When a clock is "disciplined" the clock can jump by a "step" - this is unusual and certainly not the cause of the many spikes you see.
Could you check the resolution with clock_getres()?
I suspect what you are measuring here is called "OS Noise". This is often caused by your program getting pre-empted by the operating system. The operating system then performs other work. There are numerous causes, but commonly it is: other runnable tasks, hardware interrupts, or timer events.
The FTQ/FWQ benchmarks were designed to measure this characteristic and the summary contains some further information:
https://asc.llnl.gov/sequoia/benchmarks/FTQ_summary_v1.1.pdf

malloc large memory never returns NULL

when I run this, it seems to have no problem with keep allocating memory with cnt going over thousands. I don't understand why -- aren't I supposed to get a NULL at some point? Thanks!
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <math.h>
int main(void)
{
long C = pow(10, 9);
int cnt = 0;
int conversion = 8 * 1024 * 1024;
int *p;
while (1)
{
p = (int *)malloc(C * sizeof(int));
if (p != NULL)
cnt++;
else break;
if (cnt % 10 == 0)
printf("number of successful malloc is %d with %ld Mb\n", cnt, cnt * C / conversion);
}
return 0;
}
Are you running this on Linux? Linux has a highly surprising feature known as overcommit. It doesn't actually allocate memory when you call malloc(), but rather when you actually use that memory. malloc() will happily let you allocate as much memory as your heart desires, never returning a NULL pointer.
It's only when you actually access the memory that Linux takes you seriously and goes out searching for free memory to give you. Of course there may not actually be enough memory to meet the promise it gave your program. You say, "Give me 8GB," and malloc() says, "Sure." Then you try to write to your pointer and Linux says, "Oops! I lied. How bout I just kill off processes (probably yours) until I I free up enough memory?"
You're allocating virtual memory. On a 64-bit OS, virtual memory is available in almost unlimited supply.

How to calculate CPU utilization of a process & all its child processes in Linux?

I want to know the CPU utilization of a process and all the child processes, for a fixed period of time, in Linux.
To be more specific, here is my use-case:
There is a process which waits for a request from the user to execute the programs. To execute the programs, this process invokes child processes (maximum limit of 5 at a time) & each of this child process executes 1 of these submitted programs (let's say user submitted 15 programs at once). So, if user submits 15 programs, then 3 batches of 5 child processes each will run. Child processes are killed as soon as they finish their execution of the program.
I want to know about % CPU Utilization for the parent process and all its child process during the execution of those 15 programs.
Is there any simple way to do this using top or another command? (Or any tool i should attach to the parent process.)
You can find this information in /proc/PID/stat where PID is your parent process's process ID. Assuming that the parent process waits for its children then the total CPU usage can be calculated from utime, stime, cutime and cstime:
utime %lu
Amount of time that this process has been scheduled in user mode,
measured in clock ticks (divide by sysconf(_SC_CLK_TCK). This includes
guest time, guest_time (time spent running a virtual CPU, see below),
so that applications that are not aware of the guest time field do not
lose that time from their calculations.
stime %lu
Amount of time that this process has been scheduled in kernel mode,
measured in clock ticks (divide by sysconf(_SC_CLK_TCK).
cutime %ld
Amount of time that this process's waited-for children have been
scheduled in user mode, measured in clock ticks (divide by
sysconf(_SC_CLK_TCK). (See also times(2).) This includes guest time,
cguest_time (time spent running a virtual CPU, see below).
cstime %ld
Amount of time that this process's waited-for children have been
scheduled in kernel mode, measured in clock ticks (divide by
sysconf(_SC_CLK_TCK).
See proc(5) manpage for details.
And of course you can do it in hardcore-way using good old C
find_cpu.c
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#define MAX_CHILDREN 100
/**
* System command execution output
* #param <char> command - system command to execute
* #returb <char> execution output
*/
char *system_output (const char *command)
{
FILE *pipe;
static char out[1000];
pipe = popen (command, "r");
fgets (out, sizeof(out), pipe);
pclose (pipe);
return out;
}
/**
* Finding all process's children
* #param <Int> - process ID
* #param <Int> - array of childs
*/
void find_children (int pid, int children[])
{
char empty_command[] = "/bin/ps h -o pid --ppid ";
char pid_string[5];
snprintf(pid_string, 5, "%d", pid);
char *command = (char*) malloc(strlen(empty_command) + strlen(pid_string) + 1);
sprintf(command, "%s%s", empty_command, pid_string);
FILE *fp = popen(command, "r");
int child_pid, i = 1;
while (fscanf(fp, "%i", &child_pid) != EOF)
{
children[i] = child_pid;
i++;
}
}
/**
* Parsign `ps` command output
* #param <char> out - ps command output
* #return <int> cpu utilization
*/
float parse_cpu_utilization (const char *out)
{
float cpu;
sscanf (out, "%f", &cpu);
return cpu;
}
int main(void)
{
unsigned pid = 1;
// getting array with process children
int process_children[MAX_CHILDREN] = { 0 };
process_children[0] = pid; // parent PID as first element
find_children(pid, process_children);
// calculating summary processor utilization
unsigned i;
float common_cpu_usage = 0.0;
for (i = 0; i < sizeof(process_children)/sizeof(int); ++i)
{
if (process_children[i] > 0)
{
char *command = (char*)malloc(1000);
sprintf (command, "/bin/ps -p %i -o 'pcpu' --no-headers", process_children[i]);
common_cpu_usage += parse_cpu_utilization(system_output(command));
}
}
printf("%f\n", common_cpu_usage);
return 0;
}
Compile:
gcc -Wall -pedantic --std=gnu99 find_cpu.c
Enjoy!
Might not be the exact command. But you can do something like below to get cpu usage of various process and add it.
#ps -C sendmail,firefox -o pcpu= | awk '{s+=$1} END {print s}'
/proc/[pid]/stat Status information about the process. This is used by ps and made into human readable form.
Another way is to use cgroups and use cpuacct.
http://www.kernel.org/doc/Documentation/cgroups/cpuacct.txt
https://access.redhat.com/knowledge/docs/en-US/Red_Hat_Enterprise_Linux/6/html/Resource_Management_Guide/sec-cpuacct.html
Here's one-liner to compute total CPU for all processes. You can adjust it by passing column filter into top output:
top -b -d 5 -n 2 | awk '$1 == "PID" {block_num++; next} block_num == 2 {sum += $9;} END {print sum}'

max thread per process in linux

I wrote a simple program to calculate the maximum number of threads that a process can have in linux (Centos 5). here is the code:
int main()
{
pthread_t thrd[400];
for(int i=0;i<400;i++)
{
int err=pthread_create(&thrd[i],NULL,thread,(void*)i);
if(err!=0)
cout << "thread creation failed: " << i <<" error code: " << err << endl;
}
return 0;
}
void * thread(void* i)
{
sleep(100);//make the thread still alive
return 0;
}
I figured out that max number for threads is only 300!? What if i need more than that?
I have to mention that pthread_create returns 12 as error code.
Thanks before
There is a thread limit for linux and it can be modified runtime by writing desired limit to /proc/sys/kernel/threads-max. The default value is computed from the available system memory. In addition to that limit, there's also another limit: /proc/sys/vm/max_map_count which limits the maximum mmapped segments and at least recent kernels will mmap memory per thread. It should be safe to increase that limit a lot if you hit it.
However, the limit you're hitting is lack of virtual memory in 32bit operating system. Install a 64 bit linux if your hardware supports it and you'll be fine. I can easily start 30000 threads with a stack size of 8MB. The system has a single Core 2 Duo + 8 GB of system memory (I'm using 5 GB for other stuff in the same time) and it's running 64 bit Ubuntu with kernel 2.6.32. Note that memory overcommit (/proc/sys/vm/overcommit_memory) must be allowed because otherwise system would need at least 240 GB of committable memory (sum of real memory and swap space).
If you need lots of threads and cannot use 64 bit system your only choice is to minimize the memory usage per thread to conserve virtual memory. Start with requesting as little stack as you can live with.
Your system limits may not be allowing you to map the stacks of all the threads you require. Look at /proc/sys/vm/max_map_count, and see this answer. I'm not 100% sure this is your problem, because most people run into problems at much larger thread counts.
I had also encountered the same problem when my number of threads crosses some threshold.
It was because of the user level limit (number of process a user can run at a time) set to 1024 in /etc/security/limits.conf .
so check your /etc/security/limits.conf and look for entry:-
username -/soft/hard -nproc 1024
change it to some larger values to something 100k(requires sudo privileges/root) and it should work for you.
To learn more about security policy ,see http://linux.die.net/man/5/limits.conf.
check the stack size per thread with ulimit, in my case Redhat Linux 2.6:
ulimit -a
...
stack size (kbytes, -s) 10240
Each of your threads will get this amount of memory (10MB) assigned for it's stack. With a 32bit program and a maximum address space of 4GB, that is a maximum of only 4096MB / 10MB = 409 threads !!! Minus program code, minus heap-space will probably lead to your observed max. of 300 threads.
You should be able to raise this by compiling a 64bit application or setting ulimit -s 8192 or even ulimit -s 4096. But if this is advisable is another discussion...
You will run out of memory too unless u shrink the default thread stack size. Its 10MB on our version of linux.
EDIT:
Error code 12 = out of memory, so I think the 1mb stack is still too big for you. Compiled for 32 bit, I can get a 100k stack to give me 30k threads. Beyond 30k threads I get Error code 11 which means no more threads allowed. A 1MB stack gives me about 4k threads before error code 12. 10MB gives me 427 threads. 100MB gives me 42 threads. 1 GB gives me 4... We have 64 bit OS with 64 GB ram. Is your OS 32 bit? When I compile for 64bit, I can use any stack size I want and get the limit of threads.
Also I noticed if i turn the profiling stuff (Tools|Profiling) on for netbeans and run from the ide...I only can get 400 threads. Weird. Netbeans also dies if you use up all the threads.
Here is a test app you can run:
#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>
#include <signal.h>
// this prevents the compiler from reordering code over this COMPILER_BARRIER
// this doesnt do anything
#define COMPILER_BARRIER() __asm__ __volatile__ ("" ::: "memory")
sigset_t _fSigSet;
volatile int _cActive = 0;
pthread_t thrd[1000000];
void * thread(void *i)
{
int nSig, cActive;
cActive = __sync_fetch_and_add(&_cActive, 1);
COMPILER_BARRIER(); // make sure the active count is incremented before sigwait
// sigwait is a handy way to sleep a thread and wake it on command
sigwait(&_fSigSet, &nSig); //make the thread still alive
COMPILER_BARRIER(); // make sure the active count is decrimented after sigwait
cActive = __sync_fetch_and_add(&_cActive, -1);
//printf("%d(%d) ", i, cActive);
return 0;
}
int main(int argc, char** argv)
{
pthread_attr_t attr;
int cThreadRequest, cThreads, i, err, cActive, cbStack;
cbStack = (argc > 1) ? atoi(argv[1]) : 0x100000;
cThreadRequest = (argc > 2) ? atoi(argv[2]) : 30000;
sigemptyset(&_fSigSet);
sigaddset(&_fSigSet, SIGUSR1);
sigaddset(&_fSigSet, SIGSEGV);
printf("Start\n");
pthread_attr_init(&attr);
if ((err = pthread_attr_setstacksize(&attr, cbStack)) != 0)
printf("pthread_attr_setstacksize failed: err: %d %s\n", err, strerror(err));
for (i = 0; i < cThreadRequest; i++)
{
if ((err = pthread_create(&thrd[i], &attr, thread, (void*)i)) != 0)
{
printf("pthread_create failed on thread %d, error code: %d %s\n",
i, err, strerror(err));
break;
}
}
cThreads = i;
printf("\n");
// wait for threads to all be created, although we might not wait for
// all threads to make it through sigwait
while (1)
{
cActive = _cActive;
if (cActive == cThreads)
break;
printf("Waiting A %d/%d,", cActive, cThreads);
sched_yield();
}
// wake em all up so they exit
for (i = 0; i < cThreads; i++)
pthread_kill(thrd[i], SIGUSR1);
// wait for them all to exit, although we might be able to exit before
// the last thread returns
while (1)
{
cActive = _cActive;
if (!cActive)
break;
printf("Waiting B %d/%d,", cActive, cThreads);
sched_yield();
}
printf("\nDone. Threads requested: %d. Threads created: %d. StackSize=%lfmb\n",
cThreadRequest, cThreads, (double)cbStack/0x100000);
return 0;
}

Resources