proc/[pid[/stat file unrecognizable output - linux

When I read the stat file following output comes
15465 (out1) S 15290 15465 15290 34817 15465 4202496 185 0 0 0 0 0 0 0 20 0 1 0 1505506 4263936 89 18446744073709551615 4194304 4196524 140733951429456 140733951428040 139957189597360 0 0 0 0 18446744071582981369 0 0 17 1 0 0 0 0 0 6295080 6295608 23592960 140733951431498 140733951431506 140733951431506 140733951434736 0
i.e. 52 lines are there ...whereas in the man proc around 44 lines are given.
Why this extra information is coming??
Can anyone please elaborate. I am working on Ubuntu 12.04 , kernel is 3.5.0-40-generic .

A very good documentation of stat content is available on the Linux /proc Filesystem manual:
Table 1-4: Contents of the stat files (as of 2.6.30-rc7)
..............................................................................
Field Content
pid process id
tcomm filename of the executable
state state (R is running, S is sleeping, D is sleeping in an
uninterruptible wait, Z is zombie, T is traced or stopped)
ppid process id of the parent process
pgrp pgrp of the process
sid session id
tty_nr tty the process uses
tty_pgrp pgrp of the tty
flags task flags
min_flt number of minor faults
cmin_flt number of minor faults with child's
maj_flt number of major faults
cmaj_flt number of major faults with child's
utime user mode jiffies
stime kernel mode jiffies
cutime user mode jiffies with child's
cstime kernel mode jiffies with child's
priority priority level
nice nice level
num_threads number of threads
it_real_value (obsolete, always 0)
start_time time the process started after system boot
vsize virtual memory size
rss resident set memory size
rsslim current limit in bytes on the rss
start_code address above which program text can run
end_code address below which program text can run
start_stack address of the start of the main process stack
esp current value of ESP
eip current value of EIP
pending bitmap of pending signals
blocked bitmap of blocked signals
sigign bitmap of ignored signals
sigcatch bitmap of caught signals
0 (place holder, used to be the wchan address, use /proc/PID/wchan instead)
0 (place holder)
0 (place holder)
exit_signal signal to send to parent thread on exit
task_cpu which CPU the task is scheduled on
rt_priority realtime priority
policy scheduling policy (man sched_setscheduler)
blkio_ticks time spent waiting for block IO
gtime guest time of the task in jiffies
cgtime guest time of the task children in jiffies
start_data address above which program data+bss is placed
end_data address below which program data+bss is placed
start_brk address above which program heap can be expanded with brk()
arg_start address above which program command line is placed
arg_end address below which program command line is placed
env_start address above which program environment is placed
env_end address below which program environment is placed
exit_code the thread's exit_code in the form reported by the waitpid system call
..............................................................................
source

Here is the code that forms the /proc/[pid]/stat file contents
http://lxr.free-electrons.com/source/fs/proc/stat.c?v=3.5#L74

Related

How to illustrate use of smaller nice value of process in Linux?

I want to see the real change in process launch and execution after changing nice value.
When i allocate lower nice value to process, what changes should i see.
$ps -l |head -2
UID PID PPID F CPU PRI NI SZ RSS WCHAN S
501 25164 25144 4006 0 31 10 4280144 1584 - SN+
I executed
$renice -6 25164
and i got new value of NICENESS as -6 ,it was 10 before
ps -l |head -2
UID PID PPID F CPU PRI NI SZ RSS WCHAN S
501 25164 25144 4006 0 31 -6 4280144 1584 - S<+
So, what changes i should see now. i.e Should it increase processing speed .
or launch time will be less.
$renice -6 pid
I want to see the changes in process execution time, as it gets higher priority .What benefit user will get?
You will only see a difference in execution time if the cpu is fully utilized since the niceness affects the priority of a process. So to benchmark a difference you will need to run some other program that fully utilizes the cpu and then run the program you are benchmarking. Then change the niceness so that the priority is now more or less than the other program and you will then see a difference in execution time.

Cannot allocate exclusively a CPU for my process

I have a simple mono-threaded application that does almost pure processing
It uses two int buffers of the same size
It reads one-by-one all the values of the first buffer
each value is a random index in the second buffer
It reads the value at the index in the second buffer
It sums all the values taken from the second buffer
It does all the previous steps for bigger and bigger
At the end, I print the number of voluntary and involuntary CPU context switches
If the size of the buffers become quite big, my PC starts to slow down: why? I have 4 cores with hyper-threading so 3 cores are remaing. Only one is 100% busy. Is it because my process uses almost 100% for the "RAM-bus"?
Then, I created a CPU-set that I want to dedicate to my process (my CPU-set contains both CPU-threads of the same core)
$ cat /sys/devices/system/cpu/cpu3/topology/core_id
3
$ cat /sys/devices/system/cpu/cpu7/topology/core_id
3
$ cset set -c 3,7 -s my_cpuset
$ cset set -l
cset:
Name CPUs-X MEMs-X Tasks Subs Path
------------ ---------- - ------- - ----- ---- ----------
root 0-7 y 0 y 934 1 /
my_cpuset 3,7 n 0 n 0 0 /my_cpuset
It seems that absolutely no task at all is running on my CPU-set. I can relaunch my process and while it is running, I launch:
$ taskset -c 7 ./TestCpuset # Here, I launch my process
...
$ ps -mo pid,tid,fname,user,psr -p 25244 # 25244 being the PID of my process
PID TID COMMAND USER PSR
25244 - TestCpus phil -
- 25244 - phil 7
PSR = 7: my process is well running on the expected CPU-thread. I hope it is the only one running on it but at the end, my process displays:
Number of voluntary context switch: 2
Number of involuntary context switch: 1231
If I had involuntary context switches, it means that other processes are running on my core: How is it possible? What must I do in order to get Number of involuntary context switch = 0?
Last question: When my process is running, if I launch
$ cset set -l
cset:
Name CPUs-X MEMs-X Tasks Subs Path
------------ ---------- - ------- - ----- ---- ----------
root 0-7 y 0 y 1031 1 /
my_cpuset 3,7 n 0 n 0 0 /my_cpuset
Once again I get 0 tasks on my CPU-set. But I know that there is a process running on it: it seems that a task is not a process?
If the size of the buffers become quite big, my PC starts to slow down: why? I have 4 cores with hyper-threading so 3 cores are remaing. Only one is 100% busy. Is it because my process uses almost 100% for the "RAM-bus"?
You reached the hardware performance limit of a single-threaded application, that is 100% CPU time on the single CPU your program is allocated to. Your application thread will not run on more than one CPU at a time (reference).
What must I do in order to get Number of involuntary context switch = 0?
Aren't you missing --cpu_exclusive option in cset set command?
By the way, if you want to achieve lower execution time, i suggest you to make a multithreaded application and let operating system, and the hardware beneath parallelize execution instead. Locking a process to a CPU set and preventing it from doing context-switch might degrade the operating system performance and is not a portable solution.

Why is pidstat not reflecting the change I made in CPU affinity using taskset?

I wanted to change the CPU affinity of a process with PID 1132, so I used the following command and successfully changed its CPU affinity:
abc#abc:~$ taskset -pc 1132
pid 1132's current affinity list: 0
But when I try to see the CPU of that PID 1132 is using, I get the same old CPU ie. CPU 3
abc#abc:~$ pidstat |grep '1132'
10:01:37 AM 1132 0.00 0.00 0.00 0.00 3 runsv
Why is it so?
The affinity list is wrong. You should assign the process 1132 to one or more cpus. Currently the affinity list is 0 which is wrong. If you want to assign to the CPU 0, you should use 0x00000001, as:
0x00000001 is processor #0
0x00000003 is processors #0 and #1
0xFFFFFFFF is all processors (#0 through #31).
The CPU affinity is represented as a bitmask, with the lowest order bit corresponding to the first logical CPU and the highest order bit corresponding to the last logical CPU. Not all CPUs may exist on a given system but a mask may specify more CPUs than are present. \
A retrieved mask will reflect only the bits that correspond to CPUs physically on the system. If an invalid mask is given (i.e., one that corresponds to no valid CPUs on the current system) an error is returned.
On my linux (kernel 4.0), I am not able to assign CPU list 0x0 to a process:
# taskset -p 0x0 1089
pid 1089's current affinity mask: ff000000ff0000ff00ff
taskset: failed to set pid 1089's affinity: Invalid argument

How is each process pinned to a specific core by scheduler (Linux)

I am now studying about the scheduler of Linux. Regarding CPU core affinity, I would like to know the following:
1) How is each process(thread) pinned to a core?
there is a system call sched_setaffinity to change the core affinity on which a process is executed. But internally, when a process(or a thread) is generated, how does the default Linux scheduler assign the process(thread) to a specific core? I modified sched_setaffinity system call to dump information about the task being moved from one core to another.
printk(KERN_INFO "%d %d %ld %lu %s\n", current->pid, current->tgid,
current->state, current->cpus_allowed,
current->comm);
It seems that there is no dump of the above information in /var/log/messages. So the default scheduler pins each process in a different way, but I cannot figure out how.
2) Is it possible to get core ID by PID or other information?
This is what I want to implement inside of Linux kernel. In task_struct, there is a member called cpus_allowed. But this is a mask for setting affinity, not core ID. I want to retrieve a data identifying the core on which specified process is running.
Thanks,
Each CPU has its own runqueue, AFAIK we can find out current CPU of a process by looking for which runqueue it belongs to. Given task_struct *p, we can get its runqueue by struct rq = task_rq(p), and struct rq has a field named cpu, I guess this should be the answer.
I have not tried this in practice, just read some code online, and am not quite sure it it will work or not. Wish it could help you.
You can determine the CPU ID on which a thread is running by using its task_struct:
#include <linux/sched.h>
task_struct *p;
int cpu_id = task_cpu(p);
Field 39 in /proc/pid/stat tells the current core/cpu of the process.
e.g.:
#cat /proc/6128/stat
6128 (perl) S 3390 6128 3390 34821 6128 4202496 2317 268 0 0 1621 59 0 0 16 0 1 0 6860821 10387456 1946 18446744073709551615 1 1 0 0 0 0 0 128 0 18446744073709551615 0 0 17 8 0 0 0
Process 6128 is running on core 8.
CPU core affinity is OS specific. The OS knows how to do this, you do not have to. You could run into all sorts of issues if you specified which core to run on, some of which could actually slow the process down.
In Linux Kernel, the data structure associated with processes task_struct contains cpu_allowed bitmask field. This contains n bits one for each of n processors in the machine. A machine with four physical core's would have four bits. If those CPU core's were hyperthread-enabled they would have an eight-bit bitmask. If a given bit is set for a given process, that process may run on the associated core. Therefore, if a process is allowed to run on any core and allowed to migrate across processors as needed, the bitmask would be entirely 1s. This is in fact, the default state for processes under Linux. For example,
PID 2441: PRIO 0, POLICY N: SCHED_NORMAL, NICE 0, AFFINITY 0x3
the process 2441 has a CPU affinity of 0x3, which means it can used in Core0 and Core1.
The applications can also specify/set the affinity using Kernel API sched_set_affinity() by altering the bitmask.

profile program's speed on Linux

I have a couple variants of a program that I want to compare on performance. Both perform essentially the same task.
One does it all in C and memory. The other calls an external utility and does file IO.
How do I reliably compare them?
1) Getting "time on CPU" using "time" favors the second variant for calling system() and doing IO. Even if I add "system" time to "user" time, it'll still not count for time spent blocked on wait().
2) I can't just clock them for they run on a server and can be pushed off the CPU any time. Averaging across 1000s of experiments is a soft option, since I have no idea how my server is utilized - it's a VM on a cluster, it's kind of complicated.
3) profilers do not help since they'll give me time spent in the code, which again favors the version that does system()
I need to add up all CPU time that these programs consume, including user, kernel, IO, and children's recursively.
I expected this to be a common problem, but still don't seem to find a solution.
(Solved with times() - see below. Thanks everybody)
If I've understood, typing "time myapplication" on a bash command line is not what you are looking for.
If you want accuracy, you must use a profiler... You have the source, yes?
Try something like Oprofile or Valgrind, or take a look at this for a more extended list.
If you haven't the source, honestly I don't know...
/usr/bin/time (not built-in "time" in bash) can give some interesting stats.
$ /usr/bin/time -v xeyes
Command being timed: "xeyes"
User time (seconds): 0.00
System time (seconds): 0.01
Percent of CPU this job got: 0%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:04.57
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 0
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 9
Minor (reclaiming a frame) page faults: 517
Voluntary context switches: 243
Involuntary context switches: 0
Swaps: 0
File system inputs: 1072
File system outputs: 0
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0
Run them a thousand times, measure actual time taken, then average the results. That should smooth out any variances due to other applications running on your server.
I seem to have found it at last.
NAME
times - get process times
SYNOPSIS
#include
clock_t times(struct tms *buf);
DESCRIPTION
times() stores the current process times in the struct tms that buf
points to. The struct tms is as defined in :
struct tms {
clock_t tms_utime; /* user time */
clock_t tms_stime; /* system time */
clock_t tms_cutime; /* user time of children */
clock_t tms_cstime; /* system time of children */
};
The children's times are a recursive sum of all waited-for children.
I wonder why it hasn't been made a standard CLI utility yet. Or may be I'm just ignorant.
I'd probably lean towards adding "time -o somefile" to the front of the system command, and then adding it to the time given by time'ing your main program to get a total. Unless I had to do this lots of times, then I'd find a way to take two time outputs and add them up to the screen (using awk or shell or perl or something).

Resources