Is the Linux Scheduler aware of hardware interrupts (Scheduler Jitter) - linux

If a process is interrupted by a hardware interrupt (First Level Interrupt Handler), then does the CPU scheduler becomes aware of that (e.g. Does the Scheduler count execution time for hardware interrupts separately from interrupted process)?
More details:
I am trying to troubleshoot an issue where CPU utilization in htop is way too low for the specified packet encryption task (CPU is at <10% while encrypting packets at 400Mbps; Raw encryption speed is only 1.6Gbps, so packet encryption should not go any faster than raw encryption speed).
Explanation:
My hypothesis is that packet encapsulation happens at hardware interrupts hence giving me the illusion of the low CPU usage in htop. Usually FLIHs are implemented so that they finish their task as quickly as possible and defer their work to SLIHs (Second Level Interrupt Handler which I guess is executed on behalf of ksoftirqd/X). But what happens if FLIH interrupts a process for a very long time? Does that introduce some kind of a OS jitter?
I am using Ubuntu 10.04.1 on x86-64 platform.
Additional debugging info:
while [ 1 ]; do cat /proc/stat | grep "cpu "; sleep 1; done;
cpu 288 1 1677 356408 1145 0 20863 0 0
cpu 288 1 1677 356772 1145 0 20899 0 0
cpu 288 1 1677 357108 1145 0 20968 0 0
cpu 288 1 1677 357392 1145 0 21083 0 0
cpu 288 1 1677 357620 1145 0 21259 0 0
cpu 288 1 1677 357972 1145 0 21310 0 0
cpu 288 1 1677 358289 1145 0 21398 0 0
cpu 288 1 1677 358517 1145 0 21525 0 0
cpu 288 1 1678 358838 1145 0 21652 0 0
cpu 289 1 1678 359141 1145 0 21704 0 0
cpu 289 1 1678 359563 1145 0 21729 0 0
cpu 290 1 1678 359886 1145 0 21758 0 0
cpu 290 1 1678 360296 1145 0 21801 0 0
Seventh (or sixth number column) column here I guess is the time spent inside Hardware interrupt handlers (htop uses this proc file to get statistics). I am wondering if this will end up as a bug in linux or the driver. When I took these /proc/stat snapshots the traffic was going at 500Mbps in and 500Mbps out.

The time spent in interrupt handlers is accounted.
htop shows it in "si" (soft interrupt) and "hi" (hard interrupt). ni is nice and wa is io-wait.
Edit:
From man proc:
sixth column is hardware irq time
seventh column is softirq
eight is stolen time
nienth is guest time.
the latter two are only meaningful for virtualized systems.
Do you have a kernel built with the CONFIG_IRQ_TIME_ACCOUNTING (Processor type and features/Fine granularity task level IRQ time accounting) option set?

Related

Advise on stopping compaction to reduce slowness

I am seeing high CPU and memory usage of cassandra on the seed node. Is it advisable to stop compaction(nodetool stop) and enable in offpeak hours. Should I do manual compaction or enable autocompaction. I see lot of Native-Transport-Requests. I have three seed nodes. This is the first seed node.
Pool Name Active Pending Completed Blocked All time blocked
ReadStage 0 0 54255 0 0
MiscStage 0 0 0 0 0
CompactionExecutor 2 2566 352765 0 0
MutationStage 0 0 2659921760 0 0
MemtableReclaimMemory 0 0 180958 0 0
PendingRangeCalculator 0 0 21 0 0
GossipStage 0 0 338375 0 0
SecondaryIndexManagement 0 0 0 0 0
HintsDispatcher 0 0 63 0 0
RequestResponseStage 0 1 1684328696 0 0
Native-Transport-Requests 4 0 1538523706 0 47006391
ReadRepairStage 0 0 2197 0 0
CounterMutationStage 0 0 0 0 0
MigrationStage 0 0 0 0 0
MemtablePostFlush 1 1 216220 0 0
PerDiskMemtableFlushWriter_0 1 1 180958 0 0
ValidationExecutor 0 0 33250 0 0
Sampler 0 0 0 0 0
MemtableFlushWriter 1 1 180958 0 0
InternalResponseStage 0 0 141677 0 0
ViewMutationStage 0 0 0 0 0
AntiEntropyStage 0 0 166254 0 0
CacheCleanupExecutor 0 0 0 0 0
Repair#9 0 0 5719 0 0
I do see high compactions. Is it advisable to disable compactions using nodetool stop
$ nodetool info
ID : ebeda774-cea8-40bb-9322-69c6fcded5a9
Gossip active : true
Thrift active : true
Native Transport active: true
Load : 535.37 GiB
Generation No : 1636316595
Uptime (seconds) : 73152
Heap Memory (MB) : 19542.18 / 32168.00
Off Heap Memory (MB) : 1337.98
Data Center : us-west2
Rack : a
Exceptions : 15
Key Cache : entries 152283, size 23.07 MiB, capacity 100 MiB, 23835 hits, 280738 requests, 0.085 recent hit rate, 14400 save period in seconds
Row Cache : entries 0, size 0 bytes, capacity 0 bytes, 0 hits, 0 requests, NaN recent hit rate, 0 save period in seconds
Counter Cache : entries 0, size 0 bytes, capacity 50 MiB, 0 hits, 0 requests, NaN recent hit rate, 7200 save period in seconds
Chunk Cache : entries 6782, size 423.88 MiB, capacity 480 MiB, 23947952 misses, 24381819 requests, 0.018 recent hit rate, 250.977 microseconds miss latency
Percent Repaired : 0.49796724500672584%
Token : (invoke with -T/--tokens to see all 256 tokens)
$ free -h
total used free shared buff/cache available
Mem: 62G 53G 658M 1.0M 8.5G 8.5G
Swap: 0B 0B 0B
~$ nodetool compactionstats
pending tasks: 197
....
id compaction type keyspace table completed total unit progress
5e555610-40b2-11ec-9b5a-27bc920e6e55 Compaction mykeyspace table1 27299674 89930474 bytes 30.36%
5e55f251-40b2-11ec-9b5a-27bc920e6e55 Compaction mykeyspace table2 13922048 74426264 bytes 18.71%
Active compaction remaining time : 0h00m02s
I would definitely not run compaction manually. Much of the compaction thresholds are file-size based, which means that forcing it creates files sized outside of the normal progression. The result, is that the chances of compaction running on that table again are extremely slim. Basically, once you start down that path, you'll be running manual compactions forever.
I would also say that compaction is a good thing. You want it to happen, as compacted files are necessary to keep reads performing well. Of course, that's not much of a consolation when the compaction process is affecting operational activity.
tl;dr;
One I have done in the past, is to lower compaction throughput during the day. Not sure what throughput you're running with currently, but you can find this out by running nodetool getcompactionthroughput:
% bin/nodetool getcompactionthroughput
Current compaction throughput: 64 MB/s
So at the times when customer/operational traffic is high, you can reduce that significantly:
% bin/nodetool setcompactionthroughput 1
% bin/nodetool getcompactionthroughput
Current compaction throughput: 1 MB/s
1 MB / second is the lowest that compaction throughput can be set. If you set it to zero, it's "un-throttled," which means it'll consume all the resources that it can get at. Setting it to 1 brings its resource use (and speed) down to a trickle.
Once the busy daily traffic subsides, that setting can be turned back up:
% bin/nodetool setcompactionthroughput 256
Current compaction throughput: 256 MB/s
This can be accomplished with a scheduled job for each command.

"vmstat" and "perf stat -a" show different numbers for context-switching

I'm trying to understand the context-switching rate on my system (running on AWS EC2), and where the switches are coming from. Just getting the number is already confusing, as two tools that I know can output such a metric give me different results. Here's the output from vmstat:
$ vmstat -w 2
procs -------------------memory------------------ ---swap-- -----io---- --system-- -----cpu-------
r b swpd free buff cache si so bi bo in cs us sy id wa st
8 0 0 443888 492304 8632452 0 0 0 1 0 0 14 2 84 0 0
37 0 0 444820 492304 8632456 0 0 0 20 131602 155911 43 5 52 0 0
8 0 0 445040 492304 8632460 0 0 0 42 131117 147812 46 4 50 0 0
13 0 0 446572 492304 8632464 0 0 0 34 129154 142260 49 4 46 0 0
The number is ~140k-160k/sec.
But perf tells something else:
$ sudo perf stat -a
Performance counter stats for 'system wide':
2980794.013800 cpu-clock (msec) # 35.997 CPUs utilized
12,335,935 context-switches # 0.004 M/sec
2,086,162 cpu-migrations # 0.700 K/sec
11,617 page-faults # 0.004 K/sec
...
0.004 M/sec is apparently 4k/sec.
Why is there a disparity between the two tools? Am I misinterpreting something in either of them, or are their CS metrics somehow different?
FWIW, I've tried doing the same on a machine running a different workload, and the difference there is even twice larger.
Environment:
AWS EC2 c5.9xlarge instance
Amazon Linux, kernel 4.14.94-73.73.amzn1.x86_64
The service runs on Docker 18.06.1-ce
Some recent versions of perf have a unit-scaling bug in the printing code. Manually do 12.3M / wall-time and see if that's sane. (spoiler alert: it is according to OP's comment.)
https://lore.kernel.org/patchwork/patch/1025968/
Commit 0aa802a79469 ("perf stat: Get rid of extra clock display
function") introduced the bug in mainline Linux 4.19-rc1 or so.
Thus, perf_stat__update_shadow_stats() now saves scaled values of clock events
in msecs, instead of original nsecs. But while calculating values of
shadow stats we still consider clock event values in nsecs. This results
in a wrong shadow stat values.
Commit 57ddf09173c1 on Mon, 17 Dec 2018 fixed it in 5.0-rc1, eventually being released with perf upstream version 5.0.
Vendor kernel trees that cherry-pick commits for their stable kernels might have the bug or have fixed the bug earlier.

/proc/[pid]/stat refresh period

hi I am a Linux programmer
I have an order that monitor process cpus usage, so I use data on /proc/[pid]/stat № 14 and 15. That values are called utime and stime.
Example [/proc/[pid]/stat]
30182 (TTTTest) R 30124 30182 30124 34845 30182 4218880 142 0 0 0 5274 0 0 0 20 0 1 0 55611251 17408000 386 18446744073709551615 4194304 4260634 140733397159392 140733397158504 4203154 0 0 0 0 0 0 0 17 2 0 0 0 0 0 6360520 6361584 33239040 140733397167447 140733397167457 140733397167457 140733397168110 0
State after 5 sec
30182 (TTTTest) R 30124 30182 30124 34845 30182 4218880 142 0 0 0 5440 0 0 0 20 0 1 0 55611251 17408000 386 18446744073709551615 4194304 4260634 140733397159392 140733397158504 4203154 0 0 0 0 0 0 0 17 2 0 0 0 0 0 6360520 6361584 33239040 140733397167447 140733397167457 140733397167457 140733397168110 0
In test environment, this file refreshed 1 ~ 2 sec, so I assume this file often updated by system at least 1 sec.
So I use this calculation
process_cpu_usage = ((utime - old_utime) + (stime - old_stime))/ period
In case of above values
33.2 = ((5440 - 5274) + (0 - 0)) / 5
But, In commercial servers environment, process run with high load (cpu and file IO), /proc/[pid]/stat file update period increasing even 20~60 sec!!
So top/htop utility can't measure correct process usage value.
Why is this phenomenon occurring??
Our system is [CentOS Linux release 7.1.1503 (Core)]
Most (if not all) files in the /proc filesystem are special files, their content at any given moment reflect the actual OS/kernel data at that very moment, they're not files with contents periodically updated. See the /proc filesystem doc.
In particular the /proc/[pid]/stat content changes whenever the respective process state changes (for example after every scheduling event) - for processes mostly sleeping the file will appear to be "updated" at slower rates while for active/running processes at higher rates on lightly loaded systems. Check, for example, the corresponding files for a shell process which doesn't do anything and for a browser process playing some video stream.
On heavily loaded systems with many processes in the ready state (like the one mentioned in this Q&A, for example) there can be scheduling delays making the file content "updates" appear less often despite the processes being ready/active. Such conditions seem to be more often encountered in commercial/enterprise environments (debatable, I agree).

Process niceness (priority) setting has no effect on Linux

I wrote a test program which consists of just an infinite loop with some
computations inside, and performs no
I/O operations. I tried starting two instances of the program, one with a high
niceness value, and the other with a low niceness value:
sudo nice -n 19 taskset 1 ./test
sudo nice -n -20 taskset 1 ./test
The taskset command ensures that both programs execute on the same core.
Contrary to my expectation, top reports that both programs get about 50% of the
computation time. Why is that? Does the nice command even have an effect?
The behavior you are seeing is almost certainly because of the autogroup feature that was added in Linux 2.6.38 (in 2010). Presumably when you described running the two commands, they were run in different terminal windows. If you had run them in the same terminal window, then you should have seen the nice value have an effect. The rest of this answer elaborates the story.
The kernel provides a feature known as autogrouping to improve interactive desktop performance in the face of multiprocess, CPU-intensive workloads such as building the Linux kernel with large numbers of parallel build processes (i.e., the make(1) -j flag).
A new autogroup is created when a new session is created
via setsid(2); this happens, for example, when a new terminal window is started. A new process created by fork(2) inherits its
parent's autogroup membership. Thus, all of the processes in a
session are members of the same autogroup.
When autogrouping is enabled, all of the members of an autogroup
are placed in the same kernel scheduler "task group". The Linux kernel scheduler employs an algorithm that equalizes the distribution of
CPU cycles across task groups. The benefits of this for interactive desktop performance can be described via the following example.
Suppose that there are two autogroups competing for the same CPU
(i.e., presume either a single CPU system or the use of taskset(1)
to confine all the processes to the same CPU on an SMP system).
The first group contains ten CPU-bound processes from a kernel
build started with make -j10. The other contains a single
CPU-bound process: a video player. The effect of autogrouping is that
the two groups will each receive half of the CPU cycles. That is,
the video player will receive 50% of the CPU cycles, rather than
just 9% of the cycles, which would likely lead to degraded video
playback. The situation on an SMP system is more complex, but the
general effect is the same: the scheduler distributes CPU cycles
across task groups such that an autogroup that contains a large
number of CPU-bound processes does not end up hogging CPU cycles
at the expense of the other jobs on the system.
The nice value and group scheduling
When scheduling non-real-time processes (e.g., those scheduled
under the default SCHED_OTHER policy), the
scheduler employs a technique known as "group scheduling", under which threads are scheduled in "task groups".
Task groups are formed in the various circumstances, with the relevant case here being autogrouping.
If autogrouping is enabled, then all of the threads that are
(implicitly) placed in an autogroup (i.e., the same session, as
created by setsid(2)) form a task group. Each new autogroup is
thus a separate task group.
Under group scheduling, a thread's nice value has an effect for
scheduling decisions only relative to other threads in the same
task group. This has some surprising consequences in terms of the
traditional semantics of the nice value on UNIX systems. In particular, if autogrouping is enabled (which is the default in various Linux distributions), then
employing nice(1) on a process has an effect
only for scheduling relative to other processes executed in the
same session (typically: the same terminal window).
Conversely, for two processes that are (for example) the sole
CPU-bound processes in different sessions (e.g., different terminal
windows, each of whose jobs are tied to different autogroups),
modifying the nice value of the process in one of the sessions has
no effect in terms of the scheduler's decisions relative to the
process in the other session. This presumably is the scenario you saw, though you don't explicitly mention using two terminal windows.
If you want to prevent autogrouping interfering with the traditional nice behavior as described here, you can disable the feature
echo 0 > /proc/sys/kernel/sched_autogroup_enabled
Be aware though that this will also have the effect of disabling the benefits for desktop interactivity that the autogroup feature was intended to provide (see above).
The autogroup nice value
A process's autogroup membership can be viewed via
the file /proc/[pid]/autogroup:
$ cat /proc/1/autogroup
/autogroup-1 nice 0
This file can also be used to modify the CPU bandwidth allocated
to an autogroup. This is done by writing a number in the "nice"
range to the file to set the autogroup's nice value. The allowed
range is from +19 (low priority) to -20 (high priority).
The autogroup nice setting has the same meaning as the process
nice value, but applies to distribution of CPU cycles to the
autogroup as a whole, based on the relative nice values of other
autogroups. For a process inside an autogroup, the CPU cycles that it
receives will be a product of the autogroup's nice value (compared
to other autogroups) and the process's nice value (compared to
other processes in the same autogroup).
I put together a test.c that just does:
for(;;)
{
}
And then ran it with your nice's. I didn't run a different sudo for each one, but rather sudo'd an interactive shell and ran them both from there. I used two &'s.
I got one ./test hitting my CPU hard, and one barely touching it.
Naturally, the system still felt quite responsive; it takes a lot of CPU-hogging processes on modern processors to get so much load you can "feel" it.
That stands in contrast to I/O-hogging processes and memory-hogging processes; in these cases a single greedy process can make a system painful to use.
I'd guess either your system has a relatively unique priority-related bug (or subtlety), or there's something up with your methodology.
I ran my test on an Ubuntu 11.04 system.
I'm assuming that there's a & missing at the end of the command line. Otherwise, the second line won't run until the first completes.
While both processes are running, use something like top and make sure that they each have the nice value that you assigned.
What happens if you launch the processes using only taskset and then adjust their priority with renice after they are running?
Process niceness (priority) setting HAS an effect on Linux! (in practise, but ONLY if you give it enough work to do!)
On my system, as long as all cores are fully loaded, then nice does have an impact. On ubuntu 14.04, processes run with nice -N gets through 0.807 ** N operations compared to processes run without altering the nice value (given you are running one instance per core for each nice level).
In my case I have quad core i7 with hyper threading turned off, so if I run four or less processes, then it doesn't matter what their nice values are - they each get a full core. If I run four processes at nice level 0 and 4 at nice level 12, then the ones at level 12 get through 0.807 ^ 12, ie approx 7% of the work the ones at nice level zero do. The ratio seems to be a reasonable predictor from nice levels 0 through 14, after that it fluctuates (A few runs had nice level 18 processing more than nice 16 for instance) - Running the test for longer may smooth the results out.
(ruby 2.1.2 used)
,cl file:
uptime
nices='-0 -6 -12 -18'
nices='-0 -18'
nices='-0 -2 -4 -6 -8 -10 -12 -14 -16 -18'
rm -f ,n-*
for i in 1 2 3 4
do
for n in $nices
do
nice $n ruby ,count_loops.rb > ,n${n}-$i &
done
done
ps -l
uptime
wait
uptime
ps -l
c=`cat ,n-0-[1234] | total`
last=$c
for n in $nices
do
echo
c2=`cat ,n${n}-[1234] | total`
echo total of `cat ,n${n}-[1234]` is $c2
echo -n "nice $n count $2, percentage: "
echo "3 k $c2 100 * $c / p" | dc
echo -n " percent of last: "
echo "3 k $c2 100 * $last / p" | dc
last=$c2
done
uptime
echo total count: `cat ,n-*-[1234] | total`
,count_loops.rb file
#!/usr/bin/env ruby
limit = Time.new + 70
i=0
while Time.new < limit
i += 1
j = 0
while (j += 1) < 10000
t = j
end
end
puts i
results of sh ,cl - initial diagnostic output:
19:16:25 up 20:55, 2 users, load average: 3.58, 3.59, 2.88
F S UID PID PPID C PRI NI ADDR SZ WCHAN TTY TIME CMD
0 S 1000 4987 4977 0 80 0 - 7297 wait pts/3 00:00:00 bash
0 S 1000 11743 2936 0 80 0 - 2515 wait pts/3 00:00:00 rubymine.sh
0 S 1000 11808 11743 6 80 0 - 834604 futex_ pts/3 00:18:10 java
0 S 1000 11846 11808 0 80 0 - 4061 poll_s pts/3 00:00:02 fsnotifier64
0 S 1000 19613 4987 0 80 0 - 2515 wait pts/3 00:00:00 sh
0 R 1000 19616 19613 0 80 0 - 7582 - pts/3 00:00:00 ruby
0 R 1000 19617 19613 0 82 2 - 7582 - pts/3 00:00:00 ruby
0 R 1000 19618 19613 0 84 4 - 7582 - pts/3 00:00:00 ruby
0 R 1000 19619 19613 0 86 6 - 7416 - pts/3 00:00:00 ruby
0 R 1000 19620 19613 0 88 8 - 6795 - pts/3 00:00:00 ruby
0 R 1000 19621 19613 0 90 10 - 6247 - pts/3 00:00:00 ruby
0 R 1000 19622 19613 0 92 12 - 6247 - pts/3 00:00:00 ruby
0 R 1000 19623 19613 0 94 14 - 6247 - pts/3 00:00:00 ruby
0 R 1000 19624 19613 0 96 16 - 6078 - pts/3 00:00:00 ruby
0 R 1000 19625 19613 0 98 18 - 6012 - pts/3 00:00:00 ruby
0 R 1000 19626 19613 0 80 0 - 7582 - pts/3 00:00:00 ruby
0 R 1000 19627 19613 0 82 2 - 7582 - pts/3 00:00:00 ruby
0 R 1000 19628 19613 0 84 4 - 7582 - pts/3 00:00:00 ruby
0 R 1000 19629 19613 0 86 6 - 7582 - pts/3 00:00:00 ruby
0 R 1000 19630 19613 0 88 8 - 6247 - pts/3 00:00:00 ruby
0 R 1000 19631 19613 0 90 10 - 6247 - pts/3 00:00:00 ruby
0 R 1000 19632 19613 0 92 12 - 6247 - pts/3 00:00:00 ruby
0 R 1000 19633 19613 0 94 14 - 6144 - pts/3 00:00:00 ruby
0 R 1000 19634 19613 0 96 16 - 4971 - pts/3 00:00:00 ruby
0 R 1000 19635 19613 0 98 18 - 4971 - pts/3 00:00:00 ruby
0 R 1000 19636 19613 0 80 0 - 7582 - pts/3 00:00:00 ruby
0 R 1000 19637 19613 0 82 2 - 7449 - pts/3 00:00:00 ruby
0 R 1000 19638 19613 0 84 4 - 7344 - pts/3 00:00:00 ruby
0 R 1000 19639 19613 0 86 6 - 7582 - pts/3 00:00:00 ruby
0 R 1000 19640 19613 0 88 8 - 7416 - pts/3 00:00:00 ruby
0 R 1000 19641 19613 0 90 10 - 6210 - pts/3 00:00:00 ruby
0 R 1000 19642 19613 0 92 12 - 6247 - pts/3 00:00:00 ruby
0 R 1000 19643 19613 0 94 14 - 5976 - pts/3 00:00:00 ruby
0 R 1000 19644 19613 0 96 16 - 6111 - pts/3 00:00:00 ruby
0 R 1000 19645 19613 0 98 18 - 4971 - pts/3 00:00:00 ruby
0 R 1000 19646 19613 0 80 0 - 7582 - pts/3 00:00:00 ruby
0 R 1000 19647 19613 0 82 2 - 7516 - pts/3 00:00:00 ruby
0 R 1000 19648 19613 0 84 4 - 7416 - pts/3 00:00:00 ruby
0 R 1000 19649 19613 0 86 6 - 6247 - pts/3 00:00:00 ruby
0 R 1000 19650 19613 0 88 8 - 6177 - pts/3 00:00:00 ruby
0 R 1000 19651 19613 0 90 10 - 6247 - pts/3 00:00:00 ruby
0 R 1000 19652 19613 0 92 12 - 6078 - pts/3 00:00:00 ruby
0 R 1000 19653 19613 0 94 14 - 6247 - pts/3 00:00:00 ruby
0 R 1000 19654 19613 0 96 16 - 4971 - pts/3 00:00:00 ruby
0 R 1000 19655 19613 0 98 18 - 4971 - pts/3 00:00:00 ruby
0 R 1000 19656 19613 0 80 0 - 3908 - pts/3 00:00:00 ps
19:16:26 up 20:55, 2 users, load average: 3.58, 3.59, 2.88
19:17:37 up 20:56, 3 users, load average: 28.92, 11.25, 5.59
F S UID PID PPID C PRI NI ADDR SZ WCHAN TTY TIME CMD
0 S 1000 4987 4977 0 80 0 - 7297 wait pts/3 00:00:00 bash
0 S 1000 11743 2936 0 80 0 - 2515 wait pts/3 00:00:00 rubymine.sh
0 S 1000 11808 11743 6 80 0 - 834604 futex_ pts/3 00:18:10 java
0 S 1000 11846 11808 0 80 0 - 4061 poll_s pts/3 00:00:02 fsnotifier64
0 S 1000 19613 4987 0 80 0 - 2515 wait pts/3 00:00:00 sh
0 R 1000 19794 19613 0 80 0 - 3908 - pts/3 00:00:00 ps
results of sh ,cl - statistics: (percentage of last is the percentage of this total compares to the count for the last group of processes)
total of 99951 101725 100681 104046 is 406403
nice -0 count , percentage: 100.000
percent of last: 100.000
total of 64554 62971 64006 63462 is 254993
nice -2 count , percentage: 62.743
percent of last: 62.743
total of 42997 43041 43197 42717 is 171952
nice -4 count , percentage: 42.310
percent of last: 67.434
total of 26882 28250 27151 27244 is 109527
nice -6 count , percentage: 26.950
percent of last: 63.696
total of 17228 17189 17427 17769 is 69613
nice -8 count , percentage: 17.129
percent of last: 63.557
total of 10815 10792 11021 11307 is 43935
nice -10 count , percentage: 10.810
percent of last: 63.113
total of 7023 6923 7885 7323 is 29154
nice -12 count , percentage: 7.173
percent of last: 66.357
total of 5005 4881 4938 5159 is 19983
nice -14 count , percentage: 4.917
percent of last: 68.542
total of 3517 5537 3555 4092 is 16701
nice -16 count , percentage: 4.109
percent of last: 83.576
total of 4372 4307 5552 4527 is 18758
nice -18 count , percentage: 4.615
percent of last: 112.316
19:17:37 up 20:56, 3 users, load average: 28.92, 11.25, 5.59
total count: 1141019
( Purists will note I am mixing ruby, shell and dc - they will have to forgive me for old habits from last century showing through ;) )
I run an example program from APUE and nice does have the effect.
The example program mainly fork a child and both the parent and child execute a i++ increment operation for given time(10s). By giving the child different nice value, the result shows if nice makes a difference.
The book warns that I should run the program with a uniprocessor PC, fisrt I tried with my own PC, i5-7500 CPU # 3.40GHz × 4 (4 cores), giving different nice value, almost no difference.
Then I log into my remote server, 1 processor 1 GB, and get the expected difference.
1 core processor 1 GB Test result:
./a.out
NZERO = 20
current nice value in parent is 0
current nice value in child is 0, adjusting by 0
now child nice value is 0
parent count = 13347219
child count = 13357561
./a.out 20 //child nice set to 20
NZERO = 20
current nice value in parent is 0
current nice value in child is 0, adjusting by 20
now child nice value is 19
parent count = 29770491
ubuntu#VM-0-2-ubuntu:~$ child count = 441330
Test program(I made a little modification), from Section 8.16, APUE:
apue.h is merely a header wrapper
err_sys() is also a error handler wrapper, you can use printf temporarily.
#include "apue.h"
#include <errno.h>
#include <sys/time.h>
#if defined(MACOS)
#include <sys/syslimits.h>
#elif defined(SOLARIS)
#include <limits.h>
#elif defined(BSD)
#include <sys/param.h>
#endif
unsigned long long count;
struct timeval end;
void
checktime(char *str)
{
struct timeval tv;
gettimeofday(&tv, NULL);
if (tv.tv_sec >= end.tv_sec && tv.tv_usec >= end.tv_usec) {
printf("%s count = %lld\n", str, count);
exit(0);
}
}
int
main(int argc, char *argv[])
{
pid_t pid;
char *s;
int nzero, ret;
int adj = 0;
setbuf(stdout, NULL);
#if defined(NZERO)
nzero = NZERO;
#elif defined(_SC_NZERO)
nzero = sysconf(_SC_NZERO);
#else
#error NZERO undefined
#endif
printf("NZERO = %d\n", nzero);
if (argc == 2)
adj = strtol(argv[1], NULL, 10);
gettimeofday(&end, NULL);
end.tv_sec += 10; /* run for 10 seconds */
if ((pid = fork()) < 0) {
err_sys("fork failed");
} else if (pid == 0) { /* child */
s = "child";
printf("current nice value in child is %d, adjusting by %d\n",
nice(0), adj);
errno = 0;
if ((ret = nice(adj)) == -1 && errno != 0)
err_sys("child set scheduling priority");
printf("now child nice value is %d\n", ret);
} else { /* parent */
s = "parent";
printf("current nice value in parent is %d\n", nice(0));
}
for(;;) {
if (++count == 0)
err_quit("%s counter wrap", s);
checktime(s);
}
}
Complete source code link: https://wandbox.org/permlink/8iryAZ48sIbaq27y

Finding usage of resources (CPU and memory) by threads of a process in unix (solaris/linux)

I have a multi-threaded application(C++ and pthread library) and I will like to know how much resources(CPU and memory) each thread uses.
Is there a way to find out these details on Solaris and Linux both or on either of them.
You could use the ps command with some option:
ps -eLo pid,ppid,lwp,nlwp,osz,rss,ruser,pcpu,stime,etime,args | more
PID PPID LWP NLWP SZ RSS RUSER %CPU STIME ELAPSED COMMAND
0 0 1 1 0 0 root 0.0 Oct_02 4-02:13:37 sched
1 0 1 1 298 528 root 0.0 Oct_02 4-02:13:36 /sbin/init
2 0 1 1 0 0 root 0.0 Oct_02 4-02:13:36 pageout
Have a look at the ps man's page to get some information (LWP (light weight process))

Resources