FreeBSD iostat - How to tell if there's a problem? [closed] - io

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 11 years ago.
Improve this question
I run a FreeBSD NFS server and recently I've been having odd issues throughout the cluster (the Apache servers are hanging in "lockf" state when loading files from the NFS share, etc).
I'm fairly new to this, so my question is how can I tell if a server's IO is getting overloaded?
Here is my current iostat:
[root#host ~]# iostat 1 10
tty mfid0 cpu
tin tout KB/t tps MB/s us ni sy in id
0 55 16.03 194 3.04 0 0 5 0 95
0 490 21.73 238 5.05 0 0 5 0 95
0 43 20.09 402 7.88 0 0 7 0 93
0 407 12.58 531 6.53 0 0 5 0 94
0 43 15.69 416 6.37 0 0 8 1 91
0 437 30.23 287 8.46 0 0 9 1 91
0 43 23.50 109 2.50 0 0 2 0 98
0 273 11.58 76 0.86 0 0 2 0 98
0 43 15.70 243 3.72 0 0 5 0 95
0 320 20.35 248 4.92 0 0 3 0 96
[root#host ~]#
Do any of the values seem high? Are there any other tests I can do to see if the system is handling the load efficiently?
Thanks!

Try using gstat or systat -iostat but it will only show you (like iostat) IO usage not what causes it. You probably are more interested in trying:
procstat -f $ApachePIDinLockfState
or ktrace -p $ApachePIDinLockfState and kdump -R | less. Remeber to run ktrace -C when you have finished.

Related

Understand output of vmstat memory utilization

I have a solaris box and im trying to know whether its running out of memory or if its stable.
below is the output of vmstat.
kthr memory page disk faults cpu
r b w swap free re mf pi po fr de sr vc vc vc vc in sy cs us sy id
1 0 0 11426696 4603520 613 1477 449 6 6 0 0 78 22 28 29 8970 37714 22961 43 6 51
4 0 0 4975280 0 1747 3487 805 0 0 0 0 233 41 33 44 9558 53713 15845 74 8 18
4 0 0 4936944 0 933 1837 0 0 0 0 0 56 28 12 39 9317 46898 14648 82 7 11
5 0 0 4943080 0 1056 2806 805 0 0 0 0 103 21 18 18 9286 46900 14866 78 8 14
5 0 0 4942264 0 1088 2173 804 6 6 0 0 109 8 40 31 9927 56484 16495 84 8 8
3 0 0 4942520 0 308 1018 1756 3 3 0 0 166 87 29 44 10638 64146 21413 83 9 8
0 0 0 4942512 0 156 326 1740 0 0 0 0 370 12 33 52 11554 40375 21897 75 9 16
2 0 0 4947384 0 294 560 845 0 0 0 0 121 18 23 20 9445 52382 17016 77 6 17
I can see the free column shows 0 however the sr column also shows 0
And output from top command doesn't show how much free memory available. Swap shows 0.0%
load averages: 11.4, 9.12, 9.24;
9021 processes: 9018 sleeping, 1 running, 2 on cpu
CPU states: 0.0% idle, 71.4% user, 28.6% kernel, 0.0% iowait, 0.0% swap
Memory: 24G phys mem, 16G total swap, 13G free swap
Am i running out of RAM?
Please suggest how to interpret this data. Do i need to increase my physical memory?
Appreciate some insights.
From the Solaris 11.4 vmstat man page, there's one important thing to note:
Without options, vmstat displays a one-line summary of the virtual memory activity since the system was booted.
That also applies to the first line of output from Solaris vmstat: it's a summary of all activity since the system was booted.
A good description of the output fields is found in the EXAMPLES section of the Solaris man vmstat page:
Examples
Example 1 Using vmstat
The following command displays a summary of what the system is doing
every five seconds.
example% vmstat 5
kthr memory page disk faults cpu
r b w swap free re mf pi p fr de sr s0 s1 s2 s3 in sy cs us sy id
0 0 0 11456 4120 1 41 19 1 3 0 2 0 4 0 0 48 112 130 4 14 82
0 0 1 10132 4280 0 4 44 0 0 0 0 0 23 0 0 211 230 144 3 35 62
0 0 1 10132 4616 0 0 20 0 0 0 0 0 19 0 0 150 172 146 3 33 64
0 0 1 10132 5292 0 0 9 0 0 0 0 0 21 0 0 165 105 130 1 21 78
1 1 1 10132 5496 0 0 5 0 0 0 0 0 23 0 0 183 92 134 1 20 79
1 0 1 10132 5564 0 0 25 0 0 0 0 0 18 0 0 131 231 116 4 34 62
1 0 1 10124 5412 0 0 37 0 0 0 0 0 22 0 0 166 179 118 1 33 67
1 0 1 10124 5236 0 0 24 0 0 0 0 0 14 0 0 109 243 113 4 56 39
example%
The fields of vmstat's display are
kthr
Report the number of kernel threads in each of the three following
states:
r
the number of kernel threads in run queue
b
the number of blocked kernel threads that are waiting for
resources I/O, paging, and so forth
w
the number of swapped out lightweight processes (LWPs) that
are waiting for processing resources to finish.
memory
Report on usage of virtual and real memory.
swap
available swap space (Kbytes)
free
size of the free list (Kbytes)
page
Report information about page faults and paging activity. The
information on each of the following activities is given in units per
second.
re
page reclaims — but see the –S option for how this field is modified.
mf
minor faults — but see the –S option for how this field is modified.
pi
kilobytes paged in
po
kilobytes paged out
fr
kilobytes freed
de
anticipated short-term memory shortfall (Kbytes)
sr
pages scanned by clock algorithm
When executed in a zone and if the pools facility is active, all of
the above (except for ‘de’) only report activity on the processors in
the processor set of the zone's pool.
disk
Report the number of disk operations per second. There are slots for
up to four disks, labeled with a single letter and number. The letter
indicates the type of disk (s = SCSI, i = IPI, and so forth); the
number is the logical unit number.
faults
Report the trap/interrupt rates (per second).
in
interrupts
sy
system calls
cs
CPU context switches
When executed in a zone and if the pools facility is active, all of
the above only report activity on the processors in the processor set
of the zone's pool.
cpu
Give a breakdown of percentage usage of CPU time. On MP systems, this
is an average across all processors.
us
user time
sy
system time
id
idle time
When executed in a zone and if the pools facility is active, all of
the above only report activity on the processors in the processor set
of the zone's pool.
This can help you https://www.howtogeek.com/424334/how-to-use-the-vmstat-command-on-linux/. There is explanation of those shorts.
Memory
swpd: the amount of virtual memory used. In other words, how much memory has been swapped out.,
free: the amount of idle (currently unused) memory.
buff: the amount of memory used as buffers.
cache: the amount of memory used as cache.
Swap
si: Amount of virtual memory swapped in from swap space.
so: Amount of virtual memory swapped out to swap space.
IO
bi: Blocks received from a block device. The number of data blocks used to swap virtual memory back into RAM.
bo: Blocks sent to a block device. The number of data blocks used to swap virtual memory out of RAM and into swap space.
System
in: The number of interrupts per second, including the clock.
cs: The number of context switches per second. A context switch is when the kernel swaps from system mode processing into user mode processing.
"0" is not a valid free memory value.
By design, Solaris always makes sure a minimal amount of free memory is available. The fact the sr column is also equals to zero suggests there is no memory shortage. In any case, you wouldn't have been able to run vmstat or top in the first place with such an extreme RAM shortage.
You should investigate further to understand why the free memory is reported a zero. mdb's ::memstat command would be a good start:
# echo "::memstat" | mdb -k

How to interpret such value of the time column in /proc/self/mountstats - does it indicate a performance issue?

I have some bladefs volume and I just checked /proc/self/mountstats where I see statistics per operations:
...
opts: rw,vers=3,rsize=131072,wsize=131072,namlen=255,acregmin=1800,acregmax=1800,acdirmin=1800,acdirmax=1800,hard,nolock,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=10.0.2.100,mountvers=3,mountport=903,mountproto=tcp,local_lock=all
age: 18129
caps: caps=0x3fc7,wtmult=512,dtsize=32768,bsize=0,namlen=255
sec: flavor=1,pseudoflavor=1
events: 18840 116049 23 5808 22138 21048 146984 13896 287 2181 0 7560 31380 0 9565 5106 0 6471 0 0 13896 0 0 0 0 0 0
bytes: 339548407 48622919 0 0 311167118 48622919 76846 13896
RPC iostats version: 1.0 p/v: 100003/3 (nfs)
xprt: tcp 875 1 7 0 0 85765 85764 1 206637 0 37 1776 35298
per-op statistics
NULL: 0 0 0 0 0 0 0 0
GETATTR: 18840 18840 0 2336164 2110080 92 8027 8817
SETATTR: 0 0 0 0 0 0 0 0
LOOKUP: 21391 21392 0 3877744 4562876 118 103403 105518
ACCESS: 20183 20188 0 2584304 2421960 72 10122 10850
READLINK: 0 0 0 0 0 0 0 0
READ: 3425 3425 0 465848 311606600 340 97323 97924
WRITE: 2422 2422 0 48975488 387520 763 200645 201522
CREATE: 2616 2616 0 447392 701088 21 870 1088
MKDIR: 858 858 0 188760 229944 8 573 705
SYMLINK: 0 0 0 0 0 0 0 0
MKNOD: 0 0 0 0 0 0 0 0
REMOVE: 47 47 0 6440 6768 0 8 76
RMDIR: 23 23 0 4876 3312 0 3 5
RENAME: 23 23 0 7176 5980 0 5 6
LINK: 0 0 0 0 0 0 0 0
READDIR: 160 160 0 23040 4987464 0 16139 16142
READDIRPLUS: 15703 15703 0 2324044 8493604 43 1041634 1041907
FSSTAT: 1 1 0 124 168 0 0 0
FSINFO: 2 2 0 248 328 0 0 0
PATHCONF: 1 1 0 124 140 0 0 0
COMMIT: 68 68 0 9248 10336 2 272 275...
about my bladefs. I am interested in READ operation statistics. As I know the last column (97924) means:
execute: How long ops of this type take to execute (from
rpc_init_task to rpc_exit_task) (microsecond)
How to interpret this? Is it the average time of each read operation regardless of the block size? I have very strong suspicion that I have problems with NFS: am I right? The value of 0.1 sec looks bad for me, but I am not sure how exactly to interpret this time: average, some sum...?
After reading the kernel source, the statistics are printed from net/sunrpc/stats.c rpc_clnt_show_stats() and the 8th column of per-op statistics statistics seems to printed from _print_rpc_iostats, it's printing struct rpc_iostats member om_execute. (The newest kernel has 9 columns with errors on the last column.)
That member looks to be only referenced/actually changed in rpc_count_iostats_metrics with:
execute = ktime_sub(now, task->tk_start);
op_metrics->om_execute = ktime_add(op_metrics->om_execute, execute);
Assuming ktime_add does what it says, the value of om_execute only increases. So the 8th column of mountstats would be the sum of the time of operations of this type.

Linux Top command giving output of both the cpu cores using command line [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 6 years ago.
Improve this question
I have a system with 2 cores running linux. I will want to log the cpu usages of the individual cores at regular intervals of say 15mins.
I can use top and regex to get the info. But it gives me the overall info of the cpu. When I manually press "1", then both the cores usages are shown separately.
My question is how can I display both the cores cpu usage without manually pressing "1" after invoking top command.
Current research by me:
-I can use the -b option to run in batch mode and output to a file. But the next question is how I can input data to the top command in the batch mode. Is there a script that top command reads to run in a batch mode?
The Linux top command obtains its information from /proc/stat which is (somewhat) dependent upon the kernel version. Perhaps you could write a program which reads from that. Here is a sample from a 2.6.32 system with 20 cores:
cpu 46832272 794980 8521784 1312627944 853989 247 34947 0 0
cpu0 6404288 173468 806918 60455445 377313 1 1799 0 0
cpu1 2980140 137898 937163 64278592 68373 0 118 0 0
cpu2 5099227 86676 841568 62395343 27685 0 64 0 0
cpu3 11255325 20062 767603 56427120 9388 0 85 0 0
cpu4 2618170 1002 501629 65394095 4369 0 62 0 0
cpu5 635453 867 154898 67725523 2981 212 58 0 0
cpu6 343657 32 66510 68113208 2769 0 64 0 0
cpu7 327935 688 38431 68158263 1703 0 55 0 0
cpu8 118687 78 27436 68382190 1992 0 33 0 0
cpu9 329990 49 42224 68138515 1643 0 49 0 0
cpu10 3462177 160918 814788 63701724 202763 3 5444 0 0
cpu11 3006524 112533 484490 64877526 37455 0 6840 0 0
cpu12 2696919 61285 695966 65004324 17277 0 133 0 0
cpu13 3453005 34509 957663 64035215 10938 0 101 0 0
cpu14 2068954 2039 679830 65764151 6418 0 50 0 0
cpu15 628390 159 367213 67531841 2593 0 41 0 0
cpu16 331139 77 76690 68120995 2971 0 51 0 0
cpu17 616895 2482 182239 67595814 70070 29 19797 0 0
cpu18 343472 51 38712 68148369 2481 0 46 0 0
cpu19 111916 96 39803 68379681 2797 0 47 0 0
intr 1991637171 173 2 0 0 2 0 0 0 1 0 0 0 4 0 0 0 0 1 56 1416833 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1644 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2285 0 0 0 0 0 0 0 3211641 4799987 3235 31624105 11000098 0 ...
ctxt 3201588026
btime 1460672984
processes 2430161
procs_running 2
procs_blocked 0
softirq 1391193131 0 626556634 166050 33864038 3892307 0 11210298 67287467 2880340 645335997
According to the man page (man 5 proc then search for /proc/stat), the lines for cpu entries are:
The amount of time, measured in units of USER_HZ (1/100ths of a second on most architectures, use sysconf(_SC_CLK_TCK) to obtain the right value), that the system spent in user mode, user mode with low priority (nice), system mode, and the idle task, respectively. The last value should be USER_HZ times the second entry in the uptime pseudo-file.
iowait - time waiting for I/O to complete; irq - time servicing interrupts; softirq - time servicing softirqs.
steal - stolen time, which is the time spent in other operating systems when running in a virtualized environment
guest, which is the time spent running a virtual CPU for guest operating systems under the control of the Linux kernel.
guest_nice time spent running a niced guest (virtual CPU for get operating systems under the control of the Linux kernel).
I looked at a 4.4.6 kernel system too. The cpu entries have the tenth item.

Benchmarking CPU and File IO for an application running on Linux

I wrote two programs to run on Linux, each using a different algorithm, and I want to find a way (preferably using a benchmarking software) to compare the CPU usage and IO operations between these two programs.
Is there such a thing? and if yes, where can I find them. Thanks.
You can try hardinfo
Or there are like n different tools measuring system performance if measuring it while running your app solves your purpose
And you can also check this thread
You might try vmstat command:
vmstat 2 20 > vmstat.txt
20 samples of 2 seconds
bi = KB in, bo = KB out with wa = waiting for I/O
I/O can also increase cache demands
%CPU utilisation = us (user) = sy (system)
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
r b swpd free buff cache si so bi bo in cs us sy id wa
0 0 0 277504 17060 82732 0 0 91 87 1432 236 11 3 84 1
0 0 0 277372 17068 82732 0 0 0 24 1361 399 23 8 59 10
test start
0 1 0 275240 17068 82732 0 0 0 512 1342 305 24 4 69 4
2 1 0 275232 17068 82780 0 0 24 10752 4176 216 7 8 0 85
1 1 0 275240 17076 82732 0 0 12288 2590 5295 243 15 8 0 77
0 1 0 275240 17076 82748 0 0 8 11264 4329 214 6 12 0 82
0 1 0 275240 17076 82780 0 0 16 11264 4278 233 15 10 0 75
0 1 0 275240 17084 82780 0 0 19456 542 6563 255 10 7 0 83
0 1 0 275108 17084 82748 0 0 5128 3072 3501 265 16 37 0 47
3 1 0 275108 17084 82748 0 0 924 5120 8369 3845 12 33 0 55
0 1 0 275116 17092 82748 0 0 1576 85 11483 6645 5 50 0 45
1 1 0 275116 17092 82748 0 0 0 136 2304 689 3 9 0 88
2 1 0 275084 17100 82732 0 0 0 352 2374 800 14 26 0 61
0 0 0 275076 17100 82732 0 0 546 118 2408 1014 35 17 47 1
0 1 0 275076 17104 82732 0 0 0 62 1324 76 3 2 89 7
1 1 0 275076 17108 82732 0 0 0 452 1879 442 8 13 66 12
0 0 0 275116 17108 82732 0 0 800 352 2456 1195 19 17 56 8
0 1 0 275116 17112 82732 0 0 0 54 1325 76 4 1 88 8
test end
1 1 0 275116 17116 82732 0 0 0 510 1717 286 6 10 72 11
1 0 0 275076 17116 82732 0 0 1600 1152 3087 1344 23 29 41 7

Debian Wheezy with default Kernel (3.2.0-4-amd64) high load average

I have recently upgraded to a new machine and to Debian Wheezy.
Everything ist running fine, except that even with low actual load, the load average is too high.
Example:
14:29:35 up 9:49, 1 user, load average: 1.96, 2.22, 2.14
This happens even tho all components are at a low load (almost no I/O and all CPU Cores are <50%)
top:
top - 14:30:31 up 9:50, 1 user, load average: 2.38, 2.32, 2.18
Tasks: 156 total, 3 running, 153 sleeping, 0 stopped, 0 zombie
%Cpu(s): 28.2 us, 1.3 sy, 0.0 ni, 69.8 id, 0.4 wa, 0.0 hi, 0.3 si, 0.0 st
KiB Mem: 32878740 total, 8036624 used, 24842116 free, 106544 buffers
KiB Swap: 16768892 total, 0 used, 16768892 free, 2897400 cached
iotop:
Total DISK READ: 0.00 B/s | Total DISK WRITE: 1004.39 B/s
vmstat:
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
r b swpd free buff cache si so bi bo in cs us sy id wa
0 0 0 24809288 107236 2921816 0 0 10 1 102 138 17 2 81 0
1 0 0 24809364 107244 2921908 0 0 6 6 14334 15108 24 1 75 0
4 0 0 24808784 107260 2921952 0 0 2 16 14407 15222 24 1 74 0
0 0 0 24808660 107272 2922096 0 0 4 14 14570 15373 26 1 73 0
1 0 0 24808156 107280 2922220 0 0 0 13 14783 15499 27 1 72 0
2 0 0 24807420 107292 2922684 0 0 0 23 14590 15344 26 1 72 0
uname -r:
3.2.0-4-amd64
Anyone got a clue?
Load average is usually the number of process waiting for execution, as in the queue.
As it doesn't seem to be a problem with CPU and I/O, I would expect to be something like a shared memory semaphore, or a network dependent code.
Try to see what processes are with:
# top -b -n 1 | awk '{if (NR <=7) print; else if ($8 == "D") {print; count++} } END {print "Total status D: "count}'

Resources