block tracepoints show dev 0,0. Isn't this invalid? - io

# trace-cmd record $OPTS systemctl suspend
# dmesg
...
[21976.161716] PM: suspend entry (deep)
[21976.161720] PM: Syncing filesystems ... done.
[21976.551178] Freezing user space processes ... (elapsed 0.003 seconds) done.
[21976.554240] OOM killer disabled.
[21976.554241] Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done.
[21976.555801] Suspending console(s) (use no_console_suspend to debug)
[21976.564650] sd 1:0:0:0: [sda] Synchronizing SCSI cache
[21976.573482] e1000e: EEE TX LPI TIMER: 00000011
[21976.622307] sd 1:0:0:0: [sda] Stopping disk
[21976.803789] PM: suspend devices took 0.248 seconds
...
# trace-cmd report -F 'block_rq_insert, block_rq_complete, block_rq_requeue' | less
...
<...>-27919 [003] 21976.567169: block_rq_insert: 0,0 N 0 () 0 + 0 [kworker/u8:12]
<idle>-0 [000] 21976.624751: block_rq_complete: 0,0 N () 18446744073709551615 + 0 [0]
<...>-27919 [003] 21976.624820: block_rq_insert: 0,0 N 0 () 0 + 0 [kworker/u8:12]
<idle>-0 [000] 21976.806090: block_rq_complete: 0,0 N () 18446744073709551615 + 0 [0]
kworker/u8:92-27999 [003] 21977.271943: block_rq_insert: 0,0 N 0 () 0 + 0 [kworker/u8:92]
kworker/u8:92-27999 [003] 21977.271948: block_rq_requeue: 0,0 N () 0 + 0 [0]
kworker/u8:92-27999 [003] 21977.271948: block_rq_insert: 0,0 N 0 () 0 + 0 [kworker/u8:92]
kworker/3:1H-478 [003] 21977.283873: block_rq_requeue: 0,0 N () 0 + 0 [0]
kworker/3:1H-478 [003] 21977.283874: block_rq_insert: 0,0 N 0 () 0 + 0 [kworker/3:1H]
kworker/3:1H-478 [003] 21977.287802: block_rq_requeue: 0,0 N () 0 + 0 [0]
kworker/3:1H-478 [003] 21977.287803: block_rq_insert: 0,0 N 0 () 0 + 0 [kworker/3:1H]
kworker/3:1H-478 [003] 21977.291781: block_rq_requeue: 0,0 N () 0 + 0 [0]
kworker/3:1H-478 [003] 21977.291781: block_rq_insert: 0,0 N 0 () 0 + 0 [kworker/3:1H]
kworker/3:1H-478 [003] 21977.295777: block_rq_requeue: 0,0 N () 0 + 0 [0]
kworker/3:1H-478 [003] 21977.295778: block_rq_insert: 0,0 N 0 () 0 + 0 [kworker/3:1H]
Other requests show dev 8,0, which is sda as expected. dev 0,0 is a reserved valule for a null device. Why would the tracepoint show a bio on a null device? Isn't this an invalid operation?
Version of Linux kernel and trace-cmd
# uname -r
4.15.14-300.fc27.x86_64
# rpm -q trace-cmd
trace-cmd-2.6.2-1.fc27.x86_64

The 0,0 requests in this trace appear to be associated with non-data requests, e.g. SCSI SYNCHRONIZE_CACHE and START_STOP.
It seems to always happen like this: these tracepoints are hit for non-data requests (as well as the normal as data ones), but in that case the block dev variable is not set. Although it does not apply to userspace SG_IO requests; these seem to hit the tracepoints and show the real device value.
EDIT: this is how all the block tracepoints work when there is no struct bio associated:
static void blk_add_trace_getrq(void *ignore,
struct request_queue *q,
struct bio *bio, int rw)
{
if (bio)
blk_add_trace_bio(q, bio, BLK_TA_GETRQ, 0);
else {
struct blk_trace *bt = q->blk_trace;
if (bt)
__blk_add_trace(bt, 0, 0, rw, 0, BLK_TA_GETRQ, 0, 0,
NULL, NULL);
}
}
Example trace:
# trace-cmd report | less
...
<...>-28415 [001] 21976.558455: suspend_resume: dpm_suspend[2] begin
<...>-27919 [003] 21976.567166: block_getrq: 0,0 R 0 + 0 [kworker/u8:12]
<...>-27919 [003] 21976.567169: block_rq_insert: 0,0 N 0 () 0 + 0 [kworker/u8:12]
<...>-27919 [003] 21976.567171: block_rq_issue: 0,0 N 0 () 0 + 0 [kworker/u8:12]
<...>-27919 [003] 21976.567175: scsi_dispatch_cmd_start: host_no=1 channel=0 id=0 lun=0 data_sgl=0 prot_sgl=0 prot_op=0x0 cmnd=(SYNCHRONIZE_CACHE - raw=35 00 00 00 00 00 00 00 00 00)
<...>-27965 [000] 21976.574023: scsi_eh_wakeup: host_no=0
<...>-28011 [003] 21976.575989: block_touch_buffer: 253,0 sector=9961576 size=4096
<...>-28011 [003] 21976.576000: block_touch_buffer: 253,0 sector=9961576 size=4096
<...>-28011 [003] 21976.576003: block_dirty_buffer: 253,0 sector=6260135 size=4096
<...>-28011 [003] 21976.576006: block_touch_buffer: 253,0 sector=9961576 size=4096
irq/49-mei_me-28010 [000] 21976.578250: block_touch_buffer: 253,0 sector=9961576 size=4096
irq/49-mei_me-28010 [000] 21976.578256: block_touch_buffer: 253,0 sector=9961576 size=4096
irq/49-mei_me-28010 [000] 21976.578258: block_dirty_buffer: 253,0 sector=6260135 size=4096
irq/49-mei_me-28010 [000] 21976.578259: block_touch_buffer: 253,0 sector=9961576 size=4096
<idle>-0 [000] 21976.624746: scsi_dispatch_cmd_done: host_no=1 channel=0 id=0 lun=0 data_sgl=0 prot_sgl=0 prot_op=0x0 cmnd=(SYNCHRONIZE_CACHE - raw=35 00 00 00 00 00 00 00 00 00) result=(driver=DRIVER_OK host=DID_OK message=COMMAND_COMPLETE status=SAM_STAT_GOOD)
<idle>-0 [000] 21976.624751: block_rq_complete: 0,0 N () 18446744073709551615 + 0 [0]
<...>-27919 [003] 21976.624817: block_getrq: 0,0 R 0 + 0 [kworker/u8:12]
<...>-27919 [003] 21976.624820: block_rq_insert: 0,0 N 0 () 0 + 0 [kworker/u8:12]
<...>-27919 [003] 21976.624821: block_rq_issue: 0,0 N 0 () 0 + 0 [kworker/u8:12]
<...>-27919 [003] 21976.624824: scsi_dispatch_cmd_start: host_no=1 channel=0 id=0 lun=0 data_sgl=0 prot_sgl=0 prot_op=0x0 cmnd=(START_STOP - raw=1b 00 00 00 00 00)
<idle>-0 [000] 21976.806085: scsi_dispatch_cmd_done: host_no=1 channel=0 id=0 lun=0 data_sgl=0 prot_sgl=0 prot_op=0x0 cmnd=(START_STOP - raw=1b 00 00 00 00 00) result=(driver=DRIVER_OK host=DID_OK message=COMMAND_COMPLETE status=SAM_STAT_GOOD)
<idle>-0 [000] 21976.806090: block_rq_complete: 0,0 N () 18446744073709551615 + 0 [0]
kworker/u8:66-27973 [000] 21976.806190: scsi_eh_wakeup: host_no=1
<...>-28415 [001] 21976.806290: suspend_resume: dpm_suspend[2] end
...
<...>-28415 [000] 21977.261494: suspend_resume: dpm_resume[16] begin
kworker/u8:31-27938 [002] 21977.271875: scsi_eh_wakeup: host_no=0
kworker/u8:33-27940 [000] 21977.271884: scsi_eh_wakeup: host_no=1
kworker/u8:92-27999 [003] 21977.271928: funcgraph_entry: | sd_resume() {
kworker/u8:92-27999 [003] 21977.271941: block_getrq: 0,0 R 0 + 0 [kworker/u8:92]
kworker/u8:92-27999 [003] 21977.271943: block_rq_insert: 0,0 N 0 () 0 + 0 [kworker/u8:92]
kworker/u8:92-27999 [003] 21977.271945: block_rq_issue: 0,0 N 0 () 0 + 0 [kworker/u8:92]
kworker/u8:92-27999 [003] 21977.271948: block_rq_requeue: 0,0 N () 0 + 0 [0]
kworker/u8:92-27999 [003] 21977.271948: block_rq_insert: 0,0 N 0 () 0 + 0 [kworker/u8:92]
kworker/3:1H-478 [003] 21977.283872: block_rq_issue: 0,0 N 0 () 0 + 0 [kworker/3:1H]
kworker/3:1H-478 [003] 21977.283873: block_rq_requeue: 0,0 N () 0 + 0 [0]
kworker/3:1H-478 [003] 21977.283874: block_rq_insert: 0,0 N 0 () 0 + 0 [kworker/3:1H]
kworker/3:1H-478 [003] 21977.287801: block_rq_issue: 0,0 N 0 () 0 + 0 [kworker/3:1H]
kworker/3:1H-478 [003] 21977.287802: block_rq_requeue: 0,0 N () 0 + 0 [0]
kworker/3:1H-478 [003] 21977.287803: block_rq_insert: 0,0 N 0 () 0 + 0 [kworker/3:1H]
kworker/3:1H-478 [003] 21977.291780: block_rq_issue: 0,0 N 0 () 0 + 0 [kworker/3:1H]
kworker/3:1H-478 [003] 21977.291781: block_rq_requeue: 0,0 N () 0 + 0 [0]
kworker/3:1H-478 [003] 21977.291781: block_rq_insert: 0,0 N 0 () 0 + 0 [kworker/3:1H]
...
kworker/3:1H-478 [003] 21977.811763: block_rq_insert: 0,0 N 0 () 0 + 0 [kworker/3:1H]
kworker/3:1H-478 [003] 21977.818229: block_rq_issue: 0,0 N 0 () 0 + 0 [kworker/3:1H]
kworker/3:1H-478 [003] 21977.818231: block_rq_requeue: 0,0 N () 0 + 0 [0]
kworker/3:1H-478 [003] 21977.818231: block_rq_insert: 0,0 N 0 () 0 + 0 [kworker/3:1H]
<...>-28415 [001] 21977.819038: suspend_resume: dpm_resume[16] end
<...>-28415 [001] 21977.819039: suspend_resume: dpm_complete[16] begin
<...>-28415 [001] 21977.819228: suspend_resume: dpm_complete[16] end
<...>-28415 [001] 21977.819230: suspend_resume: resume_console[3] begin
<...>-28415 [001] 21977.819231: suspend_resume: resume_console[3] end
<...>-28415 [001] 21977.821284: suspend_resume: thaw_processes[0] begin
kworker/3:1H-478 [003] 21977.821775: block_rq_issue: 0,0 N 0 () 0 + 0 [kworker/3:1H]
kworker/3:1H-478 [003] 21977.821778: block_rq_requeue: 0,0 N () 0 + 0 [0]
kworker/3:1H-478 [003] 21977.821779: block_rq_insert: 0,0 N 0 () 0 + 0 [kworker/3:1H]
...
kworker/3:1H-478 [003] 21979.121804: block_rq_insert: 0,0 N 0 () 0 + 0 [kworker/3:1H]
kworker/0:3-27785 [000] 21979.121918: block_getrq: 0,0 R 0 + 0 [kworker/0:3]
kworker/0:3-27785 [000] 21979.121928: block_rq_insert: 0,0 N 255 () 0 + 0 [kworker/0:3]
kworker/0:3-27785 [000] 21979.121930: block_rq_issue: 0,0 N 255 () 0 + 0 [kworker/0:3]
kworker/0:3-27785 [000] 21979.121934: block_rq_requeue: 0,0 N () 0 + 0 [0]
kworker/0:3-27785 [000] 21979.121935: block_rq_insert: 0,0 N 255 () 0 + 0 [kworker/0:3]
scsi_eh_1-107 [000] 21979.122665: block_rq_issue: 0,0 N 255 () 0 + 0 [scsi_eh_1]
scsi_eh_1-107 [000] 21979.122669: scsi_dispatch_cmd_start: host_no=1 channel=0 id=0 lun=0 data_sgl=1 prot_sgl=0 prot_op=0x0 cmnd=(INQUIRY - raw=12 01 00 00 ff 00)
scsi_eh_1-107 [000] 21979.122675: scsi_dispatch_cmd_done: host_no=1 channel=0 id=0 lun=0 data_sgl=1 prot_sgl=0 prot_op=0x0 cmnd=(INQUIRY - raw=12 01 00 00 ff 00) result=(driver=DRIVER_OK host=DID_OK message=COMMAND_COMPLETE status=SAM_STAT_GOOD)
scsi_eh_1-107 [000] 21979.122679: block_rq_issue: 0,0 N 0 () 0 + 0 [scsi_eh_1]
scsi_eh_1-107 [000] 21979.122681: scsi_dispatch_cmd_start: host_no=1 channel=0 id=0 lun=0 data_sgl=0 prot_sgl=0 prot_op=0x0 cmnd=(START_STOP - raw=1b 00 00 00 01 00)
...
...
<...>-7 [000] 21979.122875: block_rq_complete: 0,0 N () 18446744073709551615 + 1 [0]
hdparm-28438 [002] 21979.123335: funcgraph_entry: | sd_ioctl() {
hdparm-28438 [002] 21979.123342: funcgraph_entry: | scsi_cmd_blk_ioctl() {
<idle>-0 [000] 21979.151036: scsi_dispatch_cmd_done: host_no=1 channel=0 id=0 lun=0 data_sgl=0 prot_sgl=0 prot_op=0x0 cmnd=(START_STOP - raw=1b 00 00 00 01 00) result=(driver=DRIVER_OK host=DID_OK message=COMMAND_COMPLETE status=SAM_STAT_GOOD)
<idle>-0 [000] 21979.151040: block_rq_complete: 0,0 N () 18446744073709551615 + 0 [0]
kworker/u8:92-27999 [003] 21979.151083: funcgraph_exit: $ 1879152 us | }
hdparm-28438 [002] 21979.151135: block_getrq: 0,0 R 0 + 0 [hdparm]
hdparm-28438 [002] 21979.151139: block_rq_insert: 8,0 N 0 () 0 + 0 [hdparm]
hdparm-28438 [002] 21979.151141: block_rq_issue: 8,0 N 0 () 0 + 0 [hdparm]
hdparm-28438 [002] 21979.151145: scsi_dispatch_cmd_start: host_no=1 channel=0 id=0 lun=0 data_sgl=0 prot_sgl=0 prot_op=0x0 cmnd=(ATA_16 - raw=85 06 20 00 05 00 fe 00 00 00 00 00 00 40 ef 00)
hdparm-28438 [003] 21979.151250: funcgraph_exit: # 27907.436 us | }
hdparm-28438 [003] 21979.151251: funcgraph_exit: # 27914.313 us | }
# dmesg
...
[21977.269427] sd 1:0:0:0: [sda] Starting disk
...
[21977.816724] PM: resume devices took 0.558 seconds
[21977.818781] OOM killer enabled.
[21977.818782] Restarting tasks ...
...
[21979.032279] ata2: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[21979.120143] ata2.00: configured for UDMA/133

Related

cgroup and process memory usage mismactch

I have a memory cgroup with 1 process in it.
And look for rss memory usage in that cgroup (in memory.stat) and it is much bigger than rss memory of the process (from /proc/[pid]/status).
The only process pid in cgroup:
$ cat /sys/fs/cgroup/memory/karim/master/cgroup.procs
3744924
The memory limit in cgroup:
$ cat /sys/fs/cgroup/memory/karim/master/memory.limit_in_bytes
7340032000
rss of the cgroup is 990 MB:
$ cat /sys/fs/cgroup/memory/karim/master/memory.stat
cache 5990449152
rss 990224384
rss_huge 0
shmem 0
mapped_file 13516800
dirty 1081344
writeback 270336
pgpgin 4195191
pgpgout 2490628
pgfault 5264589
pgmajfault 0
inactive_anon 0
active_anon 990240768
inactive_file 5862830080
active_file 127021056
unevictable 0
hierarchical_memory_limit 7340032000
total_cache 5990449152
total_rss 990224384
total_rss_huge 0
total_shmem 0
total_mapped_file 13516800
total_dirty 1081344
total_writeback 270336
total_pgpgin 4195191
total_pgpgout 2490628
total_pgfault 5264589
total_pgmajfault 0
total_inactive_anon 0
total_active_anon 990240768
total_inactive_file 5862830080
total_active_file 127021056
total_unevictable 0
rss of the process is 165 MB:
$ cat /proc/3744924/status
Name: [main] /h
Umask: 0002
State: S (sleeping)
Tgid: 3744924
Ngid: 0
Pid: 3744924
PPid: 3744912
TracerPid: 0
Uid: 1000 1000 1000 1000
Gid: 1001 1001 1001 1001
FDSize: 256
Groups: 1000 1001
NStgid: 3744924
NSpid: 3744924
NSpgid: 3744912
NSsid: 45028
VmPeak: 2149068 kB
VmSize: 2088876 kB
VmLck: 0 kB
VmPin: 0 kB
VmHWM: 245352 kB
VmRSS: 198964 kB
RssAnon: 165248 kB
RssFile: 33660 kB
RssShmem: 56 kB
VmData: 575400 kB
VmStk: 132 kB
VmExe: 3048 kB
VmLib: 19180 kB
VmPTE: 1152 kB
VmSwap: 0 kB
HugetlbPages: 0 kB
CoreDumping: 0
THP_enabled: 1
Threads: 17
SigQ: 0/241014
SigPnd: 0000000000000000
ShdPnd: 0000000000000000
SigBlk: 0000000000000000
SigIgn: 0000000001001000
SigCgt: 0000000180000002
CapInh: 0000000000000000
CapPrm: 0000000000000000
CapEff: 0000000000000000
CapBnd: 0000003fffffffff
CapAmb: 0000000000000000
NoNewPrivs: 0
Seccomp: 0
Speculation_Store_Bypass: thread vulnerable
Cpus_allowed: fff
Cpus_allowed_list: 0-11
Mems_allowed: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000001
Mems_allowed_list: 0
voluntary_ctxt_switches: 94902
nonvoluntary_ctxt_switches: 1903
Why is there such a big difference?

The trace-cmd report do not show the function name

I used the following command to trace the kernel.
$ trace-cmd record -p function_graph ls
$ trace-cmd report
but I saw the following result just show the address instead of the function name.
MtpServer-4877 [000] 1706.014074: funcgraph_exit: + 23.875 us | }
trace-cmd-4892 [001] 1706.014075: funcgraph_entry: 2.000 us | ffff0000082cdc24();
trace-cmd-4895 [002] 1706.014076: funcgraph_entry: 1.250 us | ffff00000829e704();
MtpServer-4877 [000] 1706.014076: funcgraph_entry: 1.375 us | ffff0000083266bc();
kswapd0-1024 [003] 1706.014078: funcgraph_entry: | ffff00000827956c() {
kswapd0-1024 [003] 1706.014081: funcgraph_entry: | ffff00000827801c() {
trace-cmd-4895 [002] 1706.014081: funcgraph_entry: 1.375 us | ffff0000082bd8b4();
MtpServer-4877 [000] 1706.014082: funcgraph_entry: 1.375 us | ffff0000082ccefc();
trace-cmd-4892 [001] 1706.014082: funcgraph_entry: | ffff0000082c5adc() {
kswapd0-1024 [003] 1706.014084: funcgraph_entry: 1.500 us | ffff00000828c8f0();
trace-cmd-4892 [001] 1706.014085: funcgraph_entry: 1.250 us | ffff0000082c5a58();
MtpServer-4877 [000] 1706.014088: funcgraph_entry: 1.125 us | ffff0000082e3a30();
trace-cmd-4895 [002] 1706.014089: funcgraph_exit: + 19.125 us | }
kswapd0-1024 [003] 1706.014090: funcgraph_entry: 1.500 us | ffff0000090b6c04();
trace-cmd-4895 [002] 1706.014090: funcgraph_entry: | ffff0000082d4ffc() {
trace-cmd-4892 [001] 1706.014092: funcgraph_exit: 6.875 us | }
trace-cmd-4895 [002] 1706.014093: funcgraph_entry: 1.000 us | ffff0000090b3a40();
May I know how to show the exact function name on the trace-cmd result?
I ran into this issue, and for me it was caused by /proc/kallsyms (a file which mappings from kernel addresses to symbol names) displaying all zeros for kernel addresses.
Since this is a procfs "file" it's behavior can, and does vary depending on what context you are accessing it from.
In this case, it is due to security checks.
If you pass the checks, the addresses will be nonzero and look similar to this:
000000000000c000 A exception_stacks
0000000000014000 A entry_stack_storage
0000000000015000 A espfix_waddr
0000000000015008 A espfix_stack
If you fail the checks they will be zero, like this:
(In my case I did not have the CAP_SYSLOG capability because I was running in a container and systemd-nspawn's default behaviour was dropping the capability. [1])
0000000000000000 A exception_stacks
0000000000000000 A entry_stack_storage
0000000000000000 A espfix_waddr
0000000000000000 A espfix_stack
This is influenced by kernel settings and more information (along with the rather simple checking source code) can be found on the answers in Allow single user to access /proc/kallsyms
To fix this issue you need to start your trace-cmd with proper permissions / capabilities.
[1] This is what the capability settings look like for an nspawned process with no modifications:
# cat /proc/2894/status | grep -i cap
CapInh: 0000000000000000
CapPrm: 00000000fdecafff
CapEff: 00000000fdecafff
CapBnd: 00000000fdecafff
CapAmb: 0000000000000000
This is what it looks like for a process started by root in a "normal" context:
# cat /proc/616428/status | grep -i cap
CapInh: 0000000000000000
CapPrm: 000001ffffffffff
CapEff: 000001ffffffffff
CapBnd: 000001ffffffffff
CapAmb: 0000000000000000
Alternative solution
This is speculation on my part, but I expect this manner of censoring in the kallsyms file is to prevent defeating kASLR by just reading addresses of memory locations out of the file.
It seems that if you directly use the tracefs ftrace interfaces, as opposed to reading raw trace events and then resolving symbols (I don't know if there is a way to make trace-cmd do this), then the kernel will resolve the symbol names for you (and this way does not expose kernel addresses) and so this is not an issue.
For example (note how it says !cap_syslog):
# capsh --print
Current: =ep cap_syslog-ep
Bounding set = [...elided for brevity ...]
Ambient set =
Current IAB: !cap_syslog
Securebits: 00/0x0/1'b0 (no-new-privs=0)
secure-noroot: no (unlocked)
secure-no-suid-fixup: no (unlocked)
secure-keep-caps: no (unlocked)
secure-no-ambient-raise: no (unlocked)
uid=0(root) euid=0(root)
gid=0(root)
groups=0(root)
Guessed mode: HYBRID (4)
# echo 1 > events/enable
# echo function_graph > current_tracer
# echo 1 > tracing_on
# head trace
# tracer: function_graph
#
# CPU DURATION FUNCTION CALLS
# | | | | | | |
2) 0.276 us | } /* fpregs_assert_state_consistent */
1) 2.052 us | } /* exit_to_user_mode_prepare */
1) | syscall_trace_enter.constprop.0() {
1) | __traceiter_sys_enter() {
1) | /* sys_futex(uaddr: 7fc3565b9378, op: 80, val: 0, utime: 0, uaddr2: 0, val3: 0) */
1) | /* sys_enter: NR 202 (7fc3565b9378, 80, 0, 0, 0, 0) */

Unexpectedly high latency spikes measured with hwlat

I am using hwlat_detector tracer to measure system latencies that are caused outside the Linux.
The machine used for measurements is Asus Zenbook ux331un. I did disable processor power states with idle=poll kernel cmd-line parameter, hyper threading is disabled, turbo boost is disabled.
Here is an example output of hwlat tracer.
<...>-4976 [002] d...... 3258.677286: #81 inner/outer(us): 31/33 ts:1572267806.608681762
<...>-4976 [003] dn..... 3276.085073: #82 inner/outer(us): 32/34 ts:1572267824.016656104
<...>-4976 [000] d.L.... 3277.113063: #83 inner/outer(us): 32/0 ts:1572267825.044656725
<...>-4976 [003] dn..... 3280.181030: #84 inner/outer(us): 32/32 ts:1572267828.112657027
<...>-4976 [000] dnL.... 3281.205020: #85 inner/outer(us): 32/32 ts:1572267829.136657127
<...>-4976 [002] dn..... 3283.252999: #86 inner/outer(us): 456/33 ts:1572267831.184659048
<...>-4976 [003] dn..... 3284.276986: #87 inner/outer(us): 16/32 ts:1572267832.208656275
<...>-4976 [000] dnL.... 3285.300975: #88 inner/outer(us): 16/33 ts:1572267833.232656714
<...>-4976 [002] dn..... 3287.348959: #89 inner/outer(us): 20/33 ts:1572267835.280662760
<...>-4976 [003] dn..... 3288.372943: #90 inner/outer(us): 33/34 ts:1572267836.304657092
<...>-4976 [000] dnL.... 3289.396938: #91 inner/outer(us): 32/32 ts:1572267837.328663631
<...>-4976 [002] dn..... 3291.444912: #92 inner/outer(us): 20/33 ts:1572267839.376659443
As can be seen form the output, latency over 400us occurs.I am also getting a lot of latencies in 30us range. At first i thought that latencies are caused by SMIs, so i started to read MSR register 0x34 that counts the number of SMIs that occurred since the system was booted. It does not increase while the test is running. What else could be the reason for such latency?

Linux, C, lock analysis

I am writing a TCP socket application.
In the case of high number of TCP connections, I found there are lot of locks:
slock-AF_INET
sk_lock-AF_INET
read rcu_read_loc
I then do a simple stats on those locks:
root#ubuntu1604:~/test-tool-for-linux/src# perf lock script
......
test-tool 21098 [000] 7591.565861: lock:lock_acquire: 0xffff9eea2ffeec20 slock-AF_INET
test-tool 21098 [000] 7591.565865: lock:lock_acquire: 0xffffffff93e68160 read rcu_read_loc
test-tool 21098 [000] 7591.565866: lock:lock_acquire: 0xffff9eea5e804be0 slock-AF_INET
test-tool 21098 [000] 7591.565866: lock:lock_acquire: 0xffff9eea5e804c70 sk_lock-AF_INET
test-tool 21098 [000] 7591.565867: lock:lock_acquire: 0xffff9eea5e804be0 slock-AF_INET
test-tool 21098 [000] 7591.565868: lock:lock_acquire: 0xffffffff93e68160 read rcu_read_loc
test-tool 21098 [000] 7591.565869: lock:lock_acquire: 0xffff9eea5e804be0 slock-AF_INET
test-tool 21098 [000] 7591.565870: lock:lock_acquire: 0xffff9eea5e804c70 sk_lock-AF_INET
test-tool 21098 [000] 7591.565872: lock:lock_acquire: 0xffff9eea5e804be0 slock-AF_INET
test-tool 21098 [000] 7591.565875: lock:lock_acquire: 0xffffffff93e68160 read rcu_read_loc
test-tool 21098 [000] 7591.565876: lock:lock_acquire: 0xffff9eea1d96a0e0 slock-AF_INET
root#ubuntu1604:~/test-tool-for-linux/src# perf lock script > lock.log
Warning:
Processed 9195881 events and lost 224 chunks!
Check IO/CPU overload!
root#ubuntu1604:~/test-tool-for-linux/src# cat lock.log | grep slock-AF_INET | wc -l
604748
root#ubuntu1604:~/test-tool-for-linux/src# cat lock.log | grep sk_lock-AF_INET | wc -l
59991
root#ubuntu1604:~/test-tool-for-linux/src# cat lock.log | grep rcu_read_loc | wc -l
4806082
Could you please help me understand what are those locks and how to potentially fix the high lock usage?
Thanks!

Disable kernel tracer

I have installed Ubuntu 12.04 (32bit). The current tracer is set as nop.
cat current_tracer
nop
Although the current tracer is nop, all these following messages are printing and continuously printed while I am performing other operations. Why is this so happening ? How can I disable to print these messages being printed?
<...>-573 [003] .... 6.304043: do_sys_open: "/etc/modprobe.d/blacklist-firewire.conf" 0 666
<...>-573 [003] .... 6.304055: do_sys_open: "/etc/modprobe.d/blacklist-framebuffer.conf" 0 666
<...>-569 [000] .... 6.304073: do_sys_open: "/run/udev/data/c4:73" 88000 666
<...>-573 [003] .... 6.304077: do_sys_open: "/etc/modprobe.d/blacklist-modem.conf" 0 666
<...>-573 [003] .... 6.304087: do_sys_open: "/etc/modprobe.d/blacklist-oss.conf" 0 666
<...>-573 [003] .... 6.304119: do_sys_open: "/etc/modprobe.d/blacklist-rare-network.conf" 0 666
<...>-573 [003] .... 6.304135: do_sys_open: "/etc/modprobe.d/blacklist-watchdog.conf" 0 666
<...>-573 [003] .... 6.304166: do_sys_open: "/etc/modprobe.d/blacklist.conf" 0 666
<...>-569 [000] .... 6.304180: do_sys_open: "/run/udev/data/c4:73.tmp" 88241 666
<...>-573 [003] .... 6.304190: do_sys_open: "/etc/modprobe.d/vmwgfx-fbdev.conf" 0 666
Thank you in advance.
Have you tried echo 0 > tracing_on?
Have you tried echo notrace_printk > trace_options?
However, if you are concerned about ftrace's overhead, you should do more than this and disable ftrace entirely.
If you are unsure of how to deal with ftrace, you can also look at the trace-cmd command.
In particular, try trace-cmd reset.

Resources