BASH - is it possible to check memory usage of command? - linux

I want to check how much resources took each command in my script
example code
#!/bin/bash
mkdir /tmp/{1..100}
touch /tmp/{1..100}
I want to know what resource was taken by mkdir and what by touch

You can use the command perf, and it's really complete.
Intall it with those 2 commands:
sudo apt install linux-tools-common
sudo apt install linux-tools-5.4.0-47-generic
test#test:~/work$ sudo perf stat ls
perf.data
Performance counter stats for 'ls':
0.83 msec task-clock # 0.793 CPUs utilized
2 context-switches # 0.002 M/sec
0 cpu-migrations # 0.000 K/sec
113 page-faults # 0.136 M/sec
<not supported> cycles
<not supported> instructions
<not supported> branches
<not supported> branch-misses
0.001051057 seconds time elapsed
0.000989000 seconds user
0.000000000 seconds sys
This my repository
test#test:~$ cat /etc/apt/sources.list
deb http://fr.archive.ubuntu.com/ubuntu focal-security universe
deb http://fr.archive.ubuntu.com/ubuntu focal-security multiverse
My distribution
test#test:~$ uname -a
Linux test 5.4.0-47-generic #51-Ubuntu SMP Fri Sep 4 19:50:52 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

Related

How to enable hardware events to use in the perf program?

I'm running the following command line using perf
$ perf stat ./program
[...]
<not supported> instructions
<not supported> branches
<not supported> branch-misses
<not supported> cycles
[...]
$ cat /proc/sys/kernel/perf_event_paranoid
3
How do I enable these events? Mainly instructions and cycles?
I already changed the value to -1 in the /proc/sys/kernel/perf_event_paranoid file, but it keeps the events off, when I reboot, it goes back to the original value which is 3.

Linux perf record not generating any samples

I am trying to profile my userspace program on aria10 fpga board (with 2 ARM Cortex A9 CPUs) which has PMU support. I am running windriver linux version 9.x. I built my kernel with almost all of the CONFIG_ options people suggested over the internet. Also, my pgm is compiled with –fno-omit-frame-pointer and –g options.
What I see is that ‘perf record’ doesn’t generate any samples at all. ‘perf stat true’ output looks to be valid though (not sure what to make out of it). Does anyone have suggestion/ideas why I am not seeing any sample being generated?
~: perf record --call-graph dwarf -- my_app
^C
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.003 MB perf.data ]
~: perf report -g graph --no-children
Error:
The perf.data file has no samples!
To display the perf.data header info, please use --header/--header-only options.
~: perf stat true
Performance counter stats for 'true':
1.095300 task-clock (msec) # 0.526 CPUs utilized
0 context-switches # 0.000 K/sec
0 cpu-migrations # 0.000 K/sec
22 page-faults # 0.020 M/sec
1088056 cycles # 0.993 GHz
312708 instructions # 0.29 insn per cycle
29159 branches # 26.622 M/sec
16386 branch-misses # 56.20% of all branches
0.002082030 seconds time elapsed
I don't use a VM in this setup. Arria10 is intel FPGA with 2 ARM CPUs that supports PMU.
Edit:
1. I realize now that ARM CPU has HW PMU support (opposite to what I mentioned earlier). Even with HW PMU support, I am not able to do 'perf record' successfully.
This is an old question, but for people who find this via search:
perf record -e cpu-clock <command>
works for me. The problem seems to be that th default event (cycles) is not available

Why does perf show that sleep takes all cores?

I am trying to familiarize myself with perf and run it against various programs I wrote.
When I launch it against program that is 100% single threaded, perf shows that it takes two cores on machine (task-clock event).
Here's the example output:
perf stat -a --per-core python3 test.py
Performance counter stats for 'system wide':
S0-C0 1 19004.951263 task-clock (msec) # 1.000 CPUs utilized (100.00%)
S0-C0 1 5,582 context-switches (100.00%)
S0-C0 1 19 cpu-migrations (100.00%)
S0-C0 1 3,746 page-faults
S0-C0 1 <not supported> cycles
S0-C0 1 <not supported> stalled-cycles-frontend
S0-C0 1 <not supported> stalled-cycles-backend
S0-C0 1 <not supported> instructions
S0-C0 1 <not supported> branches
S0-C0 1 <not supported> branch-misses
S0-C1 1 19004.950059 task-clock (msec) # 1.000 CPUs utilized (100.00%)
S0-C1 1 6,752 context-switches (100.00%)
S0-C1 1 25 cpu-migrations (100.00%)
S0-C1 1 935 page-faults
S0-C1 1 <not supported> cycles
S0-C1 1 <not supported> stalled-cycles-frontend
S0-C1 1 <not supported> stalled-cycles-backend
S0-C1 1 <not supported> instructions
S0-C1 1 <not supported> branches
S0-C1 1 <not supported> branch-misses
19.004688019 seconds time elapsed
It even shows that simple sleep command takes two cores on my computer and I can't explain this. I understand that OS scheduler can reassign active core for any process, but in this case CPU utilization would reflect that.
Can anyone explain this?
According to man page of perf stat subocmmand, you have -a option to profile full system:
http://man7.org/linux/man-pages/man1/perf-stat.1.html
-a, --all-cpus
system-wide collection from all CPUs (default if no target is
specified)
In this "system-wide" mode perf stat (and perf record too) will count events on (or profile for record) all CPUs in the system. When used without additional argument of command, perf will run until interrupted by Ctrl-C. With argument of command, perf will count/profile until the command works. Typical usage is
perf stat -a sleep 10 # Profile counting every CPU for 10 seconds
perf record -a sleep 10 # Profile with cycles every CPU for 10 seconds to perf.data
For getting stats of single command use single process profiling (without -a option)
perf stat python3 test.py
For profiling (perf record) you may run without -a option; or you may use -a and later do some manual filtering in perf report, focusing only on the pids/tids/dsos of your application (This can be very useful if command to profile uses some interprocess requests to other daemons to do lot of CPU work).
--per-core, -A, -C <cpulist>, --per-socket options are only for system-wide -a mode. Try --per-thread with -p pid attach to process option.

perf get time elasped with field separator option

I have a program which parses the output of the linux command perf. It requires the use of option -x, (the field separator option. I want to extract elapsed time (not task-time or cpu-clock) using perf. However when I use the -x option, the elapsed time is not present in the output and I cannot find a corresponding perf event. Here are the sample outputs
perf stat ls
============
Performance counter stats for 'ls':
0.934889 task-clock (msec) # 0.740 CPUs utilized
6 context-switches # 0.006 M/sec
0 cpu-migrations # 0.000 K/sec
261 page-faults # 0.279 M/sec
1,937,910 cycles # 2.073 GHz
<not supported> stalled-cycles-frontend
<not supported> stalled-cycles-backend
1,616,944 instructions # 0.83 insns per cycle
317,016 branches # 339.095 M/sec
12,439 branch-misses # 3.92% of all branches
0.001262625 seconds time elapsed //here we have it
Now with field separator option
perf stat -x, ls
================
2.359807,task-clock
6,context-switches
0,cpu-migrations
261,page-faults
1863028,cycles
<not supported>,stalled-cycles-frontend
<not supported>,stalled-cycles-backend
1670644,instructions
325047,branches
12251,branch-misses
Any help is appreciated
# perf stat ls 2>&1 >/dev/null | tail -n 2 | sed 's/ \+//' | sed 's/ /,/'
0.002272536,seconds time elapsed
Starting with kernel 5.2-rc1, a new event called duration_time is exposed by perf statto solve exactly this problem. The value of this event is exactly equal to the time elapsed value, but the unit is nanoseconds instead of seconds.

Perf does not support some performance events

I want to measure stalled cycles for my application using perf.
When I try: perf stat -B dd if=/dev/zero of=/dev/null count=1000000
1000000+0 records in
1000000+0 records out
512000000 bytes (512 MB) copied, 0.218456 s, 2.3 GB/s
Performance counter stats for 'dd if=/dev/zero of=/dev/null count=1000000':
218.420011 task-clock # 0.995 CPUs utilized
25 context-switches # 0.000 M/sec
1 CPU-migrations # 0.000 M/sec
255 page-faults # 0.001 M/sec
821,183,099 cycles # 3.760 GHz
<not supported> stalled-cycles-frontend
<not supported> stalled-cycles-backend
1,526,427,190 instructions # 1.86 insns per cycle
292,281,624 branches # 1338.163 M/sec
1,013,837 branch-misses # 0.35% of all branches
0.219551862 seconds time elapsed
As you can see, I'm getting for stalled-cycles* events. I couldn't find a solution or explanation for this online.
My kernel version is 3.2.0-59, perf version is 3.2.54, and my CPU is an i7-3770.

Resources