Profiling for Node.js Application - node.js

I followed this link https://nodejs.org/uk/docs/guides/simple-profiling/, to profile a particular endPoint in my app (Expressjs).
The endpoint will download a pdf from s3 , use a thread pool ( a pool of 4 worker_threads) to fill the pdf with data (I use HummusJS for pdf filling), then upload the filled file to s3 and respond with a signedUrl for the filled file.
The test was done by apache benchmark:
ab -p req.json -T application/json -c 20 -n 2000 http://{endpoint}
The output from profilling was like this :
[Bottom up (heavy) profile]:
Note: percentage shows a share of a particular caller in the total
amount of its parent calls.
Callers occupying less than 1.0% are not shown.
ticks parent name
287597 89.2% epoll_pwait
[Bottom up (heavy) profile]:
Note: percentage shows a share of a particular caller in the total
amount of its parent calls.
Callers occupying less than 1.0% are not shown.
ticks parent name
1515166 98.5% epoll_wait
So, my question is , what does the epoll_wait and epoll_pwait mean, since they are taking almost 100% of CPU time taken by the program ?

See google.com/search?q=epoll_wait.
In short, the thread was waiting for something (maybe the network? maybe another thread?).

Related

Is it possible to monitor all write access to the filesystem of all process under linux

Is it possible to monitor all write access to the filesystem of all process under linux?
I've some different mounted filesystems. A lot of them are tempfs.
I'm interested in all writes to the root filesystem except the tempfs,devtmpfs etc.
I'm looking for something that will output: <PID xy> write n Bytes to /targe/filepath.
What monitoring tool can list all this write syscalls? Can they be filtered by mount points?
iotop (kernel version 2.6.20 or higher) or dstat could help you. E.g. iotop -o -b -d 10 like discussed in this similar thread.
/proc/diskstats has data for all the block devices.
https://www.kernel.org/doc/Documentation/iostats.txt
The /proc/diskstats file displays the I/O statistics of block devices. Each line contains the following 14 fields:
1 - major number
2 - minor mumber
3 - device name
4 - reads completed successfully
5 - reads merged
6 - sectors read
7 - time spent reading (ms)
8 - writes completed
9 - writes merged
10 - sectors written
11 - time spent writing (ms)
12 - I/Os currently in progress
13 - time spent doing I/Os (ms)
14 - weighted time spent doing I/Os (ms)
For more details refer to Documentation/iostats.txt
You can write a SystemTap script to monitor filesystem operations. Maybe you can visit the Brendan D. Gregg's blog, where there are many monitor tools.
fatrace (File Activity Trace)
fatrace reports file access events (Open, Read, Write, Close) from all running processes. Its main purpose is to find processes which keep waking up the disk unnecessarily and thus prevent some power saving.
When running it outputs one line per event in this format:
<timestamp> <processName(id)>: <accessType> </path/to/file>
For example:
23:10:21.375341 Plex Media Serv(2290): W /srv/dev-disk-by-uuid-UID/Plex/Library/Application Support/Plex Media Server/Logs/Plex Media Server.log
From which you easily get the all necessary infos
Timestamp from the --timestamp option
Process name (who is accessing)
File operation (O-pen R-read W-rite C-lose)
Filepath (where is it writing to).
You can limit the search scope with --current-mount to only record events on partition/mount of current directory.
So simply cd into the volume which corresponds to your spinning HDD first, and there run ftrace with the --current-mount option.
Without this option, all (real) partitions/mount points are being watched.
Very practical
With it I found out easily that the reason why my NAS disk was spinning 24/7 also when nobody accessed the NAS and also no maintenance tasks where about to run was unnecessary logging of the Plex Media Server.

How do you determine which process is using up Linux aio context capacity?

In Linux, you can read the value of /proc/sys/fs/aio-nr and this returns the total no. of events allocated across all active aio contexts in the system. The max value is controlled by /proc/sys/fs/aio-max-nr.
Is there a way to tell which process is responsible for allocating these aio contexts?
There isn't a simple way. At least, not that I've ever found! However, you can see them being consumed and freed using systemtap.
https://blog.pythian.com/troubleshooting-ora-27090-async-io-errors/
Attempting to execute the complete script in that article produced errors on my Centos 7 system. But, if you just take the first part of it, the part that logs allocations, it may give you enough insight:
stap -ve '
global allocated, allocatedctx
probe syscall.io_setup {
allocatedctx[pid()] += maxevents; allocated[pid()]++;
printf("%d AIO events requested by PID %d (%s)\n",
maxevents, pid(), cmdline_str());
}
'
You'll need to coordinate things such that systemtap is running before your workload kicks in.
Install systemtap, then execute the above command. (Note, I've altered this slightly from the linked article to removed the unused freed symbol.) After a few seconds, it'll be running. Then, start your workload.
Pass 1: parsed user script and 469 library scripts using 227564virt/43820res/6460shr/37524data kb, in 260usr/10sys/263real ms.
Pass 2: analyzed script: 5 probes, 14 functions, 101 embeds, 4 globals using 232632virt/51468res/11140shr/40492data kb, in 80usr/150sys/240real ms.
Missing separate debuginfos, use: debuginfo-install kernel-lt-4.4.70-1.el7.elrepo.x86_64
Pass 3: using cached /root/.systemtap/cache/55/stap_5528efa47c2ab60ad2da410ce58a86fc_66261.c
Pass 4: using cached /root/.systemtap/cache/55/stap_5528efa47c2ab60ad2da410ce58a86fc_66261.ko
Pass 5: starting run.
Then, once your workload starts, you'll see the context requests logged:
128 AIO events requested by PID 28716 (/Users/blah/awesomeprog)
128 AIO events requested by PID 28716 (/Users/blah/awesomeprog)
So, not as simple as lsof, but I think it's all we have!

Measure CPU usage of multithreaded program

In a GCC bug report (Bug 51617), the poster times his asynchronous (C++11) program's execution with an output looking like this:
/tmp/tst 81.54s user 0.23s system 628% cpu 13.001 total
What could the poster be using which gives that (or similar) output?
NB: An inspection of my man entry for time doesn't suggest anything useful to this end.
The time(1) man page on my (Ubuntu) Linux system says:
-f FORMAT, --format FORMAT
Use FORMAT as the format string that controls the output of
time. See the below more information.
:
FORMATTING THE OUTPUT
The format string FORMAT controls the contents of the time output. The
format string can be set using the `-f' or `--format', `-v' or `--ver‐
bose', or `-p' or `--portability' options. If they are not given, but
the TIME environment variable is set, its value is used as the format
string. Otherwise, a built-in default format is used. The default
format is:
%Uuser %Ssystem %Eelapsed %PCPU (%Xtext+%Ddata %Mmax)k
%Iinputs+%Ooutputs (%Fmajor+%Rminor)pagefaults %Wswaps
:
The resource specifiers, which are a superset of those recognized by
the tcsh(1) builtin `time' command, are:
% A literal `%'.
C Name and command line arguments of the command being
timed.
D Average size of the process's unshared data area, in
Kilobytes.
E Elapsed real (wall clock) time used by the process, in
[hours:]minutes:seconds.
F Number of major, or I/O-requiring, page faults that oc‐
curred while the process was running. These are faults
where the page has actually migrated out of primary memo‐
ry.
I Number of file system inputs by the process.
K Average total (data+stack+text) memory use of the
process, in Kilobytes.
M Maximum resident set size of the process during its life‐
time, in Kilobytes.
O Number of file system outputs by the process.
P Percentage of the CPU that this job got. This is just
user + system times divided by the total running time. It
also prints a percentage sign.
R Number of minor, or recoverable, page faults. These are
pages that are not valid (so they fault) but which have
not yet been claimed by other virtual pages. Thus the
data in the page is still valid but the system tables
must be updated.
S Total number of CPU-seconds used by the system on behalf
of the process (in kernel mode), in seconds.
U Total number of CPU-seconds that the process used direct‐
ly (in user mode), in seconds.
W Number of times the process was swapped out of main memo‐
ry.
X Average amount of shared text in the process, in Kilo‐
bytes.
Z System's page size, in bytes. This is a per-system con‐
stant, but varies between systems.
c Number of times the process was context-switched involun‐
tarily (because the time slice expired).
e Elapsed real (wall clock) time used by the process, in
seconds.
k Number of signals delivered to the process.
p Average unshared stack size of the process, in Kilobytes.
r Number of socket messages received by the process.
s Number of socket messages sent by the process.
t Average resident set size of the process, in Kilobytes.
w Number of times that the program was context-switched
voluntarily, for instance while waiting for an I/O opera‐
tion to complete.
x Exit status of the command.
So you can get CPU percentage as %P in the format.
Note that this is for the /usr/bin/time binary -- the shell time builtin is usually different (and less capable)

How could uptime command show a cpu time greater than 100%?

Running an application with an infinite loop (no sleep, no system calls inside the loop) on a linux system with kernel 2.6.11 and single core processor results in a 98-99% cputime consuming. That's normal.
I have another single thread server application normally with an average sleep of 98% and a maximum of 20% of cputime. Once connected with a net client, the sleep average drops to 88% (nothing strange) but the cputime (1 minute average) rises constantly but not immediately over the 100%... I saw also a 160% !!?? The net traffic is quite slow (one small packet every 0.5 seconds). Usually the 15 min average of uptime command shows about 115%.
I ran also gprof but I did not find it useful... I have a similar scenario:
IDLE SCENARIO
%time name
59.09 Run()
25.00 updateInfos()
5.68 updateMcuInfo()
2.27 getLastEvent()
....
CONNECTED SCENARIO
%time name
38.42 updateInfo()
34.49 Run()
10.57 updateCUinfo()
4.36 updateMcuInfo()
3.90 ...
1.77 ...
....
None of the listed functions is directly involved with client communication
How can you explain this behaviour? Is it a known bug? Could be the extra time consumed in kernel space after a system call and lead to a calculus like
%=100*(apptime+kerneltime)/(total apps time)?

How much data can be fetched by submit_bio() at a time

Here is my LAN structure
I want to download a .zip file of 258.6MB from the samba server, meanwhile, start a profiling for the router's linux stack just before the download.
When finished, stopped the profiling and I found this in the porfiling report
samples % image name app name symbol name
...
16 0.0064 vmlinux smbd submit_bio
...
The sampling rate is 100000 and the event is CPU_CYCLES.
Because this is the first download of the file that is to say it is not in the page cache, submit_bio() should be pretty busy. Thus, I don't understand why there is just a poor portion of submit_bio(). Is that mean each time the submit_bio is called, we fetch about (258.6/16)MB data?
Thanks
That's statistical sampling. It means of the x times the profiler sampled the system, 16 times it happened to find the CPU running in submit_bio(). It does not mean that submit_bio() is called 16 times.

Resources