If I run top -p $(pgrep -d',' scrapy) I get information on the scrapy process, but this process probably triggers other python related processes. How can I get information on these processes as well in real time as the top command does?
Thanks,
Dani
What you're looking for is a program or script that will gather the CPU usage of all child processes spawned by scrapy.
If you wanted to script this yourself, you could look at the output of ps -p {scrapy pid} -L to get all the threads spawned by the instantiation of scrapy.
Or, you could chain together a couple Linux commands to have a one-liner:
ps -C scrapy -o pcpu= | awk '{cpu_usage+=$1} END {print cpu_usage}'
ps:
-C specifies the command name to output
-o pcou= tells ps to only display cpu usage
awk:
{cpu_usage+=$1} END loops over the response from ps
{print cpu_usage} will send the sum to STDOUT.
How to get the information of a specific process given its process ID using the command 'ps' in Linux. I also want to get the proportion of memory the process occupies.
Is that 'ps processID' ?
You could use
pmap $PID
or perhaps
cat /proc/$PID/maps
and/or
cat /proc/$PID/status
See proc(5) for details.
ps -o pmem h -p processID
pmem: Ratio of the process's resident set size to the physical memory on the machine, expressed as a percentage.
I'm working on a simulation model, where I want to determine when the storage IOPS capacity becomes a bottleneck (e.g. and HDD has ~150 IOPS, while an SSD can have 150,000). So I'm trying to come up with a way to benchmark IOPS in a command (git) for some of it's different operations (push, pull, merge, clone).
So far, I have found tools like iostat, however, I am not sure how to limit the report to what a single command does.
The best idea I can come up with is to determine my HDD IOPS capacity, use time on the actual command, see how long it lasts, multiply that by IOPS and those are my IOPS:
HDD ->150 IOPS
time df -h
real 0m0.032s
150 * .032 = 4.8 IOPS
But, this is of course very stupid, because the duration of the execution may have been related to CPU usage rather than HDD usage, so unless usage of HDD was 100% for that time, it makes no sense to measure things like that.
So, how can I measure the IOPS for a command?
There are multiple time(1) commands on a typical Linux system; the default is a bash(1) builtin which is somewhat basic. There is also /usr/bin/time which you can run by either calling it exactly like that, or telling bash(1) to not use aliases and builtins by prefixing it with a backslash thus: \time. Debian has it in the "time" package which is installed by default, Ubuntu is likely identical, and other distributions will be quite similar.
Invoking it in a similar fashion to the shell builtin is already more verbose and informative, albeit perhaps more opaque unless you're already familiar with what the numbers really mean:
$ \time df
[output elided]
0.00user 0.00system 0:00.01elapsed 66%CPU (0avgtext+0avgdata 864maxresident)k
0inputs+0outputs (0major+261minor)pagefaults 0swaps
However, I'd like to draw your attention to the man page which lists the -f option to customise the output format, and in particular the %w format which counts the number of times the process gave up its CPU timeslice for I/O:
$ \time -f 'ios=%w' du Maildir >/dev/null
ios=184
$ \time -f 'ios=%w' du Maildir >/dev/null
ios=1
Note that the first run stopped for I/O 184 times, but the second run stopped just once. The first figure is credible, as there are 124 directories in my ~/Maildir: the reading of the directory and the inode gives roughly two IOPS per directory, less a bit because some inodes were likely next to each other and read in one operation, plus some extra again for mapping in the du(1) binary, shared libraries, and so on.
The second figure is of course lower due to Linux's disk cache. So the final piece is to flush the cache. sync(1) is a familiar command which flushes dirty writes to disk, but doesn't flush the read cache. You can flush that one by writing 3 to /proc/sys/vm/drop_caches. (Other values are also occasionally useful, but you want 3 here.) As a non-root user, the simplest way to do this is:
echo 3 | sudo tee /proc/sys/vm/drop_caches
Combining that with /usr/bin/time should allow you to build the scripts you need to benchmark the commands you're interested in.
As a minor aside, tee(1) is used because this won't work:
sudo echo 3 >/proc/sys/vm/drop_caches
The reason? Although the echo(1) runs as root, the redirection is as your normal user account, which doesn't have write permissions to drop_caches. tee(1) effectively does the redirection as root.
The iotop command collects I/O usage information about processes on Linux. By default, it is an interactive command but you can run it in batch mode with -b / --batch. Also, you can a list of processes with -p / --pid. Thus, you can monitor the activity of a git command with:
$ sudo iotop -p $(pidof git) -b
You can change the delay with -d / --delay.
You can use pidstat:
pidstat -d 2
More specifically pidstat -d 2 | grep COMMAND or pidstat -C COMMANDNAME -d 2
The pidstat command is used for monitoring individual tasks currently being managed by the Linux kernel. It writes to standard output activities for every task selected with option -p or for every task managed by the Linux kernel if option -p ALL has been used. Not selecting any tasks is equivalent to specifying -p ALL but only active tasks (tasks with non-zero statistics values) will appear in the report.
The pidstat command can also be used for monitoring the child processes of selected tasks.
-C commDisplay only tasks whose command name includes the stringcomm. This string can be a regular expression.
Currently, I am taking up the long method of doing this by getting a list of processes using the following command
sudo ps -eo pid,command | grep -v grep | awk '{print $1}' > pids.txt
And then iterating through the process ids and executing in background the strace of each process and generating logs for each process with the process id in the log's extension
filename="$1"
while read -r line
do
chmod +x straceProgram.sh
./straceProgram.sh $line &
done < "$filename"
straceProgram.sh
pid="$1"
sudo strace -p $pid -o log.$pid
However, the problem with this approach is that if there is any new process which gets started, it will not be straced since the strace is on the process ids stored in the pids.txt during the first run.
The list of pids.txt can be updated with new process ids, however, I was inquisitive on running a strace at an operating system level which would strace all the activities being performed.
Could there be a better way to do this?
If your resulting filesystem is going to be a kernel filesystem driver, I would recommend using tracefs to gather the information you require. I would recommend against making this a kernel filesystem unless you have a lot of time and a lot of testing resources. It is not trivial.
If you want an easier, safer alternative, write your filesystem using fuse. The downside is that performance is not quite as good and there are a few places where it cannot be used, but it is often acceptable. Note that there is already an implementation of a logging filesystem under fuse.
use the strace -f (fork) option, also I suggest the -s 9999 for more details
I need some command line utility able to run specified command and measure process group memory usage at peak and average (RSS, virtual and shared). As I understand that should be a combination of ptrace(2) and libprocps, but I can't find anything similar.
Any ideas?
/usr/bin/time -f "max RSS: %MKb" <command>
See man time for more details.