What exactly happens when you download multiple files from firefox? - multithreading

I'm trying to understand what happens when you download multiple files from Firefox at the same time. Is it using multithreading or multiprocessing? Assuming it's using multithreading, how many threads is it using? How can I figure this out?

One way to figure it out is to look at the source.
Also, a script like this might give you a good starting point:
while ps x | grep -i firefox | wc; do sleep 1; done
which prints something like this:
16 392 6607
16 392 6607
16 392 6607
...
then start downloading a file that takes more than a few seconds and see how many new processes it starts.
Depending upon your OS, you might want to consult the man page for ps to figure out how to report threads as well as processes. You might need a little more smarts than grep.
If you need to dive deeper, but don't want to go into the source, most OSes have some mechanism to view a trace of the system calls a program makes. This could be truss, strace, dtrace, ...
There is also a possibility that it uses no threads or processes to download files; it merely relies on select (man 2 select).
Happy hunting.

Related

How to monitor a process in Linux CPU, Memory and time

How can I benchmark a process in Linux? I need something like "top" and "time" put together for a particular process name (it is a multiprocess program so many PIDs will be given)?
Moreover I would like to have a plot over time of memory and cpu usage for these processes and not just final numbers.
Any ideas?
I typically throw together a simple script for this type of work.
Take a look at the kernel documentation for the proc filesystem (Google 'linux proc.txt').
The first line of /proc/stat (Section 1.8 in proc.txt) will give you cumulative cpu usage stats (i.e. user, nice, system, idle, ...). For each process, the file /proc/$PID/stat (Table 1-4 in proc.txt) will provide you with both process-specific cpu usage stats and memory usage stats (see rss).
If you google a bit you'll find plenty of detailed info on these files, and pointers to libraries / apps / code snippets that can help you obtain / derive the values you need. With that in mind, I'll focus on the high-level strategy.
For CPU stats, use your favorite scripting language to create an executable that takes a set of process ids for monitoring. At a fixed interval (ex: 1 second) poll / calculate the cumulative totals for each process and the system as a whole. During each poll interval, write all results on a single line to stdout.
For memory stats, write a similar script, but simply log the per-process memory usage. Memory is a bit easier as we directly obtain the instantaneous values.
Run these script for the duration of your test, passing the set of processes ids that you'd like to monitor and redirecting its output to a log file.
./logcpu $(pidof foo) $(pidof bar) > cpustats
./logmem $(pidof foo) $(pidof bar) > memstats
Import the contents of these files into a spreadsheet (for certain applications this is as easy as copy / paste). For CPU, you are after instantaneous values but have cumulative values, so you'll need to do some minor spreadsheet work to derive these values (it's just the delta 't(x + 1) - t(x)'). Of course you could have your cpu logger write the delta, but you'll be spending a bit more time up front on the script.
Finally, use your spreadsheet to generate a nice plot.
Following are the tools for monitoring a linux system
System commands like top, free -m, vmstat, iostat, iotop, sar, netstat, etc. Nothing comes near these linux utility when you are debugging a problem. These command give you a clear picture that is going inside your server
SeaLion: Agent executes all the commands mentioned in #1 (also user defined) and outputs of these commands can be accessed in a beautiful web interface. This tool comes handy when you are debugging across hundreds of servers as installation is clear simple. And its FREE
Nagios: It is the mother of all monitoring/alerting tools. It is very much customization but very much difficult to setup for beginners. There are sets of tools called nagios plugins that covers pretty much all important Linux metrics
Munin
Server Density: A cloudbased paid service that collects important Linux metrics and gives users ability to write own plugins.
New Relic: Another well know hosted monitoring service.
Zabbix

Benchmark a linux Bash script

Is there a way to benchmark a bash script's performance? the script downloads a remote file, and then makes calls to multiple commandline programs to manipulate. I would like to know (or as much as possible):
Total time
Time spent downloading
Time spent on each command called
-=[ I think these could be wrapped in "time" calls right? ]=-
Average download speed
uses wget
Total Memory used
Total CPU usage
CPU usage per command called
I'm able to make edits to the bash script to insert any benchmark commands needed at specific points (ie, between app calls). Not sure if some "top" ninja-ry could solve this or not. Not able to find anything useful (at least to limited understanding) in man file.
Will be running the benchmarks on OSX Terminal as well as Ubuntu (if either matter).
strace -o trace -c -Ttt ./scrip
-c is to trace the time spent by cpu on specific call.
-Ttt will tell you time in microseconds at time of each system call running.
-o will save output in file "trace".
You should be able to achieve this a number of ways. One way is to use time built-in function for each command of interest and capture the results. You may have to be careful about any pipes and redirects;
You may also consider trapping SIGCHLD, DEBUG, RETURN, ERR and EXIT signals and putting timing information in there, but you may not get some results.
This concept of CPU usage of each command won't give you any thing useful, all commands use 100% of cpu. Memory usage is something you can pull out but you should look at
If you want to get deep process statistics then you would want to use strace... See strace(1) man page for details. I doubt that -Ttt as it is suggest elsewhere is useful all that tells you are system call times and you want other process trace info.
You may also want to see ltrace and dstat tools.
A similar question is answered here Linux benchmarking tools

Is tesseract 3.00 multi-threaded?

I read some other posts suggesting that they would add multi-threading support in 3.00. But I'm not sure if it's added in 3.00 when it was released.
Other than multi-threading, is running multiple processes of tesseract a feasible option to achieve concurrency?
Thanks.
One thing I've done is invoked GNU Parallel to run as many instances of Tess* as able on a multi-core system for multi-page documents converted to single page images.
It's a short program, easily compiled on most Linux distros (I'm using OpenSuSE 11.4).
Here's the command line that I use:
/usr/local/bin/parallel -j 4 \
/usr/local/bin/tesseract -psm 1 -l eng {} {.} \
::: /tmp/tmp/*.jpg
The -j 4 tells parallel to use all four CPU cores that I have on a server.
If you run this, and in another terminal do a 'top,' you'll see up to four processes at one time until it rummages through all of the JPG's in the directory specified.
Your load should never exceed the number of CPU cores in your system (if you run Linux).
Here's the link to GNU Parallel:
http://www.gnu.org/software/parallel/
No. You can browse the code in http://code.google.com/p/tesseract-ocr/source/browse/ None of the current code in trunk seems to make use of multi-threading. (at least looking through the base classes, api, and neural networking classes)
I did use parallel as well, on a Centos, this way:
ls | parallel --gnu "tesseract {} {.}"
I used the --gnu option as suggested from the stdout log which was:
parallel: Warning: YOU ARE USING --tollef. IF THINGS ARE ACTING WEIRD USE --gnu.
the {} and {.} are placeholders for parallel: in this case you're telling tesseract to use the file listed as first argument, and the same file name without extension as second argument - everything is well explained in parallel man pages.
Now, if you have - say - three .tif files and you run tesseract three times, one for each file, summing up the execution time, and then you run the command above with time before parallel, you can easily check the speedup.

How to set core dump naming scheme without su/sudo?

I am developing a MPI program on a Linux machine where I do not have sudo/su access. As my program currently segfaults, I would like to examine the core dumps via gdb. Unfortunately, as the program is multi-threaded, all the threads write to one core dump. So I would like to be able to append the PID to each separate core dump for every process.
I know there is a way to do it via /proc/sys/kernel/core_pattern, however I do not have access to write to this.
Thanks for any help.
It can be a pain to debug MPI apps on systems that are configured this way when you do not have root access. One option for working around this is to use Valgrind to get stack traces for your segfault(s). This will only be useful provided that your application will fail in a reasonable period of time when slowed down via Valgrind, and that it still segfaults at all in this case.
I usually run MPI apps under Valgrind like this:
% mpiexec -n 5 valgrind -q /path/to/my_app
That will send all of the Valgrind output to standard error. But if I want the output separated into different files, then you can get a bit fancier:
% mpiexec -n 5 valgrind -q --log-file='vg_out.%q{PMI_RANK}' /path/to/my_app
That's the setup for MPICH2. I think that for Open MPI you'll need to replace PMI_RANK with OMPI_MCA_ns_nds_vpid, but if that doesn't work for you then you'll need to check with the Open MPI developers on their discussion list. In either case, this will yield N files, where N is the size of MPI_COMM_WORLD, each named vg_out.0, vg_out.1, ..., to vg_out.$(($N-1)), each corresponding to a rank in MPI_COMM_WORLD.

How can I record what process or kernel activity is using the disk in GNU/Linux?

On a particular Debian server, iostat (and similar) report an unexpectedly high volume (in bytes) of disk writes going on. I am having trouble working out which process is doing these writes.
Two interesting points:
Tried turning off system services one at a time to no avail. Disk activity remains fairly constant and unexpectedly high.
Despite the writing, do not seem to be consuming more overall space on the disk.
Both of those make me think that the writing may be something that the kernel is doing, but I'm not swapping, so it's not clear to me what Linux might try to write.
Could try out atop:
http://www.atcomputing.nl/Tools/atop/
but would like to avoid patching my kernel.
Any ideas on how to track this down?
iotop is good (great, actually).
If you have a kernel from before 2.6.20, you can't use most of these tools.
Instead, you can try the following (which should work for almost any 2.6 kernel IIRC):
sudo -s
dmesg -c
/etc/init.d/klogd stop
echo 1 > /proc/sys/vm/block_dump
rm /tmp/disklog
watch "dmesg -c >> /tmp/disklog"
CTRL-C when you're done collecting data
echo 0 > /proc/sys/vm/block_dump
/etc/init.d/klogd start
exit (quit root shell)
cat /tmp/disklog | awk -F"[() \t]" '/(READ|WRITE|dirtied)/ {activity[$1]++} END {for (x in activity) print x, activity[x]}'| sort -nr -k2
The dmesg -c lines clear your kernel log . The logger is then shut off, manually (using watch) dumped to a disk (the memory buffer is small, which is why we need to do this). Let it run for about five minutes or so, and then CTRL-c the watch process. After shutting off the logging and restarting klogd, analyze the results using the little bit of awk at the end.
If you are using a kernel newer than 2.6.20 that is very easy, as that is the first version of Linux kernel that includes I/O accounting. If you are compiling your own kernel, be sure to include:
CONFIG_TASKSTATS=y
CONFIG_TASK_IO_ACCOUNTING=y
Kernels from Debian packages already include these flags, so there is no need for recompiling your kernel. Standard utility for accessing I/O accounting data in real time is iotop(1). It gives you a complete list of processes managed by I/O scheduler, and displays per process statistics for read, write and total I/O bandwidth used.
You may want to investigate iotop for Linux. There are some Solaris versions floating around, but there is a Debian package for example.
You can use the UNIX-command lsof (list open files). That prints out the process, process-id, user for any open file.
You could also use htop, enabling IO_RATR column. Htop is an exelent top replacement.
Brendan Gregg's iosnoop script can (heuristically) tell you about currently using the disk on recent kernels (example iosnoop output).
You could try to use SystemTap , it has a lot of examples , and if I'm not mistaken , it shows how to do this sort of thing .
I've recently heard about Mortadelo, a Filemon clone, but have not checked it out myself yet:
http://gitorious.org/mortadelo

Resources