How to benchmark Linux threaded programs? - multithreading

I'm trying to compare the performance of threaded programs (on Linux). Since the programs use different thread synchronization methods and different lock granularity, running the programs on a shared server or desktop would not be good, since the other tasks may interfere with the scheduling of my programs. I don't have dedicated hosts, so I thought that using qemu would be a good option.
What I want to know is:
Are there any alternatives for this task?
I suppose that there is no way to reproduce scheduling done by guest Linux system on qemu, if
I - need to? (Suppose my program goes unusually skow or fast -- I'd like to know if I can run it again, but keeping exactly the same scheduling for its threads). Or is there a way?

Related

Multithreded applications on different CPUS

If, for example, there is a let's say embedded application which run on unicore CPU. And then that application would be ported on multi core CPU. Would that app run on single or multiple cores?
To be more specific I am interested in ARM CPU (but not only) and toolchain specifics e. g. standard C/C++ libraries.
The intention of this question is this: is it CPU's responsibility to "decide" to execute on multiple cores or compiler toolchain, developer and standard platfor specific libraries? And again, I am interested also in other systems' tendencies out there.
There are plenty of applications and RTOS (for example Linux) that run on different CPUs but the same architecture, so does that mean that they are compiled differently?
Generally speaking single-threaded code will always run on one core. To take advantage of multiple cores you need to have either multiple processes, multiple threads, or both.
There's nothing your compiler can do to help you here. This is an architectural consideration.
If you have multiple threads, for example, most multi-core systems will run them on whatever cores are available if the operating system you're running is properly compiled to support that. Running an OS that's been compiled single-core only will obviously limit your options here.
A single threaded program will run in one thread. It is theoretically possible for the thread to be scheduled to move to a different core, but the scheduler cannot turn a single thread into multiple threads and give you any parallel processing.
EDIT
I misunderstood your question. If there are multiple threads in the application, and that application is binary compatible with the new multicore CPU, the threads will indeed be scheduled to run on different CPUs, if the OS scheduler deems it appropriate.
Well it all depends on the software that if it wants to utilize other cores or not (if present). Lets take an example of Linux on ARM's cortexA53.
Initially a vendor provided boot loader runs on, FSBL (First state bootloader). It then passes control to Arm trusted firmware. ATF then runs uboot. All these run on a single core. Then uboot loads linux kernel and passes control to it. Linux then initializes some stuff and looks into some option, first in the bootargs for smp or nosmp flags. if smp it will get the number of CPUs assigned to it from dtb and then using SMC calls to ATF it will start other cores and then assign work to those cores to provide true feel of multiprocessing environment. This is normally called load balancing and in linux it is mostly done in fair.c file.

virtual machine or dual boot when measuring code performance

i am trying to measure code performance (basically speed-up when using threads). So far i was using cygwin via windows or linux on separate machine. Now i have the ability to set up a new system and i am not sure whether i should have dual boot (windows and ubuntu) or a virtual machine.
My concern is whether i can measure reliable speed up and possibly other stuff (performance monitors) via a linux virtual machine or if i have to go with with normal booting in linux.
anybody have an opinion?
If your "threading" relies heavily on scheduling, I won't recommend you to use VM. VM is just a normal process from the host OS's point of view, so the guest kernel and its scheduler will be affected by scheduling by the host kernel.
If your "threading" is more like parallel computation, I think it's OK to use VM.
For me, it is much safer to boot directly on the system and avoid using a VM in your case. Even when you don't use a VM, it is already hard to have twice the same results in multi-threading because the system being used for OS tasks, so having 2 OS running in the same time as for VM even increases the uncertainty on the results. For instance, running your tests 1000 times on a VM would lead to, let's say, 100 over-estimated time, while it would maybe be only 60 on a lonely OS. It is your call to know if this uncertainty is acceptable or not.

Linux/Windows: User mode vs Privilege mode time spent

How much percentage of time CPU spends in user mode vs privilege mode for different programs/operations.
Different Operations could be:
- running application without I/O interaction.
- application with I/O interaction like copying a file to USB
I know for a fact that Network operating system spends most of the time in interrupt context. Does this hold true for general purpose OS like Ubuntu/Windows?
I'm not much of an OS expert but I imagine it will depend a great deal on what background processes are running on the system. On any OS you might or might not be running some system (i.e. non-user) processes that are heavy resource users. Or you might have put some effort into stripping the system down so that very little CPU time is being used by the system for background maintenance.
If your question is how things compare for "clean" installations of these operating systems then all I can tell you is that on my laptop running Ubuntu right now (running top from the command line to look at resource usage) only about 5-10% of CPU time is being used by non-user processes; in my case Xorg and compiz are the main ones. I don't really know how that compares to Windows, but I think most linux users have a knee jerk reaction that Windows is greedier for system resources than most linux distros.
So, I guess the short answer is that I doubt there is a short answer to your question.

Looking for a Linux threadpool api with OS scheduler support

I'm looking for a thread pool abstraction in Linux that provides the same level of kernel scheduler support that the Win32 thread pool provides. Specifically, I'm interested in finding a thread pool that maintains a certain number of running threads. When a running pool thread blocks on I/O, I want the thread pool to be smart enough to start another thread running.
Anyone know of anything like this for linux?
You really can't do this without OS support. There's no good way to tell that a thread is blocked on I/O. You wind up having to atomically increment a counter before each operation that might block and decrement it after. Then you need a thread to monitor that counter and create an additional thread if it's above zero. (Remove threads if they're idle more than a second or so.)
Generally speaking, it's not worth the effort. This only works so well on Windows because it's the "Windows way" and Windows is built from the ground up for it. For Linux, you should be using epoll or boost::asio. Use something that does things the "Linux way" rather than trying to make the Windows way work on non-Windows operating systems.
You can write your own wrappers that use IOCP on Windows, epoll on Linux, and so on. But these already exist, so you need not bother.

Designing multithread application (looking for design patterns)

I'm preparing to write a multithread network application. At the moment I'm wondering what's the best thread pattern for my program. Whole application will handle up to 1000 descriptors (local files, network connections on various protocols and additional descriptors for timers and signals handling). Application will be optimized for Linux. Program will run on regular personal computers, so I assume, that they will have at least Pentium 4.
Here's my current idea:
One thread will handle network I/O
using epoll.
Second thread will
handle local-like I/O (disk I/O,
timers, signal handling) using epoll
Third thread
will handle UI (CLI, GTK+ or Qt)
Handling each network connection in separate thread will kill CPU because of too many context switches.
Maybe there's better way to do this?
Do you know any documents/books about designing multirhread applications? I'm looking for answers on questions like: What's the rational number of threads? etc.
You're on the right track. You want to use a thread pool pattern to handle the networking rather than one thread per network connection.
This website may also be helpful to you and lists the most common design patterns and in what situations they can be used.
http://sourcemaking.com/design_patterns/
To handle the disk I/O you might like to consider using mmap under linux. It's very fast and efficient. That way, you will let the kernel do the work and you probably won't need a separate thread for that.
I'm currently playing with Boost::asio which seems to be quite good. It uses epoll on linux. As it appears you are using a cross platform gui toolkit like Qt, then boost asio will also provide cross platform support so you will be able to use it on windows or linux. I think there might be a cross platform mmap too.

Resources