I had some problems with a cilk++ program that works well on windows system but not on linux system:
on windows system, while increasing the number of threads the execution time decrease
but on linux system, while increasing the number of threads the execution time increase.
I used linux ubuntu 2.6.35-22-generic x86_64 GNU/Linux
I can't understand the source of the problem.So can someone help me please ?

Without sources, there's no way to know. There may be a resource that has a per-thread implementation on Windows and a shared implementation on Linux.
I'd recommend using a performance analyzer like Intel's VTune/Amplifier to figure out where your application is spending it's time.
- Barry Tannenbaum
Intel Cilk Plus Runtime Development


So, from my understanding, there are two types of programs, those that are interpreted and those that are compiled. Interpreted programs are executed by an interpreter that is a native application for the platform its on, and compiled programs are themselves native applications (or system software) for the platform they are on.
But my question is this: is anything besides the kernel actually being directly run by the CPU? A Windows Executable is a "Windows Executable", not an x86 or amd64 executable. Does that mean every other process that's not the kernel is literally being interpreted by the kernel in the same way that a browser interprets Javascript? Or is the kernel placing these processes on the "bare metal" that the kernel sits on top of?
IF they're on the "bare metal", how, say does Windows know that a program is a windows program and not a Linux program, since they're both compiled for amd64 processors? If it's because of the "format" of the executable, how is that executable able to run on the "bare metal", since, to me, the fact that it's formatted to run on a particular OS would mean that some interpretation would be required for it to run.
They run on the "bare metal", but they do contain operating system-specific things. An executable file will typically provide some instructions to the kernel (which are, arguably, "interpreted") as to how the program should be loaded into memory, and the file's code will provide ways for it to "hook" in to the running operating system, such as by an operating system's API or via device drivers. Once such a non-interpreted program is loaded into memory, it runs on the bare metal but continues to communicate with the operating system, which is also running on the bare metal.
In the days of single-process operating systems, it was common for executables to essentially "seize" control of the entire computer and communicate with hardware directly. Computers like the Apple ][ and the Commodore 64 work like that. In a modern multitasking operating system like Windows or Linux, applications and the operating system share use of the CPU via a complex multitasking arrangement, and applications access the hardware via a set of abstractions built in to the operating system's API and its device drivers. Take a course in Operating System design if you are interested in learning lots of details.
Bouncing off Junaid's answer, the way that the kernel blocks a program from doing something "funny" is by controlling the allocation and usage of memory. The kernel requires that memory be requested and accessed through it via its API, and thus protects the computer from "unauthorized" access. In the days of single-process operating systems, applications had much more freedom to access memory and other things directly, without involving the operating system. An application running on an old Apple ][ can read to or write to any address in RAM that it wants to on the entire computer.
One of the reasons why a compiled application won't just "run" on another operating system is that these "hooks" are different for different operating systems. For example, an application that knows how to request the allocation of RAM from Windows might not have any idea how to request it from Linux or the Mac OS. As Disk Crasher mentioned, these low level access instructions are inserted by the compiler.
I think you are confusing things. A compiled program is in machine readable format. When you run the program, kernel will allocate memory, cpu etc and ensure that the program does not interfere with other programs. If the program requires access to HW resources or disk etc, the kernel will handle it so kernel will always be between hardware and any software you run in user space.
If the program is interpreted, then a relevant interpreter for that language will convert the code to machine readable on the fly and kernel will still provide the same functionality like access to hardware and making sure programs aren't doing anything funny like trying to access other program memory etc.
The only thing that runs on "bare metal" is assembly language code, which is abstracted from the programmer by many layers in the OS and compiler. Generally speaking, applications are compiled to an OS and CPU architecture. They will not run on other OS's, at least not without a compatible framework in place (e.g. Mono on Linux).
Back in the day a lot of code used to be written on bare metal using macro assemblers, but that's pretty much unheard of on PCs today. (And there was even a time before macro assemblers.)

How much percentage of time CPU spends in user mode vs privilege mode for different programs/operations.
Different Operations could be:
- running application without I/O interaction.
- application with I/O interaction like copying a file to USB
I know for a fact that Network operating system spends most of the time in interrupt context. Does this hold true for general purpose OS like Ubuntu/Windows?
I'm not much of an OS expert but I imagine it will depend a great deal on what background processes are running on the system. On any OS you might or might not be running some system (i.e. non-user) processes that are heavy resource users. Or you might have put some effort into stripping the system down so that very little CPU time is being used by the system for background maintenance.
If your question is how things compare for "clean" installations of these operating systems then all I can tell you is that on my laptop running Ubuntu right now (running top from the command line to look at resource usage) only about 5-10% of CPU time is being used by non-user processes; in my case Xorg and compiz are the main ones. I don't really know how that compares to Windows, but I think most linux users have a knee jerk reaction that Windows is greedier for system resources than most linux distros.
So, I guess the short answer is that I doubt there is a short answer to your question.

I'm experiencing a very high CPU usage (~100%) using the Qt version of T32 on Linux, even when the program is waiting user interaction. The executable is t32marm-qt.
This does not happen when I use the standard Tcl-based t32marm executable.
A strace shows that the executable continuosly cycles on the
The Linux distribution is Mint 14 32-bit (derivation of Ubuntu 12.10).
Has anybody experienced this behavior ?
If so, is it a bug or just a wrong configuration ?
Yes, I have been just confirmed that it is a software bug, fixed in more recent versions of the tool. If you encounter such a problem, update your version.

I am running an Intel hyper threading system using Linux OS and I would like to find out if there is a way to know how many instructions (actual work) the core (or the virtual core if it can be done) did for a period of time.
Is there any register that can tell me how much instructions was made?
You can install oprofile (
When using it, you start sampling, then stop after a while.
Then you can get a report of various CPU counters, one of them is the number of instructions.

If I run a parallized application (using f.e. OpenMP) on a windows multicore within cygwin - do I have the full multicore performance the windows machine is offering or is there a significant speed reduction to expect due to the cygwin layer?
Any experiences?
I know this is an old question but in light of my recent findings about a Cygwin bug on multithreaded apps on multicore CPUs (see my bug report on the Cygwin mailing list), I just want to point out that multithreaded applications on Cygwin is a no go. In my case, a multithreaded application on a dual core runs 8x slower than if you force it to run on single core (via setting CPU affinity mask).
