find out how many instructions made in the core - linux

I am running an Intel hyper threading system using Linux OS and I would like to find out if there is a way to know how many instructions (actual work) the core (or the virtual core if it can be done) did for a period of time.
Is there any register that can tell me how much instructions was made?

You can install oprofile (http://oprofile.sourceforge.net/).
When using it, you start sampling, then stop after a while.
Then you can get a report of various CPU counters, one of them is the number of instructions.

Related

Linux and RTOS using SoC (ARM, Xilinx)

I am facing a design "issue". I have a board with Xilinx Zynq Soc including dual-core ARM9 and I need to develop an application to support real-time property control application (time deadlines to response time) and also application to do heavy processing (image etc.) and some basic communications between them, but most importantly I will need to be able to control the Linux part (at least e.g. to somehow suspend it, "pause it" in best case to have possibility to shut it down and then run it again). So I was wondering how to combine it.
One of the option, could be RTLinux, which at least to description, what I found offers possibility to run realtime kernel and linux kernel next to it as a thread but it seems that it is now proprieatary by WindRiver..
Then I stepped up over MicroBlaze, where it could be possible to "create" soft processor on Programmable logic, but I am not sure if I can run RTOS on ARM and Linux there?
There are two things that seem to be known as rtlinux. The one you mention, a Wind River revival of the MERT system is a product of that company. Another one, seemingly “RT Linux”, is a real time patch to the mainline kernel which provides deterministic scheduling and fine grained kernel pre-emption.
I think it is the latter one that you want. 10s of google indicates that there is a kconfig target for this SoC, so all the pieces you need should be there.
Do remember there is more to a real time system than just the ability to be real time; the subsystems also have to be well behaved.
Given your description, you have (at least) the following design options:
Dual kernel approach: this means patching the Linux kernel with a (quite invasive) patch that runs a tiny real-time kernel alongside the standard kernel. This approach allows reaching good real-time performance (even in the order of us) at the cost of complexity. It was implemented by the RTLinux project (acquired and then discontinued by Windriver), then by RTAI (mostly focusing on x86) and Xenomai.
If you go along this path, you can see if Xenomai supports your specific SoC; then patch, configure and rebuild the kernel; and finally write the real-time code following Xenomai's API.
Improving the responsiveness of the Linux standard kernel: this is what the PREEMPT_RT project aims at. The real-time performance is lower with respect to the previous approach, but you don't have to write real-time specific code. With this approach, you can patch and build the kernel, then see if the real-time performance is sufficient for your needs.
Synthesizing a Microblaze soft-core on the FPGA, then run Linux on the ARM cores and the real-time code ((either bare-metal or with an RTOS) on the Microblaze.
Unfortunately, your specific SoC does not support ARM's virtualization extensions. Otherwise there would be the additional option of Multi-OS approach: running the Linux OS on one ARM core and the real-time code (either bare-metal or with an RTOS like ERIKA Enterprise) on the other ARM core, through a hypervisor like Jailhouse or Xen.

Best way to simulate old, slow processor on modern hardware?

I really like the idea of running, optimizing my software on old hardware, because you can viscerally feel when things are slower (or faster!). The most obvious way to do this is to buy an old system and literally use it for development, but that would allow down my IDE, and compiler and all other development tasks, which is less helpful, and (possibly) unnecessary.
I want to be able to:
Run my application at various levels of performance, on demand
At the same time, run my IDE, debugger, compiler at full speed
On a single system
Nice to have:
Simulate real, specific old systems, with some accuracy
Similarly throttle memory speed, and size
Optionally run my build system slowly
Try use QEMU in full emulation mode, but keep in mind it's use more cpu resources.
https://stuff.mit.edu/afs/sipb/project/phone-project/OldFiles/share/doc/qemu/qemu-doc.html
QEMU has two operating modes:
Full system emulation. In this mode, QEMU emulates a full system (for example a PC), including one or several processors and various peripherals. It can be used to launch different Operating Systems without rebooting the PC or to debug system code.
User mode emulation (Linux host only). In this mode, QEMU can launch Linux processes compiled for one CPU on another CPU.
Possible architectures can see there:
https://wiki.qemu.org/Documentation/Platforms

virtual clock speed throttling on linux

Throttling at will the execution and display speed of a particular process, for example, a game, a flash game, or an OpenGL game. I want to be able to slow it down to 20% or 0.5%. This is simply not possible on host space in linux.
But linux supports two kernel-level virtualisation environments: KVM and lxc.
Question: Is it possible to provide a fake system clock to a virtual lxc or KVM machine so that a flash game running in the guest will not run faster than what is set to run?
Some choices:
Qemu brake patch (will require work to apply no doubt.)
Bochs has ips=NNNN to define CPU "Instructions Per Second".
cpulimit a tool for limiting the CPU usage of a process (does not require virtualization.)
Update: You want this: https://superuser.com/questions/454534/how-can-i-slow-down-the-framerate-of-a-flash-game
I found a prototype version of the CheatEngine speed hack that works for linux.
http://forum.cheatengine.org/viewtopic.php?t=533437&sid=1a83d81ee08f8479eb8b190939b2e1aa
http://code.google.com/p/xeat-engine/source/checkout
http://pastebin.com/ZLryd20D
Basically it replaces gettimeofday with a hacked version using LD_PRELOAD magic. It works perfectly!
thanks lilezek! wherever you are!

cilk++ on linux system

I had some problems with a cilk++ program that works well on windows system but not on linux system:
on windows system, while increasing the number of threads the execution time decrease
but on linux system, while increasing the number of threads the execution time increase.
I used linux ubuntu 2.6.35-22-generic x86_64 GNU/Linux
I can't understand the source of the problem.So can someone help me please ?
Without sources, there's no way to know. There may be a resource that has a per-thread implementation on Windows and a shared implementation on Linux.
I'd recommend using a performance analyzer like Intel's VTune/Amplifier to figure out where your application is spending it's time.
- Barry Tannenbaum
Intel Cilk Plus Runtime Development

How well does Valgrind handle threads and machine-level synchronization instructions?

I have a highly parallel Windows program that uses lots of threads, hand-coded machine synchronization instructions, and home-rolled parallel-safe storage allocators. Alas, the
storage management has a hole (not a synchonization hole in the allocators,
I'm pretty sure) and I'd like to find it.
Valgrind has been suggested as a good tool for finding storage management errors.
Any experience here with Valgrind used under these circumstances?
Valgrind does not run on Windows, but it does work with Windows programs running under Wine on Linux. If your program will run under Wine, it has a decent chance of working with valgrind. See winehq.org for details.
The latest version is pretty good at handling all the 32-bit x86 instructions. It can handle programs that create many threads, just don't expect them to run simultaneously under valgrind. It will run only one thread at a time, as if it was run on a single core machine.

Resources