How to measure power consumed by my algorithm? - linux

I have an image processing algorithm running on an ARM-Cortex-A8/Ubuntu 9.01 platform and I have to measure the power consumed by my algorithm, does anyone know how to do this? Any tools available for this?

Strictly speaking your algorithm doesn't consume power.
Presumably you have some hardware which can accurately measure the power usage of the device, so you should just be able to repeatedly run your code (on an otherwise idle device) on various test data sets and measure the cumulative power usage, and compare that with the idle power consumption of the device over the same time; the difference would be the amount of additional juice the device used running your code.
Like any kind of benchmark, you'll need to run it repeatedly in a loop to get accurate data.
As the data may change its performance characteristics, you'll need a corpus of different test data to simulate different use-cases. Talk to your QA team about it.

I think u can try POWERTOP and POWERSTAT and measure once while system is idle and once while running your program and difference might give you the necessary information.
http://www.hecticgeek.com/2012/02/powerstat-power-calculator-ubuntu-linux/
Thanks--
S Teja

Related

How can I determine if a raspberry pi is powerful enough to run my code?

Perhaps I posted this with the wrong tags but hopefully someone can help me. I am an engineer finding myself deeper and deeper in automation. Recently I designed an automated system on a raspberry pi. I wrote a pretty simple code which was duplicated to read sensor values from different serial ports simultaneously. I did it this way so I could shut down one script without compromising the others if need be. It runs very well now but I had problems overloading my cpu when I first started (I believe it was because I opened all of the code at once rather than one at a time).
My question is:
How can I determine how much computing power is required by code I have written? How can I spec out a computer to run my code before I start building the robot?
The three resources you're likely to be bounded by on any computer are disk, RAM, and CPU (cores). MicroSD cards are cheap, and easily swapped, so the bigger concern is the latter two.
Depending on the language you're writing in, you'll have more or less control over memory usage. Python in particular "saves" the developer by "handling" memory automatically. There are a few good articles on memory management in Python, like this one. When running a simple script (e.g. activate these IO pins) on a machine with gigabytes of memory, this is rarely an issue. When running data intensive applications (e.g. do linear algebra on this gigantic array) then you have to worry about how much memory you need to do the computation and whether the interpreter actually frees it when you're done. This is not always easy to calculate but if you profile your software on another machine you may be able to estimate it.
CPU utilization is comparatively easy to prepare for. Reserve 1 core for OS and other functions and the rest are available to your software. If you write single threaded code this should be plenty. If you have parallel processing, then either stick to N-1 workers or you'll need to get creative with the software design.
Edit: all of this is with the Raspberry Pi in mind. The Pi is a full computer is a tiny form factor: OS, BIOS, boot time, etc. Many embedded problems can be solved with an Arduino or some other controller which has a different set of considerations.

Linux: CPU benchmark requiring longer time and different CPU utilization levels

For my research I need a CPU benchmark to do some experiments on my Ubuntu laptop (Ubuntu 15.10, Memory 7.7 GiB, Intel Core i7-4500U CPU # 1.80HGz x 4, 64bit). In an ideal world, I would like to have a benchmark satisfying the following:
The CPU should be an official benchmark rather than created by my own for transparency purposes.
The time needed to execute the benchmark on my laptop should be at least 5 minutes (the more the better).
The benchmark should result in different levels of CPU throughout execution. For example, I don't want a benchmark which permanently keeps the CPU utilization level at around 100% - so I want a benchmark which will make the CPU utilization vary over time.
Especially points 2 and 3 are really key for my research. However, I couldn't find any suitable benchmarks so far. Benchmarks I found so far include: sysbench, CPU Fibonacci, CPU Blowfish, CPU Cryptofish, CPU N-Queens. However, all of them just need a couple of seconds to complete and the utilization level on my laptop is at 100% constantly.
Question: Does anyone know about a suitable benchmark for me? I am also happy to hear any other comments/questions you have. Thank you!
To choose a benchmark, you need to know exactly what you're trying to measure. Your question doesn't include that, so there's not much anyone can tell you without taking a wild guess.
If you're trying to measure how well Turbo clock speed works to make a power-limited CPU like your laptop run faster for bursty workloads (e.g. to compare Haswell against Skylake's new and improved power management), you could just run something trivial that's 1 second on, 2 seconds off, and count how many loop iterations it manages.
The duty cycle and cycle length should be benchmark parameters, so you can make plots. e.g. with very fast on/off cycles, Skylake's faster-reacting Turbo will ramp up faster and drop down to min power faster (leaving more headroom in the bank for the next burst).
The speaker in that talk (the lead architect for power management on Intel CPUs) says that Javascript benchmarks are actually bursty enough for Skylake's power management to give a measurable speedup, unlike most other benchmarks which just peg the CPU at 100% the whole time. So maybe have a look at Javascript benchmarks, if you want to use well-known off-the-shelf benchmarks.
If rolling your own, put a loop-carried dependency chain in the loop, preferably with something that's not too variable in latency across microarchitectures. A long chain of integer adds would work, and Fibonacci is a good way to stop the compiler from optimizing it away. Either pick a fixed iteration count that works well for current CPU speeds, or check the clock every 10M iterations.
Or set a timer that will fire after some time, and have it set a flag that you check inside the loop. (e.g. from a signal handler). Specifically, alarm(2) may be a good choice. Record how many iterations you did in this burst of work.

Profiling resource usage - CPU, memory, hard-drive - of a long-running process on Linux?

We have a process that takes about 20 hours to run on our Linux box. We would like to make it faster, and as a first step need to identify bottlenecks. What is our best option to do so?
I am thinking of sampling the process's CPU, RAM, and disk usage every N seconds. So unless you have other suggestions, my specific questions would be:
How much should N be?
Which tool can provide accurate readings of these stats, with minimal interference or disruption from the fact that the tool itself is running?
Any other tips, nuggets of wisdom, or references to other helpful documents would be appreciated, since this seems to be one of these tasks where you can make a lot of time-consuming mistakes and false-starts as a newbie.
First of all, what you want and what you are asking is completely different.
Monitoring is required when you are running it for first time i.e. when you don't know its resource utilization (CPU, Memory,Disk etc.).
You can follow below procedure to drill down the bottleneck,
Monitor system resources (Generally 10-20 seconds interval should be fine with Munin, ganglia or other tool).
In this you should be able to identify if your hw is bottleneck or not i.e are you running out of resources Ex. 100% cpu util, very low memory, high io etc.
If this your case then probably think about upgrading hw or tuning the existing.
Then you tune your application/utility. Use profilers/loggers to find out which method, process is taking time. Try to tune that process. If you have single threaded codes then probably use parallelism. If DB etc. are involved try to tune your queries, DB params.
Then again run test with monitoring to drill down more :)
I think a graph representation should be helpful for solving your problem and i advice you Munin.
It's a resource monitoring tool with a web interface. By default it monitors disk IO, memory, cpu, load average, network usage... It's light and easy to install. It's also easy to develop your own plugins and set alert thresholds.
http://munin-monitoring.org/
Here is an example of what you can get from Munin : http://demo.munin-monitoring.org/munin-monitoring.org/demo.munin-monitoring.org/

Classifying a program as compute intensive based on performance counters

I'm trying to classify few parallel programs as compute / memory/ data intensive. Can I classify them from values obtained from performance counters like perf. This command gives couple of values like number of page faults that I think can be used to know if a program needs to access memory frequently, else otherwise.
Is this approach correct and possible way. If not can someone guide me in classifying programs into respective categories.
Cheers,
Kris
Yes you should in theroy be able to do that with perf. I don't think page faults events are the one to observe if you want to analyse memory activity. For this purpose, on Intel processors you should use uncore events that allow you to count memory traffic (read/write separately). On my Westmere-EP these counters are UNC_QMC_NORMAL_READS.ANY and UNC_QMC_WRITES_FULL.ANY
The following article deals exactly with your problem (on Intel processors):
http://spiral.ece.cmu.edu:8080/pub-spiral/pubfile/ispass-2013_177.pdf

Thermal aware scheduler in linux

Currently i'm working on making a temperature aware version of linux for my university project. Right now I have to create a temperature aware scheduler which could take into account processor temperature and perform some scheduling. Is there any generalized way to get the temperature of the processor cores or can I integrate the coretemp driver with the linux kernel in any way ( I didn't find a way to do so on the internet ).
lm-sensors simply uses some device files exported by the kernel for CPU temperature, you can just read whatever these device files have as backing variables in the kernel to get the temperature information. In terms of a scheduler I would not write one from scratch and would start with the kernels CFS implementation and in your case modify the load balancer check to include temperature (currently it uses a metric that is the calculated cost of moving a task from one core to another in terms of cache issues, etc... I'm not sure if you want to keep this or not).
Temperature control is very difficult. The difficulty is with thermal capacity and conductance. It is quite easy to read a temperature. How you control it will depend on the system model. A Kalman filter or some higher order filter will be helpful. You don't know,
Sources of heat.
Distance from sensors.
Number of sensors.
Control elements, like a fan.
If you only measure at the CPU itself, the hard drive could have over heated 10 minutes ago, but the heat is only arriving at the CPU now. Throttling the CPU at this instance is not going to help. Only by getting a good thermal model of the system can you control the heat. Yet, you say you don't really know anything about the system? I don't see how a scheduler by itself can do this.
I have worked on mobile freezer application where operators would load pallets of ice cream, etc from a freezer to a truck. Very small distances between sensors and control elements can create havoc with a control system. Also, you want your ambient temperature to be read instantly if possible. There is a lot of lag in temperature control. A small distance could delay a reading by 5-15 minutes (ie, it take 5-15 minutes for heat to transfer 1cm).
I don't see the utility of what you are proposing. If you want this for a PC, then video cards, hard drives, power supplies, sound cards, etc. can create as much heat as the CPU. You can not generically model a PC; maybe you could with an Apple product. I don't think you will have a lot of success, but you will learn a lot from trying!

Resources