Thermal aware scheduler in linux - linux

Currently i'm working on making a temperature aware version of linux for my university project. Right now I have to create a temperature aware scheduler which could take into account processor temperature and perform some scheduling. Is there any generalized way to get the temperature of the processor cores or can I integrate the coretemp driver with the linux kernel in any way ( I didn't find a way to do so on the internet ).

lm-sensors simply uses some device files exported by the kernel for CPU temperature, you can just read whatever these device files have as backing variables in the kernel to get the temperature information. In terms of a scheduler I would not write one from scratch and would start with the kernels CFS implementation and in your case modify the load balancer check to include temperature (currently it uses a metric that is the calculated cost of moving a task from one core to another in terms of cache issues, etc... I'm not sure if you want to keep this or not).

Temperature control is very difficult. The difficulty is with thermal capacity and conductance. It is quite easy to read a temperature. How you control it will depend on the system model. A Kalman filter or some higher order filter will be helpful. You don't know,
Sources of heat.
Distance from sensors.
Number of sensors.
Control elements, like a fan.
If you only measure at the CPU itself, the hard drive could have over heated 10 minutes ago, but the heat is only arriving at the CPU now. Throttling the CPU at this instance is not going to help. Only by getting a good thermal model of the system can you control the heat. Yet, you say you don't really know anything about the system? I don't see how a scheduler by itself can do this.
I have worked on mobile freezer application where operators would load pallets of ice cream, etc from a freezer to a truck. Very small distances between sensors and control elements can create havoc with a control system. Also, you want your ambient temperature to be read instantly if possible. There is a lot of lag in temperature control. A small distance could delay a reading by 5-15 minutes (ie, it take 5-15 minutes for heat to transfer 1cm).
I don't see the utility of what you are proposing. If you want this for a PC, then video cards, hard drives, power supplies, sound cards, etc. can create as much heat as the CPU. You can not generically model a PC; maybe you could with an Apple product. I don't think you will have a lot of success, but you will learn a lot from trying!

Related

How can I determine if a raspberry pi is powerful enough to run my code?

Perhaps I posted this with the wrong tags but hopefully someone can help me. I am an engineer finding myself deeper and deeper in automation. Recently I designed an automated system on a raspberry pi. I wrote a pretty simple code which was duplicated to read sensor values from different serial ports simultaneously. I did it this way so I could shut down one script without compromising the others if need be. It runs very well now but I had problems overloading my cpu when I first started (I believe it was because I opened all of the code at once rather than one at a time).
My question is:
How can I determine how much computing power is required by code I have written? How can I spec out a computer to run my code before I start building the robot?
The three resources you're likely to be bounded by on any computer are disk, RAM, and CPU (cores). MicroSD cards are cheap, and easily swapped, so the bigger concern is the latter two.
Depending on the language you're writing in, you'll have more or less control over memory usage. Python in particular "saves" the developer by "handling" memory automatically. There are a few good articles on memory management in Python, like this one. When running a simple script (e.g. activate these IO pins) on a machine with gigabytes of memory, this is rarely an issue. When running data intensive applications (e.g. do linear algebra on this gigantic array) then you have to worry about how much memory you need to do the computation and whether the interpreter actually frees it when you're done. This is not always easy to calculate but if you profile your software on another machine you may be able to estimate it.
CPU utilization is comparatively easy to prepare for. Reserve 1 core for OS and other functions and the rest are available to your software. If you write single threaded code this should be plenty. If you have parallel processing, then either stick to N-1 workers or you'll need to get creative with the software design.
Edit: all of this is with the Raspberry Pi in mind. The Pi is a full computer is a tiny form factor: OS, BIOS, boot time, etc. Many embedded problems can be solved with an Arduino or some other controller which has a different set of considerations.

Something Similar to RAPL for non Sandy Bridge/xeon processors

First post ever here.
I wanted to know if there was something similar to the Running Average Power Limit for other processors(Intel i7) that aren't Sandy Bridge or Xeon Processors as the machine im working on in the lab.
For those who do not know. I pulled this description to bring you up to speed.
"RAPL(Running Average Power Limit) interface provides platform software
with the ability to monitor, control, and get notifications on SOC
power consumptions."
What I am looking for in particular is to acquire energy consumption measurements on a processor's individual cores after running some code like Matrix Multiplication or Vector Addition. Temperature would be excellent too but that's another question for another day(lm-sensors is a bit puzzling to me)
Thanks and Take Care.
Late answer on this: There's PowerTOP on Linux, but that works for Laptops only as it needs the battery discharge rate for that. It can display Watts per process, but don't ask me how accurate that is (personally I think there might be some problems with that). IIRC it counts the number of CPU wakeups from a CPU sleep state to calculate the energy consumption per process. Also, for AMD processors there's the fam15h_power driver in the lm-sensors software package. For rather new (2011 and newer) Bulldozer AMD CPUs you can get the energy consumption that way.
Note that RAPL does not provide energy consumption per core on a multicore CPU, but only for the whole CPU. You can get the energy consumption of core and non-core (like integrated graphics) separately, but per-core is not possible.

How to get a millisecond precision uptime from user-space in Linux?

I'm working on a Raspberry Pi based project that has a GPS module which my boss wants me to get the time from for the system clock. However we also need to take readings on different sensors whilst the GPS may not have a fix, and we need to know to the millisecond precision (tolerance of 50-100ms is fine) when these readings were taken.
Personally I want a hardware RTC for this, but I've been instructed to work around it. My idea is to mark each reading with a relative time from system boot, the system time is not reliable, and is updated by NTP/Satellite time when available (I can then fix-up the records when a synchronized time is available using the relative time).
So, how can I get a millisecond precise uptime in Linux from user-space C code? Something like the jiffies value available in the kernel would be perfect.
I think you have to check the main controller(CPU) on your board. Usually, there will be a hardware timer module integrated into the CPU, or decrementer register implementation in the CPU core.
If there is a hardware timer or DEC register on your CPU, then use it to implement a periodical interrupt(the frequency can be 1000HZ or else). The interrupt handler can notify/wakeup the user-space process to do the necessary real-time work.

How to measure power consumed by my algorithm?

I have an image processing algorithm running on an ARM-Cortex-A8/Ubuntu 9.01 platform and I have to measure the power consumed by my algorithm, does anyone know how to do this? Any tools available for this?
Strictly speaking your algorithm doesn't consume power.
Presumably you have some hardware which can accurately measure the power usage of the device, so you should just be able to repeatedly run your code (on an otherwise idle device) on various test data sets and measure the cumulative power usage, and compare that with the idle power consumption of the device over the same time; the difference would be the amount of additional juice the device used running your code.
Like any kind of benchmark, you'll need to run it repeatedly in a loop to get accurate data.
As the data may change its performance characteristics, you'll need a corpus of different test data to simulate different use-cases. Talk to your QA team about it.
I think u can try POWERTOP and POWERSTAT and measure once while system is idle and once while running your program and difference might give you the necessary information.
http://www.hecticgeek.com/2012/02/powerstat-power-calculator-ubuntu-linux/
Thanks--
S Teja

Highly concurrent multi-threaded application requires hardware

I am looking for a hardware, which must run about 256 computationally intensive real-time concurrent tasks in 24 hour mode (one multi-threaded C application). Each task takes about 40-50 MFLOPs, so all tasks require about 10 GFLOPs. CPU-RAM speed is insignificant. All tasks must be managed by a Linux Kernel (32 bit, with SMP).
I am looking for a one-mainboard solution with one multi-core CPU (if such CPU exist). If such CPU doesn't exist, then I need one mulit-socket mainboard solution (with multiple CPUs).
Can you please recommend me any professional CPU/Mainboard solution which will satisfy such requirements? It is also very important that there are no issues with Linux Kernel (2.6.25). No virtualization, no needs in huge RAM or CPU cache. I also would prefer Intel architecture and well-proved stability. I still have doubts that it is feasible at all.
Thank you in advance.
UPDATE:
I think I have found a right answer here and here.
UltraSPARC T2 has 8 cores with 8 threads each. Integrated high-bandwidth memory and IO. The T5140 carries two of them for 128 hardware threads.
The theoretical max raw performance of the 8 floating point units is 11 Giga flops per second (GFlops/s). A huge advantage over other implementations however is that 64 threads can share the units and thus we can achieve an extremely high percentage of theoretical peak. Our experiments have achieved nearly 90% of the 11 Gflop/s. - (http://blogs.oracle.com/deniss/entry/floating_point_performance_on_the)
Rent some Amazon EC2 nodes.
Updated: How about PS3's then? The NASA uses them for their simulation engines.
Maybe use CPU+GPU's in commercial servers?
Build it around FPGAs: nowadays, some variants include processors that can run Linux.
Even though you've given us the specs you think you need, we might be able to help you out better if you tell us what the application is intended to accomplish, and how it was implemented.
There may be a better way to split the work up or deal with it rather than your current solution.
Not Intel architecture but these run linux and have 64 cores on a single die.
TILEPro64
Get a bunch of four- or eight-core machines and split the processing across the machines using some sort of grid or clustering software. Maybe have a look at Beowulf.
As you mentioned, 10GFlops isn't exactly to be sneezed at so in a single machine, it'll be expensive. There's also the problem what you do when the machine breaks, you're unlikely to have a second machine of similar spec available. If you build a cluster using commodity hardware, you're a little more resilient and it's easier to find replacement machines.
MFLOPS and GFLOPS are very poor indicators of how well a program can run on any given CPU. These days, cache footprint is much more important; perhaps branch prediction accuracy as well.
There's almost no way to gauge performance of a given application on different architectures without actually giving it a spin. And even then, you may not get a good idea if you were unlucky enough to unknowingly build with compiler options that ruined your cache footprint, or used a bad threading library, or any of a hundred other things.
I see you'd prefer intel, but if you need one chip, I will again suggest the cell processor -
its theoretical peak performance is arount 25GFlops - kernel 2.6.25 had support for it already.
You could try a pre-slim playstation 3 for experimenting with (that would cost you little) or get yourself a server-based solution at around US$8K - you will have to re-write and fine tune your threads to take advabtage of the SPU co-processors there, but you could achieve your computational needs without breaking a sweat with a single CELL (1 PPC core + 8 SPU's)
NB.: with a playstation 3, you'd have only 6 available co-processors - but you don't seen to be on a budget with this project -
So you could at least try IBM's cell developer kit, which offers an emulator, to see if you can code your solution to run on it.
Thre are commercially available CELL products, both as stand-alone servers in blade form factory, and PCI Express add-on boards for PC workstations from
Mercury Computer Systems:
http://www.mc.com/microsites/cell/products.aspx?id=6986
Mercury does not list any prices on the site, but the pricing seens to be around the previoulsy mentioned U$8000.00 for these PCI Express cards.
A playstation 3 videogame can be purchased for about U$300.00 - and would allow you to prototype your application, and check if it is up to the needed performance. (I myself got one and have Fedora 9 running on it, although I did that as a hobbyst and have not, so far, used it for any calculations - I had also put together a Playstation-3 12 machinne cluster for Molecular simulations at the local University. The application they run did not take advantage of the multimedia SPU's, while I was in touch with then. But even so, clocked at 3.5GHz they performed better than standard ,s imlarly priced, PC's, even considering PS3's are priced 5x higher around here)

Resources