Linux utility for Disk health Monitoring - linux

We are looking into implementing an in-memory utility which can recover the system in case of disk/filesystem lockup. This utility has to detect the lockup and take corrective action like rebooting or just shutting down interface.
The server platform is Gentoo Linux 2.4
Any suggestions on - any existing utility or which scripting method will work best (expect, native C++)?

you'll want S.M.A.R.T. monitoring tools (smartmontools)
http://en.wikipedia.org/wiki/S.M.A.R.T.
Note that not all statistics correlate with impending drive failure, and sometimes (for some brands and models) you may need to pass in special flags or you will get garbage. See the wikipedia article for which attributes really indicate danger.
The command is smartctl. You may need to be sudo. smartctl --all will give a summary of all drives, spinning them up very briefly to check their health.

What type of errors are you looking for?
smartmontools and smartd which ship with most distros should be able to help you. They work at a low level with the disk.
SMART on Wikipedia
smartmontools

Related

SuperCollider without jack server

One of the requirements in my project is to reduce runtime footprint on an embedded system. It looks like jackd is required on Linux and seem like it's currently a hard dependency and it cannot use libasound directly instead, is it true? It'd be also great to hear from someone who use jackd on an embedded device and could summaries it's resource usage. Although, I'm planing to use BeagleBone with relatively enough memory, I'd rather spare it for a longer delay line instead of running jackd.
Jack is definitely the standard way of doing it for SuperCollider on Linux. There is an AUDIOAPI flag in the cmake build settings - you can set -DAUDIOAPI=portaudio when you make your own build. (There's no direct libasound implementation; supercollider is cross-platform.) However, be warned that the portaudio approach is rarely used and might not even work at the moment. If you need help getting a build working, ask the sc-devel mailing list.
On the other hand I know people have run jack+supercollider on small ARM devices such as beaglebones. You might find it a better use of your time to go with the flow and use jack.

How to do power save on a ARM-based Embedded Linux system?

I plan to develop a nice little application that will run on an arm-based embedded Linux platform; however, since that platform will be battery-powered, I'm searching for relevant information on how to handle power save.
It is kind of important to get decent battery time.
I think the Linux kernel implemented some support for this, but I can't find any documentation on this subject.
Any input on how to design my program and the system is welcome.
Any input on how the Linux kernel tries to solves this type of problem is also welcome.
Other questions:
How much does the program in user space need to do?
And do you need to modify the kernel?
What kernel system calls or APIs are good to know about?
Update:
It seems like the folks involved with the "Free Electrons" site have produced some nice presentations on this subject.
http://free-electrons.com/services/power-management/
http://free-electrons.com/docs/power
http://free-electrons.com/docs/optimizations
But maybe someone else has even more information on this subject?
Update:
It seems like Adam Shiemke's idea to go look at the MeeGo project may be the best tip so far.
It may be the best battery powered Embedded Linux project out there at this moment.
And Nokia is usually kind of good at this type of thing.
Update:
One has to be careful about Android since it has a "modified" Linux kernel in the bottom, and some of the things the folks at Google have done do not use baseline/normal Linux kernels. I think that some of their power management ideas could be troublesome to reuse for other projects.
I haven't actually done this, but I have experience with the two apart (Linux and embedded power management). There are two main Linux distributions that come to mind when thinking about power management, Android and MeeGo. MeeGo uses (as far as I can tell) an unmodified 2.6 kernel with some extras hanging on. I wasn't able to find a lot on exactly what their power management strategy is, although I suspect more will be coming out about it in the near future as the product approaches maturity.
There is much more information available on Android, however. They run a fairly heavily modified 2.6 kernel. You can see a good bit on the different strategies implemented in http://elinux.org/Android_Power_Management (as well as kernel drama). Some other links:
https://groups.google.com/group/android-kernel/browse_thread/thread/ee356c298276ad00/472613d15af746ea?lnk=raot&pli=1
http://www.ok-labs.com/blog/entry/context-switching-in-context/
I'm sure that you can find more links of this nature. Since both projects are open source, you can grab the kernel code, and probably get further information from people who actually know what they are talking about in forms and groups.
At the driver level, you need to make sure that your drivers can properly handle suspend and shut devices off that are not in use. Most devices aimed at the mobile market offer very fine-grained support to turn individual components off, and to tweak clock settings (remember, power is proportional to clock^2).
Hope this helps.
You can do quite a bit of power-saving without requiring any special support from the OS, assuming you are writing (or at least have the source code for) your application and drivers.
Your drivers need to be able to disable their associated devices and bring them back up without requiring a restart or introducing system instability. If your devices are connected to a PCI/PCIe bus, research which power states they support (D0 - D3) and what your driver needs to do to transition between these low-power modes. If you are selecting hardware devices to use, look for devices that adhere to the PCI Power Management Specification or have similar functionality (such as a sleep mode and a "wake up" interrupt signal).
When your device boots up, every device that has the ability to detect whether it is connected to anything needs to do so. If any ports or buses detect that they are not being used, power them down or put them to sleep. A port running at full power but sitting unused can waste more power than you might think it would. Depending on your particular hardware and use case, it might also be useful to have a background app that monitors device usage, identifies unused/idle resources, and acts appropriately (like a "screen saver" for your hardware).
Your application software should make sure to detect whether hardware devices are powered up before attempting to use them. If you need to access a device that might be placed in a low-power mode, your application needs to be able to handle a potentially lengthy delay in waiting for the device to wake up and respond. Your applications should also be considerate of a device's need to sleep. If you need to send a series of commands to a hardware device, try to buffer them up and send them out all at once instead of spacing them out and requiring multiple wakeup->send->sleep cycles.
Don't be afraid to under-clock your system components slightly. Besides saving power, this can help them run cooler (which requires less power for cooling). I have seen some designs that use a CPU that is more powerful than necessary by a decent margin, which is then under-clocked by as much as 40% (bringing the performance down to the original level but at a fraction of the power cost). Also, don't be afraid to spend power to save power. That is, don't be afraid to use CPU time monitoring hardware devices for opportunities to disable/hibernate them (even if it will cause your CPU to use a bit more power). Most of the time, this tradeoff results in a net power savings.
One of the most important things to think of as a power aware application developer is to avoid unnecessary timers. If possible use interrupt driven solutions instead of polled solutions. If a timer must be used then use as long poll interval as is possible.
For example if something special should be done at a certain room temperature it is unnecessary to check the temperature every 100 ms since temperature in a room changes slowly. A more reasonable polling interval is could be 60 s.
This affects the power consumption in several ways. In Linux the CPUIDLE subsystem takes the CPU (SOC) to as deep power saving state as possible depending on when it predicts the next wakeup to occur. Having a lot of timers in a system will fragment the sleep making it impossible to go to the deeper sleep states for longer periods. A typical deep sleep state for CPUIDLE turns the CPU off but keeps the RAM in self refresh. When a timer triggers the CPU will boot and serve the timer of the application.
It's not actually your topic, but it might come in handy to log your progress: i was looking for testing / measuring my embedded linux system. chris desjardins from this forum recommended me this:
I have successfully used bootchart in the past:
http://elinux.org/Bootchart
Here is a list of other things that may also help:
http://elinux.org/Boot_Time

Minimum configuration to run embedded Linux on an ARM processor?

I need to produce an embedded ARM design that has requirements to do many things that embedded Linux would do. However the design is cost sensitive and does not need huge amounts of horse power. Mostly will be talking to serial interfaces. Ideally I would like to use one of the low end ARMs. What is the lowest configuration of an ARM that you have successfully used embedded Linux on.
Edit:
The application needs a file system on some kind of flash device and the ability to run applications for processing the data. Some of the applications might be written by others than myself. I also need to ability to load new applications or update old apps using the serial ports to accept the apps.
When I have looked at other embedded OSes they seem to be more of a real time threading solution than having the ability to run applications. I am open to what ever will get the job done.
I think you need to weigh your cost options here.
ARM + linux is an option but you will be paying a very high operating overhead for such a simple (from your description) set of features. You can't just look at the cost of the ARM chip but must also consider external RAM which will very likely be required as well as flash to get enough space available to run the kernel + apps.
NOTE: you may be able to avoid the external requirements with a very minimal kernel and simple apps combined with a uC with large internal resources.
A second option is a much simpler microcontroller with a light weight OS. This will cut your hardware costs on the CPU and you can likely run something like this without external RAM or flash (dependent on application RAM and program space requirement)
third option: I don't actually see anything in your requirements that demands any OS at all be used. Basic file systems are very simple, for instance there are even FAT drivers out there for 8 bit PIC's. Interfacing to an SD card only requires a SPI port and minimal external circuitry.
The application bit could be simple or complex. I've built systems around PIC18 microcontollers that run a web server and allow program updates via a simple upload screen, it just stores the new program into an EEPROM or flash, reboots into a bootloader and copies the new program into internal program memory. You could likely design a way to do this without the reboot via a cooperative multitasking type of architecture. Any way you go the programmers writing the apps are going to need to have knowledge of the architecture and access to libraries / driver you write. Your best bet to simplify this is to provide as simple an API as possible and to try to automate the build process for them.
The third option will be the "cheapest" in terms of hardware as there will be very little overhead in the processing of your applications allowing you to get away with minimal processing power and memory. It likely will require some more programming/software architecting on your part but won't require nearly the research you will need to undertake to get linux up and running in addition to learning to write the needed device drivers under a linux paradigm.
As always you have to include the software development costs in the build cost of the device. If you plan to build 10,000+ of these your likely better off keeping hardware costs down and putting more man power into designing a software solution that allows that hardware to meet the design goals. If your building 10 of them, your better off spending an extra $15-20 on hardware if it can cut down on your software development costs. For example an ARM with MMU with full linux kernel support and available device drivers.
I kind of feel that your selecting the worst of both worlds at the moment, your paying extra to get a uC you can run linux on but by doing so your also selecting a part that will likely be the most complex to get linux up and running on, especially having not worked with linux on embedded platforms before.
I've had success even on ARM7TDMI, so I don't think you're going to have any trouble. If you have a low-requirements system, you could use any kind of lightweight real-time executive and have a lot better experience than you would getting Linux to work.
I've used a TS-7200 for about five years to run a web server and mail server, using Debian GNU Linux. It is 200 MHz and has 32 MB of RAM, and is quite adequate for these tasks. It has serial port built in. It's based on a ARM920T.
This would be overkill for your job; I mention it so you have another data point.
For several years I've been using a gumstix to do prototyping and testing and I've had good results with it. I don't know if the processor they are using (Intel PXA255 on my board) is considered low-cost, but the entire Verdex line seems pretty cheap to me for an adaptable device.
ucLinux is designed specifically for resource constrained targets, but perhaps more importantly for targets without an MMU.
However you have to have a good reason to use Linux on such a system rather than a small real-time executive. Out-of-the-box networking, readily available drivers and protocol stacks for complex hardware and support for existing POSIX legacy or open source code are a few perhaps. However if you don't need that, Linux is still large, and you may be squandering resources for no real benefit. In most cases you will still need off-chip SDRAM and Flash if you choose Linux of any flavour.
I would not regard serial I/O as 'complex hardware', so unless you are running a complex, but standard protocol, your brief description does not appear to warrant the use of Linux IMO
My DLINK DIR-320 router runs Linux inside.
And I know some handymen, flashing it with Optware and connecting USB-hub, HDDs, USB-flash, and much more.
It's low-cost ready for use "platform". (If you don't need mass production). But maybe more powerful than you need.
Additionally, it can be configured wirelessly via web-interface even through your pda :)

Current Linux Kernel debugging techniques

A linux machine freezes few hours after booting and running software (including custom drivers). I'm looking a method to debug such problem. Recently, there has been significant progress in Linux Kernel debugging techniques, hasn't it?
I kindly ask to share some experience on the topic.
If you can reproduce the problem inside a VM, there is indeed a fairly new (AFAIK) technique which might be useful: debugging the virtual machine from the host machine it runs on.
See for example this:
Debugging Linux Kernel in VMWare with Windows host
VMware Workstation 7 also enables a powerful technique that lets you record system execution deterministically and then replay it as desired, even backwards. So as soon as the system crashes you can go backwards and see what was happening then (and even try changing something and see if it still crashes). IIRC I read somewhere you can't do this and debug the kernel using VMware/gdb at the same time.
Obviously, you need a VMM for this. I don't know what VMM's other than VMware's VMM family support this, and I don't know if any free VMware versions support this. Likely not; one can't really expect a commercial company to give away everything for free. The trial version is 30 days.
If your custom drivers are for hardware inside the machine, then I suppose this probably won't work.
SystemTap seems to be to Linux what Dtrace is to Solaris .. however I find it rather hostile to use. Still, you may want to give it a try. NB: compile the kernel with debug info and spend some time with the kernel instrumentation hooks.
This is why so many are still using printk() after empirically narrowing a bug down to a specific module.
I'm not recommending it, just pointing out that it exists. I may not be smart enough to appreciate some underlying beauty .. I just write drivers for odd devices.
There are many and varied techniques depending on the sort of problems you want to debug. In your case the first question is "is the system really frozen?". You can enable the magic sysrq key and examine the system state at freeze and go from there.
Probably the most directly powerful method is to enable the kernel debugger and connect to it via a serial cable.
One option is to use Kprobes. A quick search on google will show you all the information you need. It isn't particularly hard to use. Kprobes was created by IBM I believe as a solution for kernel debugging. It is essentially a elaborate form of printk() however it allows you to handle any "breakpoints" you insert using handlers. It may be what you are looking for. All you need to do is write and 'insmod' a module into the kernel which will handle any "breakpoints" hit that you specify in the module.
Hope that can be a useful option...
How I debug this kind of bug, was to run my OS inside the VirtualBox, and compile the kernel with kgdb builtin. Then I setup a serial console on the VirtualBox so that I can gdb to the kernel inside the VirtualBox's OS via the serial console. Anytime the OS hang, just like magic sysrq key, I can enter ctrl-c on the gdb to stop and understand the kernel at that point in time.
Normally kernel stack tracing is just too difficult to pinpoint the culprit process, so the best way I think is still generic "top" command, just looking at the application logs to see what are the cause of hanging - this will need a reboot to see the log of course.

Windows Performance Counter Port to Linux, HP-UX and AIX

We implemented a server application available on Windows only. Now we like to port it to Linux, HP-UX and AIX, too. This application provides internal statistics through performance counters into the Windows Performance Monitor.
To be more precise: The application is a data base, and we like to provide information like number of connected users or number of requests executed to the administrator. So these are "new" information, proprietary to our application. But we like to make them available in the same environment where the operating system delivers information like the CPU, etc. The goal is to make them easily readable for the administrator.
What is the appropriate and commonly used performance monitor under Linux, HP-UX and AIX?
I would say: that depends on which performance you want to monitor. Used CPU time? Free RAM? Disk IO? Number of beers in your freezer...
But regardless of this you can look at any files below /proc. I'm not sure for HP, but at least Linux and AIX should have that tree (if it's not deactivated at kernel compile time).
Management is where most OSes depart from one another. For this reason there are not many tools that are common between all the OSes.
Additionally, Unix tools follow the single process single responsibility idiom where one tool gets cpu info, another gets memory etc.
The only tool i have seen in the Unix world that gets all this info in one place is top. Almost all sys admins are familiar with this tool and works on all the flavors of OSes you are interested in. It also has the additional advantage of being open source. You could simply extend this tool to expose the counters you are interested in and ship it along with your application.
Another way to do this might be to expose your counters through SNMP and leave it to some third party SNMP tool like HP open view that can collect and present a consistent view along with other management info. This might be a more enterprisy solution, which might appeal to the marketing folks.
I would also say its a good idea to write a standalone console tool that admins can use from their custom home grown scripts (there are many firsm out there with super human admins / over paid it staff that does this).
All together would be a healthy solution for your requirement i think.
The most standard unix tools for such data are the *stat (iostat, vmstat, netstat) tools and sar. On Linux you'll find all this information in /proc, but most Unixes don't have /proc nicely filled with what you are looking for. The mentioned tools are quite standardized and can be used to gather the data you need.

Resources