Soft real time optimization for Applications under Windows/Linux possible?

Soft real time optimization for Applications under Windows/Linux possible? - linux

I'm trying to figure out how to optimize different OS and Hardware for latency.
In the past few weeks I've been tinkering with different versions of Windows and different Hardware and became pretty disappointed with what room for improvement these things have to offer in that regard.
I wanted to ask here if maybe I'm missing something or am wrong about something.
Let's say we have a system with all the standard components. All of these components send interrupt requests to the cpu when then want to complete an operation. Cpu spends cycles completing this request. All cycles have fixed length.
Basically, most of how the interrupt requests are handled is written within the drivers for your hardware and if it's badly optimized your only option is to try another version of the drivers.
The most you can do to optimize the system is disconnecting devices you don't need or enabling MSI-Mode through the registry if the hardware supports it.
Are there some advantages that you have when choosing a particular OS (w7/w8/w10/linux) that give you some benefits or more control over your system's latency.
Any input would be greatly appreciated.
Regards

Related

Selecting the right Linux I/O scheduler for a host equipped with NVMe SSD?

We are writing a highly concurrent software in C++ for a few hosts, all equipped with a single ST9500620NS as the system drive and an Intel P3700 NVMe Gen3 PCIe SSD card for data. Trying to understand the system more for tuning our software, I dug around the system (two E5-2620 v2 # 2.10GHz CPUs, 32GB RAM, running CentOS 7.0) and was surprised to spot the following:
[root#sc2u0n0 ~]# cat /sys/block/nvme0n1/queue/scheduler
none
This contradicts to everything that I learned about selecting the correct Linux I/O scheduler, such as from the official doc on kernel.org.
I understand that NVMe is a new kid on the block, so for now I won't touch the existing scheduler setting. But I really feel odd about the "none" put in by the installer. If anyone who has some hints as to where I can find more info or share your findings, I would be grateful. I have spent many hours googling without finding anything concrete so far.

The answer given by Sanne in the comments is correct:
"The reason is that NVMe bypasses the scheduler. You're not using the "noop" implementation: you're not using a scheduler."
noop is not the same as none, noop still performs block merging (unless you disable it with nomerges)
If you use an nvme device, or if you enable "scsi_mod.use_blk_mq=Y" at compile time or boot time, then you bypass the traditional request queue and its associated schedulers.
Schedulers for blk-mq might be developed in the future.

"none" (aka "noop") is the correct scheduler to use for this device.
I/O schedulers are primarily useful for slower storage devices with limited queueing (e.g, single mechanical hard drives) — the purpose of an I/O scheduler is to reorder I/O requests to get more important ones serviced earlier. For a device with a very large internal queue, and very fast service (like a PCIe SSD!), an I/O scheduler won't do you any good; you're better off just submitting all requests to the device immediately.

Linux/Windows: User mode vs Privilege mode time spent

How much percentage of time CPU spends in user mode vs privilege mode for different programs/operations.
Different Operations could be:
- running application without I/O interaction.
- application with I/O interaction like copying a file to USB
I know for a fact that Network operating system spends most of the time in interrupt context. Does this hold true for general purpose OS like Ubuntu/Windows?

I'm not much of an OS expert but I imagine it will depend a great deal on what background processes are running on the system. On any OS you might or might not be running some system (i.e. non-user) processes that are heavy resource users. Or you might have put some effort into stripping the system down so that very little CPU time is being used by the system for background maintenance.
If your question is how things compare for "clean" installations of these operating systems then all I can tell you is that on my laptop running Ubuntu right now (running top from the command line to look at resource usage) only about 5-10% of CPU time is being used by non-user processes; in my case Xorg and compiz are the main ones. I don't really know how that compares to Windows, but I think most linux users have a knee jerk reaction that Windows is greedier for system resources than most linux distros.
So, I guess the short answer is that I doubt there is a short answer to your question.

Hardware clock signals implementation in Linux Kernel

I am looking at some pointers for understanding how the Linux kernel implements the setting up of various hardware clocks. This basically relates to working with setting up the various clocks that hardware features like the LCD, UART etc will use. For example when Linux boots how does it handle setting up the clocks for UART or USB. Maybe something like a Clock manager or something.
I am basically trying to implement something similar for a different OS on a new hardware that i am working on. Any help would be really appreciated.
[Edit]
Thanks for the replies and the links. So here is what i have implemented up until now. This should give you an idea of where I'm headed.
I looked up the Hardware Reference Manual for the particular system I'm targeting and wrote some code to monitor/modify the signals/pins of the peripherals I am interested in i.e. turning them ON/OFF from the command line.Now a collection of these clocks/signals together control a peripheral.The HRM would say that if you want to turn on the UART or something then turn on such and such signals/pins. And #BjoernD yes I am using something like a mmap() function to talk to the peripherals.
The meat of my question is that I want to understand the design and implementation of a Clock/Peripheral Manager which uses the utility that I have already written. This Clock/Peripheral Manager would give me the control of enabling/disabling the peripherals I want.Basically this Manager would enable me to make changes in the init code that is right now running. Also during run time processes can call this Manager to turn ON/OFF the devices so that power consumption is optimized. It might not have made perfect sense but I'm myself trying to wrap my head around this.
Now I'm sure something like this would have been implemented in Linux or for that matter any OS for performance issues (nobody would want to waste power by turning on all peripherals at boot time). I want to understand the Software Architecture of it. Reference from any OS would do as of now to atleast get a headstart. Also I am not writing my own OS, there is an OS in place but Im looking more at a board level software aka BSP to work on. But thanks for the OS link anyways, they are really good. Appreciate it.
Thanks!

What you want to achieve is highly specific to a) the platform you are using and b) the device you want to use. For instance, on x86 there are 3 ways to communicate with a device:
Interrupts allow the device to signal the CPU. The OS usually provides mechanisms to register interrupt handlers - functions that are called upon occurrence of an interrupt. In Linux see request_irq() and friends in linux/include/interrupt.h
Memory-mapped I/O is physical memory of the device that the platform's BIOS makes available in the same way you also access plain physical memory - simply by writing to a memory address. What exactly is behind such memory (e.g., network interface config registers or an LCD frame buffer) depends on the device and is usually specified in the device's data sheet.
I/O ports are accessed through a special address space and special instructions (INB/OUTB & co.). Other than that they work similar to I/O memory.
There's a multitude of ways to find out what resources a device provies and where the BIOS mapped them. Some platforms use ACPI tables (google yourself for the 1,000k page spec), PCI provides info on devices in a standardized way through the PCI config space, USB has similar ways of discovering devices attached to the bus, and some devices, e.g., UARTS, are simply specified to be available at a pre-configured I/O range that is fixed for your platform.
As a start for understanding Linux, I'd recommend "Understanding the Linux kernel". For specifics on how Linux handles devices and what is there to write drivers, have a look at Linux Device Drivers. Furthermore, you will need to have a look at the peculiarities of your platform and the device you want to drive.
If you want to start an own OS, a UART is certainly something that will be veeery helpful to print debug output, so you might want to go for this first.
Now that I wrote down all this, it seems that your actual question is: How to get started with Operating System design. This question should be highly valuable for you: What are some resources for getting started in operating system development?

The two big power users in most computers are the CPU and the disks. Both of these have capabilities for power saving in Linux. The CPU clock can be slowed down when the system is not busy, and the disk motors can be stopped when no I/O is happening. For a UART, even if you save all of the power that it uses by turning off its clock, it is still tiny compared to the others because a UART doesn't have much logic in it.
Best ways to save power are
1) more efficient power supply
2) replace rotating disk with SSD
3) Slow down the CPU and memory bus

How to do power save on a ARM-based Embedded Linux system?

I plan to develop a nice little application that will run on an arm-based embedded Linux platform; however, since that platform will be battery-powered, I'm searching for relevant information on how to handle power save.
It is kind of important to get decent battery time.
I think the Linux kernel implemented some support for this, but I can't find any documentation on this subject.
Any input on how to design my program and the system is welcome.
Any input on how the Linux kernel tries to solves this type of problem is also welcome.
Other questions:
How much does the program in user space need to do?
And do you need to modify the kernel?
What kernel system calls or APIs are good to know about?
Update:
It seems like the folks involved with the "Free Electrons" site have produced some nice presentations on this subject.
http://free-electrons.com/services/power-management/
http://free-electrons.com/docs/power
http://free-electrons.com/docs/optimizations
But maybe someone else has even more information on this subject?
Update:
It seems like Adam Shiemke's idea to go look at the MeeGo project may be the best tip so far.
It may be the best battery powered Embedded Linux project out there at this moment.
And Nokia is usually kind of good at this type of thing.
Update:
One has to be careful about Android since it has a "modified" Linux kernel in the bottom, and some of the things the folks at Google have done do not use baseline/normal Linux kernels. I think that some of their power management ideas could be troublesome to reuse for other projects.

I haven't actually done this, but I have experience with the two apart (Linux and embedded power management). There are two main Linux distributions that come to mind when thinking about power management, Android and MeeGo. MeeGo uses (as far as I can tell) an unmodified 2.6 kernel with some extras hanging on. I wasn't able to find a lot on exactly what their power management strategy is, although I suspect more will be coming out about it in the near future as the product approaches maturity.
There is much more information available on Android, however. They run a fairly heavily modified 2.6 kernel. You can see a good bit on the different strategies implemented in http://elinux.org/Android_Power_Management (as well as kernel drama). Some other links:
https://groups.google.com/group/android-kernel/browse_thread/thread/ee356c298276ad00/472613d15af746ea?lnk=raot&pli=1
http://www.ok-labs.com/blog/entry/context-switching-in-context/
I'm sure that you can find more links of this nature. Since both projects are open source, you can grab the kernel code, and probably get further information from people who actually know what they are talking about in forms and groups.
At the driver level, you need to make sure that your drivers can properly handle suspend and shut devices off that are not in use. Most devices aimed at the mobile market offer very fine-grained support to turn individual components off, and to tweak clock settings (remember, power is proportional to clock^2).
Hope this helps.

You can do quite a bit of power-saving without requiring any special support from the OS, assuming you are writing (or at least have the source code for) your application and drivers.
Your drivers need to be able to disable their associated devices and bring them back up without requiring a restart or introducing system instability. If your devices are connected to a PCI/PCIe bus, research which power states they support (D0 - D3) and what your driver needs to do to transition between these low-power modes. If you are selecting hardware devices to use, look for devices that adhere to the PCI Power Management Specification or have similar functionality (such as a sleep mode and a "wake up" interrupt signal).
When your device boots up, every device that has the ability to detect whether it is connected to anything needs to do so. If any ports or buses detect that they are not being used, power them down or put them to sleep. A port running at full power but sitting unused can waste more power than you might think it would. Depending on your particular hardware and use case, it might also be useful to have a background app that monitors device usage, identifies unused/idle resources, and acts appropriately (like a "screen saver" for your hardware).
Your application software should make sure to detect whether hardware devices are powered up before attempting to use them. If you need to access a device that might be placed in a low-power mode, your application needs to be able to handle a potentially lengthy delay in waiting for the device to wake up and respond. Your applications should also be considerate of a device's need to sleep. If you need to send a series of commands to a hardware device, try to buffer them up and send them out all at once instead of spacing them out and requiring multiple wakeup->send->sleep cycles.
Don't be afraid to under-clock your system components slightly. Besides saving power, this can help them run cooler (which requires less power for cooling). I have seen some designs that use a CPU that is more powerful than necessary by a decent margin, which is then under-clocked by as much as 40% (bringing the performance down to the original level but at a fraction of the power cost). Also, don't be afraid to spend power to save power. That is, don't be afraid to use CPU time monitoring hardware devices for opportunities to disable/hibernate them (even if it will cause your CPU to use a bit more power). Most of the time, this tradeoff results in a net power savings.

One of the most important things to think of as a power aware application developer is to avoid unnecessary timers. If possible use interrupt driven solutions instead of polled solutions. If a timer must be used then use as long poll interval as is possible.
For example if something special should be done at a certain room temperature it is unnecessary to check the temperature every 100 ms since temperature in a room changes slowly. A more reasonable polling interval is could be 60 s.
This affects the power consumption in several ways. In Linux the CPUIDLE subsystem takes the CPU (SOC) to as deep power saving state as possible depending on when it predicts the next wakeup to occur. Having a lot of timers in a system will fragment the sleep making it impossible to go to the deeper sleep states for longer periods. A typical deep sleep state for CPUIDLE turns the CPU off but keeps the RAM in self refresh. When a timer triggers the CPU will boot and serve the timer of the application.

It's not actually your topic, but it might come in handy to log your progress: i was looking for testing / measuring my embedded linux system. chris desjardins from this forum recommended me this:
I have successfully used bootchart in the past:
http://elinux.org/Bootchart
Here is a list of other things that may also help:
http://elinux.org/Boot_Time

Minimum configuration to run embedded Linux on an ARM processor?

I need to produce an embedded ARM design that has requirements to do many things that embedded Linux would do. However the design is cost sensitive and does not need huge amounts of horse power. Mostly will be talking to serial interfaces. Ideally I would like to use one of the low end ARMs. What is the lowest configuration of an ARM that you have successfully used embedded Linux on.
Edit:
The application needs a file system on some kind of flash device and the ability to run applications for processing the data. Some of the applications might be written by others than myself. I also need to ability to load new applications or update old apps using the serial ports to accept the apps.
When I have looked at other embedded OSes they seem to be more of a real time threading solution than having the ability to run applications. I am open to what ever will get the job done.

I think you need to weigh your cost options here.
ARM + linux is an option but you will be paying a very high operating overhead for such a simple (from your description) set of features. You can't just look at the cost of the ARM chip but must also consider external RAM which will very likely be required as well as flash to get enough space available to run the kernel + apps.
NOTE: you may be able to avoid the external requirements with a very minimal kernel and simple apps combined with a uC with large internal resources.
A second option is a much simpler microcontroller with a light weight OS. This will cut your hardware costs on the CPU and you can likely run something like this without external RAM or flash (dependent on application RAM and program space requirement)
third option: I don't actually see anything in your requirements that demands any OS at all be used. Basic file systems are very simple, for instance there are even FAT drivers out there for 8 bit PIC's. Interfacing to an SD card only requires a SPI port and minimal external circuitry.
The application bit could be simple or complex. I've built systems around PIC18 microcontollers that run a web server and allow program updates via a simple upload screen, it just stores the new program into an EEPROM or flash, reboots into a bootloader and copies the new program into internal program memory. You could likely design a way to do this without the reboot via a cooperative multitasking type of architecture. Any way you go the programmers writing the apps are going to need to have knowledge of the architecture and access to libraries / driver you write. Your best bet to simplify this is to provide as simple an API as possible and to try to automate the build process for them.
The third option will be the "cheapest" in terms of hardware as there will be very little overhead in the processing of your applications allowing you to get away with minimal processing power and memory. It likely will require some more programming/software architecting on your part but won't require nearly the research you will need to undertake to get linux up and running in addition to learning to write the needed device drivers under a linux paradigm.
As always you have to include the software development costs in the build cost of the device. If you plan to build 10,000+ of these your likely better off keeping hardware costs down and putting more man power into designing a software solution that allows that hardware to meet the design goals. If your building 10 of them, your better off spending an extra $15-20 on hardware if it can cut down on your software development costs. For example an ARM with MMU with full linux kernel support and available device drivers.
I kind of feel that your selecting the worst of both worlds at the moment, your paying extra to get a uC you can run linux on but by doing so your also selecting a part that will likely be the most complex to get linux up and running on, especially having not worked with linux on embedded platforms before.

I've had success even on ARM7TDMI, so I don't think you're going to have any trouble. If you have a low-requirements system, you could use any kind of lightweight real-time executive and have a lot better experience than you would getting Linux to work.

I've used a TS-7200 for about five years to run a web server and mail server, using Debian GNU Linux. It is 200 MHz and has 32 MB of RAM, and is quite adequate for these tasks. It has serial port built in. It's based on a ARM920T.
This would be overkill for your job; I mention it so you have another data point.

For several years I've been using a gumstix to do prototyping and testing and I've had good results with it. I don't know if the processor they are using (Intel PXA255 on my board) is considered low-cost, but the entire Verdex line seems pretty cheap to me for an adaptable device.

ucLinux is designed specifically for resource constrained targets, but perhaps more importantly for targets without an MMU.
However you have to have a good reason to use Linux on such a system rather than a small real-time executive. Out-of-the-box networking, readily available drivers and protocol stacks for complex hardware and support for existing POSIX legacy or open source code are a few perhaps. However if you don't need that, Linux is still large, and you may be squandering resources for no real benefit. In most cases you will still need off-chip SDRAM and Flash if you choose Linux of any flavour.
I would not regard serial I/O as 'complex hardware', so unless you are running a complex, but standard protocol, your brief description does not appear to warrant the use of Linux IMO

My DLINK DIR-320 router runs Linux inside.
And I know some handymen, flashing it with Optware and connecting USB-hub, HDDs, USB-flash, and much more.
It's low-cost ready for use "platform". (If you don't need mass production). But maybe more powerful than you need.
Additionally, it can be configured wirelessly via web-interface even through your pda :)

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string