What kind of api does a sata hard drive expose? - linux

I understand that the linux kernel uses a driver to communicate with the hard disk device and that there is firmware code on the device to service the driver's requests. My questions are:
what kind of functionality (i.e. api) does the firmware expose? For example, does it only expose an address space that the kernel manages, or is there some code in the linux kernel that deals with some of the physics related to the hard drive (i.e. data layout on track/sector/platter etc...)
Does the kernel schedule the disk's head movement, or is it the firmware?
Is there a standard spec for the apis exposed by hard disk devices?

I understand that the linux kernel uses a driver to communicate with the hard disk device
That's true for all peripherals.
there is firmware code on the device to service the driver's requests
Modern HDDs (since the advent of IDE) have an integrated disk controller.
"Firmware" by itself isn't going to do anything, and is an ambiguous description. I.E. what is executing this "firmware"?
what kind of functionality (i.e. api) does the firmware expose? For example, does it only expose an address space that the kernel manages, or is there some code in the linux kernel that deals with some of the physics related to the hard drive (i.e. data layout on track/sector/platter etc...)
SATA drives use the ATA Packet Interface, ATAPI.
The old SMD and ST506 drive interfaces used cylinder, head, and sector (aka CHS) addressing. Disk controllers for such drives typically kept a similar interface on the host side, so the operating system was obligated to be aware of the drive (physical) geometry. OSes would try to optimize performance by aligning partitions to cylinders, and minimize seek/access time by ordering requests by cylinder address.
Although the disk controller typically required CHS addressing, the higher layers of an OS would use a sequential logical sector address. Conversion between a logical sector address to cylinder, head, & sector address is straightforward so long as the drive geometry is known.
The SCSI and IDE (ATA) interfaces for the host side of the disk controller offered logical block addressing (block = sector) rather than CHS addressing. The OS no longer had to be aware of the physical geometry of the drive, and the disk controller was able to use the abstraction of logical addressing to implement a more consistent areal density per sector using zone-bit recording.
So the OS should only issue a read or write block operation with a logical block address, and not be too concerned with the drive's geometry.
For example, low-level format is no longer possible through the ATA interface, and the drive's geometry is variable (and unknown to the host) due to zone-bit recording. Bad sector management is typically under sole control of the integrated controller.
However you can probably still find some remnants of CHS optimization in various OSes (e.g. drive partitions aligned to a "cylinder").
Does the kernel schedule the disk's head movement, or is it the firmware?
It's possible with a seek operation, but more likely the OS uses R/W operations with auto-seek or LBA R/W operations.
However with LBA and modern HDDs that have sizeable cache and zone-bit recording, such seek operations are not needed and can be counterproductive.
Ultimately the disk controller performs the actual seek.
Is there a standard spec for the apis exposed by hard disk devices?
ATA/ATAPI is a published specification (although it seems to be in a "working draft" state for 20 years).
See http://www.t13.org/Documents/UploadedDocuments/docs2013/d2161r5-ATAATAPI_Command_Set_-_3.pdf
ABSTRACT
This standard specifies the AT Attachment command set used to communicate between host systems and
storage devices. This provides a common command set for systems manufacturers, system integrators, software
suppliers, and suppliers of storage devices. The AT Attachment command set includes the PACKET feature set
implemented by devices commonly known as ATAPI devices. This standard maintains a high degree of
compatibility with the ATA/ATAPI Command Set - 2 (ACS-2).

Related

What is the difference between DMA-Engine and DMA-Controller?

As mentioned above, what is the difference between a dma engine and a dma-controller (on focus on linux)?
When does the linux dma engine come into place? Is this a special device or always part of all periphery devices, which support dma?
When browsing the linux source, I found the driver ste_dma40.c. How does any driver uses this engine?
DMA - Direct memory access. The operation of your driver reading or writing from/to your HW memory without the CPU being involved in it (freeing it to do other stuff).
DMA Controller - reading and writing can't be done by magic. if the CPU doesn't do it, we need another HW to do it. Many years ago (at the time of ISA/EISA) it was common to use a shared HW on the motherboard that did this operation. In recent years , each HW has its own DMA HW mechanism.
But in all cases this specific HW gets the source address and the destination address and passes the data. Usually triggering an interrupt when done.
DMA Engine - Now here I am not sure what you mean. I believe you probably refer to the SW side that handles the DMA.
DMA is a little more complicated than usual I\O since all memory SRC and DST has to be physically present at all times during the DMA operation. If the DST address is swapped to disk, the HW will write to a bad address and the system will crash.
This and other aspects of DMA are handled by the driver with code sections you probably refer to as the "DMA Engine"
*Another interpretation of what 'DMA Engine' is, may be a code part of Firmware (or HW) that handles the DMA HW controller on the HW side.
According to this document, http://www.asprom.com/application/intel_3.pdf:
The 82C37 DMA controllers should not
be confused with the DMA engines found
in some earlier MCH (Memory Controller
Hub) components. These DMA controllers
are tied to the ISA/LPC bus and used
mostly for transfers to/from slow
devices such as floppy disk controllers.
So it seems it is a device found on previous platfroms that used MCHs devices.

Application processor memory map

What information is contained in the memory map of application processor? Is it tells which subsystem can access which area of RAM or it means if CPU tries to access an address based on memory map it can be RAM address or a device address? I am referring this documentation
http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0515b/CIHIJJJA.html.
Here 0x00_0000_0000 to 0x00_0800_0000 is mapped to the boot region, what does that imply?
The style of memory map diagram you've linked to shows how the processor and peripherals will decode physical memory addresses. This is a normal diagram for any System-on-Chip device, though the precise layout will vary. The linked page actually lists which units of the SoC use this memory map for their address decoding, and it includes the ARM and the Mali graphics processor. In a Linux system, much of this information will be passed to the kernel in the device tree. It's important to remember that this tells us nothing about how the operating system chooses to organise the virtual memory addresses.
Interesting regions of this are:
DRAM - these addresses will be passed to the DRAM controller. There is no guarantee that the specific board being used has DRAM at all of that address space. The boot firmware will set up the DRAM controller and pass those details to the operating system.
PCIe - these addresses will be mapped to the PCIe controller, and ultimately to transfers on the PCIe links.
The boot region on this chip by default contains an on-chip boot rom and working space. On this particular chip there's added complexity caused by ARMs TrustZone security architecture, which means that application code loaded after boot may not have access to this region. On the development board it should be possible to override this mapping and boot from external devices.
The memory map contains an layout of the memory of your device.
It tells your OS, where the OS can place data and how it is accessed, as some areas may be only accessible in a privileged state.
Your boot image will be placed in the boot area.This defines among other things your entry point.

What is partition checker in ARM Secure Mode

As per this link
http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0333h/Chdfjdgi.html
under
System boot sequence
...
Program the partition checker to allocate physical memory available to the Non-secure OS.
What is the partition checker? Is it a subsystem which has registers, what is its programming model ?
What is the partition checker?
It is outside of the TrustZone specification for the CPU. However, in a nut shell it partitions or divided memory spaces into different permitted accesses. If the access is not permitted, it throws an external BUS error.
Is it a subsystem which has registers, what is its programming model?
Typically, it is a bunch of registers. It maybe multiple register files. For instance, an APB (peripheral bus), AHB (older ARM bus) and a new AXI (TrustZone aware bus) may all be present in one system. There may even be multiple APB buses, etc.
From the same page,
The principle of TrustZone memory management is to partition the physical memory into Secure and Non-secure regions.
It should be added that partitioning the masters as secure and non-secure is also important. The partitioning is outside the ARM CPU TrustZone specification; it is part of the BUS architecture. It is up to a bus controller/structure to implement this. The bus controller has both masters (CPUs, DMA peripherals, etc) and slaves (memory devices, register interfaces, etc) connected.
Partitioning in the context of the ARM TrustZone document is a little nebulous as it is up to each SOC and the bus controllers (and hierarchy) to implement the details. As above, it partitions or divided memory spaces into different permitted accesses. This is just like supervisor versus user access with traditional ARM (AMABA) AHB buses. The AXI interface adds an NS bit.
Here are possible combinations for a bus controller to support.
| Read | Write
-------------+--------+-------
Normal User | yes/no | yes/no
Normal Super | yes/no | yes/no
Secure User | yes/no | yes/no
Secure Super | yes/no | yes/no
The SCR NS bit will dynamically determine whether the 'NS' bit is set on bus accesses. This is a TrustZone difference. For the super and user, there is a traditional HPROT bit. As well, each master will assert a WRITE/~READ signal (maybe the polarity is different, but we are software not hardware).
A DMA master (Ethernet, USB, etc) may also send out requests to a BUS. Typically, these are setup and locked at boot time. If your secure world uses the Ethernet, then it is probably a secure DMA master to access secure memory. The Ethernet chip also typically has a slave register interface. It must be marked (or partitioned) as secure. If the normal world accesses the ethernet register file, then an BUS error is thrown. A vendor may also make DMA peripherals that dynamically set the NS bit depending on the command structure. The CAAM is a crypto driver that can setup job descriptions to handle both normal and secure access, as an example of a DMA master which does both.
A CPU (say Cortex-M4 or Cortex-R) may also be globally secure or normal. Only the Cortex-A series (and ARMv6) with full TrustZone will dynamically toggle the NS bit allowing the CPU to be both secure and normal, depending on context.
Slave peripherals maybe partitioned. For example, the first 10MB of SDRAM maybe both normal and secure read and write for inter-world communication. Then next 54MB, maybe normal only read/write for the normal world. Then a final 64MB of read/write secure for the secure world. Typically, register interfaces for peripherals are an all or none setup.
These are all outside of the scope of an MMU and deal only with physical addresses. If the SOC locks them after boot, it is impossible for anyone to change the mapping. If the secure world code is read-only, it maybe more difficult to engineer an exploit.
Typically, all APB buses are layered on an AHB bus, which connects to an AXI main bus like a tree. The AXI bus is the default for a Cortex-A. Each BUS will have a list of slaves and masters and will support various yes and no configurations, which maybe a subset of the list above; Ie, it may not care about read/write or super/user or some other permutations. It will be different for each ARM system. In some cases, a vendor may not even support it. In this case, it maybe more difficult to make the system secure or even use TrustZone. See: Handling ARM TrustZones‌​, where some of the bus issues are touched on in less details.
See: TrustZone versus Hypervisor which gives some more details.

How can the linux kernel be forced to enumerate the PCI-e bus?

Linux kernel 2.6
I've got an fpga that is loaded over GPIO connected to a development board running linux.
The fpga will transmit and receive data over the pci-express bus. However, this is enumerated
at boot and as such, no link is discovered (because the fpga is not loaded at boot).
How can I force re-enumeration of the pci-e bus in linux?
Is there a simple command or will I have to make kernel changes?
I need the capability to hotplug pcie devices.
As root, try the following command:
echo "1" > /sys/bus/pci/rescan
See this link for more information: http://www.kernel.org/doc/Documentation/ABI/testing/sysfs-bus-pci
I wonder what platform you are on: A work around (aka hack) for this that works on x86 systems is to have the BIOS basically statically configure a PCI device at whatever bus, device, function the FPGA normally lands on, then the OS will enumerate the device and reserve the PCI space for it (even though the device isn't really there). Then in your device driver you will have to do some extra things like setup the BARs and int lines manually after the fpga has been programmed. Of course this requires modifying the BIOS, which if you are working with a BIOS vendor you can contract them to make this change for you, if you are not working with a BIOS vendor then it will be much harder... Also keep in mind that I was working on VxWorks on x86, and we had a AMI make a custom BIOS for our boards...
If you don't have a BIOS, then consider programming it in the bootloader, there you already have the ability to read from disk, and adding GPIO capabilities probably isn't too difficult (assuming you are using jtag and GPIOs?), in fact depending on what bootloader you use it might already be able to do GPIO?
The issues with modifying the kernel to do this is that you have to find the sweet spot where you can read the bitfile, before the PCI enumeration... If for example the disk device drivers are initialized after PCI, then obviously you must do some radical changes to the kernel just to read the bitfile prior to PCI enumeration, which might cause other annoying problems...
One other option which you may have already discovered, and which is really only ok for development time: Power up the system, program the fpga board, then do a reset (without power cycle, for example: sudo reboot now), the FPGA should keep its configuration, and linux should enumerate it...
After turning on your computer, the BIOS enumerates the PCI bus and attempts to fulfill all IO space and memory mapped IO (MMIO) requests. It sets up these BAR's initially, and when the operating system loads these BAR's can be changed by the OS as it sees fit while the PCI bus driver enumerates the bus yet again. It is even possible for the superuser of the system to run the command setpci to change these BAR's after the BIOS has already attempted to configure them and the OS has loaded (may cause drivers to fail and several other bad things if done improperly).
I have had to do this in cases where the card in question was not assigned any resources by the BIOS since the region requested required a 64-bit address and the BIOS only operated with 32-bit address assignments. I was able to go in after-the-fact and change these addresses (originally assigned by the BIOS) to whatever addresses I saw fit, insert the kernel module, and my driver would map and use these newly-assigned addresses for the card without knowing the difference.
The problem that exists with hotplugging PCI-Express cards is that the power to the slot, itself, cannot be turned on/off without specific hotplug controllers that need to exist on the motherboard/backplane. Not having these hotplug controllers to turn the slot's power off may lead to shorts between the tiny pins when the card is physically inserted and/or removed if power is still present. Hotplug events, however, can be initiated by either end (the host or the endpoint device). This does not seem to be the case, however if your FPGA already has a link established with the root complex, a possible solution to your problem would be to generate hotplug interrupts to cause a bus rescan in the OS.
There is a major problem, though -- if your card does not actually obtain a link to the root complex, it won't be able to generate any hotplug events; which seems to be the case. After booting, the FPGA should toggle the PRESENT line on the PCIe bus to tell the OS there is a card ready to be enumerated. Once detected, the OS should attempt to establish a link to the card and assign memory regions to the device. After the OS enumerates the card you'll be able to load drivers against it and see it in lspci. You stated you're using kernel 2.6, which does have support for hotplugging and dynamic resource allocation so this method should work as long as your FPGA supports the ability to toggle the PRESENT PCIe line, too.

Hardware clock signals implementation in Linux Kernel

I am looking at some pointers for understanding how the Linux kernel implements the setting up of various hardware clocks. This basically relates to working with setting up the various clocks that hardware features like the LCD, UART etc will use. For example when Linux boots how does it handle setting up the clocks for UART or USB. Maybe something like a Clock manager or something.
I am basically trying to implement something similar for a different OS on a new hardware that i am working on. Any help would be really appreciated.
[Edit]
Thanks for the replies and the links. So here is what i have implemented up until now. This should give you an idea of where I'm headed.
I looked up the Hardware Reference Manual for the particular system I'm targeting and wrote some code to monitor/modify the signals/pins of the peripherals I am interested in i.e. turning them ON/OFF from the command line.Now a collection of these clocks/signals together control a peripheral.The HRM would say that if you want to turn on the UART or something then turn on such and such signals/pins. And #BjoernD yes I am using something like a mmap() function to talk to the peripherals.
The meat of my question is that I want to understand the design and implementation of a Clock/Peripheral Manager which uses the utility that I have already written. This Clock/Peripheral Manager would give me the control of enabling/disabling the peripherals I want.Basically this Manager would enable me to make changes in the init code that is right now running. Also during run time processes can call this Manager to turn ON/OFF the devices so that power consumption is optimized. It might not have made perfect sense but I'm myself trying to wrap my head around this.
Now I'm sure something like this would have been implemented in Linux or for that matter any OS for performance issues (nobody would want to waste power by turning on all peripherals at boot time). I want to understand the Software Architecture of it. Reference from any OS would do as of now to atleast get a headstart. Also I am not writing my own OS, there is an OS in place but Im looking more at a board level software aka BSP to work on. But thanks for the OS link anyways, they are really good. Appreciate it.
Thanks!
What you want to achieve is highly specific to a) the platform you are using and b) the device you want to use. For instance, on x86 there are 3 ways to communicate with a device:
Interrupts allow the device to signal the CPU. The OS usually provides mechanisms to register interrupt handlers - functions that are called upon occurrence of an interrupt. In Linux see request_irq() and friends in linux/include/interrupt.h
Memory-mapped I/O is physical memory of the device that the platform's BIOS makes available in the same way you also access plain physical memory - simply by writing to a memory address. What exactly is behind such memory (e.g., network interface config registers or an LCD frame buffer) depends on the device and is usually specified in the device's data sheet.
I/O ports are accessed through a special address space and special instructions (INB/OUTB & co.). Other than that they work similar to I/O memory.
There's a multitude of ways to find out what resources a device provies and where the BIOS mapped them. Some platforms use ACPI tables (google yourself for the 1,000k page spec), PCI provides info on devices in a standardized way through the PCI config space, USB has similar ways of discovering devices attached to the bus, and some devices, e.g., UARTS, are simply specified to be available at a pre-configured I/O range that is fixed for your platform.
As a start for understanding Linux, I'd recommend "Understanding the Linux kernel". For specifics on how Linux handles devices and what is there to write drivers, have a look at Linux Device Drivers. Furthermore, you will need to have a look at the peculiarities of your platform and the device you want to drive.
If you want to start an own OS, a UART is certainly something that will be veeery helpful to print debug output, so you might want to go for this first.
Now that I wrote down all this, it seems that your actual question is: How to get started with Operating System design. This question should be highly valuable for you: What are some resources for getting started in operating system development?
The two big power users in most computers are the CPU and the disks. Both of these have capabilities for power saving in Linux. The CPU clock can be slowed down when the system is not busy, and the disk motors can be stopped when no I/O is happening. For a UART, even if you save all of the power that it uses by turning off its clock, it is still tiny compared to the others because a UART doesn't have much logic in it.
Best ways to save power are
1) more efficient power supply
2) replace rotating disk with SSD
3) Slow down the CPU and memory bus

Resources