Does disk IO correspond directly to its physical sector location? - io

I've been playing around with disk IO on flash drives, HDDs, and SSDs by opening /dev/sd* paths in Linux the way I would any other file.
I understand that the memory controller on the disk can hide true block order (via a mapping) from the OS.
This boils down to these questions:
Are the blocks in /dev/sd* in the order perceived by the OS, or in the order as perceived by the disk's memory controller?
Is the order of blocks in /dev/sd* subjective between POSIX OSes?
Can these properties change if done on an NT or Cygwin system?
Is this property different among Flash, HDD, and SSD?
Can a write occur to a specific index in an opened /dev/sd* path, or is this determined by the memory controller?
Thanks in advance!

If you use the device nodes for entire disks (/dev/sda, /dev/sdb, and so on), then the file offsets for the block device correspond to logical block addresses and will be portable across systems (assuming that the disk sector size is supported). This is independent of the storage technology.
However, the names of the device nodes are different from system to system.
If you use sub-devices (partitions), this is not necessarily the case because interpretation of and support for partition tables varies considerably.

Related

where is disk scheduling implemented

I'm recently learning the part of disk schedulingof operating system. And I could understand the various algorithms for this issue, like FCFS, LIFO, SSTF, SCAN and so on. But I was wondering where these algorithms are implemented?
I don't think operating system is the answer because OS can't know the details of the I/O devices. So are they implemented on the devices themselves? Could anyone clarify it for me? Any related literature or links will be appreciated.
The simple answer is that these days, this all takes place in the drive controller.
In ye olde days, operating systems usually implemented disk I/O in two layers. At the the top was a drive independent logical layer. This viewed the drive as an array of blocks. Below this was a physical layer that viewed disks as platters, tracks, and sectors. Because the physical details varied among drives, the physical layer was usually implemented in a disk-(or class of disks) specific device driver.
In these dark times, you often had to wait for your drive vendor to create a new device driver before you could upgrade your operating system.
In the mid-1980's it started to become common for disk drives to provide a logical I/O interface. The device driver no longer saw disks/platters/sectors. Instead, it just saw an array of logical blocks. The drive took care of physical locations and redirecting of bad blocks (tasks that the operating system used to handle). This allowed single device driver to manage multiple types of devices, sharing the same interface and differing only in the number of logical blocks.
These days, you'd be hard pressed to find a disk drive that does not provide a logical interface.
All the scheduling algorithms that involve physical locations have to take place within the disk drive.
Unless you are doing disk drive engineering, such scheduling algorithms are quite meaningless. If you learning hard drive engineering, expect that occupation to disappear soon.
In practice, disk scheduling (in the sense of e.g. reordering the pending disk reads to minimize rotational delay) is less important today than it was in the XXth century.
hard disks are probably less used, in favor of SSDs, and they are even more slow w.r.t. fast RAM access time.
the disk sector as seen by the kernel have been reorganized by the disk controller itself, so the CHS addressing (as seen by the OS kernel) does not correspond to geometrical reality
the hard disk drive is smarter today and its internal controller has significant memory and computing capabilities. The SATA protocol has some "higher level" requests (e.g. TRIM). Read about SMART and hybrid drives.
However, application code can give hints to the operating system about access patterns. Look for example into posix_fadvise(2).
Read also Operating Systems: Three Easy Pieces

In non-DMA scenario, does a storage device/disk content go to CPU registers first and then to main memory during a disk read?

I am learning computer organization but struggling with the following concept. In non-DMA scenarios, do all disk reads follow the following sequence to get into main memory:
Disk storage surface -> Disk registers -> CPU registers -> Main memory
Similarly for writes, is the sequence:
Main memory -> CPU registers -> Disk registers -> Disk storage surface
(I know that in a DMA scenario, the CPU only initiates the transfer after which the content of the disks are transferred directly to main memory).
If yes, before DMA came, was the above sequence a serious bottleneck as overall CPU registers' capacity is much less compared to main memory and storage disk? Or it is so fast that a human user won't notice in non-DMA modes?
PS: Please bear with my rudimentary terminology, but I hope I conveyed what I want to ask.
Yes, what you describe is what happened in the bad old days with programmed-I/O instead of DMA.
For example, IDE disk-controller hardware used to be less well standardized, so the Linux drivers defaulted to programmed I/O (i.e. a copy loop using x86 IN instructions, since ATA predated memory-mapped I/O registers being common). For decent performance, you had to manually enable DMA in your boot scripts.
But before doing that, check by manually enabling DMA it didn't lead to lockups, or far worse cause data corruption.
re: memory-mapped file: nothing to do with how the data gets from disk into the pagecache (or vice versa). mmap() just means your process's address space includes a shared mapping of the same pages that the OS is using to cache the file's contents.

Load a Disc (HD) trail into memory (Minix)

Minix is a micro-kernel OS, programmed in C, based in the unix archtecture, used sometimes in embbeded systems, and I have a task to alter the way it works in some ways.
In Minix there is a cache for disk blocks (used to make access to disk fast). I need to alter that cache, so it will keep disc trails instead of disk blocks.
A trail is a circular area of the HD, composed of sectors.
So I'm a bit lost here, how can I load a disk trail into memory ? (Answers related to Linux systems might help)
Should I alter the disk driver or use functions and methods of an existing one ?
How to calculate where in the HD a disk block is located ?
Thanks for your attention.
The typical term for what you're describing is a disk cylinder, not a "trail".
What you're trying to do isn't precisely possible; modern hard drives do not expose their physical organization to the operating system. While cylinder/head/sector addressing is still supported for compatibility, the numbers used have no relationship to the actual location of data on the drive.
Instead, consider defining fixed "chunks" of the disk which will always be loaded into cache together. (For instance, perhaps you could group every 128 sectors together, creating a 64 KB "chunk". So a read for sector 400 would cause the cache to pull in sectors 384-511, for example.) Figuring out how to make the Minix disk cache do this will be your project. :)

Using a hard disk without filesystem for big data

I'm working on a web crawler and have to handle big data (about 160 TB raw data in trillions of data files).
The data should be stored sequencial as one big bz2 file on the magnetic hard disk. A SSD is used to hold the meta data. THe most important operation on the hard disk is a squential read over all of the 4 TB off the disk, which should happen with full maximum speed of 150 MB/s.
I want to not waste the overhead of a file system an instead use the "/dev/file" devices directly. Does this access use the os block buffer? Are the access operations queued or synchronous in a FIFO style?
Is it better to use /dev/file or write your own user level file system?
Has anyone experience with it.
If you don't use any file system but read your disk device (e.g. /dev/sdb) directly, you are losing all the benefit of file system cache. I am not at all sure it is worthwhile.
Remember that you could use syscalls like readahead(2) or posix_fadvise(2) or madvise(2) to give hints to the kernel to improve performance.
Also, you might when making your file system use a larger than usual block size. And don't forget to use big blocks (e.g. of 64 to 256 Kbytes) when read(2)-ing data. You could also use mmap(2) to get the data from disk.
I would recommend against "coding your own file system". Existing file systems are quite tuned (and some are used on petabytes of storage). You may want to chose big blocks when making them (e.g. -b with mke2fs(8)...)
BTW, choosing between filesystem and raw disk data is mostly a configuration issue (you specify a /dev/sdb path if you want raw disk, and /home/somebigfile if you want a file). You could code a webcrawler to be able to do both, then benchmark both approaches. Very likely, performance could depend upon actual system and hardware.
As a case in point, relational database engines used often raw disk partitions in the previous century (e.g. 1990s) but seems to often use big files today.
Remember that the real bottleneck is the hardware (i.e. disk): CPU time used by filesystems is often insignificant and cannot even be measured.
PS. I have not much real recent experience with these issues.

What is the difference between DMA and memory-mapped IO?

What is the difference between DMA and memory-mapped IO? They both look similar to me.
Memory-mapped I/O allows the CPU to control hardware by reading and writing specific memory addresses. Usually, this would be used for low-bandwidth operations such as changing control bits.
DMA allows hardware to directly read and write memory without involving the CPU. Usually, this would be used for high-bandwidth operations such as disk I/O or camera video input.
Here is a paper has a thorough comparison between MMIO and DMA.
Design Guidelines for High Performance RDMA Systems
Since others have already answered the question, I'll just add a little bit of history.
Back in the old days, on x86 (PC) hardware, there was only I/O space and memory space. These were two different address spaces, accessed with different bus protocol and different CPU instructions, but able to talk over the same plug-in card slot.
Most devices used I/O space for both the control interface and the bulk data-transfer interface. The simple way to access data was to execute lots of CPU instructions to transfer data one word at a time from an I/O address to a memory address (sometimes known as "bit-banging.")
In order to move data from devices to host memory autonomously, there was no support in the ISA bus protocol for devices to initiate transfers. A compromise solution was invented: the DMA controller. This was a piece of hardware that sat up by the CPU and initiated transfers to move data from a device's I/O address to memory, or vice versa. Because the I/O address is the same, the DMA controller is doing the exact same operations as a CPU would, but a little more efficiently and allowing some freedom to keep running in the background (though possibly not for long as it can't talk to memory).
Fast-forward to the days of PCI, and the bus protocols got a lot smarter: any device can initiate a transfer. So it's possible for, say, a RAID controller card to move any data it likes to or from the host at any time it likes. This is called "bus master" mode, but for no particular reason people continue to refer to this mode as "DMA" even though the old DMA controller is long gone. Unlike old DMA transfers, there is frequently no corresponding I/O address at all, and the bus master mode is frequently the only interface present on the device, with no CPU "bit-banging" mode at all.
Memory-mapped IO means that the device registers are mapped into the machine's memory space - when those memory regions are read or written by the CPU, it's reading from or writing to the device, rather than real memory. To transfer data from the device to an actual memory buffer, the CPU has to read the data from the memory-mapped device registers and write it to the buffer (and the converse for transferring data to the device).
With a DMA transfer, the device is able to directly transfer data to or from a real memory buffer itself. The CPU tells the device the location of the buffer, and then can perform other work while the device is directly accessing memory.
Direct Memory Access (DMA) is a technique to transfer the data from I/O to memory and from memory to I/O without the intervention of the CPU. For this purpose, a special chip, named DMA controller, is used to control all activities and synchronization of data. As result, compare to other data transfer techniques, DMA is much faster.
On the other hand, Virtual memory acts as a cache between main memory and secondary memory. Data is fetched in advance from the secondary memory (hard disk) into the main memory so that data is already available in the main memory when needed. It allows us to run more applications on the system than we have enough physical memory to support.

Resources