When does a device get a 512B request from the filesystem? - linux

Am new to linux and have been doing a bit of reading. But am a little confused about the following. Can the device receive a request for a single 512B sector ? Under what conditions does this happen? From what I understand , while the sector size defines the smallest unit a device can be addressed by , the FS usually has a block size of 4K(smallest unit of access for the fs) . So this means most(all) commands are addressed by the FS on a 4k granularity.
Can a file system generate traffic for <4K(1-7 512bytes) from application traffic?
Is there some file system meta data that can cause this kind of traffic?
If we align the partition to a 4k boundary, will the device always get commands aligned on 4k boundaries?

This can happen for a variety of reasons (assuming your disks expose a logical sector size of 512 bytes) because you send a direct request for 512 bytes correctly aligned outside of the filesystem:
Some cases when this can happen during general usage:
Reading an old style MBR partition table (which fits in 512 bytes at the start of the disk)
Rewriting the bootloader or because you told it to happen
Trying to read the smallest sized sector from a broken disk with 512 byte sectors

Related

Storing and writing to text file in STM32 F4 internal flash memory

I need to store a text file into the STM32 F446RE internal flash memory.  This text file will contain log data that needs to be written to and updated consistently.  I know there are a couple ways of writing to it including embedding the text as constant string/data into the source code or implementing a file system like fatfs (Not suitable for STM32 F4 flash due to its sector orientation). It has total 7 sectors that vary in size.  Sectors 0-3 each contain 16 kB, 4 contains 64 kB, and 5-7 each contain 128 kB.  This translates to a total of 512 kB of Flash memory.   These are are not sufficient for what I'm looking for, and was wondering if anyone has ideas?  I'm using the STM32CubeIDE. 
Writing to FLASH memory requires an erase operation first. It seems that you already know that erase operations must be performed on whole sectors. Note also that FLASH memory wears out with repeated erase/write cycles.
I suggest one of three approaches depending upon how much data you must store and your coding abilities.
An "in-chip" approach is to implement a circular buffer in RAM and maintain your log there. If power is lost then you need code to commit that RAM buffer to FLASH. On power-up you need code to restore the RAM buffer from FLASH. This implies that your design does not suffer frequent power cycles and that you can maintain power to the microcontroller long enough to save the buffer from RAM to FLASH.
Next option is to use an external memory chip. EEPROMs are not terribly fast and are also subject to wear. FRAM is fast and suffers trillions of writes before wear issues. It is available as I2C or SPI so you can implement a number of chips to provide a reasonable buffer size and handle the chip to memory mapping in your handler code. FRAM is not cheap though.
Finally, there is an option to add an SSD drive. These devices include "wear levelling" to maintain their active lives. However, you need a suitable interface such as USB or PCI.
HTH

Do partition tables use logical block size or 512 bytes as the unit?

When I read the partition table (MBR or GPT) from a device, are the numbers in units of logical block size, or nominal 512-byte sectors? Surprisingly, I couldn't find the canonical answer through googling.
Conclusion has been reversed based on further investigation
Although almost all drives use 512-byte logical sectors, modern partition tables use LBA addresses, and LBA unit size is the logical sector size of the device, which today may be as great as 4096 bytes.
In the end I posted the question about unit size to the main GNU parted (partition editor) mailing list and have received this response. Specifically:
"LBA always refers to the drive's block size. So it may be 512 or 4096
or some other value, depending on what the drive reports."
Incorrect previous answer version: [[Partition tables (in the MBR and otherwise) refer to 512 byte blocks / logical sectors. See for example https://en.wikipedia.org/wiki/Master_boot_record#PTE.]]
Background information
Reporting of disk physical disk sector sizes seems to be fundamentally done through commands in the ATA-8 specification, specifically the "IDENTIFY DEVICE" command. Compatibility issues (most often discussed) are alignment of I/O operations. Apparently most drives handle 512 byte alignment, but with performance penalties, though there are some drives advertised as "4k native" or "4kn" that do not support 512 byte aligned I/O at all. In general, drives with physical 4k sectors use what is called "Advanced Format", which may help you search if you want more info.
This article https://linuxconfig.org/linux-wd-ears-advanced-format has some relatively clear discussion, especially if you are a Linux user. For what it's worth, on Linux the "parted -l" command reports physical and logical sector size, and parted also knows how to align partitions appropriately for Advanced Format devices.
Also, you might find this article http://www.seagate.com/tech-insights/advanced-format-4k-sector-hard-drives-master-ti/ informative and reassuring on the issue.

How file system block size works?

All Linux file systems have 4kb block size. Let's say I have 10mb of hard disk storage. That means I have 2560 blocks available and let's say I copied 2560 files each having 1kb of size. Each 1 kb block will occupy 1 block though it is not filling entire block.
So my entire disk is now filled but still I have 2560x3kb of free space. If I want to store another file of say 1mb will the file system allow me to store? Will it write in the free space left in the individual blocks? Is there any concept addressing this problem?
I would appreciate some clarification.
Thanks in advance.
It is true, you are in a way wasting disk space if you are storing a lot of files which are much smaller than the smallest block size of the file system.
The reason why the block size is around 4kb is the amount of metadata associated with blocks. Smaller the block size, more there is metadata about the locations of the blocks compared to the actual data and more fragmented is the worst case scenario.
However, there are filesystems with different block sizes, most filesystems let you define the block size, typically the minimum block size is 512 bytes. If you are storing a lot of very small files having a small block size might make sense.
http://www.tldp.org/LDP/sag/html/filesystems.html
XFS Filesystem documentation has some comments on how to select filesystem block size - it is also possible to defined the directory block size:
http://techpubs.sgi.com/library/tpl/cgi-bin/getdoc.cgi?coll=linux&db=bks&srch=&fname=/SGI_Admin/LX_XFS_AG/sgi_html/ch02.html
You should consider setting a logical block size for a filesystem
directory that is greater than the logical block size for the
filesystem if you are supporting an application that reads directories
(with the readdir(3C) or getdents(2) system calls) many times in
relation to how much it creates and removes files. Using a small
filesystem block size saves on disk space and on I/O throughput for
the small files.

writeback of dirty pages in linux

I have a question regarding the writeback of the dirty pages. If a portion of page data is modified, will the writeback write the whole page to the disk, or only the partial page with modified data?
The memory management hardware on x86 systems has a granularity of 4096 bytes. This means: It is not possible to find out which bytes of a 4096-byte page are really changed and which ones are unchanged.
Theoretically the disk driver system could check if bytes have been changed and not write the 512-byte blocks that have not been changed.
However this would mean that - if the blocks are no longer in disk cache memory - the page must be read from hard disk to check if it has changed before writing.
I do not think that Linux would do this in that way because reading the page from disk would cost too much time.
Upon EACH hardware interrupt, the CPU would like to write as much data as possible that the harddisk controller can handle - this size is defined by us as the blksize (or ONE sector, in Linux):
http://en.wikipedia.org/wiki/Disk_sector
https://superuser.com/questions/121252/how-do-i-find-the-hardware-block-read-size-for-my-hard-drive
But waiting too long for SINGLE interrupt for a large file can make the system appear unresponsive, so it is logical to break the chunks into smaller size (like 512bytes) so that the CPU can handle other tasks while transferring each 512 bytes down. Therefore, whether u changed one byte or 511 bytes, so long as it is within that single block, all data get written at the same time. And throughout linux kernel, flagging the blocks as dirty for write or not, all goes by the single unique identifier: sector number, so anything smaller than sector size is too difficult for efficient management.
All these said, don't forget that the harddisk controller itself also has a minimum block size for write operation.

Linux: writes are split into 512K chunks

I have a user-space application that generates big SCSI writes (details below). However, when I'm looking at the SCSI commands that reach the SCSI target (i.e. the storage, connected by the FC) something is splitting these writes into 512K chunks.
The application basically does 1M-sized direct writes directly into the device:
fd = open("/dev/sdab", ..|O_DIRECT);
write(fd, ..., 1024 * 1024);
This code causes two SCSI WRITEs to be sent, 512K each.
However, if I issue a direct SCSI command, without the block layer, the write is not split.
I issue the following command from the command line:
sg_dd bs=1M count=1 blk_sgio=1 if=/dev/urandom of=/dev/sdab oflag=direct
I can see one single 1M-sized SCSI WRITE.
The question is, what is splitting the write and, more importantly, is it configurable?
Linux block layer seems to be guilty (because SG_IO doesn't pass through it) and 512K seems too arbitrary a number not to be some sort of a configurable parameter.
As described in an answer to the "Why is the size of my IO requests being limited, to about 512K" Unix & Linux Stack Exchange question and the "Device limitations" section of the "When 2MB turns into 512KB" document by kernel block layer maintainer Jens Axboe, this can be because your device and kernel have size restrictions (visible in /sys/block/<disk>/queue/):
max_hw_sectors_kb maximum size of a single I/O the hardware can accept
max_sectors_kb the maximum size the block layer will send
max_segment_size and max_segments the DMA engine limitations for scatter gather (SG) I/O (maximum size of each segment and the maximum number of segments for a single I/O)
The segment restrictions matter a lot when the buffer the I/O is coming from is not contiguous and in the worst case each segment can be as small as page (which is 4096 bytes on x86 platforms). This means SG I/O for one I/O can be limited to a size of 4096 * max_segments.
The question is, what is splitting the write
As you guessed the Linux block layer.
and, more importantly, is it configurable?
You can fiddle with max_sectors_kb but the rest is fixed and come from device/driver restrictions (so I'm going to guess in your case probably not but you might see bigger I/O directly after a reboot due to less memory fragmentation).
512K seems too arbitrary a number not to be some sort of a configurable parameter
The value is likely related to fragment SG buffers. Let's assume you're on an x86 platform and have a max_segments of 128 so:
4096 * 128 / 1024 = 512
and that's where 512K could come from.
Bonus chatter: according to https://twitter.com/axboe/status/1207509190907846657 , if your device uses an IOMMU rather than a DMA engine then you shouldn't be segment limited...
The blame is indeed on the block layer, the SCSI layer itself has little regard to the size. You should check though that the underlying layers are indeed able to pass your request, especially with regard to direct io since that may be split into many small pages and requires a scatter-gather list that is longer than what can be supported by the hardware or even just the drivers (libata is/was somewhat limited).
You should look and tune /sys/class/block/$DEV/queue there are assorted files there and the most likely to match what you need is max_sectors_kb but you can just try it out and see what works for you. You may also need to tune the partitions variables as well.
There's a max sectors per request attribute of the block driver. I'd have to check how to modify it. You used to be able to get this value via blockdev --getmaxsect but I'm not seeing the --getmaxsect option on my machine's blockdev.
Looking at the following files should tell you if the logical block size is different, possibly 512 in your case. I am not however sure if you can write to these files to change those values. (the logical block size that is)
/sys/block/<disk>/queue/physical_block_size
/sys/block/<disk>/queue/logical_block_size
try ioctl(fd, BLKSECTSET, &blocks)

Resources