how to force immediate writes to a disk from our own driver on linux - linux

We have a driver writing data directly to sectors of disk by sending the BIO through “make_request_fn” of other block device from block layer.
But somehow the data did not write down to the disk instantly, when I reboot the machine, the data what I wrote to disk before reboot are gone.
The data can reflect correctly on next reboot ONLY if one of following , before reboot,
I use function to flush the block device (from program) after writing
Drop the system cache by “fsyhc ; echo 1 > /proc/sys/vm/drop_caches” after writing
We also tried following ways, but they all did not work.
Adding flag like flush/FUA on BIO flag
Calling blkdev_issue_flush()
This happened on both real machine and virtual environment like VMware.
OS is Ubuntu 14.4.3, with kernel 3.19 and Ext4 file system.
I am wondering if somebody can explain the reason and help me out.

Related

redirect output to other partition linux

So, I have a scientific server with a HDD and a SSD hard drive.
Where for computations involving lot's of data reading/writing a user can use the SSD but all the home directories are on the HDD.
Is there an automatic way to redirect the output of any program writing on the SSD to the home directory of the user running the program if the SSD is full?
If the best solution is to write my own script, then what is the best way to determine if the SSD runs out of space?
My OS is Ubuntu 18.04 LTS
In short, I do not think there is such a thing and I do believe that you should implement a bash script that checks (my tool of choice would simply be df) that there is enough space for you to run the next computation run before actually doing it. Maybe you should pre-allocate the space you intend to use, if possible, to avoid other concurrent runs to crash/run out of space? Maybe you should have an automated procedure to clean up some space?
Obviously, you could have the ssd available on some mountpoint in /home/, and then periodically check with a cron job whether it is full. And the maybe unmount it and send a warning mail. This will sort of do what you want. Sort of. But what happens then when also the HDD gets full? Watch out- these kind of problems can easily cause a server to crash, or otherwise experience issues.
This looks like a problem you might partially solve/mitigate by e.g., using a quota scheme (that is, limiting the amount of space that each user can allocate) or better yet by using a dedicated system for queueing jobs and allocating resources.

Linux: How to enable Execute in place (XIP) for RAMFS/TMPFS

I'm working on an embedded system where the rootfs is constructed in a tmpfs partition by the init process. After the rootfs is complete, it will do a pivot-root and start spawning processes located in the rootfs.
But it seems like XIP is not working for our tmpfs, and all the applications is therefore loaded into ram twice (in the tmpfs and again into ram when loaded).
Can this really be true?
I found an old discussion thread at https://ez.analog.com/thread/45262 which describe the same issue as I'm seeing.
How can I achieve XIP for a file-system located in memory?
What you are attempting to do should be indeed possible (though I haven't tried it myself). The problem is simply you are not going about it the correct way. If you use the block RAM device ("brd") you can create a block device that is actually RAM presented as a block device. To enable this on your kernel (sorry you do not say which kernel you have so I will just go with the kernel 4.14), you need to enable CONFIG_BLK_DEV_RAM as well as CONFIG_BLK_DEV_RAM_DAX in your kernel configuration. They are both under "Device Drivers" -> "Block Devices". Then you create such a RAM backed block device and then create for example an ext2 or ext4 or XFS file system on it and then prepare your rootfs into that file system and then pivot-root into it. Now you are executing in a RAM backed file system which has XIP (now replaced by DAX) functionality thus executing applications should now at least in theory work correctly without creating a copy of the data and simply running it out of the RAM pages of the block RAM device.
Please do beware that such approach has limitation such as for example that kernel modules themselves will still be copied into RAM, get_user_pages() may not work, O_DIRECT may not work, and neither might RDMA, sendfile() and splice().
Some relevant things to look at include:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/block/Kconfig?h=v3.19#n359
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/block/Kconfig?h=v3.19#n396
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/blockdev/ramdisk.txt?h=v3.19
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/filesystems/xip.txt?h=v3.19
Note XIP was replaced by DAX since 4.0 kernel so there see:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/filesystems/dax.txt?h=4.14
Also note that support for DAX was removed from block RAM driver with kernel 4.15 so you will no longer be able to do this once you move to kernel 4.15 and later... See commit 7a862fbbdec665190c5ef298c0c6ec9f3915cf45 for the reasoning behind removing the functionality.
I hope this is enough to set you on the right track and sorry about the bad news that the functionality has been removed since 4.15 kernel...

Hibernate Linux on ARM

I am working to implement hibernation (Suspend to disk) on ARM and have successfully done that by using swsusp ARM patch by Sebastian Capella. Now I can hibernate (suspend to swap partition in sd card) the kernel using the command echo disk > /sys/power/state and the system will resume its state with the next power on. But if I press reset again the kernel follow a normal boot sequence.
My question is how can I make that swap area and hibernate image in that area permanent, so that in every reset it will awake from that permanent image? I have given the value of swapiness=0 so that I expect there wont be any swapping of pages any more while system is alive. How kernel decide whether go for a normal boot or awake from (resume=/dev/swap_partition) hibernation?
I searched a lot on internet but didn't get a clear idea about how Linux kernel is awaking from hibernation and what it will do with swap after resuming once.Thank you for your time
My kernel version is 3.14
Here is some code trace of Linux Hibernation APIs calls:
http://www.srcmap.org/sd_share/4/839d1dea/Linux_kernel_Hibernation_Resume.html
Most of the code trace is for PowerPC. But it might give you some idea on the kernel restore-from-hibernation flow.
For ARM, maybe you need to:
On hibernation, just marks the permanent swap file as the swap file.
On resume, prevents the system to "unmarks" that swap file as hibernation file.
Be very careful about kernel image upgrade, the swap file contents are tightly couple to the kernel image. Any minor chg/recompile in kernel will mark the swap file invalid and trigger a reboot action . Add a lot of printk() logs for this.

fsync not working on ext3 or ext4 system

I tried to use fsync to write some file to SD card ASAP. However fsync does not actually block before the file is physically written to the SD card. It seems to take about 5-6 seconds before the data is actually on the SD card. However mount the file system (I tried ext3, ext4) with commit = 1 or sync option does seem to work, the data is safe after reboot in 1 second. My question is that is there anyway to achieve flushing without resort to partition wide solution? I'm using linux kernel 2.6.37. Thank you
If you want to be sure the content is written on the SD card, you should call blockdev with --flushbufs before exiting the program.
If you want to benchmark the writing process, you can call this after every write.
/sbin/blockdev --flushbufs $dev

Is it possible to shutdown linux kernel and resume in Real Mode?

Let's say I'd like to start a small linux distro before my ordinary operating system start.
BIOS load MBR and execute MBR.
MBR locates the active partition which is my linux partition.
Linux start and I perform what I need to do.
Linux shut down and I switch to Real Mode again.
The original partition boot sector is loaded and my ordinary OS start.
AFAIK, step 4 will be the difficult task, restore the state on all devices prior to linux, will INT13h be functional? Do I need to restore the Interrupt Vector Table? To mention a few.
Has this been done in any existing project perhaps?
Linux does not normally support this, particularly since it reinitializes hardware in a way that the BIOS and DOS programs may not expect. However, there is some infrastructure to switch back to real mode in specific cases - particularly, for a reboot (see machine_real_restart in arch/x86/kernel/reboot.c) - and has code to reinitialize hardware for kexec or suspend. I suspect you might be able to do something with a combination of these - but I don't know if the result will truly match what DOS or Windows would expect to see on reboot.
A much easier plan would be to use a chainloading bootloader that can be set to boot in a particular configuration once, like GRUB. You could invoke grub-set-default, then reboot. When GRUB comes up, it would then pass control off to Windows. By then setting the fallback OS to the Linux partition, control would return to Linux on the next boot.
Yet another option may be to use Coreboot, but I'm not sure if this is production-ready for booting windows yet.
i haven't tried this so I don't know if it would work, but here goes:
There is an option in the header of a bzImage format kernel file that specifies the address of real mode code to execute before the protected mode code starts. You could create a minimal bzImage-compliant file which has no actual kernel, but which has real mode code to load your MBR using INT 0x13 to 0x7c00 and jmp into it like the BIOS does.
If you use kexec to load the bzImage using the "-t bzImage-x86 --real-mode" options, it should reset the PE bit in CR0 to drop to realmode (as bdonlan above mentioned) and execute the code pointed to by the bzImage header option.
The bzImage header option is called realmode_swtch and is documented in /usr/src/linux/Documentation/x86/boot.txt , the header format code is in /usr/src/linux/arch/x86/boot/header.S
Have you looked into kexec?

Resources