Can regular file reading benefited from nonblocking-IO? - io

It seems not to me and I found a link that supports my opinion. What do you think?

The content of the link you posted is correct. A regular file socket, opened in non-blocking mode, will always be "ready" for reading; when you actually try to read it, blocking (or more accurately as your source points out, sleeping) will occur until the operation can succeed.
In any case, I think your source needs some sedatives. One angry person, that is.

I've been digging into this quite heavily for the past few hours and can attest that the author of the link you cited is correct. However, the appears to be "better" (using that term very loosely) support for non-blocking IO against regular files in native Linux Kernel for v2.6+. The "libaio" package contains a library that exposes the functionality offered by the kernel, but it has some caveats about the different types of file systems which are supported and it's not portable to anything outside of Linux 2.6+.
And here's another good article on the subject.

You're correct that nonblocking mode has no benefit for regular files, and is not allowed to. It would be nice if there were a secondary flag that could be set, along with O_NONBLOCK, to change this, but due to the way cache and virtual memory work, it's actually not an easy task to define what correct "non-blocking" behavior for ordinary files would mean. Certainly there would be race conditions unless you allowed programs to lock memory associated with the file. (In fact, one way to implement a sort of non-sleeping IO for ordinary files would be to mmap the file and mlock the map. After that, on any reasonable implementation, read and write would never sleep as long as the file offset and buffer size remained within the bounds of the mapped region.)

Related

How do I open a file in a kernel module if calling process is in user space?

I am trying to create a character device driver that dumps /etc/shadow when read from as a non-privileged user. This is for purely academic purposes of course.
I was reading about how reading/writing files in kernel space opens a system to possible exploits. I am trying to implement this in practice.
Please spare me the "don't touch the filesystem in kernel mode" talk. I am precisely trying to exploit the nuances of doing so.
Problem is that the only way I have found so far that works to open a file in kernel mode is filp_open, which is currently producing EACCESS when I read from the device file as a non-privileged user. This was confounding at first as I assumed that I can do anything in kernel space.
For example, when I cat the device file I have created as a non-root user, filp_open produces EACCESS in kernel space???
Further investigation has led me to believe that filp_open checks the capabilities of the calling process. This would make sense as it is used internally by open(), but I am in kernel mode here! There must be a way!
I am very new to programming in kernel space. I have extensive application C experience, but I am finding it difficult to navigate the kernel documentation for precisely what I am looking for. Additionally, it seems that more and more symbols within the kernel are not exported for use in modules. As I am developing an exploit proof of concept, I would like it to work without recompiling the kernel. I am finding a lot of code (vfs and syscalls) that is deprecated as the symbols are no longer exported to kernel modules.
Is what I am trying to do a thing that is specifically engineered against? Loading a kernel module requires root to begin with, so I would see this more in the lens of a persistence focused attack rather than an access one.
Also, I got the proof of concept working by just reading from the file when the module is loaded, but this is no fun! Any pointers here are much appreciated.
After some rethinking and digging I have found two solutions to my problem. Thank you to Tsyvarev and stark for the pointers.
Solution 1
The first solution is to elevate the privileges of the calling process before making a call of filp_open. This is also basically making a rootkit, so not as interesting.
Here is a link to the guide that I found on the subject.
https://0x00sec.org/t/kernel-rootkits-getting-your-hands-dirty/1485
Solution 2
The module will have an init function that by nature must be run with elevated privs when the module is loaded. So you can open the file pointer there and just close it when the module is unloaded. Caveats are that you have the file pointer open the whole time, so all of the gotchas there are still present. Better to only read, writing is where things can get a bit tricky. This is the solution I chose in the interim, as I didn't want this thing to be a full rootkit.
Another direction is workqueue or to spawn a thread. Probably the most tricky but also the most inline with what my original vision of this demo was. I did not test this direction but it probably is the best solution.

How to safely use mmap() for reading?

I have a need to do a lot of random-access reads in a large file so I use mmap(). This solution seems to be perfect as long as the mapped file is untouched. But this is not always the case. If the mapped file is tampered with, several problems arise:
If a change to a file reduces its length or the file becomes inaccessible then a process in order to live must handle SIGBUS signal (at least, in Linux implementation). This adds additional complications since I'm writing a library.
To make things even worse, mmap() manpage says it is unspecified if changes to the original
file are propagated to the memory. So they can very well be propagated.
This essentially means the contents of the file I work with can become white noise at any moment.
Does all of this mean that any program that maps a freely accessible file and does not handle these problems can be brought down by a DoS attack? Even while I do not expect evil hackers to go after my program, I can easily see a user modifying my mapped file, replacing it with another one or making the file inaccessible by, for example, removing a USB drive. And while I can write a signal handler (and this is a bit messy, so I am looking for a better solution) to solve the first problem,
I have no idea how to solve the second one.
The file can not be copied and can be freely moved around if it's not used by a program (just like any other media file). Linux file locks do not always work.
So, how to safely use mmap() for reading?

Persistant storage values handling in Linux

I have a QSPI flash on my embedded board.
I have a driver + process "Q" to handle reading and writing into.
I want to store variables like SW revisions, IP, operation time, etc.
I would like to ask for suggestions how to handle the different access rights to write and read values from user space and other processes.
I was thinking to have file for each variable. Than I can assign access rights for those files and process Q can change the value in file if value has been changed. So process Q will only write into and other processes or users can only read.
But I don't know about writing. I was thinking about using message queue or zeroMQ and build the software around it but i am not sure if it is not overkill. But I am not sure how to manage access rights anyway.
What would be the best approach? I would really appreciate if you could propose even totally different approach.
Thanks!
This question will probably be downvoted / flagged due to the "Please suggest an X" nature.
That said, if a file per variable is what you're after, you might want to look at implementing a FUSE file system that wraps your SPI driver/utility "Q" (or build it into "Q" if you get to compile/control source to "Q"). I'm doing this to store settings in an EEPROM on a current work project and its turned out nicely. So I have, for example, a file, that when read, retrieves 6 bytes from EEPROM (or a cached copy) provides a MAC address in std hex/colon-separated notation.
The biggest advantage here, is that it becomes trivial to access all your configuration / settings data from shell scripts (e.g. your init process) or other scripting languages.
Another neat feature of doing it this way is that you can use inotify (which comes "free", no extra code in the fusefs) to create applications that efficiently detect when settings are changed.
A disadvantage of this approach is that it's non-trivial to do atomic transactions on multiple settings and still maintain normal file semantics.

How can I find file system concurrency issues?

I have an application running on Linux, and I find myself wanting windows (!).
The problem is that every 1000 times or so I run into concurrency problems that are consistent with concurrent reading/writing of files. I am fairly sure that this behavior would be prohibited by file locking under Windows, but I don't have any sufficiently fast windows box to check.
There is simply too much file access (too much data) to expect strace to work reliably - the sheer volume of output is likely to change the problem too. It also happens on different files every time. Ideally I would like to change/reconfigure the linux file system to be more restrictive (as in fail-fast) wrt concurrent access.
Are there any tools/settings I can use to achieve this ?
Hmmm. Concurrent access to files is perfectly legitimate on Posix-like systems so there is no kind of "failure" mode associated with it. Is there a reason you can't use file-locking on Linux? It's difficult to tell from your description what the actual problem is (1000 times of what?) but it sounds like the traditional flock() or lockf() system calls might be what you're looking for.
For some reason I thought you were using C++. The following applies if you are.
If you are using multi-threading and fstream IO and custom streambufs or you disabled sync_with_stdio, then yes, the C++ iostreams will act differently from iostreams on Windows.
I ran into this with one of my own projects.
Windows defines a mutex in its iostream sentry. Linux does not. Linux does seem to have locking in its C stdio functions, so usually that works out anyway.
However, I defined a custom debug streambuf that didn't go through stdio and got all sorts of corruption in Linux.
I got around it by using a mutex that is preprocessed out if the OS is Windows.

Limiting the File System Usage Programmatically in Linux

I was assigned to write a system call for Linux kernel, which oddly determines (and reduces) users´ maximum transfer amount per minute (for file operations). This system call will be called lim_fs_usage and will take a parameter for maximum number of bytes all users can access in a minute. For short, I am going to determine bandwidth of all filesystem operations in Linux. The project also asks for choosing appropriate method for distribution of this restricted resource (file access) among the users but I think this
won´t be a big problem.
I did a long long search and scan but could not find a method for managing file system access programmatically. I thought of mapping (mmap())hard drive to memory and manage memory operations but this turned to be useless. I also tried to find an API for virtual file system in order to monitor and limit it but I could not find one. Any ideas, please... Any help is greatly appreciated. Thank you in advance...
I wonder if you could do this as an IO scheduler implementation.
The main difficulty of doing IO bandwidth limitation under Linux is, by the time it reaches anywhere near the device, the kernel has probably long since forgotten who caused it.
Likewise, you can get on some very tricky ground in determining who is responsible for a given piece of IO:
If a binary is demand-loaded, who owns the IO doing that?
A mapped section of memory (demand-loaded executable or otherwise) might be kicked out of memory because someone else used too much ram, thus causing the kernel to choose to evict those pages, which places an unfair burden on the quota of the other user to then page it back in
IO operations can be combined, and might come from different users
A write operation might cause an IO sooner or later depending on how the kernel schedules it; a later schedule may mean that fewer IOs need to be done in the long run, as another write gets done to the same block in the interim; writing to an already dirty block in cache does not make it any dirtier.
If you understand all these and more caveats, and still want to, I imagine doing it as an IO scheduler is the way to go.
IO schedulers are pluggable under Linux (2.6) and can be changed dynamically - the kernel waits for all IO on the device (IO scheduler is switchable per block device) to end and then switches to the new one.
Since it's urgent I'll give you an idea out of the top of my head without doing any research on the feasibility -- what about inserting a hook to monitor system calls that deal with file system access?
You might end up writing specialised kernel modules to handle the various filesystems (ext3, ext4, etc) but as a proof-of-concept you can start with one. Do not forget that root has reserved blocks in memory, process space and disk for his own operations.
Managing memory operations does not sound related to what you're trying to do (but perhaps I am mistaken here).
After a long period of thinking and searching, I decided to use the ¨hooking¨ method proposed. I am thinking of creating a new system call which initializes and manages a global variable like hdd_ bandwith _limit. This variable will be used in Read() and Write() system calls´ modified implementation (instead of ¨count¨ variable). Then I will decide distribution of this resource which is the real issue. Probably I will find out how many users are using the system for a certain moment and divide this resource equally. Will be a Round-Robin-like distribution. But still, I am open to suggestions on this distribution issue. Will it be a SJF or FCFS or Round-Robin? Synchronization is another issue. How can I know a user´s job is short or long? Or whether he is done with the operation or not?

Resources