Write file on read request - Linux - linux

I want a specific file to be written whenever any program is attempting to read it. For example, I create an empty file, or filled with zeros, a program tries to read N bytes starting from the M-th byte of the file (using read/seek syscalls), and I need to make the read call wait until I write the requested bytes to the file to make the syscall successfully read written bytes without errors. The file should look as the one a program is expecting to read. Or is there a way to "send" needed bytes to the read() call without writing them directly to the file before that? I need to make it work with any program without editing its code.

This must be done by intercepting the filesystem read requests at the kernel level. The simplest approach would be to leverage FuseFS to implement a filesystem filter in userspace. The other programs would be reading from your fileyststem, giving you full control over what they read.

You would need to create your own synchronization protocol between the read and writer using some IPC mechanism.
Alternatively, you could do this in a database using stored procedures.

Related

Python3 work effectively with remote binary data

I have some remote devices that are accessed via sftp using paramiko over (usually) a cellular connection. I need to do two things to some binary files on them:
Read them byte-by-byte to enter the contents into a database
Write the entire file out to a networked drive.
I have code to do each of these things that works, but I need to do both things in the most reasonably efficient way.
I could pass around the file_handle, read byte by byte to do the database entry, then do file_handle.seek(0) and pass that to other_file_handle.write, but I'm a little concerned about the flakiness of the cellular connection as I'm reading remote files byte by byte and processing the results and it means effectively iterating the thing twice.
I could fix the double iteration part of that problem by both translating the bytes to meaningful data and simultaneously writing them to a buffer to later dump to disk, but that seems awfully...manual?
I could read the entire remote binary file, write it to disk, open that for byte-by-byte processing but that's really inefficient compared to doing all the work in memory.
I could read in the remote data to an IO stream, and then manually both convert bytes and also put them into a write stream. But then the file writing code is totally coupled to the parsing code and again it's a lot of lower level manipulation.
The last is probably the "best" way but I'm hoping there's a better higher-level abstraction to use that lets me maintain a better separation of concerns. Is there an equivalent of the posix tee command or something here?

Transactionally writing files in Node.js

I have a Node.js application that stores some configuration data in a file. If you change some settings, the configuration file is written to disk.
At the moment, I am using a simple fs.writeFile.
Now my question is: What happens when Node.js crashes while the file is being written? Is there the chance to have a corrupt file on disk? Or does Node.js guarantee that the file is written in an atomic way, so that either the old or the new version is valid?
If not, how could I implement such a guarantee? Are there any modules for this?
What happens when Node.js crashes while the file is being written? Is
there the chance to have a corrupt file on disk? Or does Node.js
guarantee that the file is written in an atomic way, so that either
the old or the new version is valid?
Node implements only a (thin) async wrapper over system calls, thus it does not provide any guarantees about atomicity of writes. In fact, fs.writeAll repeatedly calls fs.write until all data is written. You are right that when Node.js crashes, you may end up with a corrupted file.
If not, how could I implement such a guarantee? Are there any modules for this?
The simplest solution I can come up with is the one used e.g. for FTP uploads:
Save the content to a temporary file with a different name.
When the content is written on disk, rename temporary file to destination file.
The man page says that rename guarantees to leave an instance of newpath in place (on Unix systems like Linux or OSX).
fs.writeFile, just like all the other methods in the fs module are implemented as simple wrappers around standard POSIX functions (as stated in the docs).
Digging a bit in nodejs' code, one can see that the fs.js, where all the wrappers are defined, uses fs.c for all its file system calls. More specifically, the write method is used to write the contents of the buffer. It turns out that the POSIX specification for write explicitly says that:
Atomic/non-atomic: A write is atomic if the whole amount written in
one operation is not interleaved with data from any other process.
This is useful when there are multiple writers sending data to a
single reader. Applications need to know how large a write request can
be expected to be performed atomically. This maximum is called
{PIPE_BUF}. This volume of IEEE Std 1003.1-2001 does not say whether
write requests for more than {PIPE_BUF} bytes are atomic, but requires
that writes of {PIPE_BUF} or fewer bytes shall be atomic.
So it seems it is pretty safe to write, as long as the size of the buffer is smaller than PIPE_BUF. This is a constant that is system-dependent though, so you might need to check it somewhere else.
write-file-atomic will do what you need. It writes to temporary file, then rename. That's safe.

Identifying that a file is being copied outside the computer in LKM

Assuming that i have Loadable-Kernel-Module inserted in linux-kernel and have hooked read, write, open and close functions. So now i can stop access to any file but i want to stop files from being copied outside the device like to a usb device, card, disk etc. The thing i want to know is that sitting in LKM and with function calls hooked how can i identify that a file is being written to external device?.
Also i want to know that which system calls are used during a copy operation ? I have idea that a program opens the file reads from it ( read system call) and then writes to second file( write system call) but i observed strange behavior when i was trying to stop write access to a file that a process which opens a file never calls write operation on that file for saving file (checked for pdf viewer).
If anybody have idea about this strange behavior or you have idea that how to stop writing to a file then please share it also.
They could mmap it to do read/write. Or they could read the entire original file into memory, close it, then open the destination.
Or they could encrypt the file, then write it out to a new file on the USB.
Or they could do minor edits to the contents, then save it out.
Or they could use gvfs to access the network/USB device.
Or the user could reboot and copy the file in a different OS.
All that really highlights is that the problem is really difficult - a determined user will always find a way to extract data from a system they have access to.
You're best bet is just to prevent accidental leakage - so scan files after close on the removable media, and check they don't have contents you don't want leaked. Overwrite and delete if they do.
Or else block the devices from being mounted in the first place, and disable gvfs as well.
As to why your hook isn't intercepting the write(), either:
Your hook isn't actually intercepting the operation.
The application isn't using write() to put the content in a file.

File Access method in Linux

Read in text books that there are mainly two file access methods; sequential and direct. Which one we are using in Linux?
In read command we are giving the how much bytes to read and to which buffer. So we are having sequential access in Linux?
But physically we have files stored is blocks? I couldnt relate to it.
Whether direct access possible in Linux?
I read about these access models in Operating System Concepts by Galvin
Both are possible.
When you do a read on an ordinary file, it does read the file sequentially, advancing the file pointer each time by the right amount.
But you can also use seek to move to an arbitrary point in the file.
Not all files support random/direct access. Pipes for instance are typically only sequential access (you can't rewind or skip forward).
So pretty much everything is possible, but some file types have restrictions.
(File access with direct I/O (O_DIRECT flag) is a different concept altogether.)
You can certainly read/write from an arbitrary position in an open (disc) file.
There are a number of methods of doing random IO, which are optimised for different kinds of usage.
The simplest method is seek() followed by read() or write(). The file pointer moves on by the amount of bytes read/written, and it can allow sequential IO following a random jump. Consider seek() as logically spinning the an old "reel-to-reel" tape drive (even though we don't have these any more).
The pread and pwrite system calls combine seek() and read/write(), specifically for use in multithreaded programs (where two syscalls would result in a race condition). They don't change the file pointer, so you can think of it logically just taking or putting a random bit of data.
mmap() maps a file into memory - where you can then do with it, what you will, using conventional pointer/ memory manipulation (for example, memset, memcpy, etc).

Reading file in Kernel Mode

I am building a driver and i want to read some files.
Is there any way to use "ZwReadFile()" or a similar function to read the
contents of the files line by line so that i can process them in a loop.
The documentation in MSDN states that :-
ZwReadFile begins reading from the given ByteOffset or the current file position into the given Buffer. It terminates the read operation under one of the following conditions:
The buffer is full because the number of bytes specified by the Length parameter has been read. Therefore, no more data can be placed into the buffer without an overflow.
The end of file is reached during the read operation, so there is no more data in the file to be transferred into the buffer.
Thanks.
No, there is not. You'll have to create a wrapper to achieve what you want.
However, given that kernel mode code has the potential to crash the system rather than the process it runs in, you have to make sure that problems such as those known from usermode with very long lines etc will not cause issues.
If the amount of data is (and will stay) below the threshold of what registry values can hold, you should use that instead. In particular REG_MULTI_SZ which has the properties you are looking for ("line-wise" storage of data).
In this situation unless performance is a critical (like 'realtime') then I would pass the filtering to a user mode service or application. Send the file name to the application to process. A user mode application is easier to test and easier to debug. It wont blue screen or hang your box either.

Resources