Use in-memory buffer as a file - node.js

During my program's execution I get a buffer that I need to additionally feed into an external program and read it's output. Unfortunately, this program only accepts filenames as input and output, so I'd have to save this buffer to a disk first, but I'd like not to create temp files on the filesystem, if that's possible.
Is there any way to do this?

Related

Node.js: manipulate file like a stack

I'm envisioning a implementation in node.js that can manipulate a file on disk as if it's a stack data struct.
Suppose file is utf-8 encoded plain text, each element of the stack corresponds to a '\n' delimited line in the file, and top of stack point to first line of that file. I want something that can simultaneously read and write the file.
const file = new FileAsStack("/path/to/file");
// read the first line from the file,
// also remove that line from the file.
let line = await file.pop();
To implement such interface naively, I could simply read the whole file into memory, and when .pop() read from memory, and write the remainder back to disk. Obviously such approach isn't ideal. Imagine dealing with a 10GB file, it'll be both memory intensive and I/O intensive.
With fs.read() I can read just a slice of the file, so the "read" part is solved. But the "write" part I have no idea. How can I effectively take just one line, and write the rest of the file back to it? I hope I don't have to read every bytes of that file into the memory then write back to disk...
I remember vaguely that file in a filesystem is just a pointer to a position on disk, is there any way I can simply move the pointer to the start of next line?
I need some insight into what syscalls or whatever can do this effectively, but I'm quite ignorant to low level system stuffs. Any help is appreciated!
What you're asking for is not something that a standard file system can do. You can't insert data into the beginning of a file in any traditional OS file system without rewriting the entire file. That's just the way they work.
Systems that absolutely need to be able to do something like that without rewriting the entire file and still use a traditional OS file system will build their own mini file system on top of the regular file system so that one virtual file consists of many pieces written to separate files or to separate blocks of a file. Then, in a system like that, you can insert data at the beginning of a virtual file without rewriting any of the existing data by writing a new block of data to disk and then updating your virtual file index (stored in some other file) to indicate that the first block of your virtual file now comes from a specific location. This file index specifies the order of the blocks of data in the file and where they come from.
Most programs that need to do something like this will instead use a database for storing records and then use indexes and queries for controlling order and let the underlying database worry about where individual bits get stored on disk. In this way, you can very efficiently insert data anywhere you want in a resulting query.

Write file on read request - Linux

I want a specific file to be written whenever any program is attempting to read it. For example, I create an empty file, or filled with zeros, a program tries to read N bytes starting from the M-th byte of the file (using read/seek syscalls), and I need to make the read call wait until I write the requested bytes to the file to make the syscall successfully read written bytes without errors. The file should look as the one a program is expecting to read. Or is there a way to "send" needed bytes to the read() call without writing them directly to the file before that? I need to make it work with any program without editing its code.
This must be done by intercepting the filesystem read requests at the kernel level. The simplest approach would be to leverage FuseFS to implement a filesystem filter in userspace. The other programs would be reading from your fileyststem, giving you full control over what they read.
You would need to create your own synchronization protocol between the read and writer using some IPC mechanism.
Alternatively, you could do this in a database using stored procedures.

Does the Linux system lock the file when I copy it?

I have wrote a program and it will update a file periodically, sometimes I want to copy the file into another computer to check its content. If I copied the file when the program was not writing it, there is no problem. But, if I copied the file when the program was writing it, the copied file would be partial. So, I wonder that, if the Linux system exists the lock strategy to prevent the situation.
In fact, I copy the file in a bash script, so I want to check if the program is writing it in the bash script. If yes, the bash script will check its state after some seconds and then copy its completed version. So in bash script, how could we check the file was opened or modified by other programs?
You could check from your script whether the file is being written to, and abort/pause copy if it is...
fuser -v /path/to/your/file | awk 'BEGIN{FS=""}$38=="F"{num++}END{print num}'
If the output is smaller 1 you're good to copy :)
When your code writes into the file, it actually writes into an output buffer in memory. The buffer will be flushed out to disk when it becomes full. Thus, when you copy the file whose buffer has not been flushed out to disk, you will observe partial file.
You can modify the buffer size by using the call to setvbuf. If you set the buffer size to zero, it will get flushed out as it is written. Another thing you can do is to make a call to fflush() to flush the output to disk. Either of these two should update the file as it is written.

Ghostscript: Convert PDFs to other filetypes without using the filesystem

I want to use the C API to Ghostscript on Linux to convert PDFs to other things: PDFs with fewer pages and images being two examples.
My understanding was by supplying callback functions with gsapi_set_stdio I could read and write data from them. However from my experimentation and reading, this doesn't seem to be the case.
My motivation for doing this is I will be processing PDFs at scale, and don't want my throughput to be held back by a spinning disk.
Am I missing something?
The stdio API allows you to provide your own replacements for stdin, stdout and stderr, it doesn't affect any activity by the interpreter which doesn't use those.
The pdfwrite device makes extensive use of the filesystem to write temporary files which hold various intermediate portions of the PDF file as it is interpreted, these are then later reassembled into the new PDF file. The temporary files aren't written to stdout or stderr.
There is no way to avoid this behaviour.
Rendering to images again uses the file system, unless you specify stdout as the destination of the bitmap in which case you can use the stdio API call to have stdout redirect elsewhere. If the image is rendered at a high enough resolution then GS will use a display list and again the display list will be stored in a temporary file which will be unaffected by stdio redirection.

How can we create 'special' files, like /dev/random, in linux?

In Linux file system, there are files such as /dev/zero and /dev/random which are not real files on hard disk.
Is there any way that we can create a similar file and tell it to get ouput from executing a program?
For example, can I create file, say /tmp/tarfile, such that any program reading it actually gets the output from the execution of a different program (/usr/bin/tar ...)?
It is possible to create such a file/program, but it would require creation of a special filesystem in order to insert hooks into the VFS so that accesses can be detected and handled properly.

Resources