LD_PRELOAD with file functions

LD_PRELOAD with file functions - linux

I have a rather peculiar file format to work with:
Every line begins with the checksum of its content, followed by a new-line-character.
It looks like this:
[CHECKSUM OF LINE_1][LINE_1]\n
[CHECKSUM OF LINE_2][LINE_2]\n
[CHECKSUM OF LINE_3][LINE_3]\n
...
My goal: To allow any application to work with these files like they would work with any other text file - unaware of the additional checksums at the beginning of each line.
Since I work on a linux machine with debian wheezy (kernel 3.18.26) I want to use the LD_PRELOAD-mechanism to override the relevant file functions.
I have seen something like this with zlibc on https://zlibc.linux.lu/index.html - with an explanation of how it works ( https://zlibc.linux.lu/zlibc.html#SEC8 ).
But I dont get it. They only replace the file-opening functions. No read. No write. no fseek. Nothing. So how does it work?
Or - which functions would I have to intercept to handle every read or write operation on this file and handle them accordingly?

I didn't exactly check how it works but the reason seems to be quite simple.
Possible implementation:
zlibc open:
uncompress file you wanted to open to some temporary file
open this temporary file instead of yours
zlibc close:
Compress temporary file
Override original file
In this case you don't need to override read/write/etc because you can use original ones.
In your case you have two possible solutions:
open, that make a copy of your file with striped checksums. close that calculates checksums and override original file
read and write that are able to skip/calculate checksums.
Ad 2.
From What is the difference between read() and fread()?:
fread() is part of the C library, and provides buffered reads. It is
usually implemented by calling read() in order to fill its buffer
In this case I believe that overriding open and close will be less error prone because you can safely reuse original read, write, fread, fseek etc.

Related

Heroku cannot store files temporarily

I am writing a nodejs app which works with fonts. One action it performs is that it downloads a .ttf font from the web, converts it to a base64 string, deletes the .ttf and uses that string in other stuff. I need the .ttf file stored somewhere, so I convert it. This process takes like 1-2 seconds. I know heroku has an ephemeral file system but I need to store stuff for such a short time. Is there any way I can store my files? Using fs.writeFile currently returns this error:
Error: EROFS: read-only file system, open '/app\test.txt']

I had idea how about you make an action, That would get font, convert it and store it on a global variable before used by another task.
When you want to use it again, make sure you check that global variable already filled or not with that font buffer.
Reference
Singleton

I didn't know that you could store stuff in /tmp directory. It is working for the moment but according to the dyno/ephemeral system, it gets cleaned frequently so I don't know if it may cause other problems in the long run.

what happens when calling ```touch .``` in linux?

this is a very specific question
I'm mainly interested in the open() system calls the happen when running touch ..
So I ran strace touch . and saw that opennat() is called three times.
but I'm not really understanding whats going on; as touch . does not print anything in the console and does not create a new file named "." since "." is a pointer to the current folder and can be seen by running ls -a so nothing is created since that name is already in use.
this is my assumption:
open() is called to check if the specified file name already exits, if a file descriptor is returned this means that the name is already in use and the operation is canceled.
please correct me if I'm wrong.

GNU touch prefers to use a file descriptor when touching files, since it's possible to write touch - > foo and expect the file foo to be touched. As a result, it always tries to open the specified path as a writable file, and if that's possible, it then uses that file descriptor to update the file timestamp.
In this case, it's not possible to open . for writing, so openat returns EISDIR. touch notices that it's a directory, so its call to its internal fdutimensat function gets an invalid file descriptor and falls back to using utimensat instead of futimens.
It isn't the case that the openat call is used to check that the file exists, but instead that using a file descriptor for many operations means that you don't have to deal with path resolution multiple times or handle symlinks, since all of those are resolved when the file descriptor is opened. This is why many long-lived programs choose to open a file descriptor to their current working directory, then change directories, and then use the file descriptor with fchdir to change back. Any pchanges to permissions after the program starts are not a problem.

How do I get the filename of an open std::fs::File in Rust?

I have an open std::fs::File, and I want to get it's filename, e.g. as a PathBuf. How do I do that?
The simple solution would be to just save the path used in the call to File::open. Unfortunately, this does not work for me. I am trying to write a program that reads log files, and the program that writes the logs keep changing the filenames as part of it's log rotation. So the file may very well have been renamed since it was opened. This is on Linux, so renaming open files is possible.
How do I get around this issue, and get the current filename of an open file?

On a typical Unix filesystem, a file may have multiple filenames at once, or even none at all. The file metadata is stored in an inode, which has a unique inode number, and this inode number can be linked from any number of directory entries. However, there are no reverse links from the inode back to the directory entries.
Given an open File object in Rust, you can get the inode number using the ino() method. If you know the directory the log file is in, you can use std::fs::read_dir() to iterate over all entries in that directory, and each entry will also have an ino() method, so you can find the one(s) matching your open file object. Of course this approach is subject to race conditions – the directory entry may already be gone again once you try to do anything with it.

On linux, files handles held by the current process can be found under /proc/self/fd. These look and act like symlinks to the original files (though I think they may technically be something else - perhaps someone who knows more can chip in).
You can therefore recover the (possibly changed) file name by constructing the correct path in /proc/self/fd using your file descriptor, and then following the symlink back to the filesystem.
This snippet shows the steps:
use std::fs::read_link;
use std::os::unix::io::AsRawFd;
use std::path::PathBuf;
// if f is your std::fs::File
// first construct the path to the symlink under /proc
let path_in_proc = PathBuf::from(format!("/proc/self/fd/{}", f.as_raw_fd()));
// ...and follow it back to the original file
let new_file_name = read_link(path_in_proc).unwrap();

Deleting original files as you go along adding files to a TAR file

I've written a small server function which is intended to tar together a bunch of locally downloaded files, then delete the originals. It looks something like this:
with tarfile.open(archive_filename, "w:gz") as tar:
for pb in designated_objects:
bucket.download_file(pb.key, pb.key)
tar.add(pb.key)
os.delete(pb.key)
My expectation is that this will generate a tarfile with all of my desired data and an otherwise empty directory. The idea here is that I would like to minimize my disc usage as much as possible. However, I'm unsure if deleting a file before the tarfile is finished being generated (as done here) is allowed.
Will this expression work as expected?
If it will not, is there something akin to an append mode that will?

As expected, the original files are populated, then deleted. However, the behavior of the archive is unusual. When this code block is run, no archive is generated. In fact, this code block will do nothing at all (except delete your files).
I find this behavior particularly unusual and surprising given the fact that taking a pass inside the with statement (as in the code that follows) will actually write an empty archive to disc. So in a sense, the given code block does even less than nothing!
with tarfile.open('archive_filename.xy.gz', "w:gz") as tar:
pass
For reference, this behavior is what I get with Python 3.6. Behavior with other versions of Python may differ.

Create a hard link from a file handle on Unix?

If I've got a handle to an open file, is it possible to create a hard link to that file after all references to it have been removed from the filesystem?
For example, something like this:
fd = fopen("/tmp/foo", "w");
unlink("/tmp/foo");
fwrite(fd, "Hello, world!\n");
create_link_from_fd(fd, "/tmp/hello");
fclose(fd);
Specifically, I'd like to do this so that I can safely write to large data files, then move them into place atomically without having to worry about cleaning up after myself if my program is killed in the middle of writing the file.

The newly released linux 3.11 offers a solution to this problem with the new O_TMPFILE open(2) flag. With this flag you can create an "invisible" file (i.e. an inode with no hardlinks) in some file system (specified by a directory in that file system). Then, after the file is fully set up, you can create a hardlink using linkat. It works like this:
fd = open("/tmp", O_TMPFILE | O_RDWR, 0600);
// write something to the file here
// fchown()/fchmod() it
linkat(fd, "", AT_FDCWD, "/tmp/test", AT_EMPTY_PATH);
Note that aside from the >=3.11 kernel requirement, this also requires support from the underlying file system (I tried the above snippet on ext3 and it worked, but it did not seem to work on btrfs).

Not generally, no. [Edit: since Linux 3.11 there is now linkat; see safsaf32's answer. This does not work on POSIX systems in general since POSIX linkat is restricted to directories only.] There are security considerations here: someone can pass to you an open file descriptor that you could not normally open on your own, e.g.:
mkdir lock; chmod 700 lock
echo secret contents > lock/in
sudoish cmd < lock/in
Here cmd runs as a user who has no permission to open the input file (lock/in) by name, but can still read from it. If cmd could create a new name on the same file system, it could pass the file contents on to a later process. (Obviously it can copy those contents, so this issue is more of a "pass the contents on by mistake" thing than "pass the contents on, on purpose".)
That said, people have come up with ways of "relinking" files by inode/vnode internally (it's pretty easy to do inside most file systems), so you could make your own private system call for it. The descriptor must refer to a real file on the appropriate mount point, of course—there's no way of "relinking" a pipe or socket or device into becoming a regular file.
Otherwise you're stuck with "catch signals and clean up and hope for the best", or a similar trick, "fork off a subprocess, run it, and if it succeeds/fails, take appropriate move/clean-up action".
Edit to add historical note: the above lock example is not particularly good, but back in the days of V6 Unix, MDQS used a fancier version of this trick. Bits and pieces of MDQS survive in various forms today.

On Linux, you might try the unportable trick of using /proc/self/fd by trying to call
char pbuf[64];
snprintf (pbuf, sizeof(pbuf), "/proc/self/fd/%d", fd);
link(pbuf, "/tmp/hello");
I would be surprised if that trick worked after an unlink("/tmp/foo") ... I did not try that.
A more portable (but less robust) way would be to generate a "unique temporary path" perhaps like
int p = (int) getpid();
int t = (int) time(0);
int r = (int) random();
sprintf(pbuf, sizeof(pbuf), "/tmp/out-p%d-r%d-t%d.tmp", p, r, t);
int fd = open (pbuf, O_CREAT|O_WRONLY);
Once the file has been written and closed, you rename(2) it to some more sensible path. You could use atexit in your program to do the renaming (or the removing).
And have some cron job to clean the [old] /tmp/out*.tmp every hour...

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string