How to create a "fake filesystem" that forwards system calls to my program? - linux

I would like to write a tool that can be used to mount archives such as tar, tgz, zip, 7z, etc. to some directory for as long as it's running, such that I can then open it with whatever file manager I want.
To do this, I would somehow need to make a fake filesystem that forwards system calls such as opening and reading files to my program. How would I did this? Would I have to make my own filesystem driver, or does a library for this already exist?

FUSE is what you're looking for, in principle. One implementation of the archive-mounting you're looking for is/was archive mount, but I am unsure how well it is maintained.

You want to write a FUSE filesystem. It is explained in this article and there is an example in python: https://thepythoncorner.com/dev/writing-a-fuse-filesystem-in-python/
No need to call this "fake filesystem", because for linux everything can be a file, so you write just a "filesystem".

Your question is very general. I will assume you want to write your program using Python because it is what I would do.
Chosen the language, now we need to look for examples of somebody doing something similar. Github would do the trick:
To mount wikipedia: wikifs
To mount Google Music: GmusicFS
N.B. Wikifs is least than experimental, but the script is short and servers perfectly as example.
For python the most popular library is fuse-python.
Happy hacking

Related

Is there a data: URI-like construct for paths on Linux?

If you "open" an URI like data:text/html,<p>test</p>, the opened "file" contains <p>test</p>.
Is there a corresponding approach to apply this principle to Linux paths?
Example:
I want a path to a "virtual file" that "contains" example-data, ideally without actually creating this file.
So I'm basically looking for something you can replace some_special_path_results_in with in /some_very_special_path_results_in/example-data so that the opened "file" just "contains" example-data.
You can use process substitution in bash.
some_command <(printf '%s' '<p>test</p>')
I want a path to a "virtual file" that "contains" example-data, ideally without actually creating this file.
Maybe you should consider using tmpfs.
On Linux, creating a file is a very common and basic operation.
Why can't you create some "temporary" file? or some FUSE filesystem?
You could technically write your kernel module providing a new file system.
Be aware that files are mostly inode(7)-s (see stat(2)). They do have some meta data (but no MIME types). And any process can try to open(2) a file (sometimes, two processes are opening or accessing the same file). See path_resolution(7) and credentials(7).
Maybe you want pipe(7), fifo(7) or unix(7) sockets.
Read also Advanced Linux Programming, syscalls(2), a good textbook on operating systems, and see the KernelNewbies and Linux From Scratch and Linux BootPrompt websites
Technically Linux is open source: you are allowed to download, study and improve and recompile its source code. See this for the kernel code (it is free software), GNU libc, GCC, etc....
PS. Take into account legal software licensing considerations. Ask your lawyer to explain you the GPL licenses.

Retrieve information from Linux kernel virtual filesystem

I am wanting to write a utility that does some reporting based on data available in the /proc directory.
Is this as simple as reading and parsing the contents of the virtual file I am interested in? I have seen this approach implemented in Python when doing similar things.
Is there a superior way to do this in Go?
For backstory, I am using ZFS on Linux and want to retrieve data from this virtual file: /proc/spl/kstat/zfs/arcstats
This is a Python program that operates directly on that file.
Is this as simple as reading and parsing the contents of the virtual file I am interested in?
As far as I know: yes.
But you might try looking at github.com/c9s/goprocinfo to see what they do there, or if you can use that package instead.
See also this SO question and answer.

Read folders as files

After doing some research and finding nothing, I wanted to ask:
Is it possible to read folders as files?
What i mean by this is, is it possible to read a folder as an object, like one can read a text file? (example below)
with open("sometext.txt", "r") as data:
print(data)
fo = open("sometext.txt", "w")
information = "somestring"
fo.write(information)
fo.close()
Thanks in advance for any help.
Yes, on some operating systems, but probably not in the way you intend. There is a library directory (disclaimer: I have never used this, I found it while researching your question) that binds the low-level system call opendir and readdir. These are C functions that interact with the Unix kernel to manipulate directories.
This is possible because in Unix, directories are a special type of file that contain pointers to the contents of that directory. This may be true of other operating systems, but I wouldn't count on there being existing tools for interacting with the low-level components of those systems with Python.
As the people who commented above have pointed out, there are very few (though not none) scenarios that would require you to get this close to the machinery of the operating system. It's likely that whatever you're trying to do can be done using either the os module or some other part of the standard library.

Pseudo filesystems on *nix

I need some opinions pointers on creating pseudo-filesystems for linux/*nix systems.
Firstly when I say pseudo-filesystem I mean something like /proc where the structure within does not represent actual files on disks or such but the state of the kernel. I would like to try something similar as an interface to an application.
As an example you could say, mount a ftp url to your filesystem and your browser app could then allow you to interact with the remote system doing ls et al on it and translating the standard filesystem requests into ftp ones.
So the first question is: how does one go about doing that? I have read a bit about it and it looks like you need to implement a new kernel module. If possible I would like to avoid that - my thinking being that someone may have already provided a tool for doing this sort of thing and provided the module to assist already.
My second question is: does anyone have a good list of examples of applications/services/whatever using this sort of technique to provide a filesystem based interface.
Lastly if anyone has any opinions on why this might be a good/bad idea to do such a thing on a generic level I would like to hear it.
A userspace filesystem via fuse would probably be your best way to go.
Regarding the next part of your question (which applications use this method), there is the window manager wmii, it uses the 9p filesystem via v9fs, which is a port of 9p to Linux. There are many examples on plan9, most notably acme. I suggested fuse because it seems more actively developed and mainstream in the Linux world, but plan9 is pretty much the reference for this approach as far as I know.

File search algorithms using indexing in linux

I think of implementing a file search program using indexing in linux... I know that there are several other file search programs like beagled. but I am doing this for study purpose... I am struck with how to do indexing.. I have the following idea that I took from maemo-mapper application..
for example if u have file named "suresh" its index in the file system as files...
/home/$USERNAME/.file_search_index/s/u/r/e/s/h/list.txt.. This list.txt contains the location of all files with name = "suresh"... Pls suggest a better idea/algorithm to implement it... And If there is any material on various file search technique pls post it....
You haven't seen the locate command that comes with findutils? Like beagled, it's free software, so you can study the code.
The findutils package is always looking for contributors.
Information on the database format is at http://www.gnu.org/software/findutils/manual/html_node/find_html/Database-Formats.html
Beagle uses a very interesting approach with inotify. It starts, establishes a watch on the parent directory and starts another thread that does a recursive scan. As more directories are accessed, the parent sees them and adds more watches, while watching what it already knows about.
So, when its started, you're watching an entire tree quite cheaply (one watch per directory) and have the whole thing indexed. This also helps to ensure that no files are 'missed' during the scan.
So, that's most of your battle.. typically FS search programs hit their sluggish point when indexing, for instance 'updatedb'.
As for storing the index, I would not favor splitting it up in directories. You'd be in essence calling stat() on each character in a file name array. some-very-long-shared-object-name.so.0 for instance would be one call to stat() for every character in the name. You might try using a well designed SQLite3 database.
I'm working on something very similar, a program to provide slightly cheaper auditing means for PCI certification (credit card processor), without using kernel auditing hooks.

Resources