Read folders as files - python-3.x

After doing some research and finding nothing, I wanted to ask:
Is it possible to read folders as files?
What i mean by this is, is it possible to read a folder as an object, like one can read a text file? (example below)
with open("sometext.txt", "r") as data:
print(data)
fo = open("sometext.txt", "w")
information = "somestring"
fo.write(information)
fo.close()
Thanks in advance for any help.

Yes, on some operating systems, but probably not in the way you intend. There is a library directory (disclaimer: I have never used this, I found it while researching your question) that binds the low-level system call opendir and readdir. These are C functions that interact with the Unix kernel to manipulate directories.
This is possible because in Unix, directories are a special type of file that contain pointers to the contents of that directory. This may be true of other operating systems, but I wouldn't count on there being existing tools for interacting with the low-level components of those systems with Python.
As the people who commented above have pointed out, there are very few (though not none) scenarios that would require you to get this close to the machinery of the operating system. It's likely that whatever you're trying to do can be done using either the os module or some other part of the standard library.

Related

Is there a data: URI-like construct for paths on Linux?

If you "open" an URI like data:text/html,<p>test</p>, the opened "file" contains <p>test</p>.
Is there a corresponding approach to apply this principle to Linux paths?
Example:
I want a path to a "virtual file" that "contains" example-data, ideally without actually creating this file.
So I'm basically looking for something you can replace some_special_path_results_in with in /some_very_special_path_results_in/example-data so that the opened "file" just "contains" example-data.
You can use process substitution in bash.
some_command <(printf '%s' '<p>test</p>')
I want a path to a "virtual file" that "contains" example-data, ideally without actually creating this file.
Maybe you should consider using tmpfs.
On Linux, creating a file is a very common and basic operation.
Why can't you create some "temporary" file? or some FUSE filesystem?
You could technically write your kernel module providing a new file system.
Be aware that files are mostly inode(7)-s (see stat(2)). They do have some meta data (but no MIME types). And any process can try to open(2) a file (sometimes, two processes are opening or accessing the same file). See path_resolution(7) and credentials(7).
Maybe you want pipe(7), fifo(7) or unix(7) sockets.
Read also Advanced Linux Programming, syscalls(2), a good textbook on operating systems, and see the KernelNewbies and Linux From Scratch and Linux BootPrompt websites
Technically Linux is open source: you are allowed to download, study and improve and recompile its source code. See this for the kernel code (it is free software), GNU libc, GCC, etc....
PS. Take into account legal software licensing considerations. Ask your lawyer to explain you the GPL licenses.

How to create a "fake filesystem" that forwards system calls to my program?

I would like to write a tool that can be used to mount archives such as tar, tgz, zip, 7z, etc. to some directory for as long as it's running, such that I can then open it with whatever file manager I want.
To do this, I would somehow need to make a fake filesystem that forwards system calls such as opening and reading files to my program. How would I did this? Would I have to make my own filesystem driver, or does a library for this already exist?
FUSE is what you're looking for, in principle. One implementation of the archive-mounting you're looking for is/was archive mount, but I am unsure how well it is maintained.
You want to write a FUSE filesystem. It is explained in this article and there is an example in python: https://thepythoncorner.com/dev/writing-a-fuse-filesystem-in-python/
No need to call this "fake filesystem", because for linux everything can be a file, so you write just a "filesystem".
Your question is very general. I will assume you want to write your program using Python because it is what I would do.
Chosen the language, now we need to look for examples of somebody doing something similar. Github would do the trick:
To mount wikipedia: wikifs
To mount Google Music: GmusicFS
N.B. Wikifs is least than experimental, but the script is short and servers perfectly as example.
For python the most popular library is fuse-python.
Happy hacking

Is it possible for the same file to exist in more than one directory?

Just a simple question, borne out of learning about File Systems;
Is it possible for a single file two simultaneously exist in two or more directories?
I'd like to know if this is possible in Linux and well as Windows.
Yes, you can do this with either hard- or soft links (and maybe on Windows with shortcuts. I'm not sure about that). Note this is different from making a copy of the file! In both cases, you only store the same file once, unlike when you make a copy.
In the case of hard links, the same file (on disk) will be referenced in two different places. You cannot distinguish between the 'original' and the 'new one'. If you delete one of them, the other will be unaffected; a file will only actually be deleted when the last "reference" is removed. An important detail is that the way hard links work means that you cannot create them for directories.
Soft links, also referred to as symbolic links, are a bit similar to shortcuts in Windows, but on a lower level. if you open them for read or write operations, you'll read from the file, but you can distinguish between reading from the file directly, and reading from the soft link.
In Windows, the use of soft links is fairly uncommon, but there is support for it (IDK about the filesystem APIs, but there's a tool called ln just like on Unix).

File systems with support to directory hard-linking

Does anybody know one? preferrably with linux implementation?
alternatively, does anybody know how much effort would it take to add it in any open-source implementation? (i mean: maybe it's enough to change an if statement, maybe i have to go carefully trhough the whole fs implementation adding tests; do you have that notion? ).
thanks....
HFS+ allows directory hardlinks in OSX 10.5. Only TimeMachine can create them since OSX 10.6, and HFS+ does some sanity checking that they do not introduce cycles.
However, Linux will not read them. Besides filesystems, this could be enforced at the VFS layer. Even if there are no cycles, some userspace tools rely on having no directory hard links (eg, a GNU find optimisation that lets it skip many directories; it can be disabled with -noleaf ).
Technically nothing keeps you from opening /dev/sda with a hex editor and creating one. However everything else in your system will fall apart if you do.
The best explanation i could find is this quote from jta:
User-added hardlinks to directories
are forbidden because they break the
directed acyclic graph structure of
the filesystem (which is an ASSERT in
Unixiana, roughly), and because they
confuse the hell out of
file-tree-walkers (a term Multicians
will recognize at sight, but Unix
geeks can probably figure out without
problems too.

File search algorithms using indexing in linux

I think of implementing a file search program using indexing in linux... I know that there are several other file search programs like beagled. but I am doing this for study purpose... I am struck with how to do indexing.. I have the following idea that I took from maemo-mapper application..
for example if u have file named "suresh" its index in the file system as files...
/home/$USERNAME/.file_search_index/s/u/r/e/s/h/list.txt.. This list.txt contains the location of all files with name = "suresh"... Pls suggest a better idea/algorithm to implement it... And If there is any material on various file search technique pls post it....
You haven't seen the locate command that comes with findutils? Like beagled, it's free software, so you can study the code.
The findutils package is always looking for contributors.
Information on the database format is at http://www.gnu.org/software/findutils/manual/html_node/find_html/Database-Formats.html
Beagle uses a very interesting approach with inotify. It starts, establishes a watch on the parent directory and starts another thread that does a recursive scan. As more directories are accessed, the parent sees them and adds more watches, while watching what it already knows about.
So, when its started, you're watching an entire tree quite cheaply (one watch per directory) and have the whole thing indexed. This also helps to ensure that no files are 'missed' during the scan.
So, that's most of your battle.. typically FS search programs hit their sluggish point when indexing, for instance 'updatedb'.
As for storing the index, I would not favor splitting it up in directories. You'd be in essence calling stat() on each character in a file name array. some-very-long-shared-object-name.so.0 for instance would be one call to stat() for every character in the name. You might try using a well designed SQLite3 database.
I'm working on something very similar, a program to provide slightly cheaper auditing means for PCI certification (credit card processor), without using kernel auditing hooks.

Resources