Implementing basic file system - linux

As a college project I need to implement a basic file system from within a file. So how do I go about this? What are the things that I would need to know? The requirements include having a daemon process in the background. Also the applications that use this system need to connect to the server using a Unix domain socket
The file system should have the following capabilities:
List files stored along with their sizes.
Create files
Allow changes to files
Delete files

Check this out if it can help.
http://www.geocities.ws/ravikiran_uvs/articles/rkfs.html
If you want to create a file system in user space FUSE can help you.
http://fuse.sourceforge.net/

Here is an example of a very, very basic FUSE implementation that is backed by a glorified shared memory segment (xenstore). Its a fork of the original xenstore FUSE file system that I maintain.
You will also find some code to show you how to make Valgrind more helpful when debugging FUSE implementations.
You write functions for open / create / read / write / truncate / getattr / etc and pass them to fuse (line numbers are from the linked example):
343 static struct fuse_operations const xsfs_ops = {
344 .getattr = xsfs_getattr,
345 .mknod = xsfs_mknod,
346 .mkdir = xsfs_mkdir,
347 .unlink = xsfs_rm,
348 .rmdir = xsfs_rmdir,
349 .truncate = xsfs_truncate,
350 .open = xsfs_open,
351 .read = xsfs_read,
352 .write = xsfs_write,
353 .readdir = xsfs_readdir,
354 .create = xsfs_create,
355 .destroy = xsfs_destroy,
356 .utime = xsfs_utime,
357 .symlink = xsfs_symlink,
358 .init = (void *)xsfs_init
359 };
As you can see, its extremely self explanatory. A little searching would result in finding many basic file backed examples of FUSE implementations as well.
I highly recommend doing it entirely in user space, unless you have enough time to get familiar enough with the kernel.

A file system is essetially a database for files. The main thing you'll need is a lookup table for storing byte offsets and file lengths. The file names could also be stored in the table or they could be stored in the first few bytes at each offset. It will be far easier on you if you make your file system a fixed size.
This would be similar to how the FAT file system works.
You can also have a look at http://en.wikipedia.org/wiki/Database_storage_structures since at the lowest levels file systems and databases are very similar.

The simplest way to do this would be to build a template for storing data, and parse files into ram, of course, this isn't the most efficient.
Something like...
SOME/LOCATION/Filename >>>
contents of the file here, blah blah blah
<<<
SOME/OTHER/LOCATION/File2Name >>>
contents of another file here
<<<
Then to list out a directory, using regex find all lines ending in >>>, then parse up to the Xth slash (based on the number of slashes in the searched folder), and do a case (in) sensitive search, based on whether or not you want this to be case sensitive. Of course, as I mentioned loading it into memory, you could search a key->value hashmap, which would probably be a lot simpler.

Related

Read nth line in Node.js without reading entire file

I'm trying to use Node.js to get a specific line for a binary search in a 48 Million line file, but I don't want to read the entire file to memory. Is there some function that will let me read, say, line 30 million? I'm looking for something like Python's linecache module.
Update for how this is different: I would like to not read the entire file to memory. The question this is identified as a duplicate of reads the entire file to memory.
You should use readline module from Node’s standard library. I deal with 30-40 million rows files in my project and this works great.
If you want to do that in a less verbose manner and don’t mind to use third party dependency use nthline package:
const nthline = require('nthline')
, filePath = '/path/to/100-million-rows-file'
, rowNumber = 42
nthline(rowNumber, filePath)
.then(line => console.log(line))
According to the documentation, you can use fs.createReadStream(path[, options]), where:
options can include start and end values to read a range of bytes from the file instead of the entire file.
Unfortunately, you have to approximate the desired position/line, but it seems to be no seek like function in node js.
EDIT
The above solution works well with lines that have fixed length.
New line character is nothing more than a character like all the others, so looking for new lines is like looking for lines that start with the character a.
Because of that, if you have lines with variable length, the only viable approach is to load them one at a time in memory and discard those in which you are not interested.

How to check the available free space in Haxe?

I don't find a way to check the free space available in a device using Haxe, Openfl, Lime or another library.
I would like to avoid download data that will exceed the size recommended for an app in each device.
What do you do to check that?
Try creating a file of that size! Then either delete it or reopen and write (not append) over its contents.
I don't know whether all platforms Haxe supports will work fine with this trick, but this algorithm is reported to work in many places and languages (I personally tested it in Ruby and saw the same suggestion for C++/.NET). To check whether X bytes of disk space are available:
open a new file for writing
seek X-1 bytes from the beginning
write a byte of data (whatever you want, 0, 42...)
close the file (probably unrelated to the task at hand, but don't forget to do that anyway)
If there's insufficient disk space, you'll likely get an exception at some point in this algorithm. You'll have to find out what errors to expect and process them properly.
Using ihx I've found this is working and requires nothing but Haxe Standard Library:
haxe interactive shell v0.3.4
type "help" for help
>> import sys.io.*;
>> var f = File.write('loca', true)
sys.io.FileOutput : { __f => #abstract }
>> f.seek(39999, FileSeek.SeekBegin)
Void : null
>> f.writeByte(0)
Void : null
>> f.close()
Void : null
After these manipulations, I had a file named loca of exactly 40000 bytes in my working directory.
By the way, be careful when doing things like these in ihx since it re-runs the entire session with the last entered line appended each time.
Ongoing experimentation:
However, when there's insufficient disk space, it may not fail with errors. In this case you'll have to check the real size with sys.FileSystem.stat(path).size. And don't forget to delete the file if there's not enough space.

Output other than .txt

I'm looking to build a simple program that will simply modify existing output files from an other program so I don't have to open the program and enter a bunch of data the long way. This program is very specific to my domain and has an extension named .wcc. However, when I change the extension of one of these output files to .txt, I get half gibberish :
ÿÿ WPointÿÿ WPolygonÿÿ  WQuadrilateralÿÿ  WMemberDataÿÿ
WLoadÿÿ WLStandardMembersÿÿ WLSavedDesignSettingsÿÿ WLSavedFormatSettingsÿÿ  WLSavedViewSettingsÿÿ WLSavedProjectSettingsÿÿ  WLSavedSettingsÿÿ  WLSavedLoadSettingsÿÿ WLSavedDefaultSettingsÿÿ WLineÿÿ WProductÿÿ WBeamDataÿÿ  WColumnDataÿÿ
WJoistDataÿÿ
WWallStudDataÿÿ WSupportingMemberDataÿÿ WSavedAnalysisSettingsÿÿ WSavedGravityDesignSettingsÿÿ WSavedPreferencesSettingsÿÿ WNotchÿÿ WIJoistÿÿ WFloorCWC37 ÀAE LumberS-P-F No.1/No.2 # À# lumwall.cww ÿÿÿÿ1.2.3.1.Mur_1_EX-D ÿÿÿÿÿÿ B Cÿÿ B C €? 4C 4C   Neige #F #F ÈC ÿÿÿ
WLStandardMembersÿÿ "
There are also musical notes and perpendicular signs which I can't copy paste here. I can sorta read the text, but still not enough to make modifications via txt file. What type of file could this be? Is it even possible to do what I'm trying to do? Thanks!
I am surprised that you are trying to open a .wcc file as a text file (it's contents - as you will see - don't lend themselves to being converted to such a file type); however, the attempt to open the file as a .txt file seems to be specific to your domain.
I noticed part of your question is as follows: "What type of file could this be?"
You are right in thinking that the .wcc file is a rather obscure file type - we don't think about that file type a lot (or are not conscious of it existing). A .wcc file is a WinCam 2000 Cache file that allows WinCam 2000 movies to be previewed in the slide browser - these were often generated by older WinCam 2000 screen recording and editing programs.
Again, the file extension is very rare these days (a Google search only returns ~700 results). But, it appears you have a program that is producing the file, which - as you are saying - "is quite specific to your domain". You may be out of luck with regard to opening them for modification purposes.
Supposedly, you can covert .wac files to .wav files, which are much more relevant to today's technology (and definitely alterable from code); however, without knowing the purpose of the file, e.g. what you are trying to do with the file domain-side, I can't say that this will suit your needs.
Also, the above comments are "correct": changing a file extension will not convert the file to the file extension type. Typically, converters - like a simple software - are needed to convert files.

How to use GNU sort with numeric data in binary format?

Is there any way to use GNU Coreutils sort with 64bit numbers stored in binary file?
If file wasn't binary then sort -n is the solution, but I didn't find any options to use it with binary data.
File is quite large (~100GB) and if it is possible I don't want to make its' text (non-binary) copy.
Sample of data:
$ xxd file
00292e0: 4036 1eb7 6888 d319 de6b 7402 9ca9 f116 #6..h....kt.....
00292f0: db68 7f05 199f 9d36 cf01 cb28 e49f 1116 .h.....6...(....
0029300: 0c7c 8b55 2963 ef0c 277a f2b0 38d7 2b19 .|.U)c..'z..8.+.
0029310: c83b 2614 4327 d838 820c 1bb8 444f 1731 .;&.C'.8....DO.1
0029320: 1695 cab3 cd12 092a 0691 d7e4 5fcc b01d .......*...._...
0029330: b12b 7c1b a209 7c1c 568a 125c 541c d334 .+|...|.V..\T..4
0029340: 09a3 ecbc 8370 e205 9265 7759 a378 4e2f .....p...ewY.xN/
The bsort utility does this.
It is a lightning fast inplace radix sort written in C. One of the test cases for its development was a 100Gb file on a machine with 16Gb ram - took about 22 seconds or so to sort.
sort(1) will not help you here. For a small file it could be possible to split your file into lines and feed it to sort(1), but not for 100G file of course.
The answer to this question on Serverfault has a link of the tool written for solving exactly your task. You can check the github project there (it seems to be written in Go so you will need to install a compiler if you decide to use it).
Quick googling does not find any other popular tool for this task written on some more popular language (and it surprises me a bit as the task itself is just a merge sort that thousands of students implement each year on their CS courses, but that's an off-topic).

How to get the name of a file acting as stdin/stdout?

I'm having the following problem. I want to write a program in Fortran90 which I want to be able to call like this:
./program.x < main.in > main.out
Additionally to "main.out" (whose name I can set when calling the program), secondary outputs have to be written and I wanted them to have a similar name to either "main.in" or "main.out" (they are not actually called "main"); however, when I use:
INQUIRE(UNIT=5,NAME=sInputName)
The content of sInputName becomes "Stdin" instead of the name of the file. Is there some way to obtain the name of files that are linked to stdin/stdout when the program is called??
Unfortunately the point of i/o redirection is that you're program doesn't have to know what the input/output files are. On unix based systems you cannot look at the command line arguments as the < main.in > main.out are actually processed by the shell which uses these files to set up standard input and output before your program is invoked.
You have to remember that sometimes the standard input and output will not even be files, as they could be a terminal or a pipe. e.g.
./generate_input | ./program.x | less
So one solution is to redesign your program so that the output file is an explicit argument.
./program.x --out=main.out
That way your program knows the filename. The cost is that your program is now responsible for openning (and maybe creating) the file.
That said, on linux systems you can actually find yout where your standard file handles are pointing from the special /proc filesystem. There will be symbolic links in place for each file descriptor
/proc/<process_id>/fd/0 -> standard_input
/proc/<process_id>/fd/1 -> standard_output
/proc/<process_id>/fd/2 -> standard_error
Sorry, I don't know fortran, but a psudeo code way of checking the output file could be:
out_name = realLink( "/proc/"+getpid()+"/fd/1" )
if( isNormalFile( out_name ) )
...
Keep in mind what I said earlier, there is no garauntee this will actually be a normal file. It could be a terminal device, a pipe, a network socket, whatever... Also, I do not know what other operating systems this works on other than redhat/centos linux, so it may not be that portable. More a diagnostic tool.
Maybe the intrinsic subroutines get_command and/or get_command_argument can be of help. They were introduced in fortran 2003, and either return the full command line which was used to invoke the program, or the specified argument.

Resources