Read a file into memory and re-use in script - linux

I have a large (1 GB) file that I want to process with a script. I'm still experimenting with how I want to process this file. So my script keeps changing as I try different things. The problem is, it takes a long time to read the file into memory. Is there a way to read the file into memory once and just keep accessing that memory each time I run my script? That would make my script a lot faster. I switched to using a REPL for now, but curious if this can be done with a script.

You can do this in Linux using ramfs and tmpfs
In Windows, you can use a tool like Imdisk
The idea is you create a disk backed by your RAM.
After creating the disk, copy the file to it - you are essentially writing the file to RAM.
Your script can then read the file from the ramfs/tmpfs/ramdisk.
This should be faster since there will be no disk spinning around,
although this will require at least 1GB of your RAM.

Simple answer: No, there is no such way doing this. :/

Related

redirect output to other partition linux

So, I have a scientific server with a HDD and a SSD hard drive.
Where for computations involving lot's of data reading/writing a user can use the SSD but all the home directories are on the HDD.
Is there an automatic way to redirect the output of any program writing on the SSD to the home directory of the user running the program if the SSD is full?
If the best solution is to write my own script, then what is the best way to determine if the SSD runs out of space?
My OS is Ubuntu 18.04 LTS
In short, I do not think there is such a thing and I do believe that you should implement a bash script that checks (my tool of choice would simply be df) that there is enough space for you to run the next computation run before actually doing it. Maybe you should pre-allocate the space you intend to use, if possible, to avoid other concurrent runs to crash/run out of space? Maybe you should have an automated procedure to clean up some space?
Obviously, you could have the ssd available on some mountpoint in /home/, and then periodically check with a cron job whether it is full. And the maybe unmount it and send a warning mail. This will sort of do what you want. Sort of. But what happens then when also the HDD gets full? Watch out- these kind of problems can easily cause a server to crash, or otherwise experience issues.
This looks like a problem you might partially solve/mitigate by e.g., using a quota scheme (that is, limiting the amount of space that each user can allocate) or better yet by using a dedicated system for queueing jobs and allocating resources.

disk usage increasing indefinitely with php script

I am using the following code to create backups of the php variables.
if(file_exists(old_backup.txt))
unlink('old_backup.txt');
copy('new_backup.txt', 'old_backup.txt');
$content = serialize($some_ar);
file_put_contents('new_backup.txt', $content);
new_backup.txt will have current variables dump and old_backup.txt will have variables dump sometime back in time.
dump size is constant, around 300Mb. But every time above code is run, disk usage increases indefinitely. When the php script killed, disk usage is normal.
Not sure where the file handler still open for deleted files.
How do I make above code work, without much increase in disk usage.
Not sure about what exactly is causing the disk usage increase, because you posted only a snippet and not the full script. However there are a few things that are not correct for sure:
if(file_exists(old_backup.txt))
should be
if(file_exists('old_backup.txt'))
Then the mere existence of the file does not mean you can unlink it, you should check permissions too.
That being said, those aren't good reasons to fill the disk, but we need to see where you get the $some_ar variable from to give better advice.

How can we create 'special' files, like /dev/random, in linux?

In Linux file system, there are files such as /dev/zero and /dev/random which are not real files on hard disk.
Is there any way that we can create a similar file and tell it to get ouput from executing a program?
For example, can I create file, say /tmp/tarfile, such that any program reading it actually gets the output from the execution of a different program (/usr/bin/tar ...)?
It is possible to create such a file/program, but it would require creation of a special filesystem in order to insert hooks into the VFS so that accesses can be detected and handled properly.

How to wait for the dd command to finish copying before continuing my script?

Title pretty much nailed it, Im copying files to a flash drive and then doing some things to those files. Well I have noticed that after running the dd command the flash drive is still flashing and not all the files are on the device.
Does anyone know how to maybe run a simple loop (in script) to wait on the dd process to finish? I have been googling for about 2-3 hours now trying to figure it out and its beyond me if its even possible.
Thanks in advance!
Try the sync command:
sync writes any data buffered in memory out to disk. This can
include (but is not limited to) modified superblocks, modified inodes,
and delayed reads and writes. This must be implemented by the kernel;
The sync program does nothing but exercise the sync system call.
The kernel keeps data in memory to avoid doing (relatively slow)
disk reads and writes. This improves performance, but if the computer
crashes, data may be lost or the file system corrupted as a result.
The sync command ensures everything in memory is written to disk.
Most likely you are seeing the operating system caching the writes. If you really want to make sure that everything is written to the flash drive so that it is safe to remove, it needs to be unmounted.

Does the Linux filesystem cache files efficiently?

I'm creating a web application running on a Linux server. The application is constantly accessing a 250K file - it loads it in memory, reads it and sends back some info to the user. Since this file is read all the time, my client is suggesting to use something like memcache to cache it to memory, presumably because it will make read operations faster.
However, I'm thinking that the Linux filesystem is probably already caching the file in memory since it's accessed frequently. Is that right? In your opinion, would memcache provide a real improvement? Or is it going to do the same thing that Linux is already doing?
I'm not really familiar with neither Linux nor memcache, so I would really appreciate if someone could clarify this.
Yes, if you do not modify the file each time you open it.
Linux will hold the file's information in copy-on-write pages in memory, and "loading" the file into memory should be very fast (page table swap at worst).
Edit: Though, as cdhowie points out, there is no 'linux filesystem'. However, I believe the relevant code is in linux's memory management, and is therefore independent of the filesystem in question. If you're curious, you can read in the linux source about handling vm_area_struct objects in linux/mm/mmap.c, mainly.
As people have mentioned, mmap is a good solution here.
But, one 250k file is very small. You might want to read it in and put it in some sort of memory structure that matches what you want to send back to the user on startup. Ie, if it is a text file an array of lines might be a good choice, etc.
The file should be cached, but make sure the noatime option is set on the mount, otherwise the access time will attempt to be saved to the file, invalidating the cache.
Yes, definitely. It will keep accessed files in memory indefinitely, unless something else needs the memory.
You can control this behaviour (to some extent) with the fadvise system call. See its "man" page for more details.
A read/write system call will still normally need to copy the data, so if you see a real bottleneck doing this, consider using mmap() which can avoid the copy, by mapping the cache pages directly into the process.
I guess putting that file into ramdisk (tmpfs) may make enough advantage without big modifications. Unless you are really serious about response time in microseconds unit.

Resources