I am using the following code to create backups of the php variables.
if(file_exists(old_backup.txt))
unlink('old_backup.txt');
copy('new_backup.txt', 'old_backup.txt');
$content = serialize($some_ar);
file_put_contents('new_backup.txt', $content);
new_backup.txt will have current variables dump and old_backup.txt will have variables dump sometime back in time.
dump size is constant, around 300Mb. But every time above code is run, disk usage increases indefinitely. When the php script killed, disk usage is normal.
Not sure where the file handler still open for deleted files.
How do I make above code work, without much increase in disk usage.
Not sure about what exactly is causing the disk usage increase, because you posted only a snippet and not the full script. However there are a few things that are not correct for sure:
if(file_exists(old_backup.txt))
should be
if(file_exists('old_backup.txt'))
Then the mere existence of the file does not mean you can unlink it, you should check permissions too.
That being said, those aren't good reasons to fill the disk, but we need to see where you get the $some_ar variable from to give better advice.
Related
So I have a python3 job that is being run by slurm. The python job uses lots of multiprocessing, generating about 20 or so threads. The code is far from perfect, uses lots of memory, and occasionally reaches some unexpected data and throws an error. That in itself is not a problem, I don't need every one of the 20 process to complete.
The issue is that sometimes something is causing the program to create files named like core.356729, (the number after the dot changes), and these files are massive! Like GB of data. Eventually I end up with so many that I don't have any disk space left and all my jobs are stopped. I can't tell what they are, their contents are not human readable. Google searches for "core files slurm" or "core.number files" are not giving anything relevant.
The quick and dirty solution would be just to add a process that deletes these files as soon as they appear. But I'd rather understand why they are being created first.
Does anyone know what would create a file of the format "core.######"? Is there a name for this type of file? Is there any way to identify which slurm job created the file?
Those are core dump files used for debugging. They're essentially the contents of memory for the process that crashed. You can disable their creation with ulimit -c 0
I have a large (1 GB) file that I want to process with a script. I'm still experimenting with how I want to process this file. So my script keeps changing as I try different things. The problem is, it takes a long time to read the file into memory. Is there a way to read the file into memory once and just keep accessing that memory each time I run my script? That would make my script a lot faster. I switched to using a REPL for now, but curious if this can be done with a script.
You can do this in Linux using ramfs and tmpfs
In Windows, you can use a tool like Imdisk
The idea is you create a disk backed by your RAM.
After creating the disk, copy the file to it - you are essentially writing the file to RAM.
Your script can then read the file from the ramfs/tmpfs/ramdisk.
This should be faster since there will be no disk spinning around,
although this will require at least 1GB of your RAM.
Simple answer: No, there is no such way doing this. :/
Title pretty much nailed it, Im copying files to a flash drive and then doing some things to those files. Well I have noticed that after running the dd command the flash drive is still flashing and not all the files are on the device.
Does anyone know how to maybe run a simple loop (in script) to wait on the dd process to finish? I have been googling for about 2-3 hours now trying to figure it out and its beyond me if its even possible.
Thanks in advance!
Try the sync command:
sync writes any data buffered in memory out to disk. This can
include (but is not limited to) modified superblocks, modified inodes,
and delayed reads and writes. This must be implemented by the kernel;
The sync program does nothing but exercise the sync system call.
The kernel keeps data in memory to avoid doing (relatively slow)
disk reads and writes. This improves performance, but if the computer
crashes, data may be lost or the file system corrupted as a result.
The sync command ensures everything in memory is written to disk.
Most likely you are seeing the operating system caching the writes. If you really want to make sure that everything is written to the flash drive so that it is safe to remove, it needs to be unmounted.
My problem is that application takes too long to load thousands of files. Yes, I know it's going to take a long time, but I would like to make it faster by any amount of time. What I mean by "load" is open the file to get its descriptor and then read the first 100 bytes or so of it.
So, my main strategy has been to create a second thread that will open and close (without reading any contents) all the files. This seems to help because the thread runs ahead of the main thread and I'm guessing the OS is caching these file descriptors ahead of time so that when my main thread opens them it's a quick open. This has actually helped because the thread can start caching these file descriptors while my main thread is parsing the data read in from these files.
So my real question is...what else can I do to make this faster? What approaches are there? Has anyone had success doing this?
I've heard of OS prefetching calls but it was for virtual memory pages. Is there a way to tell the OS, hey I'm going to be needed all these files pretty soon - I suggest that you start gathering them for me ahead of time. My lookahead thread is pretty crude.
Are there low level disk techniques I could use? Is there possibly a pattern of file access that would help? Right now, the files that are loaded all come from the same folder. I suppose there is no way to determine where exactly on disk they lie and which ordering of file opens would be fastest for the disk. I'm also guessing that the disk has some hard ware to make this as efficient as possible too.
My application is mainly for windows, but unix suggestions would help as well.
I am programming in C++ if that makes a difference.
Thanks,
-julian
My first thought is that this is going to be hard to work around from a programmatic level.
You'll find Linux and OSX can access thousands of files like this in a fraction of the time it takes Windows. I don't know how much control you have over the machine. If you can keep the thousands of files on a FAT partition, you should see better results than with NTFS.
How often are you scanning these files and how often are they changing. If the ratio is heavily on the reading side, it would make sense to copy the start of each file into a cache. The cache could store the filename, modification time, and 100 bytes of each of the thousand files.
I'm creating a web application running on a Linux server. The application is constantly accessing a 250K file - it loads it in memory, reads it and sends back some info to the user. Since this file is read all the time, my client is suggesting to use something like memcache to cache it to memory, presumably because it will make read operations faster.
However, I'm thinking that the Linux filesystem is probably already caching the file in memory since it's accessed frequently. Is that right? In your opinion, would memcache provide a real improvement? Or is it going to do the same thing that Linux is already doing?
I'm not really familiar with neither Linux nor memcache, so I would really appreciate if someone could clarify this.
Yes, if you do not modify the file each time you open it.
Linux will hold the file's information in copy-on-write pages in memory, and "loading" the file into memory should be very fast (page table swap at worst).
Edit: Though, as cdhowie points out, there is no 'linux filesystem'. However, I believe the relevant code is in linux's memory management, and is therefore independent of the filesystem in question. If you're curious, you can read in the linux source about handling vm_area_struct objects in linux/mm/mmap.c, mainly.
As people have mentioned, mmap is a good solution here.
But, one 250k file is very small. You might want to read it in and put it in some sort of memory structure that matches what you want to send back to the user on startup. Ie, if it is a text file an array of lines might be a good choice, etc.
The file should be cached, but make sure the noatime option is set on the mount, otherwise the access time will attempt to be saved to the file, invalidating the cache.
Yes, definitely. It will keep accessed files in memory indefinitely, unless something else needs the memory.
You can control this behaviour (to some extent) with the fadvise system call. See its "man" page for more details.
A read/write system call will still normally need to copy the data, so if you see a real bottleneck doing this, consider using mmap() which can avoid the copy, by mapping the cache pages directly into the process.
I guess putting that file into ramdisk (tmpfs) may make enough advantage without big modifications. Unless you are really serious about response time in microseconds unit.