Release disk space used by cgi.FieldStorage temp files - linux

I am writing a pyramid application that accepts many large file uploads (as a POST). Similar to How can I serve temporary files from Python Pyramid, I'm having a problem where the the temp files created by cgi.FieldStorage are orphaned, consuming GB's of disk space. lsof indicates that my wsgi process has deleted files from /tmp but the files haven't been closed. Restarting the application clears the orphans.
How can I cause these files to be closed so that the disk space is returned to the OS?

This problem I encountered was unrelated to cgi.FieldStorage, pyramid actually uses WebOb for serializing data.
The cause of the high disk space usage was pyramid_debugtoolbar. The debugger states in it's documentation that it maintains the data from the previous 100 requests, which took up a great amount of memory and disk space in my case. Removing the include for the debugger from __init__.py and restarting the server resolved the problem.

Related

Python3 pathlib's Path.glob() generator keeps increasing memory usage when performed on large file structure

I used pathlib's Path(<path>).glob() function for walking through file directories and grabbing their files' name and extension parameters. My Python script is meant to run on a large file system, so I tested it on my root directory of my Linux machine. When left for a few hours I noticed that my machine's memory usage increased by over a GB.
After using memray and memory_profiler, I found that whenever I looped through directory items using the generator the memory usage kept climbing.
Here's the problematic code (path is the path to the root directory):
dir_items = Path(path).glob("**/*")
for item in dir_items:
pass
Since I was using a generator, my expectation was that my memory requirements would remain constant throughout. I think I might have some fundamental misunderstanding. Can anyone explain where I've gone wrong?

SYSTEM ERROR: I/O error 0 in writeto, ret 2048, file 56(/mfgtmp/tmp/srtE5yybD), addr 77010944. (290) - PROGRESS 4GL

I am getting below error suddenly when my progress program was executed and running for more than 80 minutes. I think this is OS error and error 0 says its for out of disk space. I checked the disk space as it shows 14 GB available but I am not sure why I am getting this error.
Is it because of on a write out of disk space(exceeding 14 GB) and stopped ? so that available 14 GB kept same as it is?
SYSTEM ERROR: I/O error 0 in writeto, ret 2048, file 56(/mfgtmp/tmp/srtE5yybD), addr 77010944. (290)
By default temp files are created "unlinked". Because of this the space they were using is automatically reclaimed by the OS if the session crashes so you will often have a situation where your temp file ran out of space, the session crashed, and then when you investigate there is plenty of free space.
You can change the default behavior by using the -t (lower case) startup parameter. This will result in the files not being removed if a session crashes - so the space will not be returned to the OS. You will have to manually delete "stale" files if you enable -t.
On UNIX -t will also make the files visible in the -T (upper case) directory so that you can see their growth in real time.
On Windows the files are always visible and the current length is not consistently reported by system tools.
If your temp files are being written to a different filesystem than your working directory (the -T startup parameter is where temp files go) then you should have a "protrace.pid" file corresponding to the crashed session's process id and the timestamp of the crash. This will then lead you to the 4gl code that was creating the very large srt file.
14GB is far beyond "reasonable" so you really should look at that code and see if there is a better way to do whatever it is doing.
There are a number of k-base articles on that issue, for instance: https://knowledgebase.progress.com/articles/Knowledge/000027351
When you check disk space, please make sure you're checking the correct file system (/mfgtmp in this case).
The error messages references an srt file - so you might want to try to use srt file less heavy, see this article for some initial help: https://knowledgebase.progress.com/articles/Knowledge/P95930
Or: https://knowledgebase.progress.com/articles/Knowledge/P84475

How to flush SQLite3 database changes to disk?

My application is running on a portable Debian (5 and 8) computer. This computer may lose power at unpredictable times. The application is frequently updating a specific SQLite3 database, and flushing to disk immediately, using a sync() command. This is done to avoid corruption of the database, which would happen in the power disappears before the changes are fully written to disk.
This has been working nicely, but now the problem is that the sync() command flushes ALL buffered changes to disk, for all open files. This causes a slowdown in other parts of the system. One possible solution is to only flush critical file changes, such as this specific database file. But the question is; how can I do that? I have no access to file descriptors, and I can't find any SQLite3 functions that does this for me. Any ideas?
you can use file specific syncing. fsync() will be useful for this.
see https://www.sqlite.org/c3ref/db_cacheflush.html

centos free space on disk not updating

I am new to the linux and working with centos system ,
By running command df -H it is showing 82% if full, that is only 15GB is free.
I want some more extra spaces, so using WINSCP i hav done shift deleted the 15G record.
and execured df -H once again, but still it is showing 15 GB free. but the free size of the deleted
file where it goes.
Plese help me out in finding solution to this
In most unix filesystems, if a file is open, the OS will delete the file right way, but will not release space until the file is closed. Why? Because the file is still visible for the user that opened it.
On the other side, Windows used to complain that it can't delete a file because it is in use, seems that in later incarnations explorer will pretend to delete the file.
Some applications are famous for bad behavior related to this fact. For example, I have to deal with some versions of MySQL that will not properly close some files, over the time I can find several GB of space wasted in /tmp.
You can use the lsof command to list open files (man lsof). If the problem is related to open files, and you can afford a reboot, most likely it is the easiest way to fix the problem.

how does kernel handle new file creation

I wish to understand the way kernel works when a user/app tries to create a file in a directorty.
The background - We have a java applicaiton which consumes messages over JMS, processes it and then writes the XML to an outbound queue+a local directory. Yesterday we obeserved unsual delays in writing to the directory. On 'ls|wc -l' we found >300,000 files in there. Did a quick strace on the process and found it full of mutex calls (More than 3/4 calls in the strace were mutex).
So i thought that new file creation is taking time becasue the system has to every time check certain things (e.g name of files to make sure that the new file with a specific name can be created) amongst 300,000 files and then create a file.
I cleared the directory and the applicaiton resumed to normal service levels.
My questions
Was my analysis correct (It seems cuz the app started working fine after a clear down)?
More imporatant, how does the kernel work when you try to creat a new file in directory.
Can the abnormal number of mutex calls be attributed to the high number of files in the directory?
Many thanks
J
Please read about the Linux Filesystem, i-nodes and d-nodes.
http://en.wikipedia.org/wiki/Inode_pointer_structure
The file system is organized into fixed-sized blocks. If your directory is relatively small, it fits in the direct blocks and things are fast. If your directory is not too big, it fits in the direct blocks and some indirect blocks, and is still reasonably fast. If your directory becomes too big, it spills into double indirect blocks and becomes slow.
Actual sizes depend on file system and kernel configuration.
Rule of thumb is to keep the directory under 12 blocks, depending on your block size. Many systems use 8K blocks; a fast directory is under 98,304 bytes.
A file entry is something like 16*4 bytes in size (IIRC), so plan on no more than 1500 files per directory as a practical upper limit.
Directories with large numbers of entries are often slow - how slow depends on the underlying filesystem.
The common solution is to create a hierarchy of directories, so each dir only has a few hundred entries.
Mutex system calls are a result of the application (probably something in the JVM or the Java libraries) making mutex calls.
Synchronisation internal to the kernel you will not see via strace, as this only examines system calls themselves.
A directory with lots of files should not become inefficient if you are using a filesystem which uses directory indexes; most now do (ext3 does optionally but it's normally enabled nowadays).
Non-indexed directories (like those used on the bad old filesystems - ext2, vfat etc) get really bad with lots of files, and you'll see the "open" system call taking a lot longer.

Resources