about managing file system space

about managing file system space - linux

Space Issues in a filesystem on Linux
Lets call it FILESYSTEM1
Normally, space in FILESYSTEM1 is only about 40-50% used
and clients run some reports or run some queries and these reports produce massive files about 4-5GB in size and this instantly fills up FILESYSTEM1.
We have some cleanup scripts in place but they never catch this because it happens in a matter of minutes and the cleanup scripts usually clean data that is more than 5-7 days old.
Another set of scripts are also in place and these report when free space in a filesystem is less than a certain threshold
we thought of possible solutions to detect and act on this proactively.
Increase the FILESYSTEM1 file system to double its size.
set the threshold in the Alert Scripts for this filesystem to alert when 50% full.
This will hopefully give us enough time to catch this and act before the client reports issues due to FILESYSTEM1 being full.
Even though this solution works, does not seem to be the best way to deal with the situation.
Any suggestions / comments / solutions are welcome.
thanks

It sounds like what you've found is that simple threshold-based monitoring doesn't work well for the usage patterns you're dealing with. I'd suggest something that pairs high-frequency sampling (say, once a minute) with a monitoring tool that can do some kind of regression on your data to predict when space will run out.
In addition to knowing when you've already run out of space, you also need to know whether you're about to run out of space. Several tools can do this, or you can write your own. One existing tool is Zabbix, which has predictive trigger functions that can be used to alert when file system usage seems likely to cross a threshold within a certain period of time. This may be useful in reacting to rapid changes that, left unchecked, would fill the file system.

Related

Netlogo 5.1 (and 5.05) Behavior Space Memory Leak

I have posted on this before, but thought I had tracked it down to the NW extension, however, memory leakage still occurs in the latest version. I found this thread, which discusses a similar issues, but attributes it to Behavior Space:
http://netlogo-users.18673.x6.nabble.com/Behaviorspace-Memory-Leak-td5003468.html
I have found the same symptoms. My model starts out at around 650mb, but over each run the private working set memory rises, to the point where it hits the 1024 limit. I have sufficient memory to raise this, but in reality it will only delay the onset. I am using the table output, as based on previous discussions this helps, and it does, but it only slows the rate of increase. However, eventually the memory usage rises to a point where the PC starts to struggle. I am clearing all data between runs so there should be no hangover. I noticed in the highlighted thread that they were going to run headless. I will try this, but I wondered if anyone else had noticed the issue? My other option is to break the BehSpc simulation into a few batches so the issues never arises, bit i would be nice to let the model run and walk away as it takes around 2 hours to go through.

Some possible next steps:
1) Isolate the exact conditions under which the problem does or not occur. Can you make it happen without involving the nw extension, or not? Does it still happen if you remove some of the code from your model? What if you keep removing code — when does the problem go away? What is the smallest code that still causes the problem? Almost any bug can be demonstrated with only a small amount of code — and finding that smallest demonstration is exactly what is needed in order to track down the cause and fix it.
2) Use standard memory profiling tools for the JVM to see what kind of objects are using the memory. This might provide some clues to possible causes.
In general, we are not receiving other bug reports from users along these lines. It's routine, and has been for many years now, for people to use BehaviorSpace (both headless and not) and do experiments that last for hours or even for days. So whatever it is you're experiencing almost certainly has a more specific cause -- mostly likely, in the nw extension -- that could be isolated.

Disk failure detection perl script

I need to write a script to check the disk every minute and report if it is failing by any reason. The error could be the absolute disk failure and a bad sector and so on .
First, I wonder if there is any script out there that does the same as it should be a standard procedure (because I really do not want to reinvent the wheel).
Second, I wonder if I want to look for errors in /var/log/messages, is there any list of standard error strings for disks that I can use?
I look for that on the net a lot, there are lots of info and at the same time no info about that.
Any help will be much appreciated.
Thanks,

You could simply parse the output of dmesg which usually reports fairly detailed information about drive errors, well that's how I've collected stats on failing drives before.
You might get better more well documented information by using Parse::Syslog or lower level kernel reporting directly though.

Logwatch does the /var/log/messages part of the ordeal (as well as any other logfiles that you choose to add). You can either choose to use that, or to use its code to roll your own sollution (it's all written in perl).
If your harddrives support SMART, i suggest you use smartctl output for diagnostics as it includes a lot of nice info that can be monitored over time to detect failure.

Buffering on top of VFS

the problem I try to deal with it is the saving of big number (millions) of small files (up to 50KB), which are sent via network. The saving is done sequential: server receives a file or a dir (via network), it saves it on disk; the next one arrives, it's saved etc.
Apparently, the performance is not acceptable, if multiple server processes coexist (let's say I have 5 processes which all read from network and write at the same time), because the I/O scheduler doesn't manage to merge efficiently the I/O writes.
A suggested solution is to implement some sort of buffering: each server process should have a 50MB cache, in which it should write the current file, do a chdir etc; when the buffer is full, it should be synced to disk, therefore obtaining an I/O burst.
My questions to you:
1) I know that already exists a buffer mechanism (disk buffer); do you think that the above scenario is going to add some improvement? (the design is much more complicated and it's not easy to implement a simple test case)
2) do you have any suggestions, where to look if I would implement this?
Many thanks.

You're going to need to do better than
"apparently the performance is not acceptable".
Specifically
How are you measuring it? Do you have an exact, reproducible figure
What is your target?
In order to do optimisation, you need two things- a method of measuring it (a metric) and a target (so you know when to stop, or how useful or useless a particular technique is).
Without either, you're sunk, I'm afraid.

How important are those writes? I have three suggestions (which can be combined), but one of them is a lot of work, and one of them is less safe...
Journaling
I'm guessing you're seeing some poor performance due in part to the journaling common to most modern Linux filesystems. The journaling causes barriers to be inserted into the IO queue when file metadata is written. You can try turning down the safety (and maybe turning up the speed) with mount(8) options barrier=0 and data=writeback.
But if there is a crash, the journal might not be able to prevent a lengthy fsck(8). And there's a chance the fsck(8) will wind up throwing away your data when fixing the problem. On the one hand, it's not a step to take lightly, on the other hand, back in the old days, we ran our ext2 filesystems in async mode without a journal both ways in the snow and we liked it.
IO Scheduler elevator
Another possibility is to swap the IO elevator; see Documentation/block/switching-sched.txt in the Linux kernel source tree. The short version is that deadline, noop, as, and cfq are available. cfq is the kernel default, and probably what your system is using. You can check:
$ cat /sys/block/sda/queue/scheduler
noop deadline [cfq]
The most important parts from the file:
As of the Linux 2.6.10 kernel, it is now possible to change the
IO scheduler for a given block device on the fly (thus making it possible,
for instance, to set the CFQ scheduler for the system default, but
set a specific device to use the deadline or noop schedulers - which
can improve that device's throughput).
To set a specific scheduler, simply do this:
echo SCHEDNAME > /sys/block/DEV/queue/scheduler
where SCHEDNAME is the name of a defined IO scheduler, and DEV is the
device name (hda, hdb, sga, or whatever you happen to have).
The list of defined schedulers can be found by simply doing
a "cat /sys/block/DEV/queue/scheduler" - the list of valid names
will be displayed, with the currently selected scheduler in brackets:
# cat /sys/block/hda/queue/scheduler
noop deadline [cfq]
# echo deadline > /sys/block/hda/queue/scheduler
# cat /sys/block/hda/queue/scheduler
noop [deadline] cfq
Changing the scheduler might be worthwhile, but depending upon the barriers inserted into the queue by the journaling requirements, there might not be much reordering possible. Still, it is less likely to lose your data, so it might be the first step.
Application changes
Another possibility is to drastically change your application to bundle files itself, and write fewer, larger, files to disk. I know it sounds strange, but (a) the iD development team packaged their maps, textures, objects, etc., into giant zip files that they would read into the program with a few system calls, unpack, and run with, because they found the performance much better than reading a few hundred or few thousand smaller files. Load times between levels was drastically shorter. (b) The Gnome desktop team and KDE desktop teams took different approaches to loading their icons and resource files: the KDE team packages their many small files into larger packages of some sort, and the Gnome team did not. The Gnome team had longer startup delays and were hoping the kernel could make some efforts to improve their startup time. The kernel team kept suggesting the fewer, larger, files approach.

Creating/renaming a file, syncing it, having lots of files in a directory and having lots of files (with tail waste) are some of the slow operations in your scenario. However to avoid them it would only help to write lesser files (for example writing out archives, concatenated file or similiar). I would actually try a (limited) parallel async or sync approach. The IO scheduler and caches are typically quite good.

How might one go about implementing a disk fragmenter?

I have a few ideas I would like to try out in the Disk Defragmentation Arena. I came to the conclusion that as a precursor to the implementation, it would be useful, to be able to put a disk into a state where it was fragmented. This seems to me to be a state that is more difficult to achieve than a defragmented one. I would assume that the commercial defragmenter companies probably have solved this issue.
So my question.....
How might one go about implementing a fragmenter? What makes sense in the context that it would be used, to test a defragmenter?

Maybe instead of fragmenting the actual disk, you should really test your defragmentation algorithm on a simulation/mock disk? Only once you're satisfied the algorithm itself works as specified, you could do the testing on actual disks using the actual disk API.
You could even take snapshots of actual fragmented disks (yours or of someone you know) and use this data as a mock model for testing.

How you can best fragement depends on the file system.
In general, concurrently open a large number of files. Opening a file will create a new directory entry but won't cause a block to be written for that file. But now go through each file in turn, writing one block. This typically will cause the next free block to be consumed, which will lead to all your files being fragmented with regard to each other.
Fragmenting existing files is another matter. Basically, do the same, but do it on a file copy of existing files, doing a delete of the original and rename of copy.

I may be oversimplifying here but if you artificially fragment the disk won't any tests you run will be only true for the fragmentation created by your fragmenter rather than any real world fragmentation. You may end up optimising for assumptions in the fragmenter tool that don't represent real world occurrences.
Wouldn't it be easier and more accurate to take some disk images of fragmented disks? Do you have any friends or colleagues who trust you not to do anything anti-social with their data?

Fragmentation is a mathematical problem such that you are trying to maximize the distance the head of the hard drive is traveling while performing a specific operation. So in order to effectively fragment something you need to define the specific operation first

Using "top" in Linux as semi-permanent instrumentation

I'm trying to find the best way to use 'top' as semi-permanent instrumentation in the development of a box running embedded Linux. (The instrumentation will be removed from the final-test and production releases.)
My first pass is to simply add this to init.d:
top -b -d 15 >/tmp/toploop.out &
This runs top in "batch" mode every 15 seconds. Let's assume that /tmp has plenty of space…
Questions:
Is 15 seconds a good value to choose for general-purpose monitoring?
Other than disk space, how seriously is this perturbing the state of the system?
What other (perhaps better) tools could be used like this?

Look at collectd. It's a very light weight system monitoring framework coded for performance.

We use sysstat to monitor things like this.

You might find that vmstat and iostat with a delay and no repeat counter is a better option.

I suspect 15 seconds would be more than adequate unless you actually want to watch what's happening in real time, but that doesn't appear to be the case here.
As far as load, on an idling PIII 900Mhz w/ 768MB of RAM running Ubuntu (not sure which version, but not more than a year old) I have top updating every 0.5 seconds and it's about 2% CPU utilization. At 15s updates, I'm seeing 0.1% CPU utilization.
depending upon what exactly you want, you could use the output of uptime, free, and ps to get most, if not all, of top's information.

If you are looking for overall load, uptime is probably sufficient. However, if you want specific information about processes, you are adventurous, and have the /proc filessystem enabled, you may want to write your own tools. The primary benefit in this environment is that you can focus on exactly what you want and minimize the load introduced to the system.
The proc file system gives your application read access to the kernel memory that keeps track of many of the interesting variables. Reading from /proc is one of the lightest ways to get this information. Additionally, you may be able to get more information than provided by top. I've done this in the past to get amount of time spent in user and system by this process. Additionally, you can use this to get information about the number of file descriptors open by the process. You might also use this to get detailed information about how the network system is working.
Much of this information is pre-processed by other applications which can be used if you get the information you need. However, it is rather straight-forward to read the raw information. Do a man proc for more information.

Pity you haven't said what you are monitoring for.
You should decide whether 15 seconds is ok or not. Feel free to drop it way lower if you wish (and have a fast HDD)
No worries unless you are running a soft real-time system
Have a look at tools suggested in other answers. I'll add another sugestion: "iotop", for answering a "who is thrashing the HDD" questions.

At work for system monitoring during stress tests we use a tool called nmon.
What I love about nmon is it has the ability to export to XLS and generate beautiful graphs for you.
It generates statistics for:
Memory Usage
CPU Usage
Network Usage
Disk I/O
Good luck :)

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string