Bash PATH length restrictions - linux

Inside a particular quarantine is all of the stuff one needs to run an application (bin, share, lib, etc.). Ideally, the quarantine has no leaks, which means it's not relying on any code outside of itself on the system. A quarantine can be defined as a set of executables (and some environment settings needed to make them run).
I think it will be beneficial to separate the built packages enough such that upgrading to a newer version of the quarantine won't require rebuilding the whole thing. I'll be able to update just a few packages, and then the new quarantine can use some of old parts and some of the new parts.
One issue I'm wondering about is the environment variables I'll be setting up to use a particular quarantines.
Is there a hard limit on how big PATH can be? (either in number of characters, or in the number of directories it contains) Does path length affect performance?

There's a hard limit. It's something like 32MB.
Yes, you can get it long enough to affect performance easily. Number of entries is the primary limiting factor, followed by number of / characters (this should not show itself unless the path depth exceeds some outrageous number like 30).

Related

Is there a way to dynamically determine the vhdSize flag?

I am using the MSIX manager tool to convert a *.msix (an application installer) to a *.vhdx so that it can be mounted in an Azure virtual machine. One of the flags that the tool requires is -vhdSize, which is in megabytes. This has proven to be problematic because I have to guess what the size should be based off the MSIX. I have ran into numerous creation errors due to too small of a vhdSize.
I could set it to an arbitrarily high value in order to get around these failures, but that is not ideal. Alternatively, guessing the correct size is an imprecise science and a chore to do repeatedly.
Is there a way to have the tool dynamically set the vhdSize, or am I stuck guessing a value that is both large enough to accommodate the file, but not too large as to waste disk space? Or, is there a better way to create a *.vhdx file?
https://techcommunity.microsoft.com/t5/windows-virtual-desktop/simplify-msix-image-creation-with-the-msixmgr-tool/m-p/2118585
There is an MSIX Hero app that could select a size for you, it will automatically check how big the uncompressed files are, add an extra buffer for safety (currently double the original size), and round it to the next 10MB. Reference from https://msixhero.net/documentation/creating-vhd-for-msix-app-attach/

about managing file system space

Space Issues in a filesystem on Linux
Lets call it FILESYSTEM1
Normally, space in FILESYSTEM1 is only about 40-50% used
and clients run some reports or run some queries and these reports produce massive files about 4-5GB in size and this instantly fills up FILESYSTEM1.
We have some cleanup scripts in place but they never catch this because it happens in a matter of minutes and the cleanup scripts usually clean data that is more than 5-7 days old.
Another set of scripts are also in place and these report when free space in a filesystem is less than a certain threshold
we thought of possible solutions to detect and act on this proactively.
Increase the FILESYSTEM1 file system to double its size.
set the threshold in the Alert Scripts for this filesystem to alert when 50% full.
This will hopefully give us enough time to catch this and act before the client reports issues due to FILESYSTEM1 being full.
Even though this solution works, does not seem to be the best way to deal with the situation.
Any suggestions / comments / solutions are welcome.
thanks
It sounds like what you've found is that simple threshold-based monitoring doesn't work well for the usage patterns you're dealing with. I'd suggest something that pairs high-frequency sampling (say, once a minute) with a monitoring tool that can do some kind of regression on your data to predict when space will run out.
In addition to knowing when you've already run out of space, you also need to know whether you're about to run out of space. Several tools can do this, or you can write your own. One existing tool is Zabbix, which has predictive trigger functions that can be used to alert when file system usage seems likely to cross a threshold within a certain period of time. This may be useful in reacting to rapid changes that, left unchecked, would fill the file system.

Compare filesystem space usage over time

Is there a good, graphical way to represent disk usage changes in a linux/unix filesystem over time?
Let me elaborate: there are several good ways to represent disk usage in a filesystem. I'm not interested in summary statistics such total space used (as given by du(1)), but more advanced interactive/visualization tools such as ncdu, gdmap, filelight or baobab, that can give me an idea of where the space is being used.
From a technical perspective, I think the best approach is squarified tree-maps (as available in gdmap), since it makes a better use of the visual space available. The circular approach used by filelight for instance cannot represent huge hierarchies efficiently, and it's dubious how to account for the increasing area of the outer rings in the representation from a human perspective. Looks nice, but that's about it.
Treemaps are perfect to have the current snapshot of disk usage in the filesystem, but I'd like to have something similar to see how disk usage has been evolving over time.
My current solution is very simple: I'm dumping the filesystem usage state using "ncdu -o" over time, and then I compare them side-by-side using two ncdu instances. It's very inefficient, but does the job. I'd like something more visual though.
All the relevant information can be dumped using:
find [dir] -printf "%P\t%s\n"
I did a crappy hack to load this state information in gdmap, so I can use two gdmap instances instead. Still not optimal though, as a treemap will fit the total allocated space into the same rectangle. As such, you cannot really tell if the same area is equivalent to more or less space. If two big directories grow proportionately, they will not change the visualization.
I need something better than that. Obviously, I cannot plot the cumulative directory sizes in a simple line plot, as I would have too many directories.
I'd like something similar to a treemap, where maybe the color of the square represents size increase/decrease using some colormap. However, since a treemap will show individual files as opposed to directories, it's not obvious on how to color-map a directory in which the allocated space has been growing/shrinking due to new/removed files.
What kind of visualization techniques could be used to see the evolution of allocated space over time, which take the whole underlying tree into account?
To elaborate even more, in a squarified treemap the whole allocated space is proportionally divided by file size, and each directory logically clusters the allocated space within it. As such, we don't "see" directories, we see the proportional space taken by it's content.
How we could extend and/or improve the visualization in order to see how the allocated space has been moved to a different area of the treemap?
You can usee Cacti for this.
You need to install snmp deamon on you machine and install cacti (freeware) localy or on any other PC and monitor you linux machine.
http://blog.securactive.net/wp-content/uploads/2012/12/cacti_performance_vision1.png
You can monitor network interfaces, spaces of any partitions and lot of other parameters of your LINUX OS.
apt-get install cacti
vim /etc/snmp/snmpd.conf
add this at about 42 line:
view systemonly included .1.3.6.1
close and restart snmpd deamon
go to cacti config and try to discover your linux machine.

Where is the best place to store your Smarty template cache files?

I'm considering either
/tmp
or
/var/cache
or
some folder in your code
I like /temp more, because if it grows too much, the system will usually take care of it, and it's universally writeable so probably more portable code.
But at the other hand I will have to store files in a folder within any of these, so making a folder and checking if it exists has to be done on /tmp, not on /var/cache, since /var/cache is not likely to get removed by linux or any other sort of common software.
What do you think? What is the best practice?
There are many approaches to storing smarty cache and, apparently, no best-case scenario i.e. the matter being more a matter of preference.
I can only say that I have witnessed hundreds of projects where Smarty cache was stored in the project's relative folders (for example /projects/cache/compiled/) for a number of reasons:
Full control of the application's cache
Ability to share the same cache amongst several servers
No need to re-create the cache after the system has tidied the /tmp folder
Moreover, we see compiled templates residing inside memcache more and more each day.

Is it OK (performance-wise) to have hundreds or thousands of files in the same Linux directory?

It's well known that in Windows a directory with too many files will have a terrible performance when you try to open one of them. I have a program that is to execute only in Linux (currently it's on Debian-Lenny, but I don't want to be specific about this distro) and writes many files to the same directory (which acts somewhat as a repository). By "many" I mean tens each day, meaning that after one year I expect to have something like 5000-10000 files. They are meant to be kept (once a file is created, it's never deleted) and it is assumed that the hard disk has the required capacity (if not, it should be upgraded). Those files have a wide range of sizes, from a few KB to tens of MB (but not much more than that). The names are always numeric values, incrementally generated.
I'm worried about long-term performance degradation, so I'd ask:
Is it OK to write all to the same directory? Or should I think about creating a set of subdirectories for every X files?
Should I require a specific filesystem to be used for such directory?
What would be the more robust alternative? Specialized filesystem? Which?
Any other considerations/recomendations?
It depends very much on the file system.
ext2 and ext3 have a hard limit of 32,000 files per directory. This is somewhat more than you are asking about, but close enough that I would not risk it. Also, ext2 and ext3 will perform a linear scan every time you access a file by name in the directory.
ext4 supposedly fixes these problems, but I cannot vouch for it personally.
XFS was designed for this sort of thing from the beginning and will work well even if you put millions of files in the directory.
So if you really need a huge number of files, I would use XFS or maybe ext4.
Note that no file system will make "ls" run fast if you have an enormous number of files (unless you use "ls -f"), since "ls" will read the entire directory and the sort the names. A few tens of thousands is probably not a big deal, but a good design should scale beyond what you think you need at first glance...
For the application you describe, I would probably create a hierarchy instead, since it is hardly any additional coding or mental effort for someone looking at it. Specifically, you can name your first file "00/00/01" instead of "000001".
If you use a filesystem without directory-indexing, then it is a very bad idea to have lots of files in one directory (say, > 5000).
However, if you've got directory indexing (which is enabled by default on more recent distros in ext3), then it's not such a problem.
However, it does break quite a few tools to have many files in one directory (For example, "ls" will stat() all the files, which takes a long time). You can probably easily split it into subdirectories.
But don't overdo it. Don't use many levels of nested subdirectory unnecessarily, this just uses lots of inodes and makes metadata operations slower.
I've seen more cases of "too many levels of nested directories" than I've seen of "too many files per directory".
The best solution I have for you (rather than quoting some values from a micro-filesystem-benchmark) is to test it yourself.
Just use the file system of your choice. Create some random test data for 100, 1000 and 10000 entries. Then, measure the time it takes your system to perform the action you are concerned about time-wise (opening a file, reading 100 random files, etc).
Then, you compare the times and use the best solution (put them all into one directory; put each year into a new directory; put each month of each year into a new directory).
I do not know in detail what you are using, but creating a directory is a one time (and probably quite easy) operation, so why not do it instead of changing filesystems or trying some other more time-consuming stuff?
In addition to the other answers, if the huge directory is managed by a known application or library, you could consider replacing it by something else, e.g:
a GDBM index file; GDBM is a very common library providing indexed file, which associates to an arbitrary key (a sequence of bytes) an arbitrary value (another sequence of byte).
perhaps a table inside a database like MySQL or PostGresQL. Be careful about indexing.
some other way to index data
The advantages of the above approaches include:
space performance for a large collection of small items (less than a kilobyte each). A filesystem need an inode for each item. Indexed systems may have much less granularity
time performance: you don't access the filesystem for every item
scalability: indexed approaches are designed to fit large needs: either a GDBM index file, or a database can handle many millions of items. I'm not sure your directory approach will scale as easily.
The disadvantage of such approach is that they don't show as files. But as MarkR's answer remind you, ls is behaving quite poorly on huge directories.
If you stick to a filesystem approach, many software using large number of files are organizing them in subdirectories like aa/ ab/ ac/ ...ay/ az/ ba/ ... bz/ ...
Is it OK to write all to the same directory? Or should I think about creating a set of subdirectories for every X files?
In my experience the only slow down a directory with many files will give is if you do things such as getting a listing with ls. But that mostly is the fault of ls, there are faster ways of listing the contents of a directory using tools such as echo and find (see below).
Should I require a specific filesystem to be used for such directory?
I don't think so with regards to amount of files in one directory. I am sure some filesystems perform better with many small files in one dir whilst others do a better job on huge files. It's also a matter of personal taste, akin to vi vs. emacs. I prefer to use the XFS filesystem so that'd be my advice. :-)
What would be the more robust alternative? Specialized filesystem? Which?
XFS is definitely robust and fast, I use it in many places, as boot partition, oracle tablespaces, space for source control you name it. It lacks a bit on delete performance, but otherwise it's a safe bet. Plus it supports growing the size whilst it is still mounted (that's a requirement actually). That is you just delete the partition, recreate it at the same starting block and whatever ending block that's larger than the original partition, then you run xfs_growfs on it with the filesystem mounted.
Any other considerations/recomendations?
See above. With the addition that having 5000 to 10000 files in one directory should not be a problem. In practice it doesn't arbitrarily slow down the filesystem as far as I know, except for utilities such as "ls" and "rm". But you could do:
find * | xargs echo
find * | xargs rm
The benefit that a directory tree with files, such as directory "a" for file names starting with an "a" etc., will give you is that of looks, it looks more organised. But then you have less of an overview... So what you're trying to do should be fine. :-)
I neglected to say you could consider using something called "sparse files" http://en.wikipedia.org/wiki/Sparse_file
It is bad for performance to have a huge number of files in one directory. Checking for the existence of a file will typically require an O(n) scan of the directory. Creating a new file will require that same scan with the directory locked to prevent the directory state changing before the new file is created. Some file systems may be smarter about this (using B-trees or whatever), but the fewer ties your implementation has to the filesystem's strengths and weaknesses the better for long term maintenance. Assume someone might decide to run the app on a network filesystem (storage appliance or even cloud storage) someday. Huge directories are a terrible idea when using network storage.

Resources