how to disable writing to /var/log/lastlog

how to disable writing to /var/log/lastlog - linux

how do i disable writing to /var/log/lastlog file. In our system we had to use a vfat file system for /var/log and since it doesn't support sparse files, our lastlog file gets real huge. I'm tried 'session required pam_lastlog.so noupdate' as last rule in /etc/pam.d/login but it doesn't seem to be working.
having 'LASTLOG_ENAB no' in /etc/login.defs isn't also working.
Appreciate any help. Don't get any hits for this on web.

First, the answer: There are a number of ways to do it.
Make the file immutable with chattr. Zero it's size first. Now nobody is allowed to write to it.
ln -s it to /dev/null. Writes will still be allowed, but they go off into never-never land.
Delete it and create a directory with the same name.
Delete it and see if your version of the logging system will recreate it, several of them won't.
Obligatory warnings:
1) Removing lastlog disrupts the audit-trail on your system and may cause various security tools to fail to function properly, or at all.
2) vfat does not support sparse files... It also does not support journaling or individual file ownership. By putting /var/log on vfat you are making it much, much easier for crackers (or even power failures) to potentially alter, damage, or destroy your logfiles. There are lots of better filesystems out there, you should probably find one. But that would be a separate question.

Related

Protect file from system modifications

I am working on a linux computer which is locked down and used in kiosk mode to run only one application. This computer cannot be updated or modified by the user. When the computer crashes or freezes the OS rebuilds or modifies the ld-2.5.so file. This file needs to be locked down without allowing even the slightest change to it (there is an application resident which requires ld-2.5.so to remain unchanged and that is out of my control). Below are the methods I can think of to protect ld-2.5.so but wanted to run it by the experts to see if I am missing anything.
I modified the fstab to mount the EXT3 filesystem as EXT2 to disable journaling. Also set the DUMP and FSCK values to "0" to disable those processes.
Performed a "chattr +i ld-2.5.so" on the file but there are still system processes that can overwrite this protection.
I could attempt to trap the name of the processes which are hitting ld-2.5.so and prevent this.
Any ideas or hints would be greatly appreciated.
-Matt (CentOS 5.0.6)

chattr +i should be fine in most circumstances.
The ld-*.so files are under /usr/lib/ and /usr/lib64/. If /usr/ is a separate partition, you also might want to mount that partition read only on a kiosk system.
Do you have, by any chance, some automated updating/patching of said PC configured? ld-*.so is part of glibc and basically should only change if the glibc package is updated.

Mandatory file lock on linux

On Linux I can dd a file on my hard drive and delete it in Nautilus while the dd is still going on.
Can Linux enforce a mandatory file lock to protect R/W?
[EDIT] The original question wasn't about linux file locking capabilities but about a supposed bug in linux, reproducing it here as it is responded below and others may have the same question.
People keep telling me Linux/Unix is better OS. I am coding Java on Linux now and come across a problem, that I can easily reproduce: I can dd a file on my hard drive and delete it in Nautilus while the dd is still going on. How come linux cannot enforce a mandatory file lock to protect R/W??

To do mandatory locking on Linux, the filesystem must be mounted with the -o mand option, and you must set g-x,g+s permissions on the file. That is, you must disable group execute, and enable setgid. Once this is performed, all access will either block or error with EAGAIN based on the value of O_NONBLOCK on the file descriptor. But beware: "The implementation of mandatory locking in all known versions of Linux is subject to race conditions which render it unreliable... It is therefore inadvisable to rely on mandatory locking." See fcntl(2).

You don't need locking. This is not a bug but a choice, your assumptions are wrong.
The file system uses reference counting and it will mark a file as free only when all hard links to the file are removed and all file descriptors are closed.
This approach allows safe file system operations that Windows, for example, doesn't. Operations like delete, move and rename over files in use without needing locking or breaking anything.
Your dd operation is going to succeed despite the file removal, which will actually be deferred till the dd finishes.
http://en.wikipedia.org/wiki/Reference_counting#Disk_operating_systems
[EDIT] My response doesn't make much sense now as the question was edited by someone else. The original question was about a supposed bug in linux and not about if linux can lock a file:
People keep telling me Linux/Unix is better OS. I am coding Java on Linux now and come across a problem, that I can easily reproduce: I can dd a file on my hard drive and delete it in Nautilus while the dd is still going on. How come linux cannot enforce a mandatory file lock to protect R/W??

Linux and Unix OS's can enforce file locks, but it does not do so by default becuase of its multiuser design. Try reading the manual pages for flock and fcntl. That might get you started.

Is it OK (performance-wise) to have hundreds or thousands of files in the same Linux directory?

It's well known that in Windows a directory with too many files will have a terrible performance when you try to open one of them. I have a program that is to execute only in Linux (currently it's on Debian-Lenny, but I don't want to be specific about this distro) and writes many files to the same directory (which acts somewhat as a repository). By "many" I mean tens each day, meaning that after one year I expect to have something like 5000-10000 files. They are meant to be kept (once a file is created, it's never deleted) and it is assumed that the hard disk has the required capacity (if not, it should be upgraded). Those files have a wide range of sizes, from a few KB to tens of MB (but not much more than that). The names are always numeric values, incrementally generated.
I'm worried about long-term performance degradation, so I'd ask:
Is it OK to write all to the same directory? Or should I think about creating a set of subdirectories for every X files?
Should I require a specific filesystem to be used for such directory?
What would be the more robust alternative? Specialized filesystem? Which?
Any other considerations/recomendations?

It depends very much on the file system.
ext2 and ext3 have a hard limit of 32,000 files per directory. This is somewhat more than you are asking about, but close enough that I would not risk it. Also, ext2 and ext3 will perform a linear scan every time you access a file by name in the directory.
ext4 supposedly fixes these problems, but I cannot vouch for it personally.
XFS was designed for this sort of thing from the beginning and will work well even if you put millions of files in the directory.
So if you really need a huge number of files, I would use XFS or maybe ext4.
Note that no file system will make "ls" run fast if you have an enormous number of files (unless you use "ls -f"), since "ls" will read the entire directory and the sort the names. A few tens of thousands is probably not a big deal, but a good design should scale beyond what you think you need at first glance...
For the application you describe, I would probably create a hierarchy instead, since it is hardly any additional coding or mental effort for someone looking at it. Specifically, you can name your first file "00/00/01" instead of "000001".

If you use a filesystem without directory-indexing, then it is a very bad idea to have lots of files in one directory (say, > 5000).
However, if you've got directory indexing (which is enabled by default on more recent distros in ext3), then it's not such a problem.
However, it does break quite a few tools to have many files in one directory (For example, "ls" will stat() all the files, which takes a long time). You can probably easily split it into subdirectories.
But don't overdo it. Don't use many levels of nested subdirectory unnecessarily, this just uses lots of inodes and makes metadata operations slower.
I've seen more cases of "too many levels of nested directories" than I've seen of "too many files per directory".

The best solution I have for you (rather than quoting some values from a micro-filesystem-benchmark) is to test it yourself.
Just use the file system of your choice. Create some random test data for 100, 1000 and 10000 entries. Then, measure the time it takes your system to perform the action you are concerned about time-wise (opening a file, reading 100 random files, etc).
Then, you compare the times and use the best solution (put them all into one directory; put each year into a new directory; put each month of each year into a new directory).
I do not know in detail what you are using, but creating a directory is a one time (and probably quite easy) operation, so why not do it instead of changing filesystems or trying some other more time-consuming stuff?

In addition to the other answers, if the huge directory is managed by a known application or library, you could consider replacing it by something else, e.g:
a GDBM index file; GDBM is a very common library providing indexed file, which associates to an arbitrary key (a sequence of bytes) an arbitrary value (another sequence of byte).
perhaps a table inside a database like MySQL or PostGresQL. Be careful about indexing.
some other way to index data
The advantages of the above approaches include:
space performance for a large collection of small items (less than a kilobyte each). A filesystem need an inode for each item. Indexed systems may have much less granularity
time performance: you don't access the filesystem for every item
scalability: indexed approaches are designed to fit large needs: either a GDBM index file, or a database can handle many millions of items. I'm not sure your directory approach will scale as easily.
The disadvantage of such approach is that they don't show as files. But as MarkR's answer remind you, ls is behaving quite poorly on huge directories.
If you stick to a filesystem approach, many software using large number of files are organizing them in subdirectories like aa/ ab/ ac/ ...ay/ az/ ba/ ... bz/ ...

Is it OK to write all to the same directory? Or should I think about creating a set of subdirectories for every X files?
In my experience the only slow down a directory with many files will give is if you do things such as getting a listing with ls. But that mostly is the fault of ls, there are faster ways of listing the contents of a directory using tools such as echo and find (see below).
Should I require a specific filesystem to be used for such directory?
I don't think so with regards to amount of files in one directory. I am sure some filesystems perform better with many small files in one dir whilst others do a better job on huge files. It's also a matter of personal taste, akin to vi vs. emacs. I prefer to use the XFS filesystem so that'd be my advice. :-)
What would be the more robust alternative? Specialized filesystem? Which?
XFS is definitely robust and fast, I use it in many places, as boot partition, oracle tablespaces, space for source control you name it. It lacks a bit on delete performance, but otherwise it's a safe bet. Plus it supports growing the size whilst it is still mounted (that's a requirement actually). That is you just delete the partition, recreate it at the same starting block and whatever ending block that's larger than the original partition, then you run xfs_growfs on it with the filesystem mounted.
Any other considerations/recomendations?
See above. With the addition that having 5000 to 10000 files in one directory should not be a problem. In practice it doesn't arbitrarily slow down the filesystem as far as I know, except for utilities such as "ls" and "rm". But you could do:
find * | xargs echo
find * | xargs rm
The benefit that a directory tree with files, such as directory "a" for file names starting with an "a" etc., will give you is that of looks, it looks more organised. But then you have less of an overview... So what you're trying to do should be fine. :-)
I neglected to say you could consider using something called "sparse files" http://en.wikipedia.org/wiki/Sparse_file

It is bad for performance to have a huge number of files in one directory. Checking for the existence of a file will typically require an O(n) scan of the directory. Creating a new file will require that same scan with the directory locked to prevent the directory state changing before the new file is created. Some file systems may be smarter about this (using B-trees or whatever), but the fewer ties your implementation has to the filesystem's strengths and weaknesses the better for long term maintenance. Assume someone might decide to run the app on a network filesystem (storage appliance or even cloud storage) someday. Huge directories are a terrible idea when using network storage.

Best POSIX way to determine if a filesystem is mounted read only

If I have a POSIX system like Linux or Mac OS X, what's the best and most portable way to determine if a path is on a read-only filesystem? I can think of 4 ways off the top of my head:
open(2) a file with O_WRONLY - You would need to come up with a unique filename and also pass in O_CREAT and O_EXCL. If it fails and you have an errno of EROFS then you know it's a read-only filesystem. This would have the annoying side effect of actually creating a file you didn't care about, but you could unlink(2) it immediately after creating it.
statvfs(3) - One of the fields of the returned struct statvfs is f_flag, and one of the flags is ST_RDONLY for a read-only filesystem. However, the spec for statvfs(3) makes it clear that applications cannot depend on any of the fields containing valid information. It would seem there's a decent possibility ST_RDONLY might not be set for a read-only filesystem.
access(2) - If you know the mount point, you can use access(2) with the W_OK flag as long as you are running as a user who would have write access to the mountpoint. Ie, either you are root or it was mounted with your UID as a mount parameter. You would get a return value of -1 and an errno of EROFS.
Parsing /etc/mtab or /proc/mounts - Doesn't seem portable. Mac OS X seems to have neither of these, for example. Even if the system did have /etc/mtab I'm not sure the fields are consistent between OSes or if the mount options for read-only (ro on Linux) are portable.
Are there other ways I'm missing? If you needed to know if a filesystem was mounted read-only, how would you do it?

You could also popen the command mount and examine the output looking for your file system and seeing if it held the text " (ro,".
But again, that's not necessarily portable.
My option would be to not worry about whether the file system was mounted read only at all. Just try and create your file and, if it fails, tell the user what the error was. And, of course, give them the option of saving it somewhere else.
You really have to do that sort of thing anyway since, in any scenario where there's even a small gap between testing and doing, you may find the situation changes (probably not to the extent of making an entire file system read only but, who knows, maybe there is (or will be in the future) a file system that allows this).

utime(path, NULL);
If you have write perms, then that will give you ROFS or -- if permitted -- simply update the mtime on the directory, which is basically harmless.

What happens if there are too many files under a single directory in Linux?

If there are like 1,000,000 individual files (mostly 100k in size) in a single directory, flatly (no other directories and files in them), is there going to be any compromises in efficiency or disadvantages in any other possible ways?

ARG_MAX is going to take issue with that... for instance, rm -rf * (while in the directory) is going to say "too many arguments". Utilities that want to do some kind of globbing (or a shell) will have some functionality break.
If that directory is available to the public (lets say via ftp, or web server) you may encounter additional problems.
The effect on any given file system depends entirely on that file system. How frequently are these files accessed, what is the file system? Remember, Linux (by default) prefers keeping recently accessed files in memory while putting processes into swap, depending on your settings. Is this directory served via http? Is Google going to see and crawl it? If so, you might need to adjust VFS cache pressure and swappiness.
Edit:
ARG_MAX is a system wide limit to how many arguments can be presented to a program's entry point. So, lets take 'rm', and the example "rm -rf *" - the shell is going to turn '*' into a space delimited list of files which in turn becomes the arguments to 'rm'.
The same thing is going to happen with ls, and several other tools. For instance, ls foo* might break if too many files start with 'foo'.
I'd advise (no matter what fs is in use) to break it up into smaller directory chunks, just for that reason alone.

My experience with large directories on ext3 and dir_index enabled:
If you know the name of the file you want to access, there is almost no penalty
If you want to do operations that need to read in the whole directory entry (like a simple ls on that directory) it will take several minutes for the first time. Then the directory will stay in the kernel cache and there will be no penalty anymore
If the number of files gets too high, you run into ARG_MAX et al problems. That basically means that wildcarding (*) does not always work as expected anymore. This is only if you really want to perform an operation on all the files at once
Without dir_index however, you are really screwed :-D

Most distros use Ext3 by default, which can use b-tree indexing for large directories.
Some of distros have this dir_index feature enabled by default in others you'd have to enable it yourself. If you enable it, there's no slowdown even for millions of files.
To see if dir_index feature is activated do (as root):
tune2fs -l /dev/sdaX | grep features
To activate dir_index feature (as root):
tune2fs -O dir_index /dev/sdaX
e2fsck -D /dev/sdaX
Replace /dev/sdaX with partition for which you want to activate it.

When you accidently execute "ls" in that directory, or use tab completion, or want to execute "rm *", you'll be in big trouble. In addition, there may be performance issues depending on your file system.
It's considered good practice to group your files into directories which are named by the first 2 or 3 characters of the filenames, e.g.
aaa/
aaavnj78t93ufjw4390
aaavoj78trewrwrwrwenjk983
aaaz84390842092njk423
...
abc/
abckhr89032423
abcnjjkth29085242nw
...
...

The obvious answer is the folder will be extremely difficult for humans to use long before any technical limit, (time taken to read the output from ls for one, their are dozens of other reasons) Is there a good reason why you can't split into sub folders?

Not every filesystem supports that many files.
On some of them (ext2, ext3, ext4) it's very easy to hit inode limit.

I've got a host with 10M files in a directory. (don't ask)
The filesystem is ext4.
It takes about 5 minutes to
ls
One limitation I've found is that my shell script to read the files (because AWS snapshot restore is a lie and files aren't present till first read) wasn't able to handle the argument list so I needed to do two passes. Firstly construct a file list with find (wholename in case you want to do partial matches)
find /path/to_dir/ -wholename '*.ldb'| tee filenames.txt
then secondly read from a the file containing filenames and read all files. (with limited parallelism)
while read -r line; do
if test "$(jobs | wc -l)" -ge 10; then
wait -n
fi
{
#do something with 10x fanout
} &
done < filenames.txt
Posting here in case anyone finds the specific work-around useful when working with too many files.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string