I'm trying to determine whether it's possible to distinguish between two separate handles on the same file, and a single handle with two file descriptors pointing to it, using metadata from procfs.
Case 1: Two File Handles
# setup
exec 3>test.lck
exec 4>test.lck
# usage
flock -x 3 # this grabs an exclusive lock
flock -s 4 # this blocks
echo "This code is never reached"
Case 2: One Handle, Two FDs
# setup
exec 3>test.lck
exec 4>&3
# usage
flock -x 3 # this grabs an exclusive lock
flock -s 4 # this converts that lock to a shared lock
echo "This code gets run"
If I'm inspecting a system's state from userland after the "setup" stage has finished and before the "usage", and I want to distinguish between those two cases, is the necessary metadata available? If not, what's the best way to expose it? (Is adding kernelspace pointers to /proc/*/fdinfo a reasonable action, which upstream is likely to accept as a patch?)
I'm unaware of anything exposing this in proc as it is. Figuring this out may be useful when debugging some crap, but then you can just inspect the state with the kernel debugger or a systemtap script.
From your question it seems you want to achieve this in a manner which can be easily scripted and here I have to ask what is the real problem.
I have no idea if linux folks would be interested in exposing this. One problem is that exposing a pointer to file adds another infoleak and thus would be likely plugged in the future. Other means would require numbering all file objects and that's not going to happen. Regardless, you would be asked for a justification in a similar way I asked you above.
Related
Is there a way to make a bash script process messages that have been sent to it using the "write" command? So for example, if a user wants to activate a feature in my script, could I make it so that they can send the script a command using the write command?
One possible method I thought of was to configure logging for a screen session and then have the bash script parse text through there, but I'm not sure if there would be a simpler or more efficient way to tackle this
EDIT: I was thinking as an alternative solution I could use a named pipe. I'm worried that it would break though if the tmp partition gets filled up completely (not sure if this would impact write as well?). I'm going to be running this script on a shared box, and every once in a while someone will completely fill up the /tmp partition and then just leave it like that until people start complaining
Hmm, you are trying to really circumvent a poor unix command to ask it something it was not specified for. From the man page (emphasize mine):
The write utility allows you to communicate with other users, by copying
lines from your terminal to theirs
That means that write is intended to copy line directly on terminals. As soon as you say, I will dump terminal output with screen, and then parse the dump file, you loose the simplicity of write (and also need disk space, with the problem of removing old lines from a sequencial file)
Worse, as your script lives on its own, it could (should?) be a daemon script attached to no terminal
So if I have correctly understood your question, your requirements are:
a script that does some tasks and should be able to respond to asynchronous requests - common usages are named pipes or network or unix domain sockets, less common are files in a dedicated folder with a optional signal to have immediate processing, adding lines to a sequential file while being possible is uncommon, because of a synchonization of access problem
a simple and convivial way for users to pass requests. Ok write is nice for that part, but much too hard to interface IMHO
If you do not want to waste time on that part by using standard tools, I would recommend the mail system. It is trivial to alias a mail address to a program that will be called with the mail message as input. But I am not sure it is worth it, because the user could directly call the program with the request as input or command line parameter.
So the client part could be simply a program that:
create a temporary file in a dedicated folder (mkstemp is your friend in C or C++, or mktemp in shell - but beware of race conditions)
write the request to that file
optionaly send a signal to a pid - provided the script write its own PID on startup to a dedicated file
What is the most straightforward way to create a "virtual" file in Linux, that would allow the read operation on it, always returning the output of some particular command (run everytime the file is being read from)? So, every read operation would cause an execution of a command, catching its output and passing it as a "content" of the file.
There is no way to create such so called "virtual file". On the other hand, you would be
able to achieve this behaviour by implementing simple synthetic filesystem in userspace via FUSE. Moreover you don't have to use c, there
are bindings even for scripting languages such as python.
Edit: And chances are that something like this already exists: see for example scriptfs.
This is a great answer I copied below.
Basically, named pipes let you do this in scripting, and Fuse let's you do it easily in Python.
You may be looking for a named pipe.
mkfifo f
{
echo 'V cebqhpr bhgchg.'
sleep 2
echo 'Urer vf zber bhgchg.'
} >f
rot13 < f
Writing to the pipe doesn't start the listening program. If you want to process input in a loop, you need to keep a listening program running.
while true; do rot13 <f >decoded-output-$(date +%s.%N); done
Note that all data written to the pipe is merged, even if there are multiple processes writing. If multiple processes are reading, only one gets the data. So a pipe may not be suitable for concurrent situations.
A named socket can handle concurrent connections, but this is beyond the capabilities for basic shell scripts.
At the most complex end of the scale are custom filesystems, which lets you design and mount a filesystem where each open, write, etc., triggers a function in a program. The minimum investment is tens of lines of nontrivial coding, for example in Python. If you only want to execute commands when reading files, you can use scriptfs or fuseflt.
No one mentioned this but if you can choose the path to the file you can use the standard input /dev/stdin.
Everytime the cat program runs, it ends up reading the output of the program writing to the pipe which is simply echo my input here:
for i in 1 2 3; do
echo my input | cat /dev/stdin
done
outputs:
my input
my input
my input
I'm afraid this is not easily possible. When a process reads from a file, it uses system calls like open, fstat, read. You would need to intercept these calls and output something different from what they would return. This would require writing some sort of kernel module, and even then it may turn out to be impossible.
However, if you simply need to trigger something whenever a certain file is accessed, you could play with inotifywait:
#!/bin/bash
while inotifywait -qq -e access /path/to/file; do
echo "$(date +%s)" >> /tmp/access.txt
done
Run this as a background process, and you will get an entry in /tmp/access.txt each time your file is being read.
I want to create a temporary file on linux while making sure that the file will disappear after my program has terminated, even if it got killed or someone performs a hard reboot in the wrong moment. Does tmpfile() handle all this for me?
You seem pre-occupied with the idea that files might get left behind some how because of some race condition, I don't see an explanation of why this is a concern.
"A race condition occurs when a program doesn't work as it's supposed to because of an unexpected ordering of events that produces contention over the same resource."
I was assuming that from your comments on other answers your concern was specifically on a dead-lock which is a result of trying to remediate a race-condition ( contention of the shared resource ). It is still not clear what your concern is, calling tmpfile() and having the program exit abnormally before that function gets to call unlink() is the least of your worries if your application is really that fragile.
Given that there isn't any mention of concurrency, threading or other processes sharing this file descriptor to this temp file, I still don't see the possibility for a race condition, maybe the concept of an incomplete logical transaction, but that can be detected and cleaned up.
The correct way to make absolutely sure that any allocated file system resources are cleaned up is not solely on exit of an application but also also on start-up. All my server code, makes sure that everything is cleaned up from a previous run before it starts and makes itself available.
Put your temp files in a sub-dir in /tmp make sure your application cleans this sub-dir on startup and normal shutdown. You can wrap your app start up with a shell script that detects abnormal ( kill -9 ) shutdown based on PID existence and also does clean up activities.
If you don't want to use tmpfile(), you can unlink() your file immediately after creating it. It will stay open and present and allocated until it is closed.
But on a hard reboot, a fsck might be needed in order to recover the space. But as this is always the case, it is no special drawback of this approach.
according to tmpfile() man page:
The file will be automatically deleted when it is closed or the
program terminates.
I have not tested, but it seems it should do what you want.
Moreover:
The default location, if TMPDIR is not set, is /tmp.
Then, when a reboot is produced, /tmp will be empty.
EDIT: Yes
I checked the tmpfile source, and it does indeed use glglgl trick, and instantly unlocks the file.
Original:
I would say no. Got killed should work, but I would assume that it can happen, that after a hard reboot (e.g. due to power outtake) the file is still there. But that depends on your Linux distribution and the used settings.
If the temp file is created in a ramdisk, it is gone (there are unix distris out there that e.g. use a ram based tmpfs for temporary files).
Or if you use an environment that has certain policy regarding tmp, it could be also gone (maybe not instant, but often there are policies, like e.g. remove all files in /tmp that are not accessed within one month), but it could be also on a standard file system where such rules are not enforced. In this case the file would stay.
The customary approach is to set up a signal handler to clean up if the program is interrupted. This will not handle kill -9 or a physical reboot, which can't be trapped. Create temporary files in /tmp, which is normally cleaned out when the system boots. All that remains then is to teach people not to use kill -9 when they don't need to, but that appears to be an uphill battle.
In linux, mktemp command works.
I wanted to quickly implement some sort of locking in perl program on linux, which would be shareable between different processes.
So I used mkdir as an atomic operation, which returns 1 if the directory doesn't exist and 0 if it does. I remove the directory right after the critical section.
Now, it was pointed to me that it's not a good practice in general (independently on the language). I think it's quite OK, but I would like to ask your opinion.
edit:
to show an example, my code looked something like this:
while (!mkdir "lock_dir") {wait some time}
critical section
rmdir "lock_dir"
IMHO this is a very bad practice. What if the perl script which created the lock directory somehow got killed during the critical section? Another perl script waiting for the lock dir to be removed will wait forever, because it won't get removed by the script which originally created it.
To use safe locking, use flock() on a lock file (see perldoc -f flock).
This is fine until an unexpected failure (e.g. program crash, power failure) happens while the directory exists.
After this, the program will never run because the lock is locked forever (assuming the directory is on a persistent filesystem).
Normally I'd use flock with LOCK_EXCL instead.
Open a file for reading+writing, creating it if it doesn't exist. Then take the exclusive lock, if that fails (if you use LOCK_NB) then some other process has it locked.
After you've got the lock, you need to keep the file open.
The advantage of this approach is, if the process dies unexpected (for example, crash, is killed or the machine fails), the lock is automatically released.
I'm trying to write a program that automatically sets process priorities based on a configuration file (basically path - priority pairs).
I thought the best solution would be a kernel module that replaces the execve() system call. Too bad, the system call table isn't exported in kernel versions > 2.6.0, so it's not possible to replace system calls without really ugly hacks.
I do not want to do the following:
-Replace binaries with shell scripts, that start and renice the binaries.
-Patch/recompile my stock Ubuntu kernel
-Do ugly hacks like reading kernel executable memory and guessing the syscall table location
-Polling of running processes
I really want to be:
-Able to control the priority of any process based on it's executable path, and a configuration file. Rules apply to any user.
Does anyone of you have any ideas on how to complete this task?
If you've settled for a polling solution, most of the features you want to implement already exist in the Automatic Nice Daemon. You can configure nice levels for processes based on process name, user and group. It's even possible to adjust process priorities dynamically based on how much CPU time it has used so far.
Sometimes polling is a necessity, and even more optimal in the end -- believe it or not. It depends on a lot of variables.
If the polling overhead is low-enough, it far exceeds the added complexity, cost, and RISK of developing your own style kernel hooks to get notified of the changes you need. That said, when hooks or notification events are available, or can be easily injected, they should certainly be used if the situation calls.
This is classic programmer 'perfection' thinking. As engineers, we strive for perfection. This is the real world though and sometimes compromises must be made. Ironically, the more perfect solution may be the less efficient one in some cases.
I develop a similar 'process and process priority optimization automation' tool for Windows called Process Lasso (not an advertisement, its free). I had a similar choice to make and have a hybrid solution in place. Kernel mode hooks are available for certain process related events in Windows (creation and destruction), but they not only aren't exposed at user mode, but also aren't helpful at monitoring other process metrics. I don't think any OS is going to natively inform you of any change to any process metric. The overhead for that many different hooks might be much greater than simple polling.
Lastly, considering the HIGH frequency of process changes, it may be better to handle all changes at once (polling at interval) vs. notification events/hooks, which may have to be processed many more times per second.
You are RIGHT to stay away from scripts. Why? Because they are slow(er). Of course, the linux scheduler does a fairly good job at handling CPU bound threads by downgrading their priority and rewarding (upgrading) the priority of I/O bound threads -- so even in high loads a script should be responsive I guess.
There's another point of attack you might consider: replace the system's dynamic linker with a modified one which applies your logic. (See this paper for some nice examples of what's possible from the largely neglected art of linker hacking).
Where this approach will have problems is with purely statically linked binaries. I doubt there's much on a modern system which actually doesn't link something dynamically (things like busybox-static being the obvious exceptions, although you might regard the ability to get a minimal shell outside of your controls as a feature when it all goes horribly wrong), so this may not be a big deal. On the other hand, if the priority policies are intended to bring some order to an overloaded shared multi-user system then you might see smart users preparing static-linked versions of apps to avoid linker-imposed priorities.
Sure, just iterate through /proc/nnn/exe to get the pathname of the running image. Only use the ones with slashes, the others are kernel procs.
Check to see if you have already processed that one, otherwise look up the new priority in your configuration file and use renice(8) to tweak its priority.
If you want to do it as a kernel module then you could look into making your own binary loader. See the following kernel source files for examples:
$KERNEL_SOURCE/fs/binfmt_elf.c
$KERNEL_SOURCE/fs/binfmt_misc.c
$KERNEL_SOURCE/fs/binfmt_script.c
They can give you a first idea where to start.
You could just modify the ELF loader to check for an additional section in ELF files and when found use its content for changing scheduling priorities. You then would not even need to manage separate configuration files, but simply add a new section to every ELF executable you want to manage this way and you are done. See objcopy/objdump of the binutils tools for how to add new sections to ELF files.
Does anyone of you have any ideas on how to complete this task?
As an idea, consider using apparmor in complain-mode. That would log certain messages to syslog, which you could listen to.
If the processes in question are started by executing an executable file with a known path, you can use the inotify mechanism to watch for events on that file. Executing it will trigger an I_OPEN and an I_ACCESS event.
Unfortunately, this won't tell you which process caused the event to trigger, but you can then check which /proc/*/exe are a symlink to the executable file in question and renice the process id in question.
E.g. here is a crude implementation in Perl using Linux::Inotify2 (which, on Ubuntu, is provided by the liblinux-inotify2-perl package):
perl -MLinux::Inotify2 -e '
use warnings;
use strict;
my $x = shift(#ARGV);
my $w = new Linux::Inotify2;
$w->watch($x, IN_ACCESS, sub
{
for (glob("/proc/*/exe"))
{
if (-r $_ && readlink($_) eq $x && m#^/proc/(\d+)/#)
{
system(#ARGV, $1)
}
}
});
1 while $w->poll
' /bin/ls renice
You can of course save the Perl code to a file, say onexecuting, prepend a first line #!/usr/bin/env perl, make the file executable, put it on your $PATH, and from then on use onexecuting /bin/ls renice.
Then you can use this utility as a basis for implementing various policies for renicing executables. (or doing other things).