Why is File::FcntlLock's l_type always "F_UNLCK" even if the file is locked? - linux

The Perl subroutine below uses File::FcntlLock to check if a file is locked.
Why does it return 0 and print /tmp/test.pid is unlocked. even if the file is locked?
sub getPidOwningLock {
my $filename = shift;
my $fs = new File::FcntlLock;
$fs->l_type( F_WRLCK );
$fs->l_whence( SEEK_SET );
$fs->l_start( 0 );
$fs->l_len( 0 );
my $fd;
if (!open($fd, '+<', $filename)) {
print "Could not open $filename\n";
return -1;
}
if (!$fs->lock($fd, F_GETLK)) {
print "Could not get lock information on $filename, error: $fs->error\n";
close($fd);
return -1;
}
close($fd);
if ($fs->l_type() == F_UNLCK) {
print "$filename is unlocked.\n";
return 0;
}
return $fs->l_pid();
}
The file is locked as follows (lock.sh):
#!/bin/sh
(
flock -n 200
while true; do sleep 1; done
) 200>/tmp/test.pid
The file is indeed locked:
~$ ./lock.sh &
[2] 16803
~$ lsof /tmp/test.pid
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
bash 26002 admin 200w REG 8,5 0 584649 test.pid
sleep 26432 admin 200w REG 8,5 0 584649 test.pid

fcntl and flock locks are invisible to each other.
This is a big problem for your use case because the flock utility that you're using in your shell script depends on flock semantics: the shell script runs a flock child process, which locks an inherited file descriptor and then exits. The shell keeps that file descriptor open (because the redirection is on a whole sequence of commands) until it wants to release the lock.
That plan can't work with fcntl because fcntl locks are not shared among processes. If there was a utility identical to flock but using fcntl, the lock would be released too early (as soon as the child process exits).
For coordination of a file lock between a perl process and a shell script, some options you can consider are:
port the shell script to zsh and use the zsystem flock builtin from the zsh/system module (note: in the documentation it claims to use fcntl in spite of its name being flock)
rewrite the shell script in perl
just use flock in the perl script (give up byte range locking and the "get locker PID" feature - but you can emulate that on Linux by reading /proc/locks)
write your own fcntl utility in C for use in the shell script (the usage pattern will be different - the shell script will have to background it and then kill it later to unlock - and it will need some way to tell the parent process when it has obtained or failed to obtain the lock, which will be hard because it's happening asynchronously now... maybe use the coprocess feature that some shells have).
run a small perl script from the shell script to do the locking (will need the same background treatment that a dedicated fcntl utility would need)
For more information on features of the different kinds of locks, see What is the difference between locking with fcntl and flock.

Related

How to get PID of perl daemon in init script?

I have the following perl script:
#!/usr/bin/perl
use strict;
use warnings;
use Proc::Daemon;
Proc::Daemon::Init;
my $continue = 1;
$SIG{TERM} = sub { $continue = 0 };
while ($continue) {
# stuff
}
I have the following in my init script:
DAEMON='/path/to/perl/script.pl'
start() {
PID=`$DAEMON > /dev/null 2>&1 & echo $!`
echo $PID > /var/run/mem-monitor.pid
}
The problem is, this returns the wrong PID! This returns the PID of the parent process which is started when the daemon is run, but that process is immediately killed off. I need to get the PID of the child process!
The Proc::Daemon says
Proc::Daemon does the following:
...
9. The first child transfers the PID of the second child (daemon) to the parent. Additionally the PID of the daemon process can be written into a file if 'pid_file' is defined. Then the first child exits.
and then later, under new ( %ARGS )
pid_file
Defines the path to a file (owned by the parent user) where the PID of the daemon process will be stored. Defaults to undef (= write no file).
Also look at Init() method description. This all implies that you may want to use new first.
The point is that it is the grand-child process that is the daemon. However, the childr passes the pid along and it is available to the parent. If pid_file => $file_name is set in the constructor (the daemon's) pid is written to that file.
A comment asks to not have shell script rely on a file written by another script.
I can see two ways to do that.
Print the pid, returned by the $daemon->Init(), from the parent and pick it up in the shell. This is defeated by redirects in the question, but I don't know why they are needed. The parent and child exit right as all is set up, while the daemon is detached from everything.
Shell script can start the Perl script with the desired log-file name as an argument, letting it write the daemon pid to that file by the above process. The file is still output by Perl, but what matters about it is decided by the shell script.
I'd like to include a statement from my comment below. I consider these superior to two other things that come to mind: picking the filename from a config-style file kept by the shell is more complicated, while parsing the process table may be unreliable.
I've seen this before and had to resort to using STDERR to send back the childs PID to the calling shell script. I've always assumed it was due to the mentioned unreliability of exit codes - but details were not clear in the documentation. Please try something like this:
#!/usr/bin/perl
use strict;
use warnings;
use Proc::Daemon;
if( my $pid = Proc::Daemon::Init() ) {
print STDERR $pid;
exit;
}
my $continue = 1;
$SIG{TERM} = sub { $continue = 0 };
while ($continue) {
sleep(20);
exit;
}
With a calling script like this:
#!/bin/bash
DAEMON='./script.pl'
start() {
PID=$($DAEMON 2>&1 >/dev/null)
echo $PID > ./mem-monitor.pid
}
start;
When the bash script is ran, it will capture the STDERR output (containing the correct PID), and store it in the file. Any STDOUT the Perl script produces would be sent to /dev/null - though this is unlikely as the 1st level Perl script does (in this case) exit fairly early on.
Thank you to the suggestions from zdim and Hakon. They are certainly workable, and got me on the right track, but ultimately I went a different route. Rather than relying on $!, I used ps and awk to get the PID, as follows:
DAEMON='/path/to/perl/script.pl'
start() {
$DAEMON > /dev/null 2>&1
PID=`ps aux | grep -v 'grep' | grep "$DAEMON" | awk '{print $2}'`
echo $PID > /var/run/mem-monitor.pid
}
This works and satisfies my OCD! Note the double quotes around "$DAEMON" in grep "$DAEMON".

How to handle updates from an continuous process pipe in Perl

I am trying to follow log files in Perl on Fedora but unfortunately, Fedora uses journalctl to read binary log files that I cannot parse directly. This, according to my understanding, means I can only read Fedora's log files by calling journalctl.
I tried using IO::Pipe to do this, but the problem is that $p->reader(..) waits until journalctl --follow is done writing output (which will be never since --follow is like tail -F) and then allows me to print everything out which is not what I want. I would like to be able to set a callback function to be called each time a new line is printed to the process pipe so that I can parse/handle each new log event.
use IO::Pipe;
my $p = IO::Pipe->new();
$p->reader("journalctl --follow"); #Waits for process to exit
while (<$p>) {
print;
}
I assume that journalctl is working like tail -f. If this is correct, a simple open should do the job:
use Fcntl; # Import SEEK_CUR
my $pid = open my $fh, '|-', 'journalctl --follow'
or die "Error $! starting journalctl";
while (kill 0, $pid) {
while (<$fh>) {
print $_; # Print log line
}
sleep 1; # Wait some time for new lines to appear
seek($fh,0,SEEK_CUR); # Reset EOF
}
open opens a filehandle for reading the output of the called command: http://perldoc.perl.org/functions/open.html
seek is used to reset the EOF marker: http://perldoc.perl.org/functions/seek.html Without reset, all subsequent <$fh> calls will just return EOF even if the called script issued additional output in the meantime.
kill 0,$pid will be true as long as the child process started by open is alive.
You may replace sleep 1 by usleep from Time::HiRes or select undef,undef,undef,$fractional_seconds; to wait less than a second depending on the frequency of incoming lines.
AnyEvent should also be able to do the job via it's AnyEvent::Handle.
Update:
Adding use POSIX ":sys_wait_h"; at the beginning and waitpid $pid, WNOHANG) to the outer loop would also detect (and reap) a zombie journalctl process:
while (kill(0, $pid) and waitpid($pid, WNOHANG) != $pid) {
A daemon might also want to check if $pid is still a child of the current process ($$) and if it's still the original journalctl process.
I have no access to journalctl, but if you avoid IO::Pipe and open the piped output directly then the data will not be buffered
use strict;
use warnings 'all';
open my $follow_fh, '-|', 'journalctl --follow' or die $!;
print while <$follow_fh>;

Linux flock, how to "just" lock a file?

In Bash, I'm trying to make a function getLock to be used with different lock names.
function getLock
{
getLock_FILE="${1}"
getLock_OP="${2}"
case "${getLock_OP}" in
"LOCK_UN")
flock -u "${getLock_FILE}"
rm -fr "${getLock_FILE}"
;;
"LOCK_EX")
flock -x "${getLock_FILE}"
esac
}
But flock says flock: bad number: myfilelock
How can I just lock a file, and release it when I want, without having to execute a command in the flock?
It is to be used like this:
getLock myfilelock LOCK_EX
somecommands
........
getLock myfilelock LOCK_UN
To lock the file:
exec 3>filename # open a file handle; this part will always succeed
flock -x 3 # lock the file handle; this part will block
To release the lock:
exec 3>&- # close the file handle
You can also do it the way the flock man page describes:
{
flock -x 3
...other stuff here...
} 3>filename
...in which case the file is automatically closed when the block exits. (A subshell can also be used here, via using ( ) rather than { }, but this should be a deliberate decision -- as subshells have a performance penalty, and scope variable modifications and other state changes to themselves).
If you're running a new enough version of bash, you don't need to manage file descriptor numbers by hand:
# this requires a very new bash -- 4.2 or so.
exec {lock_fd}>filename # open filename, store FD number in lock_fd
flock -x "$lock_fd" # pass that FD number to flock
exec $lock_fd>&- # later: release the lock
...now, for your function, we're going to need associative arrays and automatic FD allocation (and, to allow the same file to be locked and unlocked from different paths, GNU readlink) -- so this won't work with older bash releases:
declare -A lock_fds=() # store FDs in an associative array
getLock() {
local file=$(readlink -f "$1") # declare locals; canonicalize name
local op=$2
case $op in
LOCK_UN)
[[ ${lock_fds[$file]} ]] || return # if not locked, do nothing
exec {lock_fds[$file]}>&- # close the FD, releasing the lock
unset lock_fds[$file] # ...and clear the map entry.
;;
LOCK_EX)
[[ ${lock_fds[$file]} ]] && return # if already locked, do nothing
local new_lock_fd # don't leak this variable
exec {new_lock_fd}>"$file" # open the file...
flock -x "$new_lock_fd" # ...lock the fd...
lock_fds[$file]=$new_lock_fd # ...and store the locked FD.
;;
esac
}
If you're on a platform where GNU readlink is unavailable, I'd suggest replacing the readlink -f call with realpath from sh-realpath by Michael Kropat (relying only on widely-available readlink functionality, not GNU extensions).

Need an in-depth explanation of how to use flock in Linux shell scripting

I am working on a tiny Raspberry Pi cluster (4 pis). I have 3 Raspberry Pi nodes that will be leaving a message in a message.txt file on the head Pi. The head Pi will be in a loop checking the message.txt file to see if it has any lines. When it does I want to lock the file and then extract the info I need. The problem I am having is that I need to do multiple commands. The only ways I have found that allows multiple commands look like this...
(
flock -s 200
# ... commands executed under lock ...
) 200>/var/lock/mylockfile
The problem with this way is that it uses a sub shell. The problem with that is that I have "job" files labeled job_1 job_2 etc..... that I want to be able to use a counter with. If I place the increment of the counter in the subshell it will be considered only in the scope of the subshell. If I pull the incrementation out there is a chance that another pi will add an entry before I increment the counter and lock the file.
I have heard talk that there is a way to lock the file and run multiple commands and flow control and then unlock it all using flock. I have not seen any good examples though.
Here is my current code.
# Now go into loop to send out jobs as pis ask for more work
while [ $jobsLeftCount -gt 0 ]
do
echo "launchJobs.sh: About to check msg file"
msgLines=$(wc -l < $msgLocation)
if [ $msgLines ]; then
#FIND WAY TO LOCK FILE AND DO THAT HERE
echo "launchJobs.sh: Messages found. Locking message file to read contents"
(
flock -e 350
echo "Message Received"
while read line; do
#rename file to be sent to node "job"
mv $jobLocation$jobName$jobsLeftCount /home/pi/algo2/Jobs/job
#transfer new job to each script that left a message
scp /home/pi/algo2/Jobs/job pi#192.168.0.$line:/home/pi/algo2/Jobs/
jobsLeftCount=$jobsLeftCount-1;
echo $line
done < $msgLocation
#clear msg file
>$msgLocation
#UNLOCK MESG FILE HERE
) 350>>$msgLocation
echo "Head node has $jobsLeftCount remaining"
fi
#jobsLeftCount=$jobsLeftCount-1;
#echo "here is $jobsLeftCount file"
done
If the sub-shell environment is not acceptable, use braces in place of parentheses to group the commands:
{
flock -s 200
# ... commands executed under lock ...
} 200>/var/lock/mylockfile
This runs the commands executed under lock in a new I/O context, but does not start a sub-shell. Within the braces, all the commands executed will have file descriptor 200 open to the locked lock file.

How can I change the current directory in a thread-safe manner in Perl?

I'm using Thread::Pool::Simple to create a few working threads. Each working thread does some stuff, including a call to chdir followed by an execution of an external Perl script (from the jbrowse genome browser, if it matters). I use capturex to call the external script and die on its failure.
I discovered that when I use more then one thread, things start to be messy. after some research. it seems that the current directory of some threads is not the correct one.
Perhaps chdir propagates between threads (i.e. isn't thread-safe)?
Or perhaps it's something with capturex?
So, how can I safely set the working directory for each thread?
** UPDATE **
Following the suggestions to change dir while executing, I'd like to ask how exactly should I pass these two commands to capturex?
currently I have:
my #args = ( "bin/flatfile-to-json.pl", "--gff=$gff_file", "--tracklabel=$track_label", "--key=$key", #optional_args );
capturex( [0], #args );
How do I add another command to #args?
Will capturex continue die on errors of any of the commands?
I think that you can solve your "how do I chdir in the child before running the command" problem pretty easily by abandoning IPC::System::Simple as not the right tool for the job.
Instead of doing
my $output = capturex($cmd, #args);
do something like:
use autodie qw(open close);
my $pid = open my $fh, '-|';
unless ($pid) { # this is the child
chdir($wherever);
exec($cmd, #args) or exit 255;
}
my $output = do { local $/; <$fh> };
# If child exited with error or couldn't be run, the exception will
# be raised here (via autodie; feel free to replace it with
# your own handling)
close ($fh);
If you were getting a list of lines instead of scalar output from capturex, the only thing that needs to change is the second-to-last line (to my #output = <$fh>;).
More info on forking-open is in perldoc perlipc.
The good thing about this in preference to capture("chdir wherever ; $cmd #args") is that it doesn't give the shell a chance to do bad things to your #args.
Updated code (doesn't capture output)
my $pid = fork;
die "Couldn't fork: $!" unless defined $pid;
unless ($pid) { # this is the child
chdir($wherever);
open STDOUT, ">/dev/null"; # optional: silence subprocess output
open STDERR, ">/dev/null"; # even more optional
exec($cmd, #args) or exit 255;
}
wait;
die "Child error $?" if $?;
I don't think "current working directory" is a per-thread property. I'd expect it to be a property of the process.
It's not clear exactly why you need to use chdir at all though. Can you not launch the external script setting the new process's working directory appropriately instead? That sounds like a more feasible approach.

Resources