fork wget 's with ability to control specific downloads - linux

I'm writing a bash script, aka download-manager.
Point of interest is make this simple lines more advanced:
for link in ${links}; do
wget -q --show-progress ${link}
done
How to fork all downloads and provide for script-user a friendly way to kill one specific download after all has started?
Does wget -bqc run in parallel or not?
Is there anything to use something instead --show-progress to provide ability for script-user to show current status of specific download?

So idea is:
#declare associative array [download_url]=pid_of_download
declare -A downloads
#declare array of killed downloads
declare -a killedDownloads=()
function download() {
local url=$1
# killed downloads can be reloaded, but not successfully downloaded URL's
if [ ${downloads[${url}]} ] && [[ ! ${killedDownloads[*]} =~ ${do wnloads[${url}]} ]]; then
echo "Already download-[ing/ed] !"
else
wget -q ${url} &
downloads[${url}]=$!
fi
}
$! contains the process ID of the most recently executed background command, obviously PID of just executed wget command
When user wants to kill some download, we print all download array. While printing, we can also show status obtained as a substring from
jobs -l | grep ${downloads[$i])
...and use killedDownloads to know which URLs was actually killed, not finished.
if [[ ${killedDownloads[*]} =~ ${downloads[$i]} ]]
//set status var to killed
Kill itself :
function stop(){
local pid=$1
if ps -p ${pid} > /dev/null; then
kill -9 ${pid}
wait ${pid} 2>/dev/null
echo "Killed downloading of ${pid}"
killedDownloads+=${pid}
else
echo "No active download with id = ${pid} exists"
fi
}
And of course you'll need to put this inside some interactive loop, add checks for invalid urls, etc.

Related

How to make sure child processes have correct user ID?

As a etude, I wrote a quick-and-dirty bash script to set a radio alarm a few months ago. It sets a cron/at-job to start streaming an internet radio at as specified time (I use mplayer), and records the job id in a file for easy undoing. However, as it stands the logic to turn off a running alarm is simply to kill of the most recent couple of mplayer instances. This is potentially a problem if you're watching a video at the time the alarm goes off, or running a batch job converting audio or video files...
So I thought I'd create a designated virtual user for running this script, and, instead of killing the most recent mplayer instances, kill all and only those invoked by this user. I thus created a user radiowecker and invoke the script with sudo -u radiowecker /var/lib/radiowecker/wecker $1. However, this doesn't seem to do the job: While the at-job does show up as radiowecker's, the mplayer instances it spawns are filed under my UID. How do I ensure the child processes are also filed as radiowecker's?
if [ $2 ];
then stream="$2"
else stream=http://mp3stream1.apasf.apa.at:8000;
fi
if [ $1 ]; then
# with argument, set new radio alarm using 'at' and log the at-job-id
remove_log_when_playing="rm ~/.local/bin/weckerlogs/${*} "
play_radio="mplayer $stream -cache 1024"
and="&&"
show_running="touch ~/.local/bin/alarm_running"
printf "$remove_log_when_playing && $show_running && $play_radio" | at "${*}" \
&& echo $(atq | sort -nr | { read first last ; echo $first ; }) >> ~/.local/bin/weckerlogs/"${*}"
else
if [[ $(pgrep mplayer) && -e ~/.local/bin/alarm_running ]]; then
rm ~/.local/bin/alarm_running
# turn off running mplayer, assumed to be called from an earlier alarm
for i in 0 1; do
for id in $(pgrep mplayer)
do WECKER=$id
done
kill $WECKER
done
else
# turning off an alarm in the future has its own tool
echo "No active mplayer instances found."
echo "To turn off a future alarm, instead"
echo "use 'wecker-aus' <time>!"
echo "Currently set alarms:"
ls ~/.local/bin/weckerlogs/
fi
fi
turn off future alarm:
#/bin/bash
log=~/.local/bin/weckerlogs/"${*}"
atrm $(cat "$log")
rm "$log"

Shell scripts and how to avoid running the same script at the same time on a Linux machine

I have Linux centralize server – Linux 5.X.
In some cases on my Linux server the get_hosts.ksh script could be run from some other different hosts.
For example get_hosts.ksh could run on my Linux machine three or more times at the same time.
My question:
How to avoid running multiple instances of process/script?
A common solution for your problem on *nix systems is to check for a lock file existence.
Usually lock file contains current process PID.
This is an example ksh script:
#!/bin/ksh
pid="/var/run/get_hosts.pid"
trap "rm -f $pid" SIGSEGV
trap "rm -f $pid" SIGINT
if [ -e $pid ]; then
exit # pid file exists, another instance is running, so now we politely exit
else
echo $$ > $pid # pid file doesn't exit, create one and go on
fi
# your normal workflow here...
rm -f $pid # remove pid file just before exiting
exit
UPDATE: Answering to OP comment, I add handling program interruptions and segfaults with trap command.
The normal way of doing this is to write the process id into a file. The first thing the script does is check for the existence of the file, read the pid, check if a process with that pid exists, and for extra paranoia points, if that process actually runs the script. If yes, the script exits.
Here's a simple example. The process in question is a binary, and this script makes sure the binary runs only once. This is not exactly what you need, but you should be able to adapt this:
RUNNING=0
PIDFILE=$PATH_TO/var/run/example.pid
if [ -f $PIDFILE ]
then
PID=`cat $PIDFILE`
ps -eo pid | grep $PID >/dev/null 2>&1
if [ $? -eq 0 ]
then
RUNNING=1
fi
fi
if [ $RUNNING -ne 1 ]
then
run_binary
PID=$!
echo $PID > $PIDFILE
fi
This is not very elaborate but should get you on the right track.
You can use a pid file to keep track of when the process is running. At the top of the script, check for the existence of the pid file and if it doesn't exist, create it and run the script, otherwise return.
Some sample code can be seen in this answer to a similar question.
You might consider using the (optional) lockfile(1) command (provided by procmail package on Debian).
I have a lot of scripts, and using this below code for prevent multiple/simulate run:
PID="/var/scripts/PID.txt" # Temp file
if [ ! -f "$PID" ]; then
echo $$ > "$PID" # Print actual PID into a file
else
ps -p $(cat "$PID") > /dev/null && exit || echo $$ > "$PID"
fi
Building on wallenborn's answer I also added a "staleness" check just in case the PID lock file is beyond a certain expected age in seconds.
# prevent simultaneous executions within an hourish
pid_file="$HOME/.harness.pid"
max_stale_seconds=3600
if [ -f $pid_file ]; then
pid="$(cat "$pid_file")"
let age_in_seconds="$(date +%s) - $(date -r "$pid_file" +%s)"
if ps $pid >/dev/null && [ $age_in_seconds -lt $max_stale_seconds ]; then
exit 1
fi
fi
echo $$>"$pid_file"
trap "rm -f \"$pid_file\"" SIGSEGV
trap "rm -f \"$pid_file\"" SIGINT
This could be made "smarter" to kill off the other executions should the PID be valid but this would be dangerous. Consider a sudden power failure and reset situation where the PID file contains a number that may now reference a completely different process.

How to check if script is running or not from script itself?

Having below sample script sample.sh
#!/bin/bash
if ps aux | grep -o "sample.sh" >/dev/null
then
echo "Already script running"
exit 0
fi
echo "start script"
while true
do
echo "script running"
sleep 5
done
In above script i want to check if this script previously running or not if running then not run it again.
problem is check condition always become true (because to check the condition require to run script) and it always show me "Already script running" message.
Any idea how to solve it?
You need a proper lock. I'd do using flock like this:
exec 201> /tmp/lock.$(basename $0).file
if ! flock -n 201 ; then
echo "another instance of $0 is running";
exit 1
fi
# cmds
exec 201>&-
rm -rf /tmp/lock.$(basename $0).file
This basically creates lock for script using a temporary file. The temporary file has particular significance other than it's used to tell whether your script has acquired a lock.
When there's an instance of this program running, the next run of the same program can't run as the lock will prevent it.
For me will be safer to use a lock file , create it when process start and delete after completion.
Let the script record its own PID in a file. Before doing so, it first checks if that file currently contains an active PID, in which case it exits.
pid=$(< ${PID_FILE:?} || exit
kill -0 $PID && exit
The next exercise is to prevent race conditions when writing the file.
Try this, it gives number of sample.sh run by the user
ps -aux | awk -v app='sample.sh' '$0 ~ app { print $1 }' |grep $USERNAME|wc -l
Wtite a tmp file to the /tmp directory.
have your script check to see if the file exists, if it does then don't run.
#!/bin/sh
# our tmpfile
tmpfile="/tmp/mytmpfile"
# check to see if it exists.
# if it does then exit script
if [[ -f ${tmpfile} ]]; then
echo script already running.
exit
fi
# it doesn't exist at this point so lets make one
touch ${tmpfile}
# do whatever now.
# end of script
rm ${tmpfile}

wait child process but get error: 'pid is not a child of this shell'

I write a script to get data from HDFS parrallel,then I wait these child processes in a for loop, but sometimes it returns "pid is not a child of this shell". sometimes, it works well。It's so puzzled. I use "jobs -l" to show all the jobs run in the background. I am sure these pid is the child process of the shell process, and I use "ps aux" to make sure these pids is note assign to other process. Here is my script.
PID=()
FILE=()
let serial=0
while read index_tar
do
echo $index_tar | grep index > /dev/null 2>&1
if [[ $? -ne 0 ]]
then
continue
fi
suffix=`printf '%03d' $serial`
mkdir input/output_$suffix
$HADOOP_HOME/bin/hadoop fs -cat $index_tar | tar zxf - -C input/output_$suffix \
&& mv input/output_$suffix/index_* input/output_$suffix/index &
PID[$serial]=$!
FILE[$serial]=$index_tar
let serial++
done < file.list
for((i=0;i<$serial;i++))
do
wait ${PID[$i]}
if [[ $? -ne 0 ]]
then
LOG "get ${FILE[$i]} failed, PID:${PID[$i]}"
exit -1
else
LOG "get ${FILE[$i]} success, PID:${PID[$i]}"
fi
done
Just find the process id of the process you want to wait for and replace that with 12345 in below script. Further changes can be made as per your requirement.
#!/bin/sh
PID=12345
while [ -e /proc/$PID ]
do
echo "Process: $PID is still running" >> /home/parv/waitAndRun.log
sleep .6
done
echo "Process $PID has finished" >> /home/parv/waitAndRun.log
/usr/bin/waitingScript.sh
http://iamparv.blogspot.in/2013/10/unix-wait-for-running-process-not-child.html
Either your while loop or the for loop runs in a subshell, which is why you cannot await a child of the (parent, outer) shell.
Edit this might happen if the while loop or for loop is actually
(a) in a {...} block
(b) participating in a piper (e.g. for....done|somepipe)
If you're running this in a container of some sort, the condition apparently can be caused by a bug in bash that is easier to encounter in a containerized envrionment.
From my reading of the bash source (specifically see comments around RECYCLES_PIDS and CHILD_MAX in bash-4.2/jobs.c), it looks like in their effort to optimize their tracking of background jobs, they leave themselves vulnerable to PID aliasing (where a new process might obscure the status of an old one); to mitigate that, they prune their background process history (apparently as mandated by POSIX?). If you should happen to want to wait on a pruned process, the shell can't find it in the history and assumes this to mean that it never knew about it (i.e., that it "is not a child of this shell").

How to make sure only one instance of a Bash script is running at a time?

I want to make a sh script that will only run at most once at any point.
Say, if I exec the script then I go to exec the script again, how do I make it so that if the first exec of the script is still working the second one will fail with an error. I.e. I need to check if the script is running elsewhere before doing anything. How would I go about doing this??
The script I have runs a long running process (i.e. runs forever). I wanted to use something like cron to call the script every 15mins so in case the process fails, it will be restarted by the next cron run script.
You want a pid file, maybe something like this:
pidfile=/path/to/pidfile
if [ -f "$pidfile" ] && kill -0 `cat $pidfile` 2>/dev/null; then
echo still running
exit 1
fi
echo $$ > $pidfile
I think you need to use lockfile command. See using lockfiles in shell scripts (BASH) or http://www.davidpashley.com/articles/writing-robust-shell-scripts.html.
The second article uses "hand-made lock file" and shows how to catch script termination & releasing the lock; although using lockfile -l <timeout seconds> will probably be a good enough alternative for most cases.
Example of usage without timeout:
lockfile script.lock
<do some stuff>
rm -f script.lock
Will ensure that any second script started during this one will wait indefinitely for the file to be removed before proceeding.
If we know that the script should not run more than X seconds, and the script.lock is still there, that probably means previous instance of the script was killed before it removed script.lock. In that case we can tell lockfile to force re-create the lock after a timeout (X = 10 below):
lockfile -l 10 /tmp/mylockfile
<do some stuff>
rm -f /tmp/mylockfile
Since lockfile can create multiple lock files, there is a parameter to guide it how long it should wait before retrying to acquire the next file it needs (-<sleep before retry, seconds> and -r <number of retries>). There is also a parameter -s <suspend seconds> for wait time when the lock has been removed by force (which kind of complements the timeout used to wait before force-breaking the lock).
You can use the run-one package, which provides run-one, run-this-one and keep-one-running.
The package: https://launchpad.net/ubuntu/+source/run-one
The blog introducing it: http://blog.dustinkirkland.com/2011/02/introducing-run-one-and-run-this-one.html
Write the process id into a file and then when a new instance starts, check the file to see if the old instance is still running.
(
if flock -n 9
then
echo 'Not doing the critical operation (lock present).'
exit;
fi
# critical section goes here
) 9>'/run/lock/some_lock_file'
rm -f '/run/lock/some_lock_file'
From example in flock(1) man page. Very practical for using in shell scripts.
I just wrote a tool that does this:
https://github.com/ORESoftware/quicklock
writing a good one takes about 15 loc, so not something you want to include in every shell script.
basically works like this:
$ ql_acquire_lock
the above calls this bash function:
function ql_acquire_lock {
set -e;
name="${1:-$PWD}" # the lock name is the first argument, if that is empty, then set the lockname to $PWD
mkdir -p "$HOME/.quicklock/locks"
fle=$(echo "${name}" | tr "/" _)
qln="$HOME/.quicklock/locks/${fle}.lock"
mkdir "${qln}" &> /dev/null || { echo "${ql_magenta}quicklock: could not acquire lock with name '${qln}'${ql_no_color}."; exit 1; }
export quicklock_name="${qln}"; # export the var *only if* above mkdir command succeeds
trap on_ql_trap EXIT;
}
when the script exits, it automatically releases the lock using trap
function on_ql_trap {
echo "quicklock: process with pid $$ was trapped.";
ql_release_lock
}
to manually release the lock at will, use ql_release_lock:
function ql_maybe_fail {
if [[ "$1" == "true" ]]; then
echo -e "${ql_magenta}quicklock: exiting with 1 since fail flag was set for your 'ql_release_lock' command.${ql_no_color}"
exit 1;
fi
}
function ql_release_lock () {
if [[ -z "${quicklock_name}" ]]; then
echo -e "quicklock: no lockname was defined. (\$quicklock_name was not set).";
ql_maybe_fail "$1";
return 0;
fi
if [[ "$HOME" == "${quicklock_name}" ]]; then
echo -e "quicklock: dangerous value set for \$quicklock_name variable..was equal to user home directory, not good.";
ql_maybe_fail "$1"
return 0;
fi
rm -r "${quicklock_name}" &> /dev/null &&
{ echo -e "quicklock: lock with name '${quicklock_name}' was released."; } ||
{ echo -e "quicklock: no lock existed for lockname '${quicklock_name}'."; ql_maybe_fail "$1"; }
trap - EXIT # clear/unset trap
}
I suggest using flock, but in a different way than suggested by #Josef Kufner. I think this is quite easy and flock should be available on most systems by default:
flock -n lockfile myscript.sh

Resources