Bash script processing limited number of commands in parallel

Bash script processing limited number of commands in parallel - linux

I have a bash script that looks like this:
#!/bin/bash
wget LINK1 >/dev/null 2>&1
wget LINK2 >/dev/null 2>&1
wget LINK3 >/dev/null 2>&1
wget LINK4 >/dev/null 2>&1
# ..
# ..
wget LINK4000 >/dev/null 2>&1
But processing each line until the command is finished then moving to the next one is very time consuming, I want to process for instance 20 lines at once then when they're finished another 20 lines are processed.
I thought of wget LINK1 >/dev/null 2>&1 & to send the command to the background and carry on, but there are 4000 lines here this means I will have performance issues, not to mention being limited in how many processes I should start at the same time so this is not a good idea.
One solution that I'm thinking of right now is checking whether one of the commands is still running or not, for instance after 20 lines I can add this loop:
while [ $(ps -ef | grep KEYWORD | grep -v grep | wc -l) -gt 0 ]; do
sleep 1
done
Of course in this case I will need to append & to the end of the line! But I'm feeling this is not the right way to do it.
So how do I actually group each 20 lines together and wait for them to finish before going to the next 20 lines, this script is dynamically generated so I can do whatever math I want on it while it's being generated, but it DOES NOT have to use wget, it was just an example so any solution that is wget specific is not gonna do me any good.

Use the wait built-in:
process1 &
process2 &
process3 &
process4 &
wait
process5 &
process6 &
process7 &
process8 &
wait
For the above example, 4 processes process1 ... process4 would be started in the background, and the shell would wait until those are completed before starting the next set.
From the GNU manual:
wait [jobspec or pid ...]
Wait until the child process specified by each process ID pid or job specification jobspec exits and return the exit status of the last
command waited for. If a job spec is given, all processes in the job
are waited for. If no arguments are given, all currently active child
processes are waited for, and the return status is zero. If neither
jobspec nor pid specifies an active child process of the shell, the
return status is 127.

See parallel. Its syntax is similar to xargs, but it runs the commands in parallel.

In fact, xargs can run commands in parallel for you. There is a special -P max_procs command-line option for that. See man xargs.

You can run 20 processes and use the command:
wait
Your script will wait and continue when all your background jobs are finished.

Related

Parallel run and wait for pocesses from subshell

Hi all/ I'm trying to make something like parallel tool for shell simply because the functionality of parallel is not enough for my task. The reason is that I need to run different versions of compiler.
Imagine that I need to compile 12 programs with different compilers, but I can run only 4 of them simultaneously (otherwise PC runs out of memory and crashes :). I also want to be able to observe what's going on with each compile, therefore I execute every compile in new window.
Just to make it easier here I'll replace compiler that I run with small script that waits and returns it's process id sleep.sh:
#!/bin/bash
sleep 30
echo $$
So the main script should look like parallel_run.sh :
#!/bin/bash
for i in {0..11}; do
xfce4-terminal -H -e "./sleep.sh" &
pids[$i]=$!
pstree -p $pids
if (( $i % 4 == 0 ))
then
for pid in ${pids[*]}; do
wait $pid
done
fi
done
The problem is that with $! I get pid of xfce4-terminal and not the program it executes. So if I look at ptree of 1st iteration I can see output from main script:
xfce4-terminal(31666)----{xfce4-terminal}(31668)
|--{xfce4-terminal}(31669)
and sleep.sh says that it had pid = 30876 at that time. Thus wait doesn't work at all in this case.
Q: How to get right PID of compiler that runs in subshell?
Maybe there is the other way to solve task like this?

It seems like there is no way to trace PID from parent to child if you invoke process in new xfce4-terminal as terminal process dies right after it executed given command. So I came to the solution which is not perfect, but acceptable in my situation. I run and put compiler's processes in background and redirect output to .log file. Then I run tail on these logfiles and I kill all tails which belongs to current $USER when compilers from current batch are done, then I run the other batch.
#!/bin/bash
for i in {1..8}; do
./sleep.sh > ./process_$i.log &
prcid=$!
xfce4-terminal -e "tail -f ./process_$i.log" &
pids[$i]=$prcid
if (( $i % 4 == 0 ))
then
for pid in ${pids[*]}; do
wait $pid
done
killall -u $USER tail
fi
done
Hopefully there will be no other tails running at that time :)

PID of all child processes of a command

In a bash script, I want to launch a process in the foreground, then print a list of all the process names and PIDs that were started as children of that process. For example, suppose I have the following scripts, but I can only modify the first one:
A.sh:
#!/bin/bash
B.sh
B.sh:
#!/bin/bash
C.sh
C.sh:
#!/bin/bash
echo "Running C.sh"
Without modifying B.sh, C.sh or the echo command, and without starting any of the child processes in the background, I would like A.sh to print the following:
B.sh 1208
C.sh 1210
echo 1211
Can A.sh fork a process that records this information while the child processes are running in the foreground of A.sh?

Update: In the comments below my answer it turned out that:
I need something that observes the creation of all child processes during a span of time. Given that, filtering to isolate my subtree will not be difficult.
... was the intention behind the question and it was for debugging purposes.
In that case I'd recommend to use strace like this:
strace -f command
-f will track child processes - recursively. Since forking and exec-ing requires system calls, strace will list any child creation plus the pids.
Original answer:
You can use pgrep for that:
run_process &
pid=${!}
pgrep --parent "${pid}"
wait # wait for run_process to finish
Btw, you may want to use the pstree command, it is nice to use:
run_process &
pid=${!}
pstree -p "${pid}"
wait # wait for run_process to finish
Anyhow, you'll need to install pstree.

You can try doing this with A.sh
#!/usr/bin/env bash
./B.sh &
b_PID=$!
./C.sh &
c_PID=$!
echo "B.sh $b_PID"
echo "C.sh $c_PID"
The output will look something like this
B.sh 22802
C.sh 22803
Running C.sh

suspend a shell command without pid

I need something like $command & stop This should execute a command and suspend it. The application later resumes back the command for complete results.
I understand that job can be suspended with stop signal to the corresponding pid.
$kill -SIGSTOP 12753
When we execute a command, we barely know its pid. There is extra command involved to take a pid and do the required. I want to avoid the extra command and a time interval.
Basically The application is for a measure of network performance. Trigger all the commands put them in halt mode. The halted commands are resumed back as per the kind of traffic needed.

The process ID of the most recently started background command is available in the shell parameter $!:
$ command & kill -SIGSTOP $!
(Check the documentation for your shell's implementation of kill for the correct format.)

Try killall with the --signal option where you can specify the name of the process.
linux:~ # killall
Usage: killall [OPTION]... [--] NAME...
killall -l, --list
killall -V, --version
-e,--exact require exact match for very long names
-I,--ignore-case case insensitive process name match
-g,--process-group kill process group instead of process
-i,--interactive ask for confirmation before killing
-l,--list list all known signal names
-q,--quiet don't print complaints
-r,--regexp interpret NAME as an extended regular expression
-s,--signal SIGNAL send this signal instead of SIGTERM
-u,--user USER kill only process(es) running as USER
-v,--verbose report if the signal was successfully sent
-V,--version display version information
-w,--wait wait for processes to die
Verified by starting md5sum in a shell session:
linux$ md5sum
and in another session, ran:
killall -s SIGSTOP md5sum
yielding the following in the md5sum session:
[1]+ Stopped md5sum

Kindly confirm if you want to halt your command or run in background(append '&' to your command)?
If your application is expected to start halted command later, then why dont you start your command(to be halted) in that application itself.
This helps :
sleep 5 & kill -SIGSTOP $!
In above, have executed sleep(demo command) for 5 seconds in background.
Next have send to kill for stopping it using its PID obtained by $!.

Demo & kludge using timeout, (for some reason timeout intereprets a '0s' duration as "run forever"), to stop yes before it outputs anything:
# run 'yes' command, let it print 5 numbered lines, but stop it immediately
timeout -s SIGSTOP .000000001s yes | head -n 5 | cat -n
Output (to STDERR):
[1]+ Stopped timeout -s SIGSTOP .000000001s yes | head -n 5 | cat -n
Now restart it:
fg > /dev/null
Output:
1 y
2 y
3 y
4 y
5 y
Technique for users stuck with v8.12 or earlier coreutils, (pre-2011), wherein timeout lacks sub-second intervals. Requires waiting a second.
Wrap the command string in a shell invocation, preceded by a 1s wait -- so timeout waits 1 second, and simultaneously, so does the command string. Total wait time 1 second:
timeout -s SIGSTOP 1s sh -c "sleep 1s; yes | head -n 5 | cat -n"
Output is the same as before, fg is the same too.
Finesse, if waiting even 1 second before sleeping is too much, it can be run in the background like so:
timeout -s SIGSTOP 1s sh -c "sleep 1s; yes | head -n 5 | cat -n" &
Output (process number will vary):
[1] 14601
Then after a second, the output will be the same as the previous two timeout examples.

Assuming you are using the same command, find the command name in ps output, you can launch it in one terminal then open a new terminal
ps -ely
after retrieving the command name:
command & kill -SIGSTOP $(pidof command_name)
pidof needs the exact command name to be able to find the pid.
then to resume it:
kill -SIGCONT $(pidof command_name)
if the command name is not constant, but there is a pattern, you can create a script like this, you can call it pof.sh:
ps -ely | grep $1 | tr -s ' ' | cut -d" " -f3
command & kill -SIGSTOP $(bash pof.sh pattern)
One drawback with this script, is that in case many lines match the pattern it will returns all of theirs pids, if this is a problem, you can put the output in an array and go on from there.

Don't show the output of kill command in a Linux bash script [duplicate]

How can you suppress the Terminated message that comes up after you kill a
process in a bash script?
I tried set +bm, but that doesn't work.
I know another solution involves calling exec 2> /dev/null, but is that
reliable? How do I reset it back so that I can continue to see stderr?

In order to silence the message, you must be redirecting stderr at the time the message is generated. Because the kill command sends a signal and doesn't wait for the target process to respond, redirecting stderr of the kill command does you no good. The bash builtin wait was made specifically for this purpose.
Here is very simple example that kills the most recent background command. (Learn more about $! here.)
kill $!
wait $! 2>/dev/null
Because both kill and wait accept multiple pids, you can also do batch kills. Here is an example that kills all background processes (of the current process/script of course).
kill $(jobs -rp)
wait $(jobs -rp) 2>/dev/null
I was led here from bash: silently kill background function process.

The short answer is that you can't. Bash always prints the status of foreground jobs. The monitoring flag only applies for background jobs, and only for interactive shells, not scripts.
see notify_of_job_status() in jobs.c.
As you say, you can redirect so standard error is pointing to /dev/null but then you miss any other error messages. You can make it temporary by doing the redirection in a subshell which runs the script. This leaves the original environment alone.
(script 2> /dev/null)
which will lose all error messages, but just from that script, not from anything else run in that shell.
You can save and restore standard error, by redirecting a new filedescriptor to point there:
exec 3>&2 # 3 is now a copy of 2
exec 2> /dev/null # 2 now points to /dev/null
script # run script with redirected stderr
exec 2>&3 # restore stderr to saved
exec 3>&- # close saved version
But I wouldn't recommend this -- the only upside from the first one is that it saves a sub-shell invocation, while being more complicated and, possibly even altering the behavior of the script, if the script alters file descriptors.
EDIT:
For more appropriate answer check answer given by Mark Edgar

Solution: use SIGINT (works only in non-interactive shells)
Demo:
cat > silent.sh <<"EOF"
sleep 100 &
kill -INT $!
sleep 1
EOF
sh silent.sh
http://thread.gmane.org/gmane.comp.shells.bash.bugs/15798

Maybe detach the process from the current shell process by calling disown?

The Terminated is logged by the default signal handler of bash 3.x and 4.x. Just trap the TERM signal at the very first of child process:
#!/bin/sh
## assume script name is test.sh
foo() {
trap 'exit 0' TERM ## here is the key
while true; do sleep 1; done
}
echo before child
ps aux | grep 'test\.s[h]\|slee[p]'
foo &
pid=$!
sleep 1 # wait trap is done
echo before kill
ps aux | grep 'test\.s[h]\|slee[p]'
kill $pid ## no need to redirect stdin/stderr
sleep 1 # wait kill is done
echo after kill
ps aux | grep 'test\.s[h]\|slee[p]'

Is this what we are all looking for?
Not wanted:
$ sleep 3 &
[1] 234
<pressing enter a few times....>
$
$
[1]+ Done sleep 3
$
Wanted:
$ (set +m; sleep 3 &)
<again, pressing enter several times....>
$
$
$
$
$
As you can see, no job end message. Works for me in bash scripts as well, also for killed background processes.
'set +m' disables job control (see 'help set') for the current shell. So if you enter your command in a subshell (as done here in brackets) you will not influence the job control settings of the current shell. Only disadvantage is that you need to get the pid of your background process back to the current shell if you want to check whether it has terminated, or evaluate the return code.

This also works for killall (for those who prefer it):
killall -s SIGINT (yourprogram)
suppresses the message... I was running mpg123 in background mode.
It could only silently be killed by sending a ctrl-c (SIGINT) instead of a SIGTERM (default).

disown did exactly the right thing for me -- the exec 3>&2 is risky for a lot of reasons -- set +bm didn't seem to work inside a script, only at the command prompt

Had success with adding 'jobs 2>&1 >/dev/null' to the script, not certain if it will help anyone else's script, but here is a sample.
while true; do echo $RANDOM; done | while read line
do
echo Random is $line the last jobid is $(jobs -lp)
jobs 2>&1 >/dev/null
sleep 3
done

Another way to disable job notifications is to place your command to be backgrounded in a sh -c 'cmd &' construct.
#!/bin/bash
# ...
pid="`sh -c 'sleep 30 & echo ${!}' | head -1`"
kill "$pid"
# ...
# or put several cmds in sh -c '...' construct
sh -c '
sleep 30 &
pid="${!}"
sleep 5
kill "${pid}"
'

I found that putting the kill command in a function and then backgrounding the function suppresses the termination output
function killCmd() {
kill $1
}
killCmd $somePID &

Simple:
{ kill $! } 2>/dev/null
Advantage? can use any signal
ex:
{ kill -9 $PID } 2>/dev/null

Kill ssh or\and remote process from bash script

I am trying to run the following command as part of the bash script which suppose to open ssh channel, run the program on the remote machine, save the output to the file for 10 sec, kill the process, which was writing to the file and then give the control back to bash script.
#!/bin/bash
ssh hostname '/root/bin/nodes-listener > /tmp/nodesListener.out </dev/null; sshpid=!$; sleep 10; kill -9 $sshpid 2>/dev/null &'
Unfortunately, what it seems to be doing is starting the program: nodes-listener remotely, but it never gets any further and it doesn't give control to the bash script. So, the only way to stop the execution is to do Ctrl+C.
Killing ssh doesn't help (or rather can't be executed) since the control is not with bash script as it waits for the command within the ssh session to complete, which of course never happens as it has to be killed to stop.

Here's the command line that you're running on the remote system:
/root/bin/nodes-listener > /tmp/nodesListener.out </dev/null
sshpid=!$
sleep 10
kill -9 $sshpid 2>/dev/null &
You should change it to this:
/root/bin/nodes-listener > /tmp/nodesListener.out </dev/null & <-- Ampersand goes here
sshpid=$!
sleep 10
kill -9 $sshpid 2>/dev/null
You want to start nodes-listener and then kill it after ten seconds. To do this, you need to start nodes-listener as a background process, so that the shell which is executing this command line to move on to the next command after starting nodes-listener. The & in your command line is in the wrong place, and would apply only to the kill command. You need to apply it to the nodes-listener command.
I'll also note that your sshpid=!$ line was incorrect. You want sshpid=$!. $! is the process ID of the last command started in the background.

You need to place the ampersand after the first command, then put the remaining commands onto the next line:
ssh hostname -- '/root/bin/nodes-listener > /tmp/nodesListener.out </dev/null &
sshpid=$!; sleep 10; kill $sshpid 2>/dev/null'
Btw, ssh is returning after all commands had been executed. This does mean it will close the allocated pty as well. If there are still background jobs running in that shell session, they would being killed by SIGHUP. This means, you can probably omit the explicit kill command. (Depends on whether nodes-listener handles SIGHUP and SIGTERM differently). Having this, you could simplify the code to the following:
ssh hostname -- sh -c '/root/bin/nodes-listener > /tmp/nodesListener.out </dev/null &
sleep 10'

I have resolved this by pushing the shell script to the remote machine and executing it there. It is actually less tidy and relies on space being available on the remote computer.
Since my remote machine is a small physical device, the issue of the space usage is important (even for the tiny amount of space required in this case).
/root/bin/nodes-listener > /tmp/nodesListener.out </dev/null &
sshpid=!$
sleep 20
sync
# killing nodes-listener process and giving control back to the base bash
killall -9 nodes-listener 2>/dev/null && echo "nodes-listener is killed"

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Bash script processing limited number of commands in parallel - linux

See parallel. Its syntax is similar to xargs, but it runs the commands in parallel.

In fact, xargs can run commands in parallel for you. There is a special -P max_procs command-line option for that. See man xargs.

You can run 20 processes and use the command: wait Your script will wait and continue when all your background jobs are finished.

Related

Parallel run and wait for pocesses from subshell

PID of all child processes of a command

suspend a shell command without pid

Don't show the output of kill command in a Linux bash script [duplicate]

Kill ssh or\and remote process from bash script

Categories

Resources