Need help to make my program automated - linux

I just wanted to get some idea how I should approach to this. I am trying to automate to get report back to a database with these bunch of scripts (i.e., java -jar snet_client.jar -mode report -id 13528 -props /int2/contact/client0.properties & ). Lets say I have hundreds of this command with unique numbers as it in the script(13528). I need to put that in a loop so that I do not need to write/copy&paste that hundred scripts over and over to execute. Any suggestion would be helpful. It has to be in unix.

This first bash script would iterate over each line in the file textfile, assuming that each of the id values are on a single line, start the java process and wait for it to complete before starting the next one.
# Queueing
# This one will only start the next process when the previous one completes.
OLD_IFS=$IFS
while IFS=$'\n' read -r line_data; do
java -jar snet_client.jar -mode report -id ${line_data} -props /int2/contact/client0.properties &
wait;
done < /path/to/textfile
IFS=$OLD_IFS
Alternatively, this script does the same, as far as getting id values from a text file, but doesn't wait for the first to complete before the next is started. This will likely cause problems is the snet_client.jar program is very resource intensive:
# Non-queueing
# This starts and runs all the processes
OLD_IFS=$IFS
while IFS=$'\n' read -r line_data; do
java -jar snet_client.jar -mode report -id ${line_data} -props /int2/contact/client0.properties &
done < /path/to/textfile
IFS=$OLD_IFS
On both, we store the current IFS value before we begin so we can reset it after the process runs, just in case we need it set back for something later in the script file.
I have not tested these (since I don't have the dependencies available), so you might have to make adjustments for your own environment.

Related

Use more than one core in bash

I have a linux tool that (greatly simplifying) cuts me the sequences specified in illumnaSeq file. I have 32 files to grind. One file is processed in about 5 hours. I have a server on the centos, it has 128 cores.
I've found a few solutions, but each one works in a way that only uses one core. The last one seems to fire 32 nohups, but it'll still pressurize the whole thing with one core.
My question is, does anyone have any idea how to use the server's potential? Because basically every file can be processed independently, there are no relations between them.
This is the current version of the script and I don't know why it only uses one core. I wrote it with the help of advice here on stack and found on the Internet:
#!/bin/bash
FILES=/home/daw/raw/*
count=0
for f in $FILES
to
base=${f##*/}
echo "process $f file..."
nohup /home/daw/scythe/scythe -a /home/daw/scythe/illumina_adapters.fa -o "OUT$base" $f &
(( count ++ ))
if (( count = 31 )); then
wait
count=0
fi
done
I'm explaining: FILES is a list of files from the raw folder.
The "core" line to execute nohup: the first path is the path to the tool, -a path is the path to the file with paternas to cut, out saves the same file name as the processed + OUT at the beginning. The last parameter is the input file to be processed.
Here readme tools:
https://github.com/vsbuffalo/scythe
Does anybody know how you can handle it?
P.S. I also tried move nohup before count, but it's still use one core. I have no limitation on server.
IMHO, the most likely solution is GNU Parallel, so you can run up to say, 64 jobs in parallel something like this:
parallel -j 64 /home/daw/scythe/scythe -a /home/daw/scythe/illumina_adapters.fa -o OUT{.} {} ::: /home/daw/raw/*
This has the benefit that jobs are not batched, it keeps 64 running at all times, starting a new one as each job finishes, which is better than waiting potentially 4.9 hours for all 32 of your jobs to finish before starting the last one which takes a further 5 hours after that. Note that I arbitrarily chose 64 jobs here, if you don't specify otherwise, GNU Parallel will run 1 job per CPU core you have.
Useful additional parameters are:
parallel --bar ... gives a progress bar
parallel --dry-run ... does a dry run so you can see what it would do without actually doing anything
If you have multiple servers available, you can add them in a list and GNU Parallel will distribute the jobs amongst them too:
parallel -S server1,server2,server3 ...

Linux | Background assignment of command output to variable

I need 3 commands to be run and their (single-line) outputs assigned to 3 different variables, which then I use to write to a file. I want to wait till the variable assignment is complete for all 3 before I echo the variables to the file. I am running these in a loop within a bash script.
This is what I have tried -
var1=$(longRunningCommand1) &
var2=$(longRunningCommand2) &
var3=$(longRunningCommand3) &
wait %1 %2 %3
echo "$var1,$var2,$var3">>$logFile
This gives no values at all, for the variables. I get -
,,
,,
,,
However, if I try this -
var1=$(longRunningCommand1 &)
var2=$(longRunningCommand2 &)
var3=$(longRunningCommand3 &)
wait %1 %2 %3
echo "$var1,$var2,$var3">>$logFile
I get the desired output,
o/p of longRunningCommand1, o/p of longRunningCommand2, o/p of longRunningCommand3
o/p of longRunningCommand1, o/p of longRunningCommand2, o/p of longRunningCommand3
o/p of longRunningCommand1, o/p of longRunningCommand2, o/p of longRunningCommand3
but the nohup.out for this shell script indicates that there was no background job to wait for -
netmon.sh: line 35: wait: %1: no such job
netmon.sh: line 35: wait: %2: no such job
netmon.sh: line 35: wait: %3: no such job
I would not have bothered much about this, but I definitely need to make sure that my script is waiting for all the 3 variables to be assigned before attempting the write. Whereas, the nohup.out tells me otherwise! I think I want to know if the 2nd approach is the right way when I run into a situation where any of those 3 commands are running for more than a few seconds. I have not yet been able to get a really long running command or a resource contention on the box to actually resolve this doubt of mine.
Thank you very much for any helpful thoughts.
-MT
Your goal of writing the output of echo "$var1,$var2,$var3">>$logFile while backgrounding actual processes of longRunningCommand1, ..2, ..3 can be accomplished using a list and redirection. As #that_other_guy notes, you cannot assign the result of a command substitution to a variable in the background to begin with. However, for a shell that provides process substitution like bash, you can write the output of a process to a file in the background and separating your processes and redirections by a ';' will insure the sequential write of command1, ..2, ..3 to the log file, e.g.:
Commands that are separated by a <semicolon> ( ';' )
shall be executed sequentially.
POSIX Specification - lists
Putting those pieces together, you would sequentially write the results of your comment to $logfile with something similar to the following,
( (longRunningCommand1) >> $logfile; (longRunningCommand2) >> $logfile; \
(longRunningCommand3) >> $logfile) &
(note: the ';' between commands writing to $logfile)
While not required, if you wanted to wait until all commands had been written to $logfile within your script (and your script supports $! as the PID for the last backgrouded process), you could simply wait $!, though that is not required to insure the write to the file completes.

How to use one environment variable when calling a bash script repeatedly

I have a task to monitor the system with a quota, if the monitored result is over the quota, send a warning email. But this monitor program should be called once in half an hour, after one warning email is sent out, the next time if the monitored state is still the same as last time, there is no need to send the same warning email again.
In order to do this, I would like to make use of environment variable to store the state of the last monitored result, so that the next time it can be checked and duplicate email would not be sent. One of my solution is to add or update the export syntax in .bashrc, but in order to activate the updated export syntax, I have to run bash, which might be unnecessary.
So I would like ask is there any way to update the environment variable so that every time when the monitor program Bash script is called, it gets the fresh updated value?
This is a self contained solution using a heredoc. At first glance it may seem an elaborate and inperfect solution, it does have its uses in that it's resilient and it works well when deploying across more than one machine, requires no special monitoring or permissions of external files, and most importantly, there are no unwanted surprises with environment.
This example uses bash, but it will work with sh if the $thisfile variable is set manually, or another way.
This example assumes that 20 is already in the script file as mymonitorvalue, and uses argument $1 as a proof of concept. You would obviously change newval="$1" to whatever calculates the quota:
Example usage:
#bash $>./script 10
Value from previous run was 20
Value from this test was 10
Updating value ...
#bash $>./script 10
not doing anything ... result is the same as last time
#bash $>
Script:
#!/bin/bash
thisfile="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )" ; thisfile="${DIR}${0}"
read -d '' currentvalue <<'EOF'
mymonitorval=20
EOF
eval "$currentvalue"
function changeval () {
sed -E -i "s/(^mymonitorval\=)(.*)/mymonitorval\="$1"/g" "$thisfile"
}
newvalue="$1"
if [[ "$newvalue" != "$mymonitorval" ]]; then
echo "Value from previous run was $mymonitorval"
echo "Value from this test was "$1""
echo "Updating value ..."
changeval "$newvalue"
else
echo "not doing anything ... result is the same as last time"
fi
Explanation:
thisfile= can be set manually for script location. This example uses the automated solution from here: https://stackoverflow.com/a/246128
read -d...EOF is the heredoc which is saved into variable $currentvalue
eval "$currentvalue" in this case is the equivalent of typing mymonitorval=10 into a terminal
function changeval...} updates the contents of the heredoc in place (it changes the physical .sh script file)
newvalue="$1" is for the purpose of testing. $newvalue would be determined by whatever your script is that is calculating quota
if.... block is to perform two alternate sets of actions depending on whether $newvalue is the same as it was last time or not.
Store environment variable in different .file and then source <.file>

Update Bash commands every 2 seconds (without re-running code everytime)

for my first bash project I am developing a simple bash script that shows basic information about my system:
#!/bash/sh
UPTIME=$(w)
MHZ=$(cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_cur_freq)
TEMP=$(cat /sys/class/thermal/thermal_zone0/temp)
#UPTIME shows the uptime of the device
#MHZ shows the overclocked specs
#TEMP shows the current CPU Temperature
echo "$UPTIME" #displays uptime
echo "$MHZ" #displays overclocked specs
echo "$TEMP" #displays CPU Temperature
MY QUESTION: How can I code this so that the uptime and CPU temperature refresh every 2seconds without re-generating the code new every time (I just want these two variables to update without having to enter the file path again and re-running the whole script).
This code is already working fine on my system but after it executes in the command line, the information isn't updating because it executed the command and is standing by for the next command instead of updating the variables such as UPTIME in real time.
I hope someone understands what I am trying to achieve, sorry about my bad wordings of this idea.
Thank you in advance...
I think it will help you. You can use the watch command for updating that for every two seconds without the loop.
watch ./filename.sh
It will give you the update of that command for every two second.
watch - execute a program periodically, showing output fullscreen
Not sure to really understand the main goal, but here's an answer to the basic question "How can I code this so that the uptime and CPU temperature refresh every two seconds ?" :
#!/bash/sh
while :; do
UPTIME=$(w)
MHZ=$(cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_cur_freq)
TEMP=$(cat /sys/class/thermal/thermal_zone0/temp)
#UPTIME shows the uptime of the device
#MHZ shows the overclocked specs
#TEMP shows the current CPU Temperature
echo "$UPTIME" #displays uptime
echo "$MHZ" #displays overclocked specs
echo "$TEMP" #displays CPU Temperature
sleep 2
done
I may suggest some modifications.
For such simple job I may recommend no to use external utilities. So instead of $(cat file) you could use $(<file). This is a cheaper method as bash does not have to launch cat.
On the other hand if reading those devices returns only one line, you can use the bash built-in read like: read ENV_VAR <single_line_file. It is even cheaper. If there are more lines and for example you want to read the 2nd line, you could use sg like this: { read line_1; read line2;} <file.
As I see w provides much more information and as I assume you need only the header line. This is exactly what uptime prints. The external utility uptime reads the /proc/uptime pseudo file. So to avoid to call externals, you can read this pseudo file directly.
The looping part also uses the external sleep(1) utility. For this the timeout feature of the read internal could be used.
So in short the script would look like this:
while :; do
# /proc/uptime has two fields, uptime and idle time
read UPTIME IDLE </proc/uptime
# Not having these pseudo files on my system, the whole line is read
# Maybe some formatting is needed. For MHZ /proc/cpuinfo may be used
read MHZ </sys/devices/system/cpu/cpu0/cpufreq/scaling_cur_freq
read TEMP </sys/class/thermal/thermal_zone0/temp
# Bash supports only integer arithmetic, so chomp off float
UPTIME_SEC=${UPTIME%.*}
UPTIME_HOURS=$((UPTIME_SEC/3600))
echo "Uptime: $UPTIME_HOURS hours"
echo $MHZ
echo $TEMP
# It reads stdin, so pressing an ENTER it returns immediately
read -t 2
done
This does not call any external utility and does not make any fork. So instead of executing 3 external utilities (using the expensive fork and execve system calls) in every 2 seconds this executes none. Much less system resources are used.
you could use while [ : ] and sleep 2
You need the awesome power of loops! Something like this should be a good starting point:
while true ; do
echo 'Uptime:'
w 2>&1 | sed 's/^/ /'
echo 'Clocking:'
sed 's/^/ /' /sys/devices/system/cpu/cpu0/cpufreq/scaling_cur_freq
echo 'Temperature:'
sed 's/^/ /' /sys/class/thermal/thermal_zone0/temp
echo '=========='
sleep 2
done
That should give you your three sections, with the data of each nicely indented.

Bash output happening after prompt, not before, meaning I have to manually press enter

I am having a problem getting bash to do exactly what I want, it's not a major issue, but annoying.
1.) I have a third party software I run that produces some output as stderr. Some of it is useful, some of it is regularly stuff I don't care about and I don't want this dumped to screen, however I do want the useful parts of the stderr dumped to screen. I figured the best way to achieve this was to pass stderr to a function, then use conditions in that function to either show the stderr or not.
2.) This works fine. However the solution I have implemented dumped out my errors at the right time, but then returns a bash prompt and I want to summarise the status of the errors at the end of the function, but echo-ing here prints the text after the prompt meaning that I have to press enter to get back to a clean prompt. It shall become clear with the example below.
My error stream generator:
./TestErrorStream.sh
#!/bin/bash
echo "test1" >&2
My function to process this:
./Function.sh
#!/bin/bash
function ProcessErrors()
{
while read data;
do
echo Line was:"$data"
done
sleep 5 # This is used simply to simulate the processing work I'm doing on the errors.
echo "Completed"
}
I source the Function.sh file to make ProcessErrors() available, then I run:
2> >(ProcessErrors) ./TestErrorStream.sh
I expect (and want) to get:
user#user-desktop:~/path$ 2> >(ProcessErrors) ./TestErrorStream.sh
Line was:test1
Completed
user#user-desktop:~/path$
However what I really get is:
user#user-desktop:~/path$ 2> >(ProcessErrors) ./TestErrorStream.sh
Line was:test1
user#user-desktop:~/path$ Completed
And no clean prompt. Of course the prompt is there, but "Completed" is being printed after the prompt, I want to printed before, and then a clean prompt to appear.
NOTE: This is a minimum working example, and it's contrived. While other solutions to my error stream problem are welcome I also want to understand how to make bash run this script the way I want it to.
Thanks for your help
Joey
Your problem is that the while loop stay stick to stdin until the program exits.
The release of stdin occurs at the end of the "TestErrorStream.sh", so your prompt is almost immediately available compared to what remains to process in the function.
I suggest you wrap the command inside a script so you'll be able to handle the time you want before your prompt is back (I suggest 1sec more than the suspected time needed for the function to process the remaining lines of codes)
I successfully managed to do this like that :
./Functions.sh
#!/bin/bash
function ProcessErrors()
{
while read data;
do
echo Line was:"$data"
done
sleep 5 # simulate required time to process end of function (after TestErrorStream.sh is over and stdin is released)
echo "Completed"
}
./TestErrorStream.sh
#!/bin/bash
echo "first"
echo "firsterr" >&2
sleep 20 # any number here
./WrapTestErrorStream.sh
#!/bin/bash
source ./Functions.sh
2> >(ProcessErrors) ./TestErrorStream.sh
sleep 6 # <= this one is important
With the above you'll get a nice "Completed" before your prompt after 26 seconds of processing. (Works fine with or without the additional "time" command)
user#host:~/path$ time ./WrapTestErrorStream.sh
first
Line was:firsterr
Completed
real 0m26.014s
user 0m0.000s
sys 0m0.000s
user#host:~/path$
Note: the process substitution ">(ProcessErrors)" is a subprocess of the script "./TestErrorStream.sh". So when the script ends, the subprocess is no more tied to it nor to the wrapper. That's why we need that final "sleep 6"
#!/bin/bash
function ProcessErrors {
while read data; do
echo Line was:"$data"
done
sleep 5
echo "Completed"
}
# Open subprocess
exec 60> >(ProcessErrors)
P=$!
# Do the work
2>&60 ./TestErrorStream.sh
# Close connection or else subprocess would keep on reading
exec 60>&-
# Wait for process to exit (wait "$P" doesn't work). There are many ways
# to do this too like checking `/proc`. I prefer the `kill` method as
# it's more explicit. We'd never know if /proc updates itself quickly
# among all systems. And using an external tool is also a big NO.
while kill -s 0 "$P" &>/dev/null; do
sleep 1s
done
Off topic side-note: I'd love to see how posturing bash veterans/authors try to own this. Or perhaps they already did way way back from seeing this.

Resources