eliminate subshells for faster process? - linux

I've read that scripts that are calling for a subshell are slow, which would explain why my script are slow.
for example here, where I'm running a loop that gets an number from an array, is this running a subshell everytime, and can this be solved without using subshells?
mmode=1
modes[1,2]="9,12,18,19,20,30,43,44,45,46,47,48,49"
until [[ -z $kik ]];do
((++mloop))
kik=$(echo ${modes[$mmode,2]} | cut -d "," -f $mloop)
filename=$(basename "$f")
# is all these lines
xcolorall=$((xcolorall+$storednr)
# also triggering
pros2=$(echo "100/$totpix*$xcolorall" | bc -l)
IFS='.' read -r pros5 pros6 <<< "$pros2"
procenthittotal2=$pros5.${pros6:0:2}
#subshells and if,
# is it possible to circumvent it?
#and more of the same code..
done
updated:
the pros2 variable is calculating percent, how many % xcolorall are of totpix and the kik variable is getting a number from the array modes, informing the loop about what color it should count in this loop.
I suspect these are the main hoggers, is there anyway to do this without subshells?

You can replace all the subshells and extern commands shown in your question with bash built-ins.
kik=$(echo ${modes[$mmode,2]} | cut -d "," -f $mloop) can be replaced by
mapfile -d, -t -s$((mloop-1)) -n1 kik <<< "${modes[$mmode,2]}".
If $mmode is constant here, better replace the whole loop with
while IFS=, read -r kik; do ...; done <<< "${modes[$mmode,2]}".
filename=$(basename "$f") can be replaced by
filename=${f##*/} which runs 100 times faster, see benchmark.
pros2=$(echo "100/$totpix*$xcolorall" | bc -l) can be replaced by
(( pros2 = 100 * xcolorall / totpix )) if you don't care for the decimals, or by
precision=2; (( pros = 10**precision * 100 * xcolorall / totpix )); printf -v pros "%0${precision}d" "$pros"; pros="${pros:0: -precision}.${pros: -precision}" if you want 2 decimal places.
Of course you can leave out the last commands (for turning 12345 into 123.45) until you really need the decimal number.
But if speed really matters, write the script in another language. I think awk, perl, or python would be a good match here.

Related

Bash Script: Decimal increments in loop (cannot do) [duplicate]

Here is my script:
d1=0.003
d2=0.0008
d1d2=$((d1 + d2))
mean1=7
mean2=5
meandiff=$((mean1 - mean2))
echo $meandiff
echo $d1d2
But instead of getting my intended output of:
0.0038
2
I am getting the error Invalid Arithmetic Operator, (error token is ".003")?
bash does not support floating-point arithmetic. You need to use an external utility like bc.
# Like everything else in shell, these are strings, not
# floating-point values
d1=0.003
d2=0.0008
# bc parses its input to perform math
d1d2=$(echo "$d1 + $d2" | bc)
# These, too, are strings (not integers)
mean1=7
mean2=5
# $((...)) is a built-in construct that can parse
# its contents as integers; valid identifiers
# are recursively resolved as variables.
meandiff=$((mean1 - mean2))
Another way to calculate floating numbers, is by using AWK rounding capability, for example:
a=502.709672592
b=501.627497268
echo "$a $b" | awk '{print $1 - $2}'
1.08218
In case you do not need floating point precision, you may simply strip off the decimal part.
echo $var | cut -d "." -f 1 | cut -d "," -f 1
cuts the integer part of the value. The reason to use cut twice is to parse integer part in case a regional setting may use dots to separate decimals and some others may use commas.
Edit:
Or, to automate the regional settings one may use locale.
echo $var | cut -d $(locale decimal_point) -f 1
You can change the shell which you are using. If you are executing your script with bash shell bash scriptname.sh try using ksh for your script execution. Bash doesn't support arithmetic operations that involve floating point numbers.
Big shout-out to the bc command - it totally saved my day! It's a simple answer, but it worked like a charm.
a=1.1
b=1.1
echo $a + $b | bc -l
# Output:
2.2
#SUM
sum=$(echo $a + $b | bc -l)
echo $sum
# Output
2.2
bc is a command-line calculator, which allows users to perform mathematical calculations on the terminal.

More convenient way to do arithmetic with program output at the shell?

I usually need to run programs to do some file checking, like say use wc to count the lines of a file and then do some arithmetic with it. Usually the way I do this is just getting the output and then doing the arithmetic by opening a python terminal or whichever software can be used to do so.
If I have to do it many times, then this gets a bit annoying, and I'd like to have some method to take the output directly and do the arithmetic that I want. For instance, one that I like is using perl in the following way, assuming I have to take the output of wc and divide it by 12:
perl -e 'print `wc -l file`/12'
This can be useful but gets annoying after a while. Since this is probably something people need to do all the time, I'd like to know what better faster methods people use to do this fast. I've seen expr might be even better, but I get a syntax error when passing it the output of something bound in ``, like above. So basically the shortest, most efficient way one can do this simple arithmetic in linux terminals from file outputs.
Double parentheses ((...)) perform arithmetic, and with a dollar sign $((...)) you can get the result as a string.
echo $((`wc -l < file` / 12))
echo $(($(wc -l < file) / 12))
You can use variables and they don't need dollar signs. Both var and $var are acceptable:
lines=$(wc -l < file)
echo $((lines / 12))
if ((lines * 42 + 17 > 630)); then
...
fi
So basically I have tested with a code on my bash:
Multiline code:
a=$(echo "hi" | wc -l)
echo $a
b=`expr $a + 2`
echo $b
Which I have changed to one line:
echo `expr $(echo "hi" | wc -l) + 20`
echo "hi" | wc -l is calculating no of lines and is within $() which makes it as one variable and evaluate its value
Then expr takes two arguements here and make sure you to use space before and after the operator and use a backtic(`) to evaluate thi and doing echo finally

Finding a line that shows in a file only once

Assuming that I have files with 100 lines. There are a lot of lines that repeat themselves in the file, and only one line that does not.
I want to find the line that shows only once. Is there a command for that or do I have to build some complicated loop as below?
My code so far:
#!/bin/bash
filename="repeat_lines.txt"
var="$(wc -l <$filename )"
echo "length:" $var
#cp ex4.txt ex4_copy.txt
for((index=0; index < var; index++));
do
one="$(head -n $index $filename | tail -1)"
counter=0
for((index2=0; index2 < var; index2++));
do
two="$(head -n $index2 $filename | tail -1)"
if [ "$one" == "$two" ]; then
counter=$((counter+1))
fi
done
echo $one"is "$counter" times in the text: "
done
If I understood your question correctly, then
sort repeat_lines.txt | uniq -u should do the trick.
e.g. for file containing:
a
b
a
c
b
it will output c.
For further reference, see sort manpage, uniq manpage.
You've got a reasonable answer that uses standard shell tools sort and uniq. That's probably the solution you want to use, if you want something that is portable and doesn't require bash.
But an alternative would be to use functionality built into your bash shell. One method might be to use an associative array, which is a feature of bash 4 and above.
$ cat file.txt
a
b
c
a
b
$ declare -A lines
$ while read -r x; do ((lines[$x]++)); done < file.txt
$ for x in "${!lines[#]}"; do [[ ${lines["$x"]} -gt 1 ]] && unset lines["$x"]; done
$ declare -p lines
declare -A lines='([c]="1" )'
What we're doing here is:
declare -A creates the associative array. This is the bash 4 feature I mentioned.
The while loop reads each line of the file, and increments a counter that uses the content of a line of the file as the key in the associative array.
The for loop steps through the array, deleting any element whose counter is greater than 1.
declare -p prints the details of an array in a predictable, re-usable format. You could alternately use another for loop to step through the remaining array elements (of which there might be only one) in order to do something with them.
Note that this solution, while fine for small files (say, up to a few thousand lines), may not scale well for very large files of, say, millions of lines. Bash isn't the fastest at reading input this way, and one must be cognizant of memory limits when using arrays.
The sort alternative has the benefit of memory optimization using files on disk for extremely large files, at the expense of speed.
If you're dealing with files of only a few hundred lines, then it's hard to predict which solution will be faster. In the end, the form of output may dictate your choice of solution. The sort | uniq pipe generates a list to standard output. The bash solution above generates the same list as keys in an array. Otherwise, they are functionally equivalent.

How to monitor CPU usage automatically and return results when it reaches a threshold

I am new to shell script , i want to write a script to monitor CPU usage and if the CPU usage reaches a threshold it should print the CPU usage by top command ,here is my script , which is giving me error bad number and also not storing any value in the log files
while sleep 1;do if [ "$(top -n1 | grep -i ^cpu | awk '{print $2}')">>sy.log - ge "$Threshold" ]; then echo "$(top -n1)">>sys.log;fi;done
Your script HAS to be indented and stored to a file, especially if you are new to shell !
#!/bin/sh
while sleep 1
do
if [ "$(top -n1 | grep -i ^cpu | awk '{print $2}')">>sy.log - ge "$Threshold" ]
then
echo "$(top -n1)" >> sys.log
fi
done
Your condition looks a bit odd. It may work, but it looks really complex. Store intermediate results in variables, and evaluate them.
Then, you will immediately see the syntax error on the “-ge”.
You HAVE to store logfiles within an absolute path for security reasons. Use variables to simplify the reading.
#!/bin/sh
LOGFILE=/absolute_path/sy.log
WHOLEFILE=/absolute_path/sys.log
Thresold=80
while sleep 1
do
TOP="$(top -n1)"
CPU="$(echo $TOP | grep -i ^cpu | awk '{print $2}')"
echo $CPU >> $LOGFILE
if [ "$CPU" -ge "$Threshold" ] ; then
echo "$TOP" >> $WHOLEFILE
fi
done
You have a couple of errors.
If you write output to sy.log with a redirection then that output is no longer available to the shell. You can work around this with tee.
The dash before -ge must not be followed by a space.
Also, a few stylistic remarks:
grep x | awk '{y}' is a useless use of grep; this can usefully and more economically (as well as more elegantly) be rewritten as awk '/x/{y}'
echo "$(command)" is a useless use of echo -- not a deal-breaker, but you simply want command; there is no need to capture what it prints to standard output just so you can print that text to standard output.
If you are going to capture the output of top -n 1 anyway, there is no need really to run it twice.
Further notes:
If you know the capitalization of the field you want to extract, maybe you don't need to search case-insensitively. (I could not find a version of top which prints a CPU prefix with the load in the second field -- it the expression really correct?)
The shell only supports integer arithmetic. Is this a bug? Maybe you want to use Awk (which has floating-point support) to perform the comparison? This also allows for a moderately tricky refactoring. We make Awk output an exit code of 1 if the comparison fails, and use that as the condition for the if.
#!/bin/sh
while sleep 1
do
if top=$(top -n 1 |
awk -v thres="$Threshold" '1; # print every line
tolower($1) ~ /^cpu/ { print $2 >>"sy.log";
exitcode = ($2 >= thres ? 0 : 1) }
END { exit exitcode }')
then
echo "$top" >>sys.log
fi
done
Do you really mean to have two log files with nearly the same name, or is that a typo? Including a time stamp in the log might be useful both for troubleshooting and for actually using the log files.

Bash script to copy numbered files in reverse order

I have a sequence of files:
image001.jpg
image002.jpg
image003.jpg
Can you help me with a bash script that copies the images in reverse order so that the final result is:
image001.jpg
image002.jpg
image003.jpg
image004.jpg <-- copy of image003.jpg
image005.jpg <-- copy of image002.jpg
image006.jpg <-- copy of image001.jpg
The text in parentheses is not part of the file name.
Why do I need it? I am creating video from a sequence of images and would like the video to play "forwards" and then "backwards" (looping the resulting video).
You can use printf to print a number with leading 0s.
$ printf '%03d\n' 1
001
$ printf '%03d\n' 2
002
$ printf '%03d\n' 3
003
Throwing that into a for loop yields:
MAX=6
for ((i=1; i<=MAX; i++)); do
cp $(printf 'image%03d.jpg' $i) $(printf 'image%03d.jpg' $((MAX-i+1)))
done
I think that I'd use an array for this... that way, you don't have to hard code a value for $MAX.
image=( image*.jpg )
MAX=${#image[*]}
for i in ${image[*]}
do
num=${i:5:3} # grab the digits
compliment=$(printf '%03d' $(echo $MAX-$num | bc))
ln $i copy_of_image$compliment.jpg
done
I used 'bc' for arithmetic because bash interprets leading 0s as an indicator that the number is octal, and the parameter expansion in bash isn't powerful enough to strip them without jumping through hoops. I could have done that in sed, but as long as I was calling something outside of bash, it made just as much sense to do the arithmetic directly.
I suppose that Kuegelman's script could have done something like this:
MAX=(ls image*.jpg | wc -l)
That script has bigger problems though, because it's overwriting half of the images:
cp image001.jpg image006.jpg # wait wait!!! what happened to image006.jpg???
Also, once you get above 007, you run into the octal problem.

Resources