Script gives me Syntax error: Bad fd number - linux

I have a sh script that throws me the 'bad fd number' error, and I can't really find a solution. So this is my script:
rr_path='RR/'
filename='sample_generation.txt'
logdir='RR/'
bash RR/compute-RR.sh $filename
And this is the error:
sample_generation.txt RR: RR/nonSingletonesRate.sh: 40: Syntax error:
Bad fd number
According to the error, something is wrong in the nonSingletonesRate.sh, which is this one:
#! /bin/bash
# This script computes the rate of non-singletone events (N-grams with
# n=1,2,3,4) of the input textual file. Computation is averaged on
# equally-sized non-overlapped sub-samples, spanning the whole text
#
# Its results can be used to compute the repetition rate of the text
# as geometric mean by means of /hltsrv0/cettolo/bin/geomMean.pl
bindir=`dirname $0`
irstlmdir=/hltsrv1/software/irstlm/irstlm-5.80.01
# Size of subsamples:
sampleSize=1000
fn=$1
out=`basename $fn`
echo $out
for n in 1 2 3 4
do
ls $fn
echo "n=$n"
totN=0
totC=0
offset=0
while [ $offset -gt -1 ]; do
offset=`perl $bindir/selectSubSample.pl $fn __subsample__$$ $sampleSize $offset`
cat __subsample__$$ | perl $bindir/word2ngrams.pl $n > __tmp__$$
$irstlmdir/bin/dict -cs=3 -f=yes -i=__tmp__$$ -sort=yes -c=yes >& __tmp.histo__$$
N=`egrep '^>0' __tmp.histo__$$ | awk '{print $2}'`
totN=`expr $totN + $N`
C=`egrep '^>1' __tmp.histo__$$ | awk '{print $2}'`
totC=`expr $totC + $C`
done
echo $totC $totN | awk '{print $0,100*$1/$2}'
echo ""
done
rm __tmp__$$ __tmp.histo__$$ __subsample__$$
exit
From what I've seen online, it could be a problem of having a >&, and it should be solved by changing it to >. So changing this line should be enough:
$irstlmdir/bin/dict -cs=3 -f=yes -i=__tmp__$$ -sort=yes -c=yes > __tmp.histo__$$
But if a do this, then I get the error:
RR/nonSingletonesRate.sh: 31:
/hltsrv1/software/irstlm/irstlm-5.80.01/bin/dict: not found

Related

How to efficiently loop through the lines of a file in Bash?

I have a file example.txt with about 3000 lines with a string in each line. A small file example would be:
>cat example.txt
saudifh
sometestPOIFJEJ
sometextASLKJND
saudifh
sometextASLKJND
IHFEW
foo
bar
I want to check all repeated lines in this file and output them. The desired output would be:
>checkRepetitions.sh
found two equal lines: index1=1 , index2=4 , value=saudifh
found two equal lines: index1=3 , index2=5 , value=sometextASLKJND
I made a script checkRepetions.sh:
#!bin/bash
size=$(cat example.txt | wc -l)
for i in $(seq 1 $size); do
i_next=$((i+1))
line1=$(cat example.txt | head -n$i | tail -n1)
for j in $(seq $i_next $size); do
line2=$(cat example.txt | head -n$j | tail -n1)
if [ "$line1" = "$line2" ]; then
echo "found two equal lines: index1=$i , index2=$j , value=$line1"
fi
done
done
However this script is very slow, it takes more than 10 minutes to run. In python it takes less than 5 seconds... I tried to store the file in memory by doing lines=$(cat example.txt) and doing line1=$(cat $lines | cut -d',' -f$i) but this is still very slow...
When you do not want to use awk (a good tool for the job, parsing the input only once),
you can run through the lines several times. Sorting is expensive, but this solution avoids the loops you tried.
grep -Fnxf <(uniq -d <(sort example.txt)) example.txt
With uniq -d <(sort example.txt) you find all lines that occur more than once. Next grep will search for these (option -f) complete (-x) lines without regular expressions (-F) and show the line it occurs (-n).
See why-is-using-a-shell-loop-to-process-text-considered-bad-practice for some of the reasons why your script is so slow.
$ cat tst.awk
{ val2hits[$0] = val2hits[$0] FS NR }
END {
for (val in val2hits) {
numHits = split(val2hits[val],hits)
if ( numHits > 1 ) {
printf "found %d equal lines:", numHits
for ( hitNr=1; hitNr<=numHits; hitNr++ ) {
printf " index%d=%d ,", hitNr, hits[hitNr]
}
print " value=" val
}
}
}
$ awk -f tst.awk file
found 2 equal lines: index1=1 , index2=4 , value=saudifh
found 2 equal lines: index1=3 , index2=5 , value=sometextASLKJND
To give you an idea of the performance difference using a bash script that's written to be as efficient as possible and an equivalent awk script:
bash:
$ cat tst.sh
#!/bin/bash
case $BASH_VERSION in ''|[123].*) echo "ERROR: bash 4.0 required" >&2; exit 1;; esac
# initialize an associative array, mapping each string to the last line it was seen on
declare -A lines=( )
lineNum=0
while IFS= read -r line; do
(( ++lineNum ))
if [[ ${lines[$line]} ]]; then
printf 'Content previously seen on line %s also seen on line %s: %s\n' \
"${lines[$line]}" "$lineNum" "$line"
fi
lines[$line]=$lineNum
done < "$1"
$ time ./tst.sh file100k > ou.sh
real 0m15.631s
user 0m13.806s
sys 0m1.029s
awk:
$ cat tst.awk
lines[$0] {
printf "Content previously seen on line %s also seen on line %s: %s\n", \
lines[$0], NR, $0
}
{ lines[$0]=NR }
$ time awk -f tst.awk file100k > ou.awk
real 0m0.234s
user 0m0.218s
sys 0m0.016s
There are no differences in the output of both scripts:
$ diff ou.sh ou.awk
$
The above is using 3rd-run timing to avoid caching issues and being tested against a file generated by the following awk script:
awk 'BEGIN{for (i=1; i<=10000; i++) for (j=1; j<=10; j++) print j}' > file100k
When the input file had zero duplicate lines (generated by seq 100000 > nodups100k) the bash script executed in about the same amount of time as it did above while the awk script executed much faster than it did above:
$ time ./tst.sh nodups100k > ou.sh
real 0m15.179s
user 0m13.322s
sys 0m1.278s
$ time awk -f tst.awk nodups100k > ou.awk
real 0m0.078s
user 0m0.046s
sys 0m0.015s
To demonstrate a relatively efficient (within the limits of the language and runtime) native-bash approach, which you can see running in an online interpreter at https://ideone.com/iFpJr7:
#!/bin/bash
case $BASH_VERSION in ''|[123].*) echo "ERROR: bash 4.0 required" >&2; exit 1;; esac
# initialize an associative array, mapping each string to the last line it was seen on
declare -A lines=( )
lineNum=0
while IFS= read -r line; do
lineNum=$(( lineNum + 1 ))
if [[ ${lines[$line]} ]]; then
printf 'found two equal lines: index1=%s, index2=%s, value=%s\n' \
"${lines[$line]}" "$lineNum" "$line"
fi
lines[$line]=$lineNum
done <example.txt
Note the use of while read to iterate line-by-line, as described in BashFAQ #1: How can I read a file line-by-line (or field-by-field)?; this permits us to open the file only once and read through it without needing any command substitutions (which fork off subshells) or external commands (which need to be individually started up by the operating system every time they're invoked, and are likewise expensive).
The other part of the improvement here is that we're reading the whole file only once -- implementing an O(n) algorithm -- as opposed to running O(n^2) comparisons as the original code did.

BASH: compare the size to a percent part of another file's size

I have the following situation:
A=$(df / | awk 'END{print $4}')
B=$(du -s /tmp | awk '{print $1}')
The condition is to alert when $B is less then 10% of $A size.
the way below i used doesn't seem to recognize '-lt' :
A=$(df / | awk 'END{print $4}')
B=$(du -s /tmp | awk '{print $1}')
if $(($A / $B)) -lt '10'
then echo "Bad case"
fi
: line 8: syntax error near unexpected token `-lt'
: line 8: `if (($A / $B)) -lt '10''
Any ideas how can it be achieved?
A=$(df / | awk 'END{print $4}')
B=$(du -s /tmp | awk '{print $1}')
echo $A $B | awk '{C = 10*B; if (C < A) print "Bad case"}'
You forget [] in if statement:
if [ $(($A / $B)) -lt '10' ]
then echo "BLYAD"
fi
if (( A < 10 * B ))
then
echo "Bad case"
fi
In bash, you can't do fractional arithmetic. Hence you should either rewrite the expression so that it doesn't need fractions or use zsh, which has floating point arithmetic.

Bash integer expression on IF statement when retrieving free memory

I am getting the below error while running the script memory.sh script:
[root#test tmp]# ./memory.sh
./memory.sh: line 3: [: 2.05028: integer expression expected
Normal
The content of the memory.sh script is:
[root#test tmp]# cat memory.sh
threshold=80
MEMORY=$(free | grep Mem | awk '{print $3/$2 * 100.0}')
if [ ${MEMORY} -gt ${threshold} ]; then
sudo sync;echo 3 > /proc/sys/vm/drop_caches
else
echo "Normal"
fi
Does anyone know how to prevent this error?
The error on the IF statement comes up when retrieving float values from
the Memory.
The following script casts the float memory value to an integer for the comparison:
#!/bin/bash
threshold=80
memory=$(free | grep Mem | awk '{print $3/$2 * 100.0}')
castedMemory=$(echo $memory | cut -d'.' -f1)
if [[ "$castedMemory" -gt $threshold ]]; then
sudo sync
echo 3 > /proc/sys/vm/drop_caches
else
echo "Normal"
fi
Looks like you need to convert the computation result into integer type to be able to compare it. What about that suggestion:
#!/bin/bash
threshold=80
MEMORY=$(free | grep Mem | awk '{print int($3/$2 * 100.0)}')
if [ ${MEMORY} -gt ${threshold} ]
then
sudo sync;echo 3 > /proc/sys/vm/drop_caches
else
echo "Normal"
fi
This does not throw an error for me but outputs a friendly "Normal" ;-)

Find the first missing file in a series of numbered files

I have directory containing files:
$> ls blender/output/celebAnim/
0100.png 0107.png 0114.png 0121.png 0128.png 0135.png 0142.png 0149.png 0156.png 0163.png 0170.png 0177.png 0184.png 0191.png 0198.png 0205.png 0212.png 0219.png 0226.png 0233.png 0240.png 0247.png 0254.png 0261.png 0268.png 0275.png 0282.png
0101.png 0108.png 0115.png 0122.png 0129.png 0136.png 0143.png 0150.png 0157.png 0164.png 0171.png 0178.png 0185.png 0192.png 0199.png 0206.png 0213.png 0220.png 0227.png 0234.png 0241.png 0248.png 0255.png 0262.png 0269.png 0276.png 0283.png
0102.png 0109.png 0116.png 0123.png 0130.png 0137.png 0144.png 0151.png 0158.png 0165.png 0172.png 0179.png 0186.png 0193.png 0200.png 0207.png 0214.png 0221.png 0228.png 0235.png 0242.png 0249.png 0256.png 0263.png 0270.png 0277.png 0284.png
0103.png 0110.png 0117.png 0124.png 0131.png 0138.png 0145.png 0152.png 0159.png 0166.png 0173.png 0180.png 0187.png 0194.png 0201.png 0208.png 0215.png 0222.png 0229.png 0236.png 0243.png 0250.png 0257.png 0264.png 0271.png 0278.png
0104.png 0111.png 0118.png 0125.png 0132.png 0139.png 0146.png 0153.png 0160.png 0167.png 0174.png 0181.png 0188.png 0195.png 0202.png 0209.png 0216.png 0223.png 0230.png 0237.png 0244.png 0251.png 0258.png 0265.png 0272.png 0279.png
0105.png 0112.png 0119.png 0126.png 0133.png 0140.png 0147.png 0154.png 0161.png 0168.png 0175.png 0182.png 0189.png 0196.png 0203.png 0210.png 0217.png 0224.png 0231.png 0238.png 0245.png 0252.png 0259.png 0266.png 0273.png 0280.png
0106.png 0113.png 0120.png 0127.png 0134.png 0141.png 0148.png 0155.png 0162.png 0169.png 0176.png 0183.png 0190.png 0197.png 0204.png 0211.png 0218.png 0225.png 0232.png 0239.png 0246.png 0253.png 0260.png 0267.png 0274.png 0281.png
For some script, I will need to find out what the number of the first missing file is. In the above output, it would be 0285.png. However, it is also possible that files in between are missing. In the end, I am only interested in the number 285, which is part of the file name.
This is part of recovery logic: The files should be created by the script, but this step can fail. Therefore I want to have a means to check which files are missing and try to create them in a second step.
This is what I got so far (from how to extract part of a filename before '.' or before extension):
ls blender/output/celebAnim/ | awk -F'[.]' '{print $1}'
What I cannot figure out is how do I find the smallest number missing from that result, above a certain offset? The offset in this case is 100.
You could loop over all number from 100 to 500 and check if the corresponding file exists; if it doesn't, you'd print the number you're looking at:
for i in {100..500}; do
[[ ! -f 0$i.png ]] && { echo "$i missing!"; break; }
done
This prints, for your example, 285 missing!.
This solution could be made a bit more flexible by, for example, looping over zero padded numbers and then extracting the unpadded number:
for i in {0100..0500}; do
[[ ! -f $i.png ]] && { echo "${i##*(0)} missing!"; break; }
done
This requires extended globs (shopt -s extglob) for the *(0) pattern ("zero or more repetitions of 0").
begin=100
end=500
for i in `seq $begin 1 $end`; do
fname="0"$i".png"
if [ ! -f $fname ]; then
echo "$fname is missing"
fi
done
#!/bin/sh
search_dir=blender/output/celebAnim/
ls $search_dir > file_list
count=`wc -l file_list | awk '{ print $1 }'`
if [[ $count -eq 0 ]]
then
echo "No files in given directory!"
break
fi
file_extension=`head -1 file_list | tail -1 | awk -F "." '{ print $2 }'`
init_file_value=`head -1 file_list | tail -1 | awk -F "." '{ print $1 }'`
i=2
while [ $i -le $count ]
do
next_file_value=`head -$i file_list | tail -1 | awk -F "." '{ print $1 }'`
next_value=$((init_file_value+1));
if [ $next_file_value -ne $next_value ]
then
echo $next_value"."$file_extension
break
fi
init_file_value=$next_value;
i=$((i+1));
done
try it:
ls blender/output/celebAnim/ | sort -r | head -n1 | awk -F'.' '{print $1+1}'
command return 285
if need return 0285 than try it:
ls blender/output/celebAnim/ | sort -r | head -n1 | awk -F'.' '{print 0($1+1)}'

Retrieve string between characters and assign on new variable using awk in bash

I'm new to bash scripting, I'm learning how commands work, I stumble in this problem,
I have a file /home/fedora/file.txt
Inside of the file is like this:
[apple] This is a fruit.
[ball] This is a sport's equipment.
[cat] This is an animal.
What I wanted is to retrieve words between "[" and "]".
What I tried so far is :
while IFS='' read -r line || [[ -n "$line" ]];
do
echo $line | awk -F"[" '{print$2}' | awk -F"]" '{print$1}'
done < /home/fedora/file.txt
I can print the words between "[" and "]".
Then I wanted to put the echoed word into a variable but i don't know how to.
Any help I will appreciate.
Try this:
variable="$(echo $line | awk -F"[" '{print$2}' | awk -F"]" '{print$1}')"
or
variable="$(awk -F'[\[\]]' '{print $2}' <<< "$line")"
or complete
while IFS='[]' read -r foo fruit rest; do echo $fruit; done < file
or with an array:
while IFS='[]' read -ra var; do echo "${var[1]}"; done < file
In addition to using awk, you can use the native parameter expansion/substring extraction provided by bash. Below # indicates a trim from the left, while % is used to trim from the right. (note: a single # or % indicates removal up to the first occurrence, while ## or %% indicates removal of all occurrences):
#!/bin/bash
[ -r "$1" ] || { ## validate input is readable
printf "error: insufficient input. usage: %s filename\n" "${0##*/}"
exit 1
}
## read each line and separate label and value
while read -r line || [ -n "$line" ]; do
label=${line#[} # trim initial [ from left
label=${label%%]*} # trim through ] from right
value=${line##*] } # trim from left through '[ '
printf " %-8s -> '%s'\n" "$label" "$value"
done <"$1"
exit 0
Input
$ cat dat/labels.txt
[apple] This is a fruit.
[ball] This is a sport's equipment.
[cat] This is an animal.
Output
$ bash readlabel.sh dat/labels.txt
apple -> 'This is a fruit.'
ball -> 'This is a sport's equipment.'
cat -> 'This is an animal.'

Resources