Move a file if all 6th fields are 0 - linux

I want to be able to check a file to see if all records are 0 within a file and if they are to then move the file.
I have written the script, ran it, no errors, but it does not move the file, can anyone please suggest why?
#!/bin/bash
result=`cat conc_upld_atp.11002.20141204151900.dat | awk -F , '{ print $6 }' | uniq`
if [ result = "1" ]; then
mv conc_upld_atp.11002.20141204151900.dat home/stephenb/scripttest
fi

In Bash = compares strings; to you compare integers you need -eq:
if [ "$result" -eq 1 ]; then
Note that it is preferred to say var=$(command). Also, your command cat file | awk '...' can be simplified to just awk '...' file. And depending on what exactly you want to do, probably awk can handle all of it.
For example, if you just want to check if any of the 6th fields are not 0, just use:
awk '$1 != 0 {v=1} END {print v+0}' file
and then the rest of your code.
However, you can do it in a extremely fast way by using what 999999999999999999999999999999 suggested in comments:
awk -F, '$6!=0{exit 1}' file && mv file newfile
This loops through the file and exits with a code error if any line contains a 6th field different from 0. If this does not happen, awk's exit code is 0, so that the && command is performed and, hence, mv file newfile happens. You can even keep track of the other condition by saying:
awk -F, '$6!=0{exit 1}' file && mv file newfile || echo "bad data"

You're wanting to check if all records are 0, but you specifically check result = "1".
I would recommend two things: use numerical comparison and compare against the correct value:
if (( result == 0 )); then

If you want check whether all fields is 0, then try this script:
result=`cat conc_upld_atp.11002.20141204151900.dat | awk '{ print $6 }' | uniq`
if [ "$result" == "0" ]; then
mv conc_upld_atp.11002.20141204151900.dat /home/stephenb/scripttest
fi

In the end i used the below.
#!/bin/bash
result=`cat conc_upld_atp.11002.20141204151900.dat | awk -F , '{ print $6 }' | uniq`
resultcount=`echo $result | wc -l`
echo $resultcount
if [ $resultcount == "1" ]; then
echo match
mv conc_upld_atp.11002.20141204151900.dat /home/stephenb/scripttest
fi

Related

Bash function with input fails awk command

I am writing a function in a BASH shell script, that should return lines from csv-files with headers, having more commas than the header. This can happen, as there are values inside these files, that could contain commas. For quality control, I must identify these lines to later clean them up. What I have currently:
#!/bin/bash
get_bad_lines () {
local correct_no_of_commas=$(head -n 1 $1/$1_0_0_0.csv | tr -cd , | wc -c)
local no_of_files=$(ls $1 | wc -l)
for i in $(seq 0 $(( ${no_of_files}-1 )))
do
# Check that the file exist
if [ ! -f "$1/$1_0_${i}_0.csv" ]; then
echo "File: $1_0_${i}_0.csv not found!"
continue
fi
# Search for error-lines inside the file and print them out
echo "$1_0_${i}_0.csv has over $correct_no_of_commas commas in the following lines:"
grep -o -n '[,]' "$1/$1_0_${i}_0.csv" | cut -d : -f 1 | uniq -c | awk '$1 > $correct_no_of_commas {print}'
done
}
get_bad_lines products
get_bad_lines users
The output of this program is now all the comma-counts with all of the line numbers in all the files,
and I suspect this is due to the input $1 (foldername, i.e. products & users) conflicting with the call to awk with reference to $1 as well (where I wish to grab the first column being the count of commas for that line in the current file in the loop).
Is this the issue? and if so, would it be solvable by either referencing the 1.st column or the folder name by different variable names instead of both of them using $1 ?
Example, current output:
5 6667
5 6668
5 6669
5 6670
(should only show lines for that file having more than 5 commas).
Tried variable declaration in call to awk as well, with same effect
(as in the accepted answer to Awk field variable clash with function argument)
:
get_bad_lines () {
local table_name=$1
local correct_no_of_commas=$(head -n 1 $table_name/${table_name}_0_0_0.csv | tr -cd , | wc -c)
local no_of_files=$(ls $table_name | wc -l)
for i in $(seq 0 $(( ${no_of_files}-1 )))
do
# Check that the file exist
if [ ! -f "$table_name/${table_name}_0_${i}_0.csv" ]; then
echo "File: ${table_name}_0_${i}_0.csv not found!"
continue
fi
# Search for error-lines inside the file and print them out
echo "${table_name}_0_${i}_0.csv has over $correct_no_of_commas commas in the following lines:"
grep -o -n '[,]' "$table_name/${table_name}_0_${i}_0.csv" | cut -d : -f 1 | uniq -c | awk -v table_name="$table_name" '$1 > $correct_no_of_commas {print}'
done
}
You can use awk the full way to achieve that :
get_bad_lines () {
find "$1" -maxdepth 1 -name "$1_0_*_0.csv" | while read -r my_file ; do
awk -v table_name="$1" '
NR==1 { num_comma=gsub(/,/, ""); }
/,/ { if (gsub(/,/, ",", $0) > num_comma) wrong_array[wrong++]=NR":"$0;}
END { if (wrong > 0) {
print(FILENAME" has over "num_comma" commas in the following lines:");
for (i=0;i<wrong;i++) { print(wrong_array[i]); }
}
}' "${my_file}"
done
}
For why your original awk command failed to give only lines with too many commas, that is because you are using a shell variable correct_no_of_commas inside a single quoted awk statement ('$1 > $correct_no_of_commas {print}'). Thus there no substitution by the shell, and awk read "$correct_no_of_commas" as is, and perceives it as an undefined variable. More precisely, awk look for the variable correct_no_of_commas which is undefined in the awk script so it is an empty string . awk will then execute $1 > $"" as matching condition, and as $"" is a $0 equivalent, awk will compare the count in $1 with the full input line. From a numerical point of view, the full input line has the form <tab><count><tab><num_line>, so it is 0 for awk. Thus, $1 > $correct_no_of_commas will be always true.
You can identify all the bad lines with a single awk command
awk -F, 'FNR==1{print FILENAME; headerCount=NF;} NF>headerCount{print} ENDFILE{print "#######\n"}' /path/here/*.csv
If you want the line number also to be printed, use this
awk -F, 'FNR==1{print FILENAME"\nLine#\tLine"; headerCount=NF;} NF>headerCount{print FNR"\t"$0} ENDFILE{print "#######\n"}' /path/here/*.csv

Check if all lines in a file are in the same format

I would like to wrote a little shell script that permit to check if all lines on a file has the same number of ;
I have a file containing the following format :
$ cat filename.txt
34567890;098765456789;098765567;9876;9876;EXTG;687J;
4567800987987;09876789;9667876YH;9876;098765;098765;09876;
SLKL987H;09876LKJ;POIUYT;PÖIUYT;88765K;POIUYTY;LKJHGFDF;
TYUIO;09876LKJ;POIUYT;LKJHG;88765K;POIUYTY;OIUYT;
...
...
...
SDFGHJK;RTYUIO9876;4567890LKJHGFD;POIUYTRF56789;POIUY;POIUYT;9876;
I use the following command for determine of the number of ; of each line :
awk -F';' 'NF{print (NF-1)}' filename.txt
I have the following output :
7
7
7
7
...
...
...
7
Because number of ; on each line of this file is 7.
Now, I want to wrote a script that permit me to verify if all the lines in the file have 7 commas. If it's OK, it tells me that the file is correct. Otherwise, if there is a single line containing more than 7 commas, it tells me that the file is not correct.
Rather than printing output, return a value. eg
awk -F',' 'NR==1{count = NF} NF!=count{status=1}END{exit status}' filename.txt
If there are no lines or if all lines contain the same number of commas, this will return 0. Otherwise, it returns 1 to indicate failure.
Count the number of unique lines and verify that the count is 1.
if (($(awk -F';' 'NF{print (NF-1)}' filename.txt | uniq | wc -l) == 1)); then
echo good
else
echo bad
fi
Just pipe the result through sort -u | wc -l. If all lines have the same number of fields, this will produce one line of output.
Alternatively, just look for a line in awk that doesn't have the same number of fields as the first line.
awk -F';' 'NR==1 {linecount=NF}
linecount != NF { print "Bad line " $0; exit 1}
' filename.txt && echo "Good file"
You can also adapt the old trick used to output only the first of duplicate lines.
awk -F';' '{a[NF]=1}; length(a) > 1 {exit 1}' filename.txt
Each line updates the count of lines with that number of fields. Exit with status 1 as soon as a has more than one entry. Basically, a acts as a set of all field counts seen so far.
Based on all the information you have given me, I ended up doing the following. And it works for me.
nbCol=`awk -F';' '(NR==1){print NF;}' $1`
val=7
awk -F';' 'NR==1{count = NF} NF != count { exit 1}' $1
result=`echo $?`
if [ $result -eq 0 ] && [ $nbCol -eq $val ];then
echo "Good Format"
else
echo "Bad Format"
fi

bash count sequential files

I'm pretty new to bash scripting so some of the syntaxes may not be optimal. Please do point them out if you see one.
I have files in a directory named sequentially.
Example: prob01_01 prob01_03 prob01_07 prob02_01 prob02_03 ....
I am trying to have the script iterate through the current directory and count how many extensions each problem has. Then print the pre-extension name then count
Sample output for above would be:
prob01 3
prob02 2
This is my code:
#!/bin/bash
temp=$(mktemp)
element=''
count=0
for i in *
do
current=${i%_*}
if [[ $current == $element ]]
then
let "count+=1"
else
echo $element $count >> temp
element=$current
count=1
fi
done
echo 'heres the temp:'
cat temp
rm 'temp'
The Problem:
Current output:
prob1 3
Desired output:
prob1 3
prob2 2
The last count isn't appended because it's not seeing a different element after it
My Guess on possible solutions:
Have the last append occur at the end of the for loop?
Your code has 2 problems.
The first problem doesn't answer your question. You make a temporary file, the filename is stored in $temp. You should use that one, and not the file with the fixed name temp.
The problem is that you only write results when you see a new problem/filename. The last one will not be printed.
Fixing only these problems will result in
results() {
if (( count == 0 )); then
return
fi
echo $element $count >> "${temp}"
}
temp=$(mktemp)
element=''
count=0
for i in prob*
do
current=${i%_*}
if [[ $current == $element ]]
then
let "count+=1" # Better is using ((count++))
else
results
element=$current
count=1
fi
done
results
echo 'heres the temp:'
cat "${temp}"
rm "${temp}"
You can do without the script with
ls prob* | cut -d"_" -f1 | sort | uniq -c
When you want the have the output displayed as given, you need one more step.
ls prob* | cut -d"_" -f1 | sort | uniq -c | awk '{print $2 " " $1}'
You may use printf + awk solution:
printf '%s\n' *_* | awk -F_ '{a[$1]++} END{for (i in a) print i, a[i]}'
prob01 3
prob02 2
We use printf to print each file that has at least one _
We use awk to get a count of each file's first element delimited by _ by using an associative array.
I would do it like this:
$ ls | awk -F_ '{print $1}' | sort | uniq -c | awk '{print $2 " " $1}'
prob01 3
prob02 2

Find the first missing file in a series of numbered files

I have directory containing files:
$> ls blender/output/celebAnim/
0100.png 0107.png 0114.png 0121.png 0128.png 0135.png 0142.png 0149.png 0156.png 0163.png 0170.png 0177.png 0184.png 0191.png 0198.png 0205.png 0212.png 0219.png 0226.png 0233.png 0240.png 0247.png 0254.png 0261.png 0268.png 0275.png 0282.png
0101.png 0108.png 0115.png 0122.png 0129.png 0136.png 0143.png 0150.png 0157.png 0164.png 0171.png 0178.png 0185.png 0192.png 0199.png 0206.png 0213.png 0220.png 0227.png 0234.png 0241.png 0248.png 0255.png 0262.png 0269.png 0276.png 0283.png
0102.png 0109.png 0116.png 0123.png 0130.png 0137.png 0144.png 0151.png 0158.png 0165.png 0172.png 0179.png 0186.png 0193.png 0200.png 0207.png 0214.png 0221.png 0228.png 0235.png 0242.png 0249.png 0256.png 0263.png 0270.png 0277.png 0284.png
0103.png 0110.png 0117.png 0124.png 0131.png 0138.png 0145.png 0152.png 0159.png 0166.png 0173.png 0180.png 0187.png 0194.png 0201.png 0208.png 0215.png 0222.png 0229.png 0236.png 0243.png 0250.png 0257.png 0264.png 0271.png 0278.png
0104.png 0111.png 0118.png 0125.png 0132.png 0139.png 0146.png 0153.png 0160.png 0167.png 0174.png 0181.png 0188.png 0195.png 0202.png 0209.png 0216.png 0223.png 0230.png 0237.png 0244.png 0251.png 0258.png 0265.png 0272.png 0279.png
0105.png 0112.png 0119.png 0126.png 0133.png 0140.png 0147.png 0154.png 0161.png 0168.png 0175.png 0182.png 0189.png 0196.png 0203.png 0210.png 0217.png 0224.png 0231.png 0238.png 0245.png 0252.png 0259.png 0266.png 0273.png 0280.png
0106.png 0113.png 0120.png 0127.png 0134.png 0141.png 0148.png 0155.png 0162.png 0169.png 0176.png 0183.png 0190.png 0197.png 0204.png 0211.png 0218.png 0225.png 0232.png 0239.png 0246.png 0253.png 0260.png 0267.png 0274.png 0281.png
For some script, I will need to find out what the number of the first missing file is. In the above output, it would be 0285.png. However, it is also possible that files in between are missing. In the end, I am only interested in the number 285, which is part of the file name.
This is part of recovery logic: The files should be created by the script, but this step can fail. Therefore I want to have a means to check which files are missing and try to create them in a second step.
This is what I got so far (from how to extract part of a filename before '.' or before extension):
ls blender/output/celebAnim/ | awk -F'[.]' '{print $1}'
What I cannot figure out is how do I find the smallest number missing from that result, above a certain offset? The offset in this case is 100.
You could loop over all number from 100 to 500 and check if the corresponding file exists; if it doesn't, you'd print the number you're looking at:
for i in {100..500}; do
[[ ! -f 0$i.png ]] && { echo "$i missing!"; break; }
done
This prints, for your example, 285 missing!.
This solution could be made a bit more flexible by, for example, looping over zero padded numbers and then extracting the unpadded number:
for i in {0100..0500}; do
[[ ! -f $i.png ]] && { echo "${i##*(0)} missing!"; break; }
done
This requires extended globs (shopt -s extglob) for the *(0) pattern ("zero or more repetitions of 0").
begin=100
end=500
for i in `seq $begin 1 $end`; do
fname="0"$i".png"
if [ ! -f $fname ]; then
echo "$fname is missing"
fi
done
#!/bin/sh
search_dir=blender/output/celebAnim/
ls $search_dir > file_list
count=`wc -l file_list | awk '{ print $1 }'`
if [[ $count -eq 0 ]]
then
echo "No files in given directory!"
break
fi
file_extension=`head -1 file_list | tail -1 | awk -F "." '{ print $2 }'`
init_file_value=`head -1 file_list | tail -1 | awk -F "." '{ print $1 }'`
i=2
while [ $i -le $count ]
do
next_file_value=`head -$i file_list | tail -1 | awk -F "." '{ print $1 }'`
next_value=$((init_file_value+1));
if [ $next_file_value -ne $next_value ]
then
echo $next_value"."$file_extension
break
fi
init_file_value=$next_value;
i=$((i+1));
done
try it:
ls blender/output/celebAnim/ | sort -r | head -n1 | awk -F'.' '{print $1+1}'
command return 285
if need return 0285 than try it:
ls blender/output/celebAnim/ | sort -r | head -n1 | awk -F'.' '{print 0($1+1)}'

Hex compare in bash scripting

I am facing some issue when I am reading the 3rd word(a hex string) of each line in a text file and compare it with a hex number. Can some one please help me on it.
#!/bin/bash
A=$1
cat $A | while read a; do
a1=$(echo \""$a"\" | awk '{ print $3 }')
#echo $a > cut -d " " -f 3
echo $a1
(("$a1" == 0x10F7))
echo $?
done
But when I use below, the comparison happens correctly,
a1= 0xADCAFE
(( "$a1" == 0x10F7 ))
echo $?
Then why it is showing issue when I read like below,
a1=$(echo \""$a"\" | awk '{ print $3 }')
or> a1=$(echo $a | awk '{ print $3 }')
echo $a prints intended hex value, but comparison does not happen.
Regards,
Running Awk inside a while read loop is an antipattern. Just do the loop in Awk; it's good at that.
awk '$3 == 4343' "$1"
If you want to compare against a string whose value is "0x10F7" then it's
awk '$3 == "0x10F7"' "$1"
If you want to match either, case insensitively etc, a regex is a good way to do that.
awk '$3 ~ /^(0x10[Ff]7|4343)$/' "$1"
Notice how the $1 in double quotes is handled by the shell, and gets replaced by a (properly quoted!) copy of the script's first command-line argument before Awk runs, while the Awk script in single quotes has its own namespace, so $3 is an Awk variable which refers to the third field in the current input line.
Either way, avoid the useless use of cat and always always always quote variables which contain file names with double quotes.
That's literal double quotes. You seem to have tried both a dangerous bare $a and a doubly double-quoted "\"$a\"" where the simple "$a" would be what you actually want.
Thank you all for your responses, Now my script is working fine. I was trying to match two files, below script does the purpose
#!/bin/bash
A=$1
B=$2
dos2unix -f "$A"
dos2unix -f "$B"
rm search_match.txt search_data_match.txt search_nomatch.txt search_data_nomatch.txt
while read line;do
search_word=$(echo $line | awk '{ print $1 }')
grep "$search_word" $B >> temp_file.txt
while read var;do
file1_hex=$(echo $line | awk '{ print $2 }')
file2_hex=$(echo $var | awk '{ print $3 }')
(("$file1_hex" == "$file2_hex"))
zero=$(echo $?)
if [ "$zero" -eq 0 ] ; then
echo $line >> search_match.txt
echo $var >> search_data_match.txt
else
echo $line >> search_nomatch.txt
echo $var >> search_data_nomatch.txt
fi
done < "temp_file.txt"
rm temp_file.txt
done < "$A"

Resources