Find the first missing file in a series of numbered files - linux

I have directory containing files:
$> ls blender/output/celebAnim/
0100.png 0107.png 0114.png 0121.png 0128.png 0135.png 0142.png 0149.png 0156.png 0163.png 0170.png 0177.png 0184.png 0191.png 0198.png 0205.png 0212.png 0219.png 0226.png 0233.png 0240.png 0247.png 0254.png 0261.png 0268.png 0275.png 0282.png
0101.png 0108.png 0115.png 0122.png 0129.png 0136.png 0143.png 0150.png 0157.png 0164.png 0171.png 0178.png 0185.png 0192.png 0199.png 0206.png 0213.png 0220.png 0227.png 0234.png 0241.png 0248.png 0255.png 0262.png 0269.png 0276.png 0283.png
0102.png 0109.png 0116.png 0123.png 0130.png 0137.png 0144.png 0151.png 0158.png 0165.png 0172.png 0179.png 0186.png 0193.png 0200.png 0207.png 0214.png 0221.png 0228.png 0235.png 0242.png 0249.png 0256.png 0263.png 0270.png 0277.png 0284.png
0103.png 0110.png 0117.png 0124.png 0131.png 0138.png 0145.png 0152.png 0159.png 0166.png 0173.png 0180.png 0187.png 0194.png 0201.png 0208.png 0215.png 0222.png 0229.png 0236.png 0243.png 0250.png 0257.png 0264.png 0271.png 0278.png
0104.png 0111.png 0118.png 0125.png 0132.png 0139.png 0146.png 0153.png 0160.png 0167.png 0174.png 0181.png 0188.png 0195.png 0202.png 0209.png 0216.png 0223.png 0230.png 0237.png 0244.png 0251.png 0258.png 0265.png 0272.png 0279.png
0105.png 0112.png 0119.png 0126.png 0133.png 0140.png 0147.png 0154.png 0161.png 0168.png 0175.png 0182.png 0189.png 0196.png 0203.png 0210.png 0217.png 0224.png 0231.png 0238.png 0245.png 0252.png 0259.png 0266.png 0273.png 0280.png
0106.png 0113.png 0120.png 0127.png 0134.png 0141.png 0148.png 0155.png 0162.png 0169.png 0176.png 0183.png 0190.png 0197.png 0204.png 0211.png 0218.png 0225.png 0232.png 0239.png 0246.png 0253.png 0260.png 0267.png 0274.png 0281.png
For some script, I will need to find out what the number of the first missing file is. In the above output, it would be 0285.png. However, it is also possible that files in between are missing. In the end, I am only interested in the number 285, which is part of the file name.
This is part of recovery logic: The files should be created by the script, but this step can fail. Therefore I want to have a means to check which files are missing and try to create them in a second step.
This is what I got so far (from how to extract part of a filename before '.' or before extension):
ls blender/output/celebAnim/ | awk -F'[.]' '{print $1}'
What I cannot figure out is how do I find the smallest number missing from that result, above a certain offset? The offset in this case is 100.

You could loop over all number from 100 to 500 and check if the corresponding file exists; if it doesn't, you'd print the number you're looking at:
for i in {100..500}; do
[[ ! -f 0$i.png ]] && { echo "$i missing!"; break; }
done
This prints, for your example, 285 missing!.
This solution could be made a bit more flexible by, for example, looping over zero padded numbers and then extracting the unpadded number:
for i in {0100..0500}; do
[[ ! -f $i.png ]] && { echo "${i##*(0)} missing!"; break; }
done
This requires extended globs (shopt -s extglob) for the *(0) pattern ("zero or more repetitions of 0").

begin=100
end=500
for i in `seq $begin 1 $end`; do
fname="0"$i".png"
if [ ! -f $fname ]; then
echo "$fname is missing"
fi
done

#!/bin/sh
search_dir=blender/output/celebAnim/
ls $search_dir > file_list
count=`wc -l file_list | awk '{ print $1 }'`
if [[ $count -eq 0 ]]
then
echo "No files in given directory!"
break
fi
file_extension=`head -1 file_list | tail -1 | awk -F "." '{ print $2 }'`
init_file_value=`head -1 file_list | tail -1 | awk -F "." '{ print $1 }'`
i=2
while [ $i -le $count ]
do
next_file_value=`head -$i file_list | tail -1 | awk -F "." '{ print $1 }'`
next_value=$((init_file_value+1));
if [ $next_file_value -ne $next_value ]
then
echo $next_value"."$file_extension
break
fi
init_file_value=$next_value;
i=$((i+1));
done

try it:
ls blender/output/celebAnim/ | sort -r | head -n1 | awk -F'.' '{print $1+1}'
command return 285
if need return 0285 than try it:
ls blender/output/celebAnim/ | sort -r | head -n1 | awk -F'.' '{print 0($1+1)}'

Related

Bash function with input fails awk command

I am writing a function in a BASH shell script, that should return lines from csv-files with headers, having more commas than the header. This can happen, as there are values inside these files, that could contain commas. For quality control, I must identify these lines to later clean them up. What I have currently:
#!/bin/bash
get_bad_lines () {
local correct_no_of_commas=$(head -n 1 $1/$1_0_0_0.csv | tr -cd , | wc -c)
local no_of_files=$(ls $1 | wc -l)
for i in $(seq 0 $(( ${no_of_files}-1 )))
do
# Check that the file exist
if [ ! -f "$1/$1_0_${i}_0.csv" ]; then
echo "File: $1_0_${i}_0.csv not found!"
continue
fi
# Search for error-lines inside the file and print them out
echo "$1_0_${i}_0.csv has over $correct_no_of_commas commas in the following lines:"
grep -o -n '[,]' "$1/$1_0_${i}_0.csv" | cut -d : -f 1 | uniq -c | awk '$1 > $correct_no_of_commas {print}'
done
}
get_bad_lines products
get_bad_lines users
The output of this program is now all the comma-counts with all of the line numbers in all the files,
and I suspect this is due to the input $1 (foldername, i.e. products & users) conflicting with the call to awk with reference to $1 as well (where I wish to grab the first column being the count of commas for that line in the current file in the loop).
Is this the issue? and if so, would it be solvable by either referencing the 1.st column or the folder name by different variable names instead of both of them using $1 ?
Example, current output:
5 6667
5 6668
5 6669
5 6670
(should only show lines for that file having more than 5 commas).
Tried variable declaration in call to awk as well, with same effect
(as in the accepted answer to Awk field variable clash with function argument)
:
get_bad_lines () {
local table_name=$1
local correct_no_of_commas=$(head -n 1 $table_name/${table_name}_0_0_0.csv | tr -cd , | wc -c)
local no_of_files=$(ls $table_name | wc -l)
for i in $(seq 0 $(( ${no_of_files}-1 )))
do
# Check that the file exist
if [ ! -f "$table_name/${table_name}_0_${i}_0.csv" ]; then
echo "File: ${table_name}_0_${i}_0.csv not found!"
continue
fi
# Search for error-lines inside the file and print them out
echo "${table_name}_0_${i}_0.csv has over $correct_no_of_commas commas in the following lines:"
grep -o -n '[,]' "$table_name/${table_name}_0_${i}_0.csv" | cut -d : -f 1 | uniq -c | awk -v table_name="$table_name" '$1 > $correct_no_of_commas {print}'
done
}
You can use awk the full way to achieve that :
get_bad_lines () {
find "$1" -maxdepth 1 -name "$1_0_*_0.csv" | while read -r my_file ; do
awk -v table_name="$1" '
NR==1 { num_comma=gsub(/,/, ""); }
/,/ { if (gsub(/,/, ",", $0) > num_comma) wrong_array[wrong++]=NR":"$0;}
END { if (wrong > 0) {
print(FILENAME" has over "num_comma" commas in the following lines:");
for (i=0;i<wrong;i++) { print(wrong_array[i]); }
}
}' "${my_file}"
done
}
For why your original awk command failed to give only lines with too many commas, that is because you are using a shell variable correct_no_of_commas inside a single quoted awk statement ('$1 > $correct_no_of_commas {print}'). Thus there no substitution by the shell, and awk read "$correct_no_of_commas" as is, and perceives it as an undefined variable. More precisely, awk look for the variable correct_no_of_commas which is undefined in the awk script so it is an empty string . awk will then execute $1 > $"" as matching condition, and as $"" is a $0 equivalent, awk will compare the count in $1 with the full input line. From a numerical point of view, the full input line has the form <tab><count><tab><num_line>, so it is 0 for awk. Thus, $1 > $correct_no_of_commas will be always true.
You can identify all the bad lines with a single awk command
awk -F, 'FNR==1{print FILENAME; headerCount=NF;} NF>headerCount{print} ENDFILE{print "#######\n"}' /path/here/*.csv
If you want the line number also to be printed, use this
awk -F, 'FNR==1{print FILENAME"\nLine#\tLine"; headerCount=NF;} NF>headerCount{print FNR"\t"$0} ENDFILE{print "#######\n"}' /path/here/*.csv

Filtering a list by 5 files per directory

SO i have a list of files inside a tree of folders
/home/user/Scripts/example/tmp/folder2/2
/home/user/Scripts/example/tmp/folder2/3
/home/user/Scripts/example/tmp/folder2/4
/home/user/Scripts/example/tmp/folder2/5
/home/user/Scripts/example/tmp/folder2/6
/home/user/Scripts/example/tmp/folder2/7
/home/user/Scripts/example/tmp/folder2/8
/home/user/Scripts/example/tmp/folder2/9
/home/user/Scripts/example/tmp/folder2/10
/home/user/Scripts/example/tmp/other_folder/files/1
/home/user/Scripts/example/tmp/other_folder/files/2
/home/user/Scripts/example/tmp/other_folder/files/3
/home/user/Scripts/example/tmp/other_folder/files/4
/home/user/Scripts/example/tmp/other_folder/files/5
/home/user/Scripts/example/tmp/other_folder/files/6
/home/user/Scripts/example/tmp/other_folder/files/7
/home/user/Scripts/example/tmp/other_folder/files/8
/home/user/Scripts/example/tmp/other_folder/files/9
/home/user/Scripts/example/tmp/other_folder/files/10
/home/user/Scripts/example/tmp/test/example/1
/home/user/Scripts/example/tmp/test/example/2
/home/user/Scripts/example/tmp/test/example/3
/home/user/Scripts/example/tmp/test/example/4
/home/user/Scripts/example/tmp/test/example/5
/home/user/Scripts/example/tmp/test/example/6
/home/user/Scripts/example/tmp/test/example/7
/home/user/Scripts/example/tmp/test/example/8
/home/user/Scripts/example/tmp/test/example/9
/home/user/Scripts/example/tmp/test/example/10
/home/user/Scripts/example/tmp/test/other/1
/home/user/Scripts/example/tmp/test/other/2
/home/user/Scripts/example/tmp/test/other/3
/home/user/Scripts/example/tmp/test/other/4
/home/user/Scripts/example/tmp/test/other/5
/home/user/Scripts/example/tmp/test/other/6
/home/user/Scripts/example/tmp/test/other/7
/home/user/Scripts/example/tmp/test/other/8
/home/user/Scripts/example/tmp/test/other/9
/home/user/Scripts/example/tmp/test/other/10
I want to basically filter out the content of this list so I only have the highest 5 numbers for each directory.
Any ideas?
preferable in bash/shell
Expected Output:(small sample size cause of SO says too much code)
/home/user/Scripts/example/tmp/test/example/6
/home/user/Scripts/example/tmp/test/example/7
/home/user/Scripts/example/tmp/test/example/8
/home/user/Scripts/example/tmp/test/example/9
/home/user/Scripts/example/tmp/test/example/10
/home/user/Scripts/example/tmp/test/other/6
/home/user/Scripts/example/tmp/test/other/7
/home/user/Scripts/example/tmp/test/other/8
/home/user/Scripts/example/tmp/test/other/9
/home/user/Scripts/example/tmp/test/other/10
Thanks
edit - using for i in $(for i in $(dirname $(find $(pwd) -type f -name "*[0-9]*" | sort -V) | uniq) ;do ls $i | sort -V |tail -n 5 ; done) ; do readlink -f $i ; done works for a small sample size. However expanding said sample appears to long for dirname
Assuming your input data is sorted.
Try:
awk -F'/[^/]*$' '{if (NR==1 || prev_dir == $1) {i=i+1} else {i=1}; if ( i<=5){ prev_dir=$1 ; print $0}; }'
Explanation:
'/[^/]*$' <-- Set regex delimiter to get directory base-name as first field
if (NR==1 || prev_dir == $1) {i=i+1} else {i=1}; <-- Check file is from same directory. if yes increment counter by 1 else reset.
if ( i<=5){ prev_dir=$1 ; print $0}; }' <-- Print first 5 records of current directory.
Demo:
$awk -F'/[^/]*$' '{if (NR==1 || prev_dir == $1) {i=i+1} else {i=1}; if ( i<=5){ prev_dir=$1 ; print $0 }; }' temp.txt
/home/user/Scripts/example/tmp/folder2/2
/home/user/Scripts/example/tmp/folder2/3
/home/user/Scripts/example/tmp/folder2/4
/home/user/Scripts/example/tmp/folder2/5
/home/user/Scripts/example/tmp/folder2/6
/home/user/Scripts/example/tmp/other_folder/files/1
/home/user/Scripts/example/tmp/other_folder/files/2
/home/user/Scripts/example/tmp/other_folder/files/3
/home/user/Scripts/example/tmp/other_folder/files/4
/home/user/Scripts/example/tmp/other_folder/files/5
$cat temp.txt
/home/user/Scripts/example/tmp/folder2/2
/home/user/Scripts/example/tmp/folder2/3
/home/user/Scripts/example/tmp/folder2/4
/home/user/Scripts/example/tmp/folder2/5
/home/user/Scripts/example/tmp/folder2/6
/home/user/Scripts/example/tmp/folder2/7
/home/user/Scripts/example/tmp/folder2/8
/home/user/Scripts/example/tmp/folder2/9
/home/user/Scripts/example/tmp/folder2/10
/home/user/Scripts/example/tmp/other_folder/files/1
/home/user/Scripts/example/tmp/other_folder/files/2
/home/user/Scripts/example/tmp/other_folder/files/3
/home/user/Scripts/example/tmp/other_folder/files/4
/home/user/Scripts/example/tmp/other_folder/files/5
/home/user/Scripts/example/tmp/other_folder/files/6
/home/user/Scripts/example/tmp/other_folder/files/7
/home/user/Scripts/example/tmp/other_folder/files/8
/home/user/Scripts/example/tmp/other_folder/files/9
/home/user/Scripts/example/tmp/other_folder/files/10
$
Here is an implementation in plain bash:
#!/bin/bash
prevdir=
while read -r line; do
dir=${line%/*}
[[ $dir == "$prevdir" ]] || { n=0; prevdir=$dir; }
((n++ < 5)) && echo "$line"
done
You can use it like:
./script < file.list # If file.list already sorted by a reverse version sort
or,
sort -rV file.list | ./script # If the file.list is not sorted
or,
find /home/user/Scripts -type f | sort -rV | ./script
Also, you may want to append | tac to the pipelines above.

Difficulty to create .txt file from loop in bash

I've this data :
cat >data1.txt <<'EOF'
2020-01-27-06-00;/dev/hd1;100;/
2020-01-27-12-00;/dev/hd1;100;/
2020-01-27-18-00;/dev/hd1;100;/
2020-01-27-06-00;/dev/hd2;200;/usr
2020-01-27-12-00;/dev/hd2;200;/usr
2020-01-27-18-00;/dev/hd2;200;/usr
EOF
cat >data2.txt <<'EOF'
2020-02-27-06-00;/dev/hd1;120;/
2020-02-27-12-00;/dev/hd1;120;/
2020-02-27-18-00;/dev/hd1;120;/
2020-02-27-06-00;/dev/hd2;230;/usr
2020-02-27-12-00;/dev/hd2;230;/usr
2020-02-27-18-00;/dev/hd2;230;/usr
EOF
cat >data3.txt <<'EOF'
2020-03-27-06-00;/dev/hd1;130;/
2020-03-27-12-00;/dev/hd1;130;/
2020-03-27-18-00;/dev/hd1;130;/
2020-03-27-06-00;/dev/hd2;240;/usr
2020-03-27-12-00;/dev/hd2;240;/usr
2020-03-27-18-00;/dev/hd2;240;/usr
EOF
I would like to create a .txt file for each filesystem ( so hd1.txt, hd2.txt, hd3.txt and hd4.txt ) and put in each .txt file the sum of the value from each FS from each dataX.txt. I've some difficulties to explain in english what I want, so here an example of the result wanted
Expected content for the output file hd1.txt:
2020-01;/dev/hd1;300;/
2020-02;/dev/hd1;360;/
2020-03;/dev/hd1;390:/
Expected content for the file hd2.txt:
2020-01;/dev/hd2;600;/usr
2020-02;/dev/hd2;690;/usr
2020-03;/dev/hd2;720;/usr
The implementation I've currently tried:
for i in $(cat *.txt | awk -F';' '{print $2}' | cut -d '/' -f3| uniq)
do
cat *.txt | grep -w $i | awk -F';' -v date="$(cat *.txt | awk -F';' '{print $1}' | cut -d'-' -f-2 | uniq )" '{sum+=$3} END {print date";"$2";"sum}' >> $i
done
But it doesn't works...
Can you show me how to do that ?
Because the format seems to be so constant, you can delimit the input with multiple separators and parse it easily in awk:
awk -v FS='[;-/]' '
prev != $9 {
if (length(output)) {
print output >> fileoutput
}
prev = $9
sum = 0
}
{
sum += $9
output = sprintf("%s-%s;/%s/%s;%d;/%s", $1, $2, $7, $8, sum, $11)
fileoutput = $8 ".txt"
}
END {
print output >> fileoutput
}
' *.txt
Tested on repl generates:
+ cat hd1.txt
2020-01;/dev/hd1;300;/
2020-02;/dev/hd1;360;/
2020-03;/dev/hd1;390;/
+ cat hd2.txt
2020-01;/dev/hd2;600;/usr
2020-02;/dev/hd2;690;/usr
2020-03;/dev/hd2;720;/usr
Alternatively, you could -v FS=';' and use split to split first and second column to extract the year and month and the hdX number.
If you seek a bash solution, I suggest you invert the loops - first iterate over files, then over identifiers in second column.
for file in *.txt; do
prev=
output=
while IFS=';' read -r date dev num path; do
hd=$(basename "$dev")
if [[ "$hd" != "${prev:-}" ]]; then
if ((${#output})); then
printf "%s\n" "$output" >> "$fileoutput"
fi
sum=0
prev="$hd"
fi
sum=$((sum + num))
output=$(
printf "%s;%s;%d;%s" \
"$(cut -d'-' -f1-2 <<<"$date")" \
"$dev" "$sum" "$path"
)
fileoutput="${hd}.txt"
done < "$file"
printf "%s\n" "$output" >> "$fileoutput"
done
You could also almost translate awk to bash 1:1 by doing IFS='-;/' in while read loop.

bash count sequential files

I'm pretty new to bash scripting so some of the syntaxes may not be optimal. Please do point them out if you see one.
I have files in a directory named sequentially.
Example: prob01_01 prob01_03 prob01_07 prob02_01 prob02_03 ....
I am trying to have the script iterate through the current directory and count how many extensions each problem has. Then print the pre-extension name then count
Sample output for above would be:
prob01 3
prob02 2
This is my code:
#!/bin/bash
temp=$(mktemp)
element=''
count=0
for i in *
do
current=${i%_*}
if [[ $current == $element ]]
then
let "count+=1"
else
echo $element $count >> temp
element=$current
count=1
fi
done
echo 'heres the temp:'
cat temp
rm 'temp'
The Problem:
Current output:
prob1 3
Desired output:
prob1 3
prob2 2
The last count isn't appended because it's not seeing a different element after it
My Guess on possible solutions:
Have the last append occur at the end of the for loop?
Your code has 2 problems.
The first problem doesn't answer your question. You make a temporary file, the filename is stored in $temp. You should use that one, and not the file with the fixed name temp.
The problem is that you only write results when you see a new problem/filename. The last one will not be printed.
Fixing only these problems will result in
results() {
if (( count == 0 )); then
return
fi
echo $element $count >> "${temp}"
}
temp=$(mktemp)
element=''
count=0
for i in prob*
do
current=${i%_*}
if [[ $current == $element ]]
then
let "count+=1" # Better is using ((count++))
else
results
element=$current
count=1
fi
done
results
echo 'heres the temp:'
cat "${temp}"
rm "${temp}"
You can do without the script with
ls prob* | cut -d"_" -f1 | sort | uniq -c
When you want the have the output displayed as given, you need one more step.
ls prob* | cut -d"_" -f1 | sort | uniq -c | awk '{print $2 " " $1}'
You may use printf + awk solution:
printf '%s\n' *_* | awk -F_ '{a[$1]++} END{for (i in a) print i, a[i]}'
prob01 3
prob02 2
We use printf to print each file that has at least one _
We use awk to get a count of each file's first element delimited by _ by using an associative array.
I would do it like this:
$ ls | awk -F_ '{print $1}' | sort | uniq -c | awk '{print $2 " " $1}'
prob01 3
prob02 2

Move a file if all 6th fields are 0

I want to be able to check a file to see if all records are 0 within a file and if they are to then move the file.
I have written the script, ran it, no errors, but it does not move the file, can anyone please suggest why?
#!/bin/bash
result=`cat conc_upld_atp.11002.20141204151900.dat | awk -F , '{ print $6 }' | uniq`
if [ result = "1" ]; then
mv conc_upld_atp.11002.20141204151900.dat home/stephenb/scripttest
fi
In Bash = compares strings; to you compare integers you need -eq:
if [ "$result" -eq 1 ]; then
Note that it is preferred to say var=$(command). Also, your command cat file | awk '...' can be simplified to just awk '...' file. And depending on what exactly you want to do, probably awk can handle all of it.
For example, if you just want to check if any of the 6th fields are not 0, just use:
awk '$1 != 0 {v=1} END {print v+0}' file
and then the rest of your code.
However, you can do it in a extremely fast way by using what 999999999999999999999999999999 suggested in comments:
awk -F, '$6!=0{exit 1}' file && mv file newfile
This loops through the file and exits with a code error if any line contains a 6th field different from 0. If this does not happen, awk's exit code is 0, so that the && command is performed and, hence, mv file newfile happens. You can even keep track of the other condition by saying:
awk -F, '$6!=0{exit 1}' file && mv file newfile || echo "bad data"
You're wanting to check if all records are 0, but you specifically check result = "1".
I would recommend two things: use numerical comparison and compare against the correct value:
if (( result == 0 )); then
If you want check whether all fields is 0, then try this script:
result=`cat conc_upld_atp.11002.20141204151900.dat | awk '{ print $6 }' | uniq`
if [ "$result" == "0" ]; then
mv conc_upld_atp.11002.20141204151900.dat /home/stephenb/scripttest
fi
In the end i used the below.
#!/bin/bash
result=`cat conc_upld_atp.11002.20141204151900.dat | awk -F , '{ print $6 }' | uniq`
resultcount=`echo $result | wc -l`
echo $resultcount
if [ $resultcount == "1" ]; then
echo match
mv conc_upld_atp.11002.20141204151900.dat /home/stephenb/scripttest
fi

Resources