I have a very crude script getinfo.sh that gets me information from all files with name FILENAME1 and FILENAME2 in all subfolders and the path of the subfolder. The awk result should only pick the nth line from FILENAME2 if the script is called with "getinfo.sh n". I want all the info printed in one line!
The problem is that if i use print instead of printf the info is written to a new line but my script works. If i use printf i can see the last bit of the awk command in the command propt after the script ist done, but it is not paset after the grep command in the same line. All in all the complete line would be pretty long, but that is intentionally. Would you be willing to tell me what i am doing wrong?
#!/bin/bash
IFS=$'\n'
while read -r fname ;
do
pushd $(dirname "${fname}") > /dev/null
printf '%q' "${PWD##*/}"
grep 'Search_term ' FILENAME1 | tail -1
awk '{ if(NR==n) printf "%s",$0 }' n=$1 $2 FILENAME2
popd > /dev/null
done < <(find . -type f -name 'FILENAME1')
I would also be happy to grep the nth line if this is easier?
SOLUTION:
#!/bin/bash
IFS=$'\n'
while read -r fname ;
do
pushd $(dirname "${fname}") > /dev/null
{
printf '%q' "${PWD##*/}"
grep 'Search_term' FILENAME1 | tail -1
} | tr -d '\n'
if [ "$1" -eq "$1" ] 2>/dev/null
then
awk '{ if(NR==n) printf "%s",$0 }' n="$1" FILENAME2
fi
printf "\n"
popd > /dev/null
done < <(find . -type f -name 'FILENAME1')
You made it clearer in the comments.
I want the output of printf '%q' "${PWD##*/}" and grep 'Search_term ' FILENAME1 | tail -1 and awk '{ if(NR==n) printf "%s",$0 }' n=$1 $2 FILENAME2 to be printed in one line
So first, we have three commands, that each print a single line of output. As the commands do not matter, let's wrap them in functions to simplify the answer:
cmd1() { printf '%q\n' "${PWD##*/}"; }
cmd2() { grep .... ; }
cmd3() { awk ....; }
To print them without newlines between them, we can:
Use a command substitution, which removes trailing empty newlines. With some printf:
printf "%s%s%s\n" "$(cmd1)" "$(cmd2)" "$(cmd3)"
or some echo:
echo "$(cmd1) $(cmd2) $(cmd3)"
or append to a variable:
str="$(cmd1)"
str+=" $(cmd2)"
str+=" $(cmd3)"
printf" %s\n" "$str"
and so on.
We can remove newlines from the stream, using tr -d '\n':
{
cmd1
cmd2
cmd3
} | tr -d '\n'
echo # newlines were removed, so add one to the end.
or we can also remove the newlines only from the first n-1 commands, but I think this is less readable:
{
cmd1
cmd2
} | tr -d'\n'
cmd3 # the trailing newline will be added by cmd3
If i do not pass a number the awk command should be omited.
I see that your awk command expands both $1 and $2, and i see only $1 to be passed as the n=$1 environment variable to awk. I don't know what is $2. You can write if-s on the value of $# the number of arguments:
if (($# == 2)); then
awk '{ if(NR==n) printf "%s",$0 }' n="$1" "$2" FILENAME2
fi
and similar for each case you want to handle. Remember about proper quoting.
Your command shows the unused parameter $2, I deleted that one.
You can add a newline at the end of the awk using the END block, but you also want an extra newline when you call your script without a line number. echo will do.
#!/bin/bash
IFS=$'\n'
while read -r fname ;
do
pushd $(dirname "${fname}") > /dev/null
# Add result of grep in same printf statement
printf '%s %s' "${PWD##*/}" "$(grep 'Search_term ' FILENAME1 | tail -1)"
if (( $# -eq 1 )); then
# use $1 as an awk variable, number n
# use $2 as a different file to read from
awk -v n=$1 '{ if(NR==n) printf "%s ",$0 }' FILENAME2
fi
# Add line-ending
echo
popd > /dev/null
done < <(find . -type f -name 'FILENAME1')
Related
I am writing a function in a BASH shell script, that should return lines from csv-files with headers, having more commas than the header. This can happen, as there are values inside these files, that could contain commas. For quality control, I must identify these lines to later clean them up. What I have currently:
#!/bin/bash
get_bad_lines () {
local correct_no_of_commas=$(head -n 1 $1/$1_0_0_0.csv | tr -cd , | wc -c)
local no_of_files=$(ls $1 | wc -l)
for i in $(seq 0 $(( ${no_of_files}-1 )))
do
# Check that the file exist
if [ ! -f "$1/$1_0_${i}_0.csv" ]; then
echo "File: $1_0_${i}_0.csv not found!"
continue
fi
# Search for error-lines inside the file and print them out
echo "$1_0_${i}_0.csv has over $correct_no_of_commas commas in the following lines:"
grep -o -n '[,]' "$1/$1_0_${i}_0.csv" | cut -d : -f 1 | uniq -c | awk '$1 > $correct_no_of_commas {print}'
done
}
get_bad_lines products
get_bad_lines users
The output of this program is now all the comma-counts with all of the line numbers in all the files,
and I suspect this is due to the input $1 (foldername, i.e. products & users) conflicting with the call to awk with reference to $1 as well (where I wish to grab the first column being the count of commas for that line in the current file in the loop).
Is this the issue? and if so, would it be solvable by either referencing the 1.st column or the folder name by different variable names instead of both of them using $1 ?
Example, current output:
5 6667
5 6668
5 6669
5 6670
(should only show lines for that file having more than 5 commas).
Tried variable declaration in call to awk as well, with same effect
(as in the accepted answer to Awk field variable clash with function argument)
:
get_bad_lines () {
local table_name=$1
local correct_no_of_commas=$(head -n 1 $table_name/${table_name}_0_0_0.csv | tr -cd , | wc -c)
local no_of_files=$(ls $table_name | wc -l)
for i in $(seq 0 $(( ${no_of_files}-1 )))
do
# Check that the file exist
if [ ! -f "$table_name/${table_name}_0_${i}_0.csv" ]; then
echo "File: ${table_name}_0_${i}_0.csv not found!"
continue
fi
# Search for error-lines inside the file and print them out
echo "${table_name}_0_${i}_0.csv has over $correct_no_of_commas commas in the following lines:"
grep -o -n '[,]' "$table_name/${table_name}_0_${i}_0.csv" | cut -d : -f 1 | uniq -c | awk -v table_name="$table_name" '$1 > $correct_no_of_commas {print}'
done
}
You can use awk the full way to achieve that :
get_bad_lines () {
find "$1" -maxdepth 1 -name "$1_0_*_0.csv" | while read -r my_file ; do
awk -v table_name="$1" '
NR==1 { num_comma=gsub(/,/, ""); }
/,/ { if (gsub(/,/, ",", $0) > num_comma) wrong_array[wrong++]=NR":"$0;}
END { if (wrong > 0) {
print(FILENAME" has over "num_comma" commas in the following lines:");
for (i=0;i<wrong;i++) { print(wrong_array[i]); }
}
}' "${my_file}"
done
}
For why your original awk command failed to give only lines with too many commas, that is because you are using a shell variable correct_no_of_commas inside a single quoted awk statement ('$1 > $correct_no_of_commas {print}'). Thus there no substitution by the shell, and awk read "$correct_no_of_commas" as is, and perceives it as an undefined variable. More precisely, awk look for the variable correct_no_of_commas which is undefined in the awk script so it is an empty string . awk will then execute $1 > $"" as matching condition, and as $"" is a $0 equivalent, awk will compare the count in $1 with the full input line. From a numerical point of view, the full input line has the form <tab><count><tab><num_line>, so it is 0 for awk. Thus, $1 > $correct_no_of_commas will be always true.
You can identify all the bad lines with a single awk command
awk -F, 'FNR==1{print FILENAME; headerCount=NF;} NF>headerCount{print} ENDFILE{print "#######\n"}' /path/here/*.csv
If you want the line number also to be printed, use this
awk -F, 'FNR==1{print FILENAME"\nLine#\tLine"; headerCount=NF;} NF>headerCount{print FNR"\t"$0} ENDFILE{print "#######\n"}' /path/here/*.csv
I've this data :
cat >data1.txt <<'EOF'
2020-01-27-06-00;/dev/hd1;100;/
2020-01-27-12-00;/dev/hd1;100;/
2020-01-27-18-00;/dev/hd1;100;/
2020-01-27-06-00;/dev/hd2;200;/usr
2020-01-27-12-00;/dev/hd2;200;/usr
2020-01-27-18-00;/dev/hd2;200;/usr
EOF
cat >data2.txt <<'EOF'
2020-02-27-06-00;/dev/hd1;120;/
2020-02-27-12-00;/dev/hd1;120;/
2020-02-27-18-00;/dev/hd1;120;/
2020-02-27-06-00;/dev/hd2;230;/usr
2020-02-27-12-00;/dev/hd2;230;/usr
2020-02-27-18-00;/dev/hd2;230;/usr
EOF
cat >data3.txt <<'EOF'
2020-03-27-06-00;/dev/hd1;130;/
2020-03-27-12-00;/dev/hd1;130;/
2020-03-27-18-00;/dev/hd1;130;/
2020-03-27-06-00;/dev/hd2;240;/usr
2020-03-27-12-00;/dev/hd2;240;/usr
2020-03-27-18-00;/dev/hd2;240;/usr
EOF
I would like to create a .txt file for each filesystem ( so hd1.txt, hd2.txt, hd3.txt and hd4.txt ) and put in each .txt file the sum of the value from each FS from each dataX.txt. I've some difficulties to explain in english what I want, so here an example of the result wanted
Expected content for the output file hd1.txt:
2020-01;/dev/hd1;300;/
2020-02;/dev/hd1;360;/
2020-03;/dev/hd1;390:/
Expected content for the file hd2.txt:
2020-01;/dev/hd2;600;/usr
2020-02;/dev/hd2;690;/usr
2020-03;/dev/hd2;720;/usr
The implementation I've currently tried:
for i in $(cat *.txt | awk -F';' '{print $2}' | cut -d '/' -f3| uniq)
do
cat *.txt | grep -w $i | awk -F';' -v date="$(cat *.txt | awk -F';' '{print $1}' | cut -d'-' -f-2 | uniq )" '{sum+=$3} END {print date";"$2";"sum}' >> $i
done
But it doesn't works...
Can you show me how to do that ?
Because the format seems to be so constant, you can delimit the input with multiple separators and parse it easily in awk:
awk -v FS='[;-/]' '
prev != $9 {
if (length(output)) {
print output >> fileoutput
}
prev = $9
sum = 0
}
{
sum += $9
output = sprintf("%s-%s;/%s/%s;%d;/%s", $1, $2, $7, $8, sum, $11)
fileoutput = $8 ".txt"
}
END {
print output >> fileoutput
}
' *.txt
Tested on repl generates:
+ cat hd1.txt
2020-01;/dev/hd1;300;/
2020-02;/dev/hd1;360;/
2020-03;/dev/hd1;390;/
+ cat hd2.txt
2020-01;/dev/hd2;600;/usr
2020-02;/dev/hd2;690;/usr
2020-03;/dev/hd2;720;/usr
Alternatively, you could -v FS=';' and use split to split first and second column to extract the year and month and the hdX number.
If you seek a bash solution, I suggest you invert the loops - first iterate over files, then over identifiers in second column.
for file in *.txt; do
prev=
output=
while IFS=';' read -r date dev num path; do
hd=$(basename "$dev")
if [[ "$hd" != "${prev:-}" ]]; then
if ((${#output})); then
printf "%s\n" "$output" >> "$fileoutput"
fi
sum=0
prev="$hd"
fi
sum=$((sum + num))
output=$(
printf "%s;%s;%d;%s" \
"$(cut -d'-' -f1-2 <<<"$date")" \
"$dev" "$sum" "$path"
)
fileoutput="${hd}.txt"
done < "$file"
printf "%s\n" "$output" >> "$fileoutput"
done
You could also almost translate awk to bash 1:1 by doing IFS='-;/' in while read loop.
I'm monitoring from an actively written to file:
My current solution is:
ws_trans=0
sc_trans=0
tail -F /var/log/file.log | \
while read LINE
echo $LINE | grep -q -e "enterpriseID:"
if [ $? = 0 ]
then
((ws_trans++))
fi
echo $LINE | grep -q -e "sc_ID:"
if [ $? = 0 ]
then
((sc_trans++))
fi
printf "\r WSTRANS: $ws_trans \t\t SCTRANS: $sc_trans"
done
However when attempting to do this with AWK I don't get the output - the $ws_trans and $sc_trans remains 0
ws_trans=0
sc_trans=0
tail -F /var/log/file.log | \
while read LINE
echo $LINE | awk '/enterpriseID:/ {++ws_trans} END {print | ws_trans}'
echo $LINE | awk '/sc_ID:/ {++sc_trans} END {print | sc_trans}'
printf "\r WSTRANS: $ws_trans \t\t SCTRANS: $sc_trans"
done
Attempting to do this to reduce load. I understand that AWK doesn't deal with bash variables, and it can get quite confusing, but the only reference I found is a non tail application of AWK.
How can I assign the AWK Variable to the bash ws_trans and sc_trans? Is there a better solution? (There are other search terms being monitored.)
You need to pass the variables using the option -v, for example:
$ var=0
$ printf %d\\n {1..10} | awk -v awk_var=${var} '{++awk_var} {print awk_var}'
To set the variable "back" you could use declare, for example:
$ declare $(printf %d\\n {1..10} | awk -v awk_var=${var} '{++awk_var} END {print "var=" awk_var}')
$ echo $var
$ 10
Your script could be rewritten like this:
ws_trans=0
sc_trans=0
tail -F /var/log/system.log |
while read LINE
do
declare $(echo $LINE | awk -v ws=${ws_trans} '/enterpriseID:/ {++ws} END {print "ws_trans="ws}')
declare $(echo $LINE | awk -v sc=${sc_trans} '/sc_ID:/ {++sc} END {print "sc_trans="sc}')
printf "\r WSTRANS: $ws_trans \t\t SCTRANS: $sc_trans"
done
I have directory containing files:
$> ls blender/output/celebAnim/
0100.png 0107.png 0114.png 0121.png 0128.png 0135.png 0142.png 0149.png 0156.png 0163.png 0170.png 0177.png 0184.png 0191.png 0198.png 0205.png 0212.png 0219.png 0226.png 0233.png 0240.png 0247.png 0254.png 0261.png 0268.png 0275.png 0282.png
0101.png 0108.png 0115.png 0122.png 0129.png 0136.png 0143.png 0150.png 0157.png 0164.png 0171.png 0178.png 0185.png 0192.png 0199.png 0206.png 0213.png 0220.png 0227.png 0234.png 0241.png 0248.png 0255.png 0262.png 0269.png 0276.png 0283.png
0102.png 0109.png 0116.png 0123.png 0130.png 0137.png 0144.png 0151.png 0158.png 0165.png 0172.png 0179.png 0186.png 0193.png 0200.png 0207.png 0214.png 0221.png 0228.png 0235.png 0242.png 0249.png 0256.png 0263.png 0270.png 0277.png 0284.png
0103.png 0110.png 0117.png 0124.png 0131.png 0138.png 0145.png 0152.png 0159.png 0166.png 0173.png 0180.png 0187.png 0194.png 0201.png 0208.png 0215.png 0222.png 0229.png 0236.png 0243.png 0250.png 0257.png 0264.png 0271.png 0278.png
0104.png 0111.png 0118.png 0125.png 0132.png 0139.png 0146.png 0153.png 0160.png 0167.png 0174.png 0181.png 0188.png 0195.png 0202.png 0209.png 0216.png 0223.png 0230.png 0237.png 0244.png 0251.png 0258.png 0265.png 0272.png 0279.png
0105.png 0112.png 0119.png 0126.png 0133.png 0140.png 0147.png 0154.png 0161.png 0168.png 0175.png 0182.png 0189.png 0196.png 0203.png 0210.png 0217.png 0224.png 0231.png 0238.png 0245.png 0252.png 0259.png 0266.png 0273.png 0280.png
0106.png 0113.png 0120.png 0127.png 0134.png 0141.png 0148.png 0155.png 0162.png 0169.png 0176.png 0183.png 0190.png 0197.png 0204.png 0211.png 0218.png 0225.png 0232.png 0239.png 0246.png 0253.png 0260.png 0267.png 0274.png 0281.png
For some script, I will need to find out what the number of the first missing file is. In the above output, it would be 0285.png. However, it is also possible that files in between are missing. In the end, I am only interested in the number 285, which is part of the file name.
This is part of recovery logic: The files should be created by the script, but this step can fail. Therefore I want to have a means to check which files are missing and try to create them in a second step.
This is what I got so far (from how to extract part of a filename before '.' or before extension):
ls blender/output/celebAnim/ | awk -F'[.]' '{print $1}'
What I cannot figure out is how do I find the smallest number missing from that result, above a certain offset? The offset in this case is 100.
You could loop over all number from 100 to 500 and check if the corresponding file exists; if it doesn't, you'd print the number you're looking at:
for i in {100..500}; do
[[ ! -f 0$i.png ]] && { echo "$i missing!"; break; }
done
This prints, for your example, 285 missing!.
This solution could be made a bit more flexible by, for example, looping over zero padded numbers and then extracting the unpadded number:
for i in {0100..0500}; do
[[ ! -f $i.png ]] && { echo "${i##*(0)} missing!"; break; }
done
This requires extended globs (shopt -s extglob) for the *(0) pattern ("zero or more repetitions of 0").
begin=100
end=500
for i in `seq $begin 1 $end`; do
fname="0"$i".png"
if [ ! -f $fname ]; then
echo "$fname is missing"
fi
done
#!/bin/sh
search_dir=blender/output/celebAnim/
ls $search_dir > file_list
count=`wc -l file_list | awk '{ print $1 }'`
if [[ $count -eq 0 ]]
then
echo "No files in given directory!"
break
fi
file_extension=`head -1 file_list | tail -1 | awk -F "." '{ print $2 }'`
init_file_value=`head -1 file_list | tail -1 | awk -F "." '{ print $1 }'`
i=2
while [ $i -le $count ]
do
next_file_value=`head -$i file_list | tail -1 | awk -F "." '{ print $1 }'`
next_value=$((init_file_value+1));
if [ $next_file_value -ne $next_value ]
then
echo $next_value"."$file_extension
break
fi
init_file_value=$next_value;
i=$((i+1));
done
try it:
ls blender/output/celebAnim/ | sort -r | head -n1 | awk -F'.' '{print $1+1}'
command return 285
if need return 0285 than try it:
ls blender/output/celebAnim/ | sort -r | head -n1 | awk -F'.' '{print 0($1+1)}'
There are some directories in the working directory with this template
cas-2-32
sat-4-64
...
I want to loop over the directory names and grab the second and third part of folder names. I have wrote this script. The body shows what I want to do. But the awk command seems to be wrong
#!/bin/bash
for file in `ls`; do
if [ -d $file ]; then
arg2=`awk -F "-" '{print $2}' $file`
echo $arg2
arg3=`awk -F "-" '{print $3}' $file`
echo $arg3
fi
done
but it says
awk: cmd. line:1: fatal: cannot open file `cas-2-32' for reading (Invalid argument)
awk expects a filename as input. Since you have said the cas-2-32 etc are directories, awk fails for the same reason.
Feed the directory names to awk using echo:
#!/bin/bash
for file in `ls`; do
if [ -d $file ]; then
arg2=$(echo $file | awk -F "-" '{print $2}')
echo $arg2
arg3=$(echo $file | awk -F "-" '{print $3}')
echo $arg3
fi
done
Simple comand: ls | awk '{ FS="-"; print $2" "$3 }'
If you want the values in each line just add "\n" instead of a space in awk's print.
When executed like this
awk -F "-" '{print $2}' $file
awk treats $file's value as the file to be parsed, instead of parsing $file's value itself.
The minimal fix is to use a here-string which can feed the value of a variable into stdin of a command:
awk -F "-" '{print $2}' <<< $file
By the way, you don't need ls if you merely want a list of files in current directory, use * instead, i.e.
for file in *; do
One way:
#!/bin/bash
for file in *; do
if [ -d $file ]; then
tmp="${file#*-}"
arg2="${tmp%-*}"
arg3="${tmp#*-}"
echo "$arg2"
echo "$arg3"
fi
done
The other:
#!/bin/bash
IFS="-"
for file in *; do
if [ -d $file ]; then
set -- $file
arg2="$2"
arg3="$3"
echo "$arg2"
echo "$arg3"
fi
done