I have this script that is meant to trim the field specified as argument to the script.
ie sh script.sh file.txt "|" 2
#!/bin/bash
filename="$1"
delim="$2"
arg="$3"
gsubber="\"gsub("^[ \t]*|[ \t]*$","",'\$$arg')\""
myout=`nawk -F"$delim" -v fl="$gsubber" \'{ { fl } }1\' OFS="$delim" "$filename"`
echo "$myout"
So this file 'file.txt' as input:
sid|storeNo|latitude
9| gerdy| fd¿kjhn422-405
0000543210 |gfdjk39
gfd|fd||fd
becomes this output:
sid|storeNo|latitude
9|gerdy| fd¿kjhn422-405
0000543210 |gfdjk39
gfd|fd||fd
I get this error:
nawk: syntax error at source line 1
context is
' <<<
missing }
nawk: bailing out at source line 1
Once someone can assist with providing the correct syntax, I should have no trouble extending it to support multiple fields. ie sh script.sh file.txt "|" 2 3 could then trim the 2nd and 3rd field only.
Thanks in advance!
Try:
#!/bin/bash
filename=$1
delim=$2
arg=$3
regex='^[ \t]*|[ \t]*$'
myout=$(
nawk -F"$delim" -v regex="$regex" -v arg="$arg" '
{ gsub(regex, "", $arg) }
1' OFS="$delim" "$filename"
)
printf '%s\n' "$myout"
Edit:
In order to handle multiple fields in the arguments (see comments below):
#!/bin/bash
filename=$1
delim=$2
shift 2
args=$#
regex='^[ \t]*|[ \t]*$'
myout=$(
nawk -F"$delim" -v regex="$regex" -v args="$args" '{
n = split(args, t, " ")
for (i = 0; ++i <=n;)
gsub(regex, "", $t[i])
}1' OFS="$delim" "$filename"
)
printf '%s\n' "$myout"
this should work:
#!/bin/bash
filename="$1"
delim="$2"
arg="$3"
myout=`nawk -F"$delim" -v f="$arg" '{gsub(/^[ \t]*|[ \t]*$/,"",$f) }1' OFS="$delim" "$filename"`
echo "$myout"
you don't have to extract gsub out, since in the gsub function call, only field index is variable. you could pass the field index as var to awk.
Related
I am writing a function in a BASH shell script, that should return lines from csv-files with headers, having more commas than the header. This can happen, as there are values inside these files, that could contain commas. For quality control, I must identify these lines to later clean them up. What I have currently:
#!/bin/bash
get_bad_lines () {
local correct_no_of_commas=$(head -n 1 $1/$1_0_0_0.csv | tr -cd , | wc -c)
local no_of_files=$(ls $1 | wc -l)
for i in $(seq 0 $(( ${no_of_files}-1 )))
do
# Check that the file exist
if [ ! -f "$1/$1_0_${i}_0.csv" ]; then
echo "File: $1_0_${i}_0.csv not found!"
continue
fi
# Search for error-lines inside the file and print them out
echo "$1_0_${i}_0.csv has over $correct_no_of_commas commas in the following lines:"
grep -o -n '[,]' "$1/$1_0_${i}_0.csv" | cut -d : -f 1 | uniq -c | awk '$1 > $correct_no_of_commas {print}'
done
}
get_bad_lines products
get_bad_lines users
The output of this program is now all the comma-counts with all of the line numbers in all the files,
and I suspect this is due to the input $1 (foldername, i.e. products & users) conflicting with the call to awk with reference to $1 as well (where I wish to grab the first column being the count of commas for that line in the current file in the loop).
Is this the issue? and if so, would it be solvable by either referencing the 1.st column or the folder name by different variable names instead of both of them using $1 ?
Example, current output:
5 6667
5 6668
5 6669
5 6670
(should only show lines for that file having more than 5 commas).
Tried variable declaration in call to awk as well, with same effect
(as in the accepted answer to Awk field variable clash with function argument)
:
get_bad_lines () {
local table_name=$1
local correct_no_of_commas=$(head -n 1 $table_name/${table_name}_0_0_0.csv | tr -cd , | wc -c)
local no_of_files=$(ls $table_name | wc -l)
for i in $(seq 0 $(( ${no_of_files}-1 )))
do
# Check that the file exist
if [ ! -f "$table_name/${table_name}_0_${i}_0.csv" ]; then
echo "File: ${table_name}_0_${i}_0.csv not found!"
continue
fi
# Search for error-lines inside the file and print them out
echo "${table_name}_0_${i}_0.csv has over $correct_no_of_commas commas in the following lines:"
grep -o -n '[,]' "$table_name/${table_name}_0_${i}_0.csv" | cut -d : -f 1 | uniq -c | awk -v table_name="$table_name" '$1 > $correct_no_of_commas {print}'
done
}
You can use awk the full way to achieve that :
get_bad_lines () {
find "$1" -maxdepth 1 -name "$1_0_*_0.csv" | while read -r my_file ; do
awk -v table_name="$1" '
NR==1 { num_comma=gsub(/,/, ""); }
/,/ { if (gsub(/,/, ",", $0) > num_comma) wrong_array[wrong++]=NR":"$0;}
END { if (wrong > 0) {
print(FILENAME" has over "num_comma" commas in the following lines:");
for (i=0;i<wrong;i++) { print(wrong_array[i]); }
}
}' "${my_file}"
done
}
For why your original awk command failed to give only lines with too many commas, that is because you are using a shell variable correct_no_of_commas inside a single quoted awk statement ('$1 > $correct_no_of_commas {print}'). Thus there no substitution by the shell, and awk read "$correct_no_of_commas" as is, and perceives it as an undefined variable. More precisely, awk look for the variable correct_no_of_commas which is undefined in the awk script so it is an empty string . awk will then execute $1 > $"" as matching condition, and as $"" is a $0 equivalent, awk will compare the count in $1 with the full input line. From a numerical point of view, the full input line has the form <tab><count><tab><num_line>, so it is 0 for awk. Thus, $1 > $correct_no_of_commas will be always true.
You can identify all the bad lines with a single awk command
awk -F, 'FNR==1{print FILENAME; headerCount=NF;} NF>headerCount{print} ENDFILE{print "#######\n"}' /path/here/*.csv
If you want the line number also to be printed, use this
awk -F, 'FNR==1{print FILENAME"\nLine#\tLine"; headerCount=NF;} NF>headerCount{print FNR"\t"$0} ENDFILE{print "#######\n"}' /path/here/*.csv
I have a very crude script getinfo.sh that gets me information from all files with name FILENAME1 and FILENAME2 in all subfolders and the path of the subfolder. The awk result should only pick the nth line from FILENAME2 if the script is called with "getinfo.sh n". I want all the info printed in one line!
The problem is that if i use print instead of printf the info is written to a new line but my script works. If i use printf i can see the last bit of the awk command in the command propt after the script ist done, but it is not paset after the grep command in the same line. All in all the complete line would be pretty long, but that is intentionally. Would you be willing to tell me what i am doing wrong?
#!/bin/bash
IFS=$'\n'
while read -r fname ;
do
pushd $(dirname "${fname}") > /dev/null
printf '%q' "${PWD##*/}"
grep 'Search_term ' FILENAME1 | tail -1
awk '{ if(NR==n) printf "%s",$0 }' n=$1 $2 FILENAME2
popd > /dev/null
done < <(find . -type f -name 'FILENAME1')
I would also be happy to grep the nth line if this is easier?
SOLUTION:
#!/bin/bash
IFS=$'\n'
while read -r fname ;
do
pushd $(dirname "${fname}") > /dev/null
{
printf '%q' "${PWD##*/}"
grep 'Search_term' FILENAME1 | tail -1
} | tr -d '\n'
if [ "$1" -eq "$1" ] 2>/dev/null
then
awk '{ if(NR==n) printf "%s",$0 }' n="$1" FILENAME2
fi
printf "\n"
popd > /dev/null
done < <(find . -type f -name 'FILENAME1')
You made it clearer in the comments.
I want the output of printf '%q' "${PWD##*/}" and grep 'Search_term ' FILENAME1 | tail -1 and awk '{ if(NR==n) printf "%s",$0 }' n=$1 $2 FILENAME2 to be printed in one line
So first, we have three commands, that each print a single line of output. As the commands do not matter, let's wrap them in functions to simplify the answer:
cmd1() { printf '%q\n' "${PWD##*/}"; }
cmd2() { grep .... ; }
cmd3() { awk ....; }
To print them without newlines between them, we can:
Use a command substitution, which removes trailing empty newlines. With some printf:
printf "%s%s%s\n" "$(cmd1)" "$(cmd2)" "$(cmd3)"
or some echo:
echo "$(cmd1) $(cmd2) $(cmd3)"
or append to a variable:
str="$(cmd1)"
str+=" $(cmd2)"
str+=" $(cmd3)"
printf" %s\n" "$str"
and so on.
We can remove newlines from the stream, using tr -d '\n':
{
cmd1
cmd2
cmd3
} | tr -d '\n'
echo # newlines were removed, so add one to the end.
or we can also remove the newlines only from the first n-1 commands, but I think this is less readable:
{
cmd1
cmd2
} | tr -d'\n'
cmd3 # the trailing newline will be added by cmd3
If i do not pass a number the awk command should be omited.
I see that your awk command expands both $1 and $2, and i see only $1 to be passed as the n=$1 environment variable to awk. I don't know what is $2. You can write if-s on the value of $# the number of arguments:
if (($# == 2)); then
awk '{ if(NR==n) printf "%s",$0 }' n="$1" "$2" FILENAME2
fi
and similar for each case you want to handle. Remember about proper quoting.
Your command shows the unused parameter $2, I deleted that one.
You can add a newline at the end of the awk using the END block, but you also want an extra newline when you call your script without a line number. echo will do.
#!/bin/bash
IFS=$'\n'
while read -r fname ;
do
pushd $(dirname "${fname}") > /dev/null
# Add result of grep in same printf statement
printf '%s %s' "${PWD##*/}" "$(grep 'Search_term ' FILENAME1 | tail -1)"
if (( $# -eq 1 )); then
# use $1 as an awk variable, number n
# use $2 as a different file to read from
awk -v n=$1 '{ if(NR==n) printf "%s ",$0 }' FILENAME2
fi
# Add line-ending
echo
popd > /dev/null
done < <(find . -type f -name 'FILENAME1')
In a directory, there is several files such as:
file1
file2
file3
Is there a simple way to concatenate those files to get one line (connected by "OR") in bash as follows:
file1 OR file2 OR file3
Or do I need to write a script for it?
You can use this function to print all filenames (including ones with space, newline or special characters) with " OR " as separator (assuming your filename doesn't contain ASCII code 4):
orfiles() {
local IFS=$'\4'
local out="$*"
echo "${out//$'\4'/ OR }"
}
Then call it as:
orfiles *
How it works:
We set IFS (Internal Field Separator) to ASCII 4 locally inside the function
We store output of "$*" in local variable out. This will place \4 after each filename in variable $out.
Finally using BASH string substitution we globally replace \4 by " OR " while printing the output from $out.
In Unix systems IFS is only a single character delimiter therefore it cannot store multi character string " OR " and we have to do this in 2 steps as shown above.
You can simply do that with
printf '%s OR ' $(ls -1 *) | sed 's/OR $/''/'; echo -e '\n'
Where ls -1 * is the directory.
The moment that should be considered is that a filename could contain whitespace(s).
Use the following ls + awk solution:
ls -1 * | awk '{ r=(r)? r" OR "$0 : $0 }END{ print r }'
Workaround for filenames with newline(s):
echo -e $(ls -1b hello* | awk -v RS= '{gsub(/\n/," OR ",$0); gsub(/\\ /," ",$0); print $0}')
-b - ls option to print C-style escapes for nongraphic characters
ls -1|awk -v q='"' '{printf "%s%s", NR==1?"":" OR ", q $0 q}END{print ""}'
the ls & awk way to do it, with example that the filename containing spaces:
kent$ ls -1
file1
file2
'file with OR and space'
kent$ ls -1|awk -v q='"' '{printf "%s%s", NR==1?"":" OR ", q $0 q}END{print ""}'
"file1" OR "file2" OR "file with OR and space"
$ for f in *; do printf '%s%s' "$s" "$f"; s=" OR "; done; printf '\n'
file1 OR file2 OR file3
I am making a bash script. I have to get 3 variables
VAR1=$(cat /path to my file/ | grep "string1" | awk '{ print $2 }'
VAR2=$(cat /path to my file/ | grep "string2" | awk '{ print $2 }'
VAR3=$(cat /path to my file/ | grep "string3" | awk '{ print $4 }'
My problem is that if I write
echo $VAR1
echo $VAR2
echo $VAR3
I can see values correctly
But when I try to write them in one line like this
echo "VAR1: $VAR1 VAR2: $VAR2 VAR3: $VAR3"
Value from $VAR3 is written at the beginning of output overwritting values of $VAR1 and $VAR2
I expect my explanation had been clear. Any doubt please let me know
Thanks and regards.
Rambert
It seems to me that $VAR3 contains \r which in some shells will move the cursor to the beginning of the line. Use printf instead:
printf "VAR1: %s VAR2: %s VAR3: %s\n" "$VAR1" "$VAR2" "$VAR3"
Also note that the way you extract the values is highly inefficient and can be reduced to one call to awk:
read -r var1 var2 var3 _ < <(awk '/string1/ { a=$2 }
/string2/ { b=$2 }
/string3/ { c=$4 }
END { print(a, b, c) }' /path/to/file)
printf "VAR1: %s VAR2: %s VAR3: %s\n" "$var1" "$var2" "$var3"
A nitpick is that uppercase variable names are reserved for environment variables, so I changed all to lowercase.
<(...) is a process substitution and will make ... write to a "file" and return the file name:
$ echo <(ls)
/dev/fd/63
And command < file is a redirection changing standard input of command to be comming from the file file.
You could write :
cat /path to my file/ | grep "string1" | awk '{ print $2 }'
as
awk '/string1/{print $2}' /path/to/file
In other words you could do with awk alone what you intended to do with cat, grep & awk
So finally get :
VAR1=$(awk '/string1/{print $2}' /path/to/file) #mind the closing ')'
Regarding the issue you face, it looks like you have carriage returns or \r in your variables. In bash echo will not interpret escape sequences without the -e option, but the printf option which [ #andlrc ] pointed out is a good try though as he mentioned in his [ answer ]
which in some shells will move the cursor to the beginning
Notes :
Another subtle point to keep in mind is to avoid using upper case variable names like VAR1 for user scripts. So replace it with var1 or so
When assigning values to variable spaces are not allowed around =, so
VAR1="Note there are no spaces around = sign"
is the right usage
Here I'm accepting few mount points from the user and using each value to get space available on the host.
./user_input.ksh -string /m01,/m02,/m03
#!/bin/ksh
STR=$2
function showMounts {
echo "$STR"
arr=($(tr ',' ' ' <<< "$STR"))
printf "%s\n" "$(arr[#]}"
for x in "${arr[#]}"
do
free_space=`df -h "$x" | grep -v "Avail" | awk '{print $4}'`
echo "$x": free_space "$free_space"
done
#echo "$total_free_space"
}
Problems:
How can I exit for loop if any of the user input mount not avaialble?
currently it only add error in the log.
How to get total_free_space (i.e. sum of free_space)?
If you want to keep your code , test this (no ksh here). If you don't care, read Ed Morton's answer.
./user_input.ksh -string /m01,/m02,/m03
#!/bin/ksh
STR=$2
function showMounts {
echo "$STR"
arr=($(tr ',' ' ' <<< "$STR"))
printf "%s\n" "${arr[#]}"
for x in "${arr[#]}"; do
free_space=$(df -P "$x" | awk 'NR > 1 && !/Avail/{print $4}')
echo "$x: free_space $free_space"
((total_free_space+=$free_space))
done
echo "$((total_free_space/1024/1000))G"
}
showMounts
Caution:
"${arr[#]}"
not
"$(arr[#]}"
As I said in your last question, you do not need ANY of that, all you need is a one-liner like:
df -h "${STR//,/ }" | awk '/^ /{print $5, $3; sum+=$3} END{print sum}'
I have to say "like" because you haven't shown us the df -h /m01 /m02 /m03 output yet so I don't know exactly how to parse it.