awk compare value with a input - linux

can a user input variable($userinput) compare with a value?
awk -F: '$1 < $userinput { printf .... }'
This comparison expression seems ok to me, but it gives an error?

Try doing this :
awk -vuserinput="$userinput" -F: '$1 < userinput {}'
A real example :
read -p "Give me an integer >>> " int
awk -v input=$int '$1 < input {print $1, "is less than", input}' <<< 1

Related

Adding double quotes around non-numeric columns by awk

I have a file like this;
2018-01-02;1.5;abcd;111
2018-01-04;2.75;efgh;222
2018-01-07;5.25;lmno;333
2018-01-09;1.25;prs;444
I'd like to add double ticks to non-numeric columns, so the new file should look like;
"2018-01-02";1.5;"abcd";111
"2018-01-04";2.75;"efgh";222
"2018-01-07";5.25;"lmno";333
"2018-01-09";1.25;"prs";444
I tried this so far, know that this is not the correct way
head myfile.csv -n 4 | awk 'BEGIN{FS=OFS=";"} {gsub($1,echo $1 ,$1)} 1' | awk 'BEGIN{FS=OFS=";"} {gsub($3,echo "\"" $3 "\"",$3)} 1'
Thanks in advance.
You may use this awk that sets ; as input/output delimiter and then wraps each field with "s if that field is non-numeric:
awk '
BEGIN {
FS = OFS = ";"
}
{
for (i=1; i<=NF; ++i)
$i = ($i+0 == $i ? $i : "\"" $i "\"")
} 1' file
"2018-01-02";1.5;"abcd";111
"2018-01-04";2.75;"efgh";222
"2018-01-07";5.25;"lmno";333
"2018-01-09";1.25;"prs";444
Alternative gnu-awk solution:
awk -v RS='[;\n]' '$0+0 != $0 {$0 = "\"" $0 "\""} {ORS=RT} 1' file
Using GNU awk and typeof(): Fields - - that are numeric strings have the strnum attribute. Otherwise, they have the string attribute.1
$ gawk 'BEGIN {
FS=OFS=";"
}
{
for(i=1;i<=NF;i++)
if(typeof($i)=="string")
$i=sprintf("\"%s\"",$i)
}1' file
Some output:
"2018-01-02";1.5;"abcd";111
- -
Edit:
If some the fields are already quoted:
$ gawk 'BEGIN {
FS=OFS=";"
}
{
for(i=1;i<=NF;i++)
if(typeof($i)=="string")
gsub(/^"?|"?$/,"\"",$i)
}1' <<< string,123,"quoted string"
Output:
"string",123,"quoted string"
Further enhancing upon anubhava's solution (including handling fields already double-quoted :
gawk -e 'sub(".+",$-_==+$-_?"&":(_)"&"_\
)^gsub((_)_, _)^(ORS = RT)' RS='[;\n]' \_='\42'
"2018-01-02";1.5;"abcd";111
"2018-01-04";2.75;"efgh";222
"2018-01-07";5.25;"lmno";333
"2018-01-09";1.25;"prs";444
"2018-01-09";1.25;"prs";111111111111111111112222222222
222222223333333333333333333333
333344444444444444444499999999
999991111111111111111111122222
222222222222233333333333333333
333333333444444444444444444999
999999999991111111111111111111
122222222222222222233333333333
333333333333333444444444444444
444999999999999991111111111111
111111122222222222222222233333
333333333333333333333444444444
444444444999999999999991111111
111111111111122222222222222222
233333333333333333333333333444
444444444444444999999999999991
111111111111111111122222222222
222222233333333333333333333333
333444444444444444444999999999
999991111111111111111111122222
222222222222233333333333333333
333333333444444444444444444999
999999999999

How to use var inside awk printf

Hi I have something like this in my bash script
gawk '{ printf "%s %s", $1, $2 }' test.txt
Normaly I can limit the lenght of the strings with a number before the s like this:
# Limit strings to 10 characters
gawk '{ printf "%10s %10s", $1, $2 }' test.txt
How I can use a var for that limit ? I get the width of the current terminal with tput cols and I want to set dinamically the lenght of the strings.
You can use * in the format specifier to get the width from an argument to printf:
$ gawk -v width=10 '{ printf "%*s %*s\n", width, $1, width, $2 }' <<<"a b"
  a b
$ gawk -v width=3 '{ printf "%*s %*s\n", width, $1, width, $2 }' <<<"a b"
a b
This also works with the precision argument (The number after a . in the format).
One way is to break up your format string and insert the variable.
awk -v w=10 '{printf "%."w"f %."w"f", $1, $2}' <<< '1.2 3.5'
1.2000000000 3.5000000000
awk -v w=3 '{printf "%."w"f %."w"f", $1, $2}' <<< '1.2 3.5'
1.200 3.500

How to copy a value from one column to another?

I have CSV data with two price columns. If a value exists in the $4 column I want to copy it over the $3 column of the same row. If $4 is empty then $3 should be left as is.
Neither of these work:
awk -F',' '{ if (length($4) == 0) $3=$4 }'
awk -F',' '{ if(!length($4) == 0 ) print $4 }'
This will output every line with the sample table
awk -F',' '{ if(!length($4) == 0 ) print $0 }' inputfile
This will output nothing with the sample table
awk -F',' '{ if(length($4) == 0 ) print $3 }' inputfile
I've cleaned my two input files, fixed the header row, and joined them using sed, awk, sort, and join. Now what I am left with is a CSV which looks like this:
itemnumber,available,regprice,mapprice
00061,9,19.30,
00061030,31,2.87,3.19
00062,9,15.44,
00062410,2,3.59,3.99
00064,9,15.44,
00066850,29,2.87,3.99
00066871,49,4.19,5.99
00066878,3,5.63,7.99
I need to overwrite the $3 column if the $4 column in the same row has a value. The end result would be:
itemnumber,available,regprice,mapprice
00061,9,19.30,
00061030,31,3.19,3.19
00062,9,15.44,
00062410,2,3.99,3.99
00064,9,15.44,
00066850,29,3.99,3.99
00066871,49,5.99,5.99
00066878,3,7.99,7.99
$ awk 'BEGIN{FS=OFS=","} (NR>1) && ($4!=""){$3=$4} 1' file
itemnumber,available,regprice,mapprice
00061,9,19.30,
00061030,31,3.19,3.19
00062,9,15.44,
00062410,2,3.99,3.99
00064,9,15.44,
00066850,29,3.99,3.99
00066871,49,5.99,5.99
00066878,3,7.99,7.99
Let's have a look at all the things you tried:
awk -F',' '{ if (length($4) == 0) $3=$4 }'
This states, if the length if field 4 is zero then set field 3 equal to field 4. You do not ask awk to print anything, so it will not do anything. This would have printed something:
awk -F',' '{ if (length($4) == 0) $3=$4 }{print $0}'
but with all field separators equal to a space, you should have done:
awk 'BEGIN{FS=OFS=","}{ if (length($4) == 0) $3=$4 }{print $0}'
awk -F',' '{ if(!length($4) == 0 ) print $4 }'
Here you state, if the length of field 4 equals zero is not true, print field 4.
As you mention that nothing is printed, it most likely indicates that you have hidden characters in field 4, such as a CR (See: Remove carriage return in Unix), or even just spaces. You could attempt something like
awk -F',' '{sub(/ *\r?$/,""){ if(!length($4) == 0 ) print $4 }'`**
awk -F',' '{ if(!length($4) == 0 ) print $0 }' inputfile
See 2
awk -F',' '{ if(length($4) == 0 ) print $3 }' inputfile
This confirms my suspicion of 2
My solution for your problem would be based on the suggestion of 2 and the solution of Ed Morton.
awk 'BEGIN{FS=OFS=","} {sub(/ *\r?/,"")}(NR>1) && ($4!=""){$3=$4} 1' file
Here's code that matches your results:
awk -F, -v OFS=, '
NR == 1
NR > 1 {
if ( $4 == "" )
print $1,$2,$3,$4
else
print $1,$2,$4,$4 }
' $*
I've run into trouble in the past with expressions like $3 = $4, so I just print out all of the fields.
Edit: I got shamed by Ed Morton for avoiding the $3 = $4 without troubleshooting. I gave it another shot here below:
awk -F, -v OFS=, '
NR == 1
NR > 1 {
if ( $4 != "" )
$3 = $4
print
}
' $*
The above achieves the same results.
tried on gnu awk
awk -F, -vOFS=, '/[0-9.]+/{if($4)$3=$4} {print}' file

Merging Multiple records into a Unique records with all the non-null values

Suppose I have 3 records :
P1||1234|
P1|56001||
P1|||NJ
I want to merge these 3 records into one with all the attributes. Final record :
P1|56001|1234|NJ
Is there any way to achieve this in Unix/Linux?
I assume you ask solution with bash, awk, sed etc.
You could try something like
$ cat test.txt
P1||1234|
P1|56001||
P1|||NJ
$ cat test.txt | awk -F'|' '{ for (i = 1; i <= NF; i++) print $i }' | egrep '.+' | sort | uniq | awk 'BEGIN{ c = "" } { printf c $0; c = "|" } END{ printf "\n" }'
1234|56001|NJ|P1
Briefly, awk splits the lines with '|' separator and prints each field to a line. egrep removes the empty lines. After that, sort and uniq removes multiple attributes. Finally, awk merges the lines with '|' separator.
Update:
If I understand correctly, here's what you seek for;
$ cat test.txt | awk -F'|' '{ for (i = 1; i <= NF; i++) if($i) col[i]=$i } END{ for (i = 1; i <= length(col); i++) printf col[i] (i == length(col) ? "\n" : "|")}'
P1|56001|1234|NJ
In your example, 1st row you have 1234, 2nd row you have 56001.
I don't get why in your final result, the 56001 goes before 1234. I assume it is a typo/mistake.
an awk-oneliner could do the job:
awk -F'|' '{for(i=2;i<=NF;i++)if($i)a[$1]=(a[$1]?a[$1]"|":"")$i}END{print $1"|"a[$1]}'
with your data:
kent$ echo "P1||1234|
P1|56001||
P1||NJ"|awk -F'|' '{for(i=2;i<=NF;i++)if($i)a[$1]=(a[$1]?a[$1]"|":"")$i}END{print $1"|"a[$1]}'
P1|1234|56001|NJ

String comparison in awk

I need to compare two strings in alphabetic order, not only equality test. I want to know is there way to do string comparison in awk?
Sure it can:
pax$ echo 'hello
goodbye' | gawk '{if ($0 == "hello") {print "HELLO"}}'
HELLO
You can also do inequality (ordered) testing as well:
pax> printf 'aaa\naab\naac\naad\n' | gawk '{if ($1 < "aac"){print}}'
aaa
aab
You can do string comparison in awk using standard boolean operators, unlike in C where you would have to use strcmp().
echo "xxx yyy" > test.txt
cat test.txt | awk '$1!=$2 { print($1 $2); }'
You can check the answer in the nawk manual
echo aaa bbb | awk '{ print ($1 >= $2) ? "true" : "false" }'

Resources