I have a file like this;
2018-01-02;1.5;abcd;111
2018-01-04;2.75;efgh;222
2018-01-07;5.25;lmno;333
2018-01-09;1.25;prs;444
I'd like to add double ticks to non-numeric columns, so the new file should look like;
"2018-01-02";1.5;"abcd";111
"2018-01-04";2.75;"efgh";222
"2018-01-07";5.25;"lmno";333
"2018-01-09";1.25;"prs";444
I tried this so far, know that this is not the correct way
head myfile.csv -n 4 | awk 'BEGIN{FS=OFS=";"} {gsub($1,echo $1 ,$1)} 1' | awk 'BEGIN{FS=OFS=";"} {gsub($3,echo "\"" $3 "\"",$3)} 1'
Thanks in advance.
You may use this awk that sets ; as input/output delimiter and then wraps each field with "s if that field is non-numeric:
awk '
BEGIN {
FS = OFS = ";"
}
{
for (i=1; i<=NF; ++i)
$i = ($i+0 == $i ? $i : "\"" $i "\"")
} 1' file
"2018-01-02";1.5;"abcd";111
"2018-01-04";2.75;"efgh";222
"2018-01-07";5.25;"lmno";333
"2018-01-09";1.25;"prs";444
Alternative gnu-awk solution:
awk -v RS='[;\n]' '$0+0 != $0 {$0 = "\"" $0 "\""} {ORS=RT} 1' file
Using GNU awk and typeof(): Fields - - that are numeric strings have the strnum attribute. Otherwise, they have the string attribute.1
$ gawk 'BEGIN {
FS=OFS=";"
}
{
for(i=1;i<=NF;i++)
if(typeof($i)=="string")
$i=sprintf("\"%s\"",$i)
}1' file
Some output:
"2018-01-02";1.5;"abcd";111
- -
Edit:
If some the fields are already quoted:
$ gawk 'BEGIN {
FS=OFS=";"
}
{
for(i=1;i<=NF;i++)
if(typeof($i)=="string")
gsub(/^"?|"?$/,"\"",$i)
}1' <<< string,123,"quoted string"
Output:
"string",123,"quoted string"
Further enhancing upon anubhava's solution (including handling fields already double-quoted :
gawk -e 'sub(".+",$-_==+$-_?"&":(_)"&"_\
)^gsub((_)_, _)^(ORS = RT)' RS='[;\n]' \_='\42'
"2018-01-02";1.5;"abcd";111
"2018-01-04";2.75;"efgh";222
"2018-01-07";5.25;"lmno";333
"2018-01-09";1.25;"prs";444
"2018-01-09";1.25;"prs";111111111111111111112222222222
222222223333333333333333333333
333344444444444444444499999999
999991111111111111111111122222
222222222222233333333333333333
333333333444444444444444444999
999999999991111111111111111111
122222222222222222233333333333
333333333333333444444444444444
444999999999999991111111111111
111111122222222222222222233333
333333333333333333333444444444
444444444999999999999991111111
111111111111122222222222222222
233333333333333333333333333444
444444444444444999999999999991
111111111111111111122222222222
222222233333333333333333333333
333444444444444444444999999999
999991111111111111111111122222
222222222222233333333333333333
333333333444444444444444444999
999999999999
Hi I have something like this in my bash script
gawk '{ printf "%s %s", $1, $2 }' test.txt
Normaly I can limit the lenght of the strings with a number before the s like this:
# Limit strings to 10 characters
gawk '{ printf "%10s %10s", $1, $2 }' test.txt
How I can use a var for that limit ? I get the width of the current terminal with tput cols and I want to set dinamically the lenght of the strings.
You can use * in the format specifier to get the width from an argument to printf:
$ gawk -v width=10 '{ printf "%*s %*s\n", width, $1, width, $2 }' <<<"a b"
a b
$ gawk -v width=3 '{ printf "%*s %*s\n", width, $1, width, $2 }' <<<"a b"
a b
This also works with the precision argument (The number after a . in the format).
One way is to break up your format string and insert the variable.
awk -v w=10 '{printf "%."w"f %."w"f", $1, $2}' <<< '1.2 3.5'
1.2000000000 3.5000000000
awk -v w=3 '{printf "%."w"f %."w"f", $1, $2}' <<< '1.2 3.5'
1.200 3.500
I have CSV data with two price columns. If a value exists in the $4 column I want to copy it over the $3 column of the same row. If $4 is empty then $3 should be left as is.
Neither of these work:
awk -F',' '{ if (length($4) == 0) $3=$4 }'
awk -F',' '{ if(!length($4) == 0 ) print $4 }'
This will output every line with the sample table
awk -F',' '{ if(!length($4) == 0 ) print $0 }' inputfile
This will output nothing with the sample table
awk -F',' '{ if(length($4) == 0 ) print $3 }' inputfile
I've cleaned my two input files, fixed the header row, and joined them using sed, awk, sort, and join. Now what I am left with is a CSV which looks like this:
itemnumber,available,regprice,mapprice
00061,9,19.30,
00061030,31,2.87,3.19
00062,9,15.44,
00062410,2,3.59,3.99
00064,9,15.44,
00066850,29,2.87,3.99
00066871,49,4.19,5.99
00066878,3,5.63,7.99
I need to overwrite the $3 column if the $4 column in the same row has a value. The end result would be:
itemnumber,available,regprice,mapprice
00061,9,19.30,
00061030,31,3.19,3.19
00062,9,15.44,
00062410,2,3.99,3.99
00064,9,15.44,
00066850,29,3.99,3.99
00066871,49,5.99,5.99
00066878,3,7.99,7.99
$ awk 'BEGIN{FS=OFS=","} (NR>1) && ($4!=""){$3=$4} 1' file
itemnumber,available,regprice,mapprice
00061,9,19.30,
00061030,31,3.19,3.19
00062,9,15.44,
00062410,2,3.99,3.99
00064,9,15.44,
00066850,29,3.99,3.99
00066871,49,5.99,5.99
00066878,3,7.99,7.99
Let's have a look at all the things you tried:
awk -F',' '{ if (length($4) == 0) $3=$4 }'
This states, if the length if field 4 is zero then set field 3 equal to field 4. You do not ask awk to print anything, so it will not do anything. This would have printed something:
awk -F',' '{ if (length($4) == 0) $3=$4 }{print $0}'
but with all field separators equal to a space, you should have done:
awk 'BEGIN{FS=OFS=","}{ if (length($4) == 0) $3=$4 }{print $0}'
awk -F',' '{ if(!length($4) == 0 ) print $4 }'
Here you state, if the length of field 4 equals zero is not true, print field 4.
As you mention that nothing is printed, it most likely indicates that you have hidden characters in field 4, such as a CR (See: Remove carriage return in Unix), or even just spaces. You could attempt something like
awk -F',' '{sub(/ *\r?$/,""){ if(!length($4) == 0 ) print $4 }'`**
awk -F',' '{ if(!length($4) == 0 ) print $0 }' inputfile
See 2
awk -F',' '{ if(length($4) == 0 ) print $3 }' inputfile
This confirms my suspicion of 2
My solution for your problem would be based on the suggestion of 2 and the solution of Ed Morton.
awk 'BEGIN{FS=OFS=","} {sub(/ *\r?/,"")}(NR>1) && ($4!=""){$3=$4} 1' file
Here's code that matches your results:
awk -F, -v OFS=, '
NR == 1
NR > 1 {
if ( $4 == "" )
print $1,$2,$3,$4
else
print $1,$2,$4,$4 }
' $*
I've run into trouble in the past with expressions like $3 = $4, so I just print out all of the fields.
Edit: I got shamed by Ed Morton for avoiding the $3 = $4 without troubleshooting. I gave it another shot here below:
awk -F, -v OFS=, '
NR == 1
NR > 1 {
if ( $4 != "" )
$3 = $4
print
}
' $*
The above achieves the same results.
tried on gnu awk
awk -F, -vOFS=, '/[0-9.]+/{if($4)$3=$4} {print}' file
Suppose I have 3 records :
P1||1234|
P1|56001||
P1|||NJ
I want to merge these 3 records into one with all the attributes. Final record :
P1|56001|1234|NJ
Is there any way to achieve this in Unix/Linux?
I assume you ask solution with bash, awk, sed etc.
You could try something like
$ cat test.txt
P1||1234|
P1|56001||
P1|||NJ
$ cat test.txt | awk -F'|' '{ for (i = 1; i <= NF; i++) print $i }' | egrep '.+' | sort | uniq | awk 'BEGIN{ c = "" } { printf c $0; c = "|" } END{ printf "\n" }'
1234|56001|NJ|P1
Briefly, awk splits the lines with '|' separator and prints each field to a line. egrep removes the empty lines. After that, sort and uniq removes multiple attributes. Finally, awk merges the lines with '|' separator.
Update:
If I understand correctly, here's what you seek for;
$ cat test.txt | awk -F'|' '{ for (i = 1; i <= NF; i++) if($i) col[i]=$i } END{ for (i = 1; i <= length(col); i++) printf col[i] (i == length(col) ? "\n" : "|")}'
P1|56001|1234|NJ
In your example, 1st row you have 1234, 2nd row you have 56001.
I don't get why in your final result, the 56001 goes before 1234. I assume it is a typo/mistake.
an awk-oneliner could do the job:
awk -F'|' '{for(i=2;i<=NF;i++)if($i)a[$1]=(a[$1]?a[$1]"|":"")$i}END{print $1"|"a[$1]}'
with your data:
kent$ echo "P1||1234|
P1|56001||
P1||NJ"|awk -F'|' '{for(i=2;i<=NF;i++)if($i)a[$1]=(a[$1]?a[$1]"|":"")$i}END{print $1"|"a[$1]}'
P1|1234|56001|NJ
I need to compare two strings in alphabetic order, not only equality test. I want to know is there way to do string comparison in awk?
Sure it can:
pax$ echo 'hello
goodbye' | gawk '{if ($0 == "hello") {print "HELLO"}}'
HELLO
You can also do inequality (ordered) testing as well:
pax> printf 'aaa\naab\naac\naad\n' | gawk '{if ($1 < "aac"){print}}'
aaa
aab
You can do string comparison in awk using standard boolean operators, unlike in C where you would have to use strcmp().
echo "xxx yyy" > test.txt
cat test.txt | awk '$1!=$2 { print($1 $2); }'
You can check the answer in the nawk manual
echo aaa bbb | awk '{ print ($1 >= $2) ? "true" : "false" }'