I have CSV data with two price columns. If a value exists in the $4 column I want to copy it over the $3 column of the same row. If $4 is empty then $3 should be left as is.
Neither of these work:
awk -F',' '{ if (length($4) == 0) $3=$4 }'
awk -F',' '{ if(!length($4) == 0 ) print $4 }'
This will output every line with the sample table
awk -F',' '{ if(!length($4) == 0 ) print $0 }' inputfile
This will output nothing with the sample table
awk -F',' '{ if(length($4) == 0 ) print $3 }' inputfile
I've cleaned my two input files, fixed the header row, and joined them using sed, awk, sort, and join. Now what I am left with is a CSV which looks like this:
itemnumber,available,regprice,mapprice
00061,9,19.30,
00061030,31,2.87,3.19
00062,9,15.44,
00062410,2,3.59,3.99
00064,9,15.44,
00066850,29,2.87,3.99
00066871,49,4.19,5.99
00066878,3,5.63,7.99
I need to overwrite the $3 column if the $4 column in the same row has a value. The end result would be:
itemnumber,available,regprice,mapprice
00061,9,19.30,
00061030,31,3.19,3.19
00062,9,15.44,
00062410,2,3.99,3.99
00064,9,15.44,
00066850,29,3.99,3.99
00066871,49,5.99,5.99
00066878,3,7.99,7.99
$ awk 'BEGIN{FS=OFS=","} (NR>1) && ($4!=""){$3=$4} 1' file
itemnumber,available,regprice,mapprice
00061,9,19.30,
00061030,31,3.19,3.19
00062,9,15.44,
00062410,2,3.99,3.99
00064,9,15.44,
00066850,29,3.99,3.99
00066871,49,5.99,5.99
00066878,3,7.99,7.99
Let's have a look at all the things you tried:
awk -F',' '{ if (length($4) == 0) $3=$4 }'
This states, if the length if field 4 is zero then set field 3 equal to field 4. You do not ask awk to print anything, so it will not do anything. This would have printed something:
awk -F',' '{ if (length($4) == 0) $3=$4 }{print $0}'
but with all field separators equal to a space, you should have done:
awk 'BEGIN{FS=OFS=","}{ if (length($4) == 0) $3=$4 }{print $0}'
awk -F',' '{ if(!length($4) == 0 ) print $4 }'
Here you state, if the length of field 4 equals zero is not true, print field 4.
As you mention that nothing is printed, it most likely indicates that you have hidden characters in field 4, such as a CR (See: Remove carriage return in Unix), or even just spaces. You could attempt something like
awk -F',' '{sub(/ *\r?$/,""){ if(!length($4) == 0 ) print $4 }'`**
awk -F',' '{ if(!length($4) == 0 ) print $0 }' inputfile
See 2
awk -F',' '{ if(length($4) == 0 ) print $3 }' inputfile
This confirms my suspicion of 2
My solution for your problem would be based on the suggestion of 2 and the solution of Ed Morton.
awk 'BEGIN{FS=OFS=","} {sub(/ *\r?/,"")}(NR>1) && ($4!=""){$3=$4} 1' file
Here's code that matches your results:
awk -F, -v OFS=, '
NR == 1
NR > 1 {
if ( $4 == "" )
print $1,$2,$3,$4
else
print $1,$2,$4,$4 }
' $*
I've run into trouble in the past with expressions like $3 = $4, so I just print out all of the fields.
Edit: I got shamed by Ed Morton for avoiding the $3 = $4 without troubleshooting. I gave it another shot here below:
awk -F, -v OFS=, '
NR == 1
NR > 1 {
if ( $4 != "" )
$3 = $4
print
}
' $*
The above achieves the same results.
tried on gnu awk
awk -F, -vOFS=, '/[0-9.]+/{if($4)$3=$4} {print}' file
I am using grep to input a search string from a file and awk to print out the sum of the columns based on the search result using
grep -f input data.txt |awk '{ sum+=$2} END {print sum}'
This gives me the sum with all the input strings. How do I get the sum for each input string separately?
Sample input
a
b
c
Sample data.txt
a/cell1 5
b/cell1 5
a/cell2 8
c/cell1 10
no of lines in input ~32
size of data.txt - 5GB
Expected results:
a 13
b 5
c 5
$ awk 'NR==FNR{sum[$0]=0;next} $1 in sum{sum[$1]+=$2} END{for (key in sum) print key, sum[key]}' input data.txt
a 2
b 1
c 1
Hard to tell without seeing your files, but maybe:
grep -f input data.txt | \
awk '{sum[$1] += $2} END { for (key in sum) { print key, sum[key] } }'
The following avoids accumulating unnecessary details, and therefore may circumvent the memory allocation error. It assumes the list of the strings of interest is in a file named input:
awk -v dict=input '
BEGIN {while((getline<dict) > 0) {a[$1]=1}}
a[$1] {sum[$1] += $2}
END { for (key in sum) { print key, sum[key] } }'
If this does not resolve the memory issue, then please give some details about your awk, OS, and anything else that may be relevant.
Is this fast enough running againt your 5GB file?
awk 'NR == FNR {sum[$1]+=$2} NR != FNR {printf "%s %s\n", $1, sum[$1] }' file1 file2
Where file1 is the 5GB file and file2 is the file containing the strings you want to find in file1.
EDIT
As #EdMorton commented earlier, my solution will print blank for sum[$1] when $1 is not found.
In addition, #EdMorton provided an answer which will print 0 instead.
I suggest to check out his answer first, as it is assumed to meet your needs better.
I am trying to find the minimum value from a file.
input.txt
1
2
4
5
6
4
This is the code that I am using:
awk '{sum += $1; min = min < $1 ? min : $1} !(FNR%6){print min;sum=min = ""}' input.txt
But it is not working. Can anybody see the error in my code?
use below scrip to find min value in txt file.
awk 'min=="" || $1 < min {min=$1} END {print min}' input.txt
Set min to $1 on the first line
awk 'NR == 1 {min = $1} {sum += $1; min = min < $1 ? min : $1} !(FNR%6){print min;sum=min = ""}' input.txt
output:
1
Note that sum isn't used, you could simplify to this:
awk 'NR == 1 {min = $1;} {min = min < $1 ? min : $1} !(FNR%6){print min;}' input.txt
To allow any number of lines:
awk 'NR == 1 {min = $1;} {min = min < $1 ? min : $1} END{print min;}' input.txt
Suppose I have 3 records :
P1||1234|
P1|56001||
P1|||NJ
I want to merge these 3 records into one with all the attributes. Final record :
P1|56001|1234|NJ
Is there any way to achieve this in Unix/Linux?
I assume you ask solution with bash, awk, sed etc.
You could try something like
$ cat test.txt
P1||1234|
P1|56001||
P1|||NJ
$ cat test.txt | awk -F'|' '{ for (i = 1; i <= NF; i++) print $i }' | egrep '.+' | sort | uniq | awk 'BEGIN{ c = "" } { printf c $0; c = "|" } END{ printf "\n" }'
1234|56001|NJ|P1
Briefly, awk splits the lines with '|' separator and prints each field to a line. egrep removes the empty lines. After that, sort and uniq removes multiple attributes. Finally, awk merges the lines with '|' separator.
Update:
If I understand correctly, here's what you seek for;
$ cat test.txt | awk -F'|' '{ for (i = 1; i <= NF; i++) if($i) col[i]=$i } END{ for (i = 1; i <= length(col); i++) printf col[i] (i == length(col) ? "\n" : "|")}'
P1|56001|1234|NJ
In your example, 1st row you have 1234, 2nd row you have 56001.
I don't get why in your final result, the 56001 goes before 1234. I assume it is a typo/mistake.
an awk-oneliner could do the job:
awk -F'|' '{for(i=2;i<=NF;i++)if($i)a[$1]=(a[$1]?a[$1]"|":"")$i}END{print $1"|"a[$1]}'
with your data:
kent$ echo "P1||1234|
P1|56001||
P1||NJ"|awk -F'|' '{for(i=2;i<=NF;i++)if($i)a[$1]=(a[$1]?a[$1]"|":"")$i}END{print $1"|"a[$1]}'
P1|1234|56001|NJ
can a user input variable($userinput) compare with a value?
awk -F: '$1 < $userinput { printf .... }'
This comparison expression seems ok to me, but it gives an error?
Try doing this :
awk -vuserinput="$userinput" -F: '$1 < userinput {}'
A real example :
read -p "Give me an integer >>> " int
awk -v input=$int '$1 < input {print $1, "is less than", input}' <<< 1