I have CSV data with two price columns. If a value exists in the $4 column I want to copy it over the $3 column of the same row. If $4 is empty then $3 should be left as is.
Neither of these work:
awk -F',' '{ if (length($4) == 0) $3=$4 }'
awk -F',' '{ if(!length($4) == 0 ) print $4 }'
This will output every line with the sample table
awk -F',' '{ if(!length($4) == 0 ) print $0 }' inputfile
This will output nothing with the sample table
awk -F',' '{ if(length($4) == 0 ) print $3 }' inputfile
I've cleaned my two input files, fixed the header row, and joined them using sed, awk, sort, and join. Now what I am left with is a CSV which looks like this:
itemnumber,available,regprice,mapprice
00061,9,19.30,
00061030,31,2.87,3.19
00062,9,15.44,
00062410,2,3.59,3.99
00064,9,15.44,
00066850,29,2.87,3.99
00066871,49,4.19,5.99
00066878,3,5.63,7.99
I need to overwrite the $3 column if the $4 column in the same row has a value. The end result would be:
itemnumber,available,regprice,mapprice
00061,9,19.30,
00061030,31,3.19,3.19
00062,9,15.44,
00062410,2,3.99,3.99
00064,9,15.44,
00066850,29,3.99,3.99
00066871,49,5.99,5.99
00066878,3,7.99,7.99
$ awk 'BEGIN{FS=OFS=","} (NR>1) && ($4!=""){$3=$4} 1' file
itemnumber,available,regprice,mapprice
00061,9,19.30,
00061030,31,3.19,3.19
00062,9,15.44,
00062410,2,3.99,3.99
00064,9,15.44,
00066850,29,3.99,3.99
00066871,49,5.99,5.99
00066878,3,7.99,7.99
Let's have a look at all the things you tried:
awk -F',' '{ if (length($4) == 0) $3=$4 }'
This states, if the length if field 4 is zero then set field 3 equal to field 4. You do not ask awk to print anything, so it will not do anything. This would have printed something:
awk -F',' '{ if (length($4) == 0) $3=$4 }{print $0}'
but with all field separators equal to a space, you should have done:
awk 'BEGIN{FS=OFS=","}{ if (length($4) == 0) $3=$4 }{print $0}'
awk -F',' '{ if(!length($4) == 0 ) print $4 }'
Here you state, if the length of field 4 equals zero is not true, print field 4.
As you mention that nothing is printed, it most likely indicates that you have hidden characters in field 4, such as a CR (See: Remove carriage return in Unix), or even just spaces. You could attempt something like
awk -F',' '{sub(/ *\r?$/,""){ if(!length($4) == 0 ) print $4 }'`**
awk -F',' '{ if(!length($4) == 0 ) print $0 }' inputfile
See 2
awk -F',' '{ if(length($4) == 0 ) print $3 }' inputfile
This confirms my suspicion of 2
My solution for your problem would be based on the suggestion of 2 and the solution of Ed Morton.
awk 'BEGIN{FS=OFS=","} {sub(/ *\r?/,"")}(NR>1) && ($4!=""){$3=$4} 1' file
Here's code that matches your results:
awk -F, -v OFS=, '
NR == 1
NR > 1 {
if ( $4 == "" )
print $1,$2,$3,$4
else
print $1,$2,$4,$4 }
' $*
I've run into trouble in the past with expressions like $3 = $4, so I just print out all of the fields.
Edit: I got shamed by Ed Morton for avoiding the $3 = $4 without troubleshooting. I gave it another shot here below:
awk -F, -v OFS=, '
NR == 1
NR > 1 {
if ( $4 != "" )
$3 = $4
print
}
' $*
The above achieves the same results.
tried on gnu awk
awk -F, -vOFS=, '/[0-9.]+/{if($4)$3=$4} {print}' file
Related
I have a large tsv.gz file (40GB) for which I want to extract a string from an existing variable col3, store it in a new variable New_var (placed at the beginning) and save everything the new file.
an example of the data "old_file.tsv.gz"
col1 col2 col3 col4
1 positive 12:1234A 100
2 negative 10:9638B 110
3 positive 5:0987A 100
4 positive 8:5678A 170
Desired data "new_file.tsv.gz"
New_var col1 col2 col3 col4
12 1 positive 12:1234A 100
10 2 negative 10:9638B 110
5 3 positive 5:0987A 100
8 4 positive 8:5678A 170
I am new in bash so I have tried multiple things but I get stuck, I have tried
zcat old_file.tsv.gz | awk '{print New_var=$3,$0 }' | awk '$1 ~ /^[0-9]:/{print $0 | (gzip -c > new_file.tsv.gz) }'
I think I have multiple problems. {print New_var=$3,$0 } do create a duplicate of col3 but doesn't rename it. Then when I add the last part of the code awk '$1 ~ /^[0-9]:/{print $0 | (gzip -c > new_file.tsv.gz) }'...well nothing comes up (I tried to look if I forget a parenthesis but cannot find the problem).
Also I am not sure if this way is the best way to do it.
Any idea how to make it work?
Make an AWK script in a separate file (for readability), say 1.awk:
{ if (NR > 1) {
# all data lines
split($3, a, ":");
print a[1], $1, $3, $3, $4;
} else {
# header line
print "new_var", $1, $2, $3, $4;
}
}
Now process the input (say 1.csv.gz) with the AWK file:
zcat 1.csv.gz | awk -f 1.awk | gzip -c > 1_new.csv.gz
I suggest to use one tab (\t) and : as input field separator:
awk 'BEGIN { FS="[\t:]"; OFS="\t" }
NR==1 { $1="New_var" OFS $1 }
NR>1 { $0=$3 OFS $0 }
{ print }'
As one line:
awk 'BEGIN{ FS="[\t:]"; OFS="\t" } NR==1{ $1="New_var" OFS $1 } NR>1{ $0=$3 OFS $0 } { print }'
See: 8 Powerful Awk Built-in Variables – FS, OFS, RS, ORS, NR, NF, FILENAME, FNR
Hi I have one Simple AWK for combining two files.
awk -v OFS='\t' '
FNR == NR { a[$1] = $2 OFS $10 OFS $11 OFS $13; next }
{ $1 = $1 }
FNR != 1 { print $0, a[$1] }
' $2 $1 > $3
One of Column in $1 File contain Character 'Not Perfect'
After Combine two files this character become Tab delimited.
Like 'Not\tPerfect'
Anyone has good idea why it's happening?
You've set the output separator character OFS to '\t' so any place where you print two things separated by a comma, such as:
print $0, a[$1]
You'll get:
<contents of $0 i.e. the whole input line>\t<the '$1'th value of 'a'>
So either set OFS to a space, or whatever you want using:
OFS=' '
or just use printf instead to avoid implicit use of OFS like:
printf("%s %s\n", $0, a[$1])
can a user input variable($userinput) compare with a value?
awk -F: '$1 < $userinput { printf .... }'
This comparison expression seems ok to me, but it gives an error?
Try doing this :
awk -vuserinput="$userinput" -F: '$1 < userinput {}'
A real example :
read -p "Give me an integer >>> " int
awk -v input=$int '$1 < input {print $1, "is less than", input}' <<< 1
I have a nawk command as shown.
Can anyone explain what this command is suppose to do.
set sampFile = $cur_dir/${qtr}.SAMP
nawk -F "," '{OFS=","; if (($4 == "0000" || $4 == "00000000")) {print $0} }' $samp_input_file >! $sampFile
Given a CSV file pointed to by the variable $samp_input_file this command will print the lines where the 4th field is either 0000 or 00000000 and store the output in the file pointed to by $sampFile.
1,2,3,00
2,2,3,0000
3,2,3,000
4,2,3,00000000
5,2,3,0000
# Cleaner version
awk '{FS=OFS=","; if ($4 == "0000" || $4 == "00000000") print}' file
2,2,3,0000
4,2,3,00000000
5,2,3,0000
I need to select for example:
Column 2 of row 7.
Column 3 of row 8.
Columns 1 and 3 of row 11.
of a specific file and place the result in another file.
This is what I have tried so far:
sed -n -e '7p' -e '8p' -e '11p' Old_File | awk '{printf("%s %s %s\n", $2, $3, $1);}' > New_File.
Awk alone can get it done:
awk 'NR==7{print $2} NR==8{print $3} NR==11{print $1, $3}' Old_file > New_file
Awk is enought for that task:
awk '
FNR == 7 { print $2; next; }
FNR == 8 { print $3; next; }
FNR == 11 { print $1, $3; exit; }
' input-file