Add/Sub/Mul/Div a constant to a column in a csv file in linux shell scripting - linux

I am trying to modify the contents of a particular column in a csv file by dividing a constant.
For Ex: If the contents are
1000,abc,0,1
2000,cde,2,3 and so on..
I would like to change it to
1,abc,0,1
2,cde,2,3
I went through all the previous solutions in this blog, and i tried this
awk -F\; '{$1=($1/1000)}1' file.csv > tmp.csv && mv tmp.csv file.csv
The above command opens up file.csv , performs $1/1000 and save it to a temporary file and then overwrites to the original file.
The problem i see is, in the final file.csv, The contents displayed are as follows
1
2
3
4 and so on ..
It doesn't copy all the other columns except column 1.
How can i fix this ?

Because your file is comma-separated, you need to specify a comma as the field separator on both input and output:
$ awk -F, '{$1=($1/1000)}1' OFS=, file.csv
1,abc,0,1
2,cde,2,3
-F, tells awk to use a comma as the field separator on input.
OFS=, tells awk to use a comma as the field separator on output.
Changing the file in-place
With a modern GNU awk:
awk -i inplace -F, '{$1=($1/1000)}1' OFS=, file.csv
With BSD/OSX or other non-GNU awk:
awk -F, '{$1=($1/1000)}1' OFS=, file.csv >tmp && mv tmp file.csv
Alternate style
Some stylists prefer OFS to be set before the code:
awk -F, -v OFS=, '{$1=($1/1000)}1' file.csv

Related

How do you change column names to lowercase with linux and store the file as it is?

I am trying to change the column names to lowercase in a csv file. I found the code to do that online but I dont know how to replace the old column names(uppercase) with new column names(lowercase) in the original file. I did something like this:
$cat head -n1 xxx.csv | tr "[A-Z]" "[a-z]"
But it simply just prints out the column names in lowercase, which is not enough for me.
I tried to add sed -i but it did not do any good. Thanks!!
Using awk (readability winner) :
concise way:
awk 'NR==1{print tolower($0);next}1' file.csv
or using ternary operator:
awk '{print (NR==1) ? tolower($0): $0}' file.csv
or using if/else statements:
awk '{if (NR==1) {print tolower($0)} else {print $0}}' file.csv
To change the file for real:
awk 'NR==1{print tolower($0);next}1' file.csv | tee /tmp/temp
mv /tmp/temp file.csv
For your information, sed using the in place edit switch -i do the same: it use a temporary file under the hood.
You can check this by using :
strace -f -s 800 sed -i'' '...' file
Using perl:
perl -i -pe '$_=lc() if $.==1' file.csv
It replace the file on the fly with -i switch
You can use sed to tell it to replace the first line with all lower-case and then print the rest as-is:
sed '1s/.*/\L&/' ./xxx.csv
Redirect the output or use -i to do an in-place edit.
Proof of Concept
$ echo -e "COL1,COL2,COL3\nFoO,bAr,baZ" | sed '1s/.*/\L&/'
col1,col2,col3
FoO,bAr,baZ

Awk 3rd column if second coulmn matches with a variable

I am new to Awk and linux. I want to print 3rd column if 2nd column matches with a variable.
file.txt
1;XYZ;123
2;ABC;987
3;ZZZ;999
So I want to print 987, After checking if 2nd column is ABC
name="ABC"
awk -F';' '$2==$name { print $3 }' file.txt
But this is not working. Please help. Please note, I want to use AWK only, to understand how this can be achieved using awk.
Do following and it should fly then. In awk variables don't work like shell you have to explicitly mention them by using -v var_name in awk code.
name="ABC"
awk -F';' -v name="$name" '$2==name{ print $3 }' file.txt

How does the shell generate input for awk

Say I have a file1 containing:
1,2,3,4
I can use awk to process that file like this;
awk -v FS="," '{print $1}' file1
Also I can invoke awk with a Here String, meaning I read from stdin:
awk -v FS="," '{print $1}' <<<"9,10,11,12"
Command 1 yields the result 1 and command 2 yields 9 as expected.
Now say I have a second file2:
4,5
If I parse both files with awk sequentally:
awk -v FS="," '{print $1}' file1 file2
I get:
1
4
as expected.
But if I'm mixing reading from stdin and reading from files, the content I'm reading from stdin gets ignored and only the content in the files get processed sequentially:
awk -v FS="," '{print $1}' file1 file2 <<<"9,10,11,12"
awk -v FS="," '{print $1}' file1 <<<"9,10,11,12" file2
awk -v FS="," '{print $1}' <<<"9,10,11,12" file1 file2
All three commands yield:
1
4
which means the content from stdin simply gets thrown away. Now what is the shell doing?
Interestingly if I change command 3 to:
awk -v FS="," '{print $1}' <<<"9,10,11,12",file1,file2
I simply get 9 , which makes sense, as file1/2 are just two more fields from stdin. But why is then
awk -v FS="," '{print $1}' <<<"9,10,11,12" file1 file2
not expanded to
awk -v FS="," '{print $1}' <<<"9,10,11,12 file1 file2"
which would also yield the result 9?
And why does the content from stdin gets ignored? The same question arises for command 1 and 2. What is the shell doing here?
I tried out the commands on: GNU bash, version 4.2.53(1)-release
Standard input and input from files don't mix together well. This behavior is not exclusive to awk, you will find it in a lot of command line applications. It is logical if you think of it like this:
Files need to be processed one by one. The consuming application does not have control over when the input behind STDIN starts and stops. Look at echo a,b,c | awk -F, '{print $1}' file1 file2. In what order do the incoming "files" need to be read? When If you think about when FNR would need to be reset, or what FILENAME should be, it becomes clear that it is hard to make this right.
One trick that you can play, is to let awk (or any other program) read from a file descriptor generated by the shell. awk -F, '{print $1}' file1 <(echo 4,5,6) file2 will do what you expected in the first place.
What happens here, is that a proper file descriptor is created with the <(...) syntax (say: /proc/self/fd/11), and the reading program can treat it just like a file. It is the second argument, so it is the second file. FNR and FILENAME are all clear what they should be.

Awk split file give incomplete lines

My file is a csv file with comma delimited fields.
I tried to split the file into multiple files by first field. I did the following:
cat myfile.csv | awk -F',' '{print $0 > "Mydata"$1".csv"}'
It does split the file, but the file is corrupted, the last line of each file is not complete. The breaking position seems random. Anyone has the same problem?
These types of problem are invariably because you created your input file on Windows and so it has spurious control-Ms at the end of the lines. Run dos2unix on your input file to clean it up then re-run your awk command but re-write it as:
awk -F',' '{print > ("Mydata" $1 ".csv") }' myfile.csv
to solve a couple of unrelated problems.
Use this awk command to ignore \r characters before \n:
awk -F ',' -v RS='\r\n' '{print > ("Mydata" $1 ".csv") }' myfile.csv
Just don't forget to close your files :
awk -F ',' '{ f="Mydata"$1".csv"; print $0 > f; close(f) }' myfile.csv
Use a real CSV parser/generator instead. It's safe for unusual inputs including those with multi-lined values. And here's a one-liner for Ruby:
ruby -e 'require "csv";CSV.foreach(ARGV.shift){|r| File.open("Mydata#{r[0]}.csv","w").puts(CSV.generate_line(r))}' file.csv

How to reverse order of fields using AWK?

I have a file with the following layout:
123,01-08-2006
124,01-09-2007
125,01-10-2009
126,01-12-2010
How can I convert it into the following by using AWK?
123,2006-08-01
124,2007-09-01
125,2009-10-01
126,2009-12-01
Didn't read the question properly the first time. You need a field separator that can be either a dash or a comma. Once you have that you can use the dash as an output field separator (as it's the most common) and fake the comma using concatenation:
awk -F',|-' 'OFS="-" {print $1 "," $4,$3,$2}' file
Pure awk
awk -F"," '{ n=split($2,b,"-");$2=b[3]"-"b[2]"-"b[1];$i=$1","$2 } 1' file
sed
sed -r 's/(^.[^,]*,)([0-9]{2})-([0-9]{2})-([0-9]{4})/\1\4-\3-\2/' file
sed 's/\(^.[^,]*,\)\([0-9][0-9]\)-\([0-9][0-9]\)-\([0-9]\+\)/\1\4-\3-\2/' file
Bash
#!/bin/bash
while IFS="," read -r a b
do
IFS="-"
set -- $b
echo "$a,$3-$2-$1"
done <"file"
Unfortunately, I think standard awk only allows one field separator character so you'll have to pre-process the data. You can do this with tr but if you really want an awk-only solution, use:
pax> echo '123,01-08-2006
124,01-09-2007
125,01-10-2009
126,01-12-2010' | awk -F, '{print $1"-"$2}' | awk -F- '{print $1","$4"-"$3"-"$2}'
This outputs:
123,2006-08-01
124,2007-09-01
125,2009-10-01
126,2010-12-01
as desired.
The first awk changes the , characters to - so that you have four fields separated with the same character (this is the bit I'd usually use tr ',' '-' for).
The second awk prints them out in the order you specified, correcting the field separators at the same time.
If you're using an awk implementation that allows multiple FS characters, you can use something like:
gawk -F ',|-' '{print $1","$4"-"$3"-"$2}'
If it doesn't need to be awk, you could use Perl too:
$ perl -nle 'print "$1,$4-$3-$2" while (/(\d{3}),(\d{2})-(\d{2})-(\d{4})\s*/g)' < file.txt

Resources