awk to print some parameters of a line - linux

I have lines in a file in linux, and i am trying print the line without the | and without some parameters
$cat file
2013-07-15,Provider 1.99,3|30000055|2347|0,12222,1,3,0,0,0,19,aaa,bbb
2013-07-15,Provider 1.99,3|30000055|2347|0,12222,44,12,0,0,0,33,aaa,bbb
and i need the output like:
2013-07-15,Provider,2347,12222,1,3,0,0,0,19,aaa,bbb
2013-07-15,Provider,2347,12222,44,12,0,0,0,33,aaa,bbb
and i am trying with awk, but i have some problems.

If your lines have similar pattern you would to retain then you can do:
awk 'BEGIN{FS=OFS=","}{$2="Provider";$3=2347}1' file
If you don't know what the patterns are then here is a more generic one:
awk 'BEGIN{FS=OFS=","}{split($2,a,/ /);split($3,b,/\|/);$2=a[1];$3=b[3]}1' file
If it doesn't solve your problem, I am pretty sure it would help you guide to get one.

Using sed:
sed 's/ [^|]*|[^|]*|\([^|]*\)|[^,]/,\1/' input
and some shorter version:
sed 's/ .*|\([^|]*\)|[^,]*/,\1/' input
and even shorter:
sed 's/ .*|\(.*\)|[^,]*/,\1/' input

Use awk, and let blank or comma or pipe be the field separators:
awk -F '[[:blank:],|]' -v OFS=, '{
print $1,$2,$6,$8,$9,$10,$11,$12,$13,$14,$15,$16
}' file
2013-07-15,Provider,2347,12222,1,3,0,0,0,19,aaa,bbb
2013-07-15,Provider,2347,12222,44,12,0,0,0,33,aaa,bbb

Related

How do you change column names to lowercase with linux and store the file as it is?

I am trying to change the column names to lowercase in a csv file. I found the code to do that online but I dont know how to replace the old column names(uppercase) with new column names(lowercase) in the original file. I did something like this:
$cat head -n1 xxx.csv | tr "[A-Z]" "[a-z]"
But it simply just prints out the column names in lowercase, which is not enough for me.
I tried to add sed -i but it did not do any good. Thanks!!
Using awk (readability winner) :
concise way:
awk 'NR==1{print tolower($0);next}1' file.csv
or using ternary operator:
awk '{print (NR==1) ? tolower($0): $0}' file.csv
or using if/else statements:
awk '{if (NR==1) {print tolower($0)} else {print $0}}' file.csv
To change the file for real:
awk 'NR==1{print tolower($0);next}1' file.csv | tee /tmp/temp
mv /tmp/temp file.csv
For your information, sed using the in place edit switch -i do the same: it use a temporary file under the hood.
You can check this by using :
strace -f -s 800 sed -i'' '...' file
Using perl:
perl -i -pe '$_=lc() if $.==1' file.csv
It replace the file on the fly with -i switch
You can use sed to tell it to replace the first line with all lower-case and then print the rest as-is:
sed '1s/.*/\L&/' ./xxx.csv
Redirect the output or use -i to do an in-place edit.
Proof of Concept
$ echo -e "COL1,COL2,COL3\nFoO,bAr,baZ" | sed '1s/.*/\L&/'
col1,col2,col3
FoO,bAr,baZ

Print between special characters with sed,grep

I need to print the string between these characters....
atob(' ')
I am using a = in the second part as an attempt to stop the code on an equal signs (which the base64 string I'm trying to get ends in.)
I use this script, but it prints the entire line containing the above characters. I need just the data in between.
sed -n '/atob/,${p;/==/q;}'
I appreciate any help. Thank you.
Does this work (tested for GNU sed 4.2.2)?
 sed -n -e "s/atop('\(.*\)')/\1/p" b.txt
where b.txt is
atop('safdasdfasf')
or you can try awk
awk -F\' '/atop/ {print $2}' b.txt
(tested for gnu awk 4.0.2 and added the suggestion by Jotne)
And another working sed:
echo "atop('safdasdfasf')" | sed -r "/atop/ s/^[^']+'([^']+)'.*/\1/"
safdasdfasf

How to replace one or more consecutive symbols with one symbol in shell

I have a file containing consecutive symbols (as pipe "|") like
ANKRD54,LIAR,allergy,|||
ANKRD54,LIAR,asthma,||20447076||
ANKRD54,LIAR,autism,||||
ANKRD54,LIAR,cancer,|||
ANKRD54,LIAR,chronic_obstructive_pulmonary_disease,|||
ANKRD54,LIAR,dental_caries,||||
Now using shell or a sed command in shell is it possible to replace multiple pipe with one pipe like
ANKRD54,LIAR,allergy,|
ANKRD54,LIAR,asthma,|20447076|
ANKRD54,LIAR,autism,|
ANKRD54,LIAR,cancer,|
ANKRD54,LIAR,chronic_obstructive_pulmonary_disease,|
ANKRD54,LIAR,dental_caries,|
I guess the easiest way is use built-in commands: cat your_file | tr -s '|'
Pass your text to sed (e.g. via a pipe)
cat your_file | sed "s/|\+/|/g"
You can do that with a simple awk gsub as:-
awk -F"," -v OFS="," '{gsub(/[|]+/,"|",$4)}1' file
See it in action:-
$ cat file
ANKRD54,LIAR,allergy,|||
ANKRD54,LIAR,asthma,||20447076||
ANKRD54,LIAR,autism,||||
ANKRD54,LIAR,cancer,|||
ANKRD54,LIAR,chronic_obstructive_pulmonary_disease,|||
ANKRD54,LIAR,dental_caries,||||
$ awk -F"," -v OFS="," '{gsub(/[|]+/,"|",$4)}1' file
NKRD54,LIAR,allergy,|
ANKRD54,LIAR,asthma,|20447076|
ANKRD54,LIAR,autism,|
ANKRD54,LIAR,cancer,|
ANKRD54,LIAR,chronic_obstructive_pulmonary_disease,|
ANKRD54,LIAR,dental_caries,|

How to reverse order of fields using AWK?

I have a file with the following layout:
123,01-08-2006
124,01-09-2007
125,01-10-2009
126,01-12-2010
How can I convert it into the following by using AWK?
123,2006-08-01
124,2007-09-01
125,2009-10-01
126,2009-12-01
Didn't read the question properly the first time. You need a field separator that can be either a dash or a comma. Once you have that you can use the dash as an output field separator (as it's the most common) and fake the comma using concatenation:
awk -F',|-' 'OFS="-" {print $1 "," $4,$3,$2}' file
Pure awk
awk -F"," '{ n=split($2,b,"-");$2=b[3]"-"b[2]"-"b[1];$i=$1","$2 } 1' file
sed
sed -r 's/(^.[^,]*,)([0-9]{2})-([0-9]{2})-([0-9]{4})/\1\4-\3-\2/' file
sed 's/\(^.[^,]*,\)\([0-9][0-9]\)-\([0-9][0-9]\)-\([0-9]\+\)/\1\4-\3-\2/' file
Bash
#!/bin/bash
while IFS="," read -r a b
do
IFS="-"
set -- $b
echo "$a,$3-$2-$1"
done <"file"
Unfortunately, I think standard awk only allows one field separator character so you'll have to pre-process the data. You can do this with tr but if you really want an awk-only solution, use:
pax> echo '123,01-08-2006
124,01-09-2007
125,01-10-2009
126,01-12-2010' | awk -F, '{print $1"-"$2}' | awk -F- '{print $1","$4"-"$3"-"$2}'
This outputs:
123,2006-08-01
124,2007-09-01
125,2009-10-01
126,2010-12-01
as desired.
The first awk changes the , characters to - so that you have four fields separated with the same character (this is the bit I'd usually use tr ',' '-' for).
The second awk prints them out in the order you specified, correcting the field separators at the same time.
If you're using an awk implementation that allows multiple FS characters, you can use something like:
gawk -F ',|-' '{print $1","$4"-"$3"-"$2}'
If it doesn't need to be awk, you could use Perl too:
$ perl -nle 'print "$1,$4-$3-$2" while (/(\d{3}),(\d{2})-(\d{2})-(\d{4})\s*/g)' < file.txt

in place editing using awk

I want to add a line at top of file say f1 using awk.
Is there a better way than the following?
awk 'BEGIN{print "word"};{print $0}' f1 > aux;cp aux f1;\rm aux<br/>
Does awk has something like -i option in sed?
Why not use sed - it would make the solution more straightforward
$sed -i.bak '1i\
word
' <filename>
An alternate way to do this is:
sed -i '1s:^: Word1\nWord2 :' file

Resources