Awk script to put value in column basis on another column value - linux

I am trying to use below script to replace column values.
But below data is huge and have around 33000 rows.
so when i run the script i get error "Argument list too long"
Please let me know other way to do it..
if($33="100000000"){$36="EA"}
if($33="100000001"){$36="EA"}
if($33="100000002"){$36="EA"}
if($33="100000003"){$36="EA"}
if($33="100000004"){$36="EA"}
if($33="100000005"){$36="EA"}
if($33="100000006"){$36="EA"}
if($33="100000007"){$36="EA"}
if($33="100000008"){$36="EA"}
if($33="100000009"){$36="EA"}
if($33="100000010"){$36="EA"}
if($33="100000011"){$36="EA"}
if($33="100000012"){$36="EA"}
if($33="100000013"){$36="EA"}
if($33="100000014"){$36="EA"}
if($33="100000015"){$36="EA"}
if($33="100000016"){$36="EA"}
if($33="100000017"){$36="EA"}
if($33="100000018"){$36="EA"}
if($33="100000019"){$36="EA"}
if($33="100000020"){$36="EA"}
sample input file
SourceIifier|SourleName|GntCode|Dision|Suvision|ProfitCe1|Profie2|Plade|Retuiod|SuppliN|DocType|Suppe|Docummber|Docte|Originer|OrigDate|CRDST|LineNumber|CustoN|UINorComposition|OriginaN|Custoame|Custoe|BillTe|Shite|POS|PortCode|ShippingBillNumber|ShippingBillDate|FOB|ExportDuty|HSNorSAC|ProductCode|ProductDescription|Categorduct|UnitOement|Quantity|Taxabue|Integratede|Integratount|Centraate|CentralTt|StaURate|StateUTTaxAmount|CessRateAdvalorem|CessAmountAdvalorem|CessRateSpecific|CessAmountSpecific|Invoalue|ReverseChargeFlag|TCSFlag|eComGSTIN|ITCFlag|ReasonForCreditDebitNote|AccountingVoucmber|Accountinate|Userdefinedfield1|Userdefinedfield2|Userdefinedfield3|Additionalfield1|Additionalfield2|Additlfield3|Additionalfield4|Additionalfield5
SAP|SAP_OSR_INV|||||||date+%m%Y|08AAACT2T|IN|EXPWT|262881626|02.02.2018||||10||||TVVAHALI|1151040011|8|8|8||||||9984|EVD0|EVDCOCOaterial|||0|8.47|0|0|9|0.76|9|0.76|||||||||||1301312397||ZEVD|1210||||||0
SAP|SAP_OSR_INV|||||||date+%m%Y|08AAACZT|IV|EXPWT|2627|02.02.2018||||10||||TVVHALI|1151040011|8|8|8||||||9984|EVD0|EVDCOAMaterial|||0|8.47|0|0|9|0.76|9|0.76|||||||||||130139||ZEVD|1210||||||0
SAP|SAP_OSR_INV|||||||date+%m%Y|08AAAZT|NV|AN|2628|02.02.2018||||20||||TVHVAISHALI|1151040011|8|8|8||||||9984|EVD0|EVDCOCOCDMAMaterial|||0|8.47|0|0|9|0.76|9|0.76|||||||||||13014||ZEVD|1210||||||0
My code :
awk -F"|" -v OFS="|" '{
if($33="100000000"){$36="EA"}
if($33="100000001"){$36="EA"}
if($33="100000002"){$36="EA"}
if($33="100000003"){$36="EA"}
if($33="100000004"){$36="EA"}
if($33="100000005"){$36="EA"}
if($33="100000006"){$36="EA"}
if($33="100000007"){$36="EA"}
if($33="100000008"){$36="EA"}
if($33="100000009"){$36="EA"}
if($33="100000010"){$36="EA"}1' inputfile > outputfile
Here the above code is just sample but in actual has around 33000 rows.
below is Sample Awk code..
BEGIN {
FS="|";
OFS="|";
}
{
if($33="100000000"){$36="EA"}
if($33="100000001"){$36="EA"}
if($33="100000002"){$36="EA"}
if($33="100000003"){$36="EA"}
if($33="100000004"){$36="EA"}
if($33="100000005"){$36="EA"}1 inputfile > outputfile
and called it like below
awk -f script.awk
Below is the error by calling awk script.
awk: fpostp.awk:33445: if($36=="M") {$36="MTR"}} TFinaloutputp7_6_3_d_OYMNC_w.csv > TFinaloutputt_w36.csv
awk: fpostp.awk:33445: ^ syntax error
awk: fpostp.awk:33445: if($36=="M") {$36="MTR"}} TFinaloutputp7_6_3_d_OYMNC_w.csv > TFinaloutputt_w36.csv
awk: fpostp.awk:33445: ^ syntax error
Can't i redirect output in someother file when executing by awk -f script.awk

Programming is not like that, if writing it could get boring there is bound to be another way.
awk -F"|" -v OFS="|" '$33>=100 && $33<200{$36="EA";print} $33>=200 && $33<300{$36="FB";print}' inputfile > outputfile
First awk is a pattern matching language, on rows the pattern outside the curly braces matches, it does what is inside the curly braces.
no need for the if syntax as it is inherent.
The patterns can be compound and awk knows what numbers are without being told (and does math arbitrarily well).
I shortened the values in $33 and made up what and where $36 becomes
bit in general make a statement per change in $36 for ranges of $36
If that is not your goal the question will need some refining.
Edit:
maybe you are masking $36 to a constant based on a arbitrary condition involving
$33 which only you know and there are lots of them ... in a file somewhere.
(I am pretending you have the list isolated in a file named filter.list )
so maybe something like
awk 'FNR==NR{filter[$1]++}FNR!=NR && $33 in filter {$36="EA"}' filter.list inputfile > outputfile
FNR is the File's number row and NR is the overall scripts number row
they are ony equal for the first file,
so using it here to treat the first file differently from the second.

Related

AWK how to process multiple files and comparing them IN CONTROL FILE! (not command line one-liner)

I read all of answers for similar problems but they are not working for me because my files are not uniformal, they contain several control headers and in such case is safer to create script than one-liner and all the answers focused on one-liners. In theory one-liners commands should be convertible to script but I am struggling to achieve:
printing the control headers
print only the records started with 16 in <file 1> where value of column 2 NOT EXISTS in column 2 of the <file 2>
I end up with this:
BEGIN {
FS="\x01";
OFS="\x01";
RS="\x02\n";
ORS="\x02\n";
file1=ARGV[1];
file2=ARGV[2];
count=0;
}
/^#/ {
print;
count++;
}
# reset counters after control headers
NR=1;
FNR=1;
# Below gives syntax error
/^16/ AND NR==FNR {
a[$2];next; 'FNR==1 || !$2 in a' file1 file2
}
END {
}
Googling only gives me results for command line processing and documentation is also silent in that regard. Does it mean it cannot be done?
Perhaps try:
script.awk:
BEGIN {
OFS = FS = "\x01"
ORS = RS = "\x02\n"
}
NR==FNR {
if (/^16/) a[$2]
next
}
/^16/ && !($2 in a) || /^#/
Note the parentheses: !$2 in a would be parsed as (!$2) in a
Invoke with:
awk -f script.awk FILE2 FILE1
Note order of FILE1 / FILE2 is reversed; FILE2 must be read first to pre-populate the lookup table.
First of all, short answer to my question should be "NOT POSSIBLE", if anyone read question carefully and knew AWK in full that is obvious answer, I wish I knew it sooner instead of wasting few days trying to write script.
Also, there is no such thing as minimal reproducible example (this was always constant pain on TeX groups) - I need full example working, if it works on 1 row there is no guarantee if it works on 2 rows and my number of rows is ~ 127 mln.
If you read code carefully than you would know what is not working - I put in comment section what is giving syntax error. Anyway, as #Daweo suggested there is no way to use logic operator in pattern section. So because we don't need printing in first file the whole trick is to do conditional in second brackets:
awk -F, 'BEGIN{} NR==FNR{a[$1];next} !($1 in a) { if (/^16/) print $0} ' set1.txt set2.txt
assuming in above example that separator is comma. I don't know where assumption about multiple RS support only in gnu awk came from. On MacOS BSD awk it works exactly the same, but in fact RS="\x02\n" is single separator not two separators.

Using awk to delete multiple lines using argument passed on via function

My input.csv file is semicolon separated, with the first line being a header for attributes. The first column contains customer numbers. The function is being called through a script that I activate from the terminal.
I want to delete all lines containing the customer numbers that are entered as arguments for the script. EDIT: And then export the file as a different file, while keeping the original intact.
bash deleteCustomers.sh 1 3 5
Currently only the last argument is filtered from the csv file. I understand that this is happening because the output file gets overwritten each time the loop runs, restoring all previously deleted arguments.
How can I match all the lines to be deleted, and then delete them (or print everything BUT those lines), and then output it to one file containing ALL edits?
delete_customers () {
echo "These customers will be deleted: "$#""
for i in "$#";
do
awk -F ";" -v customerNR=$i -v input="$inputFile" '($1 != customerNR) NR > 1 { print }' "input.csv" > output.csv
done
}
delete_customers "$#"
Here's some sample input (first piece of code is the first line in the csv file). In the output CSV file I want the same formatting, with the lines for some customers completely deleted.
Klantnummer;Nationaliteit;Geslacht;Title;Voornaam;MiddleInitial;Achternaam;Adres;Stad;Provincie;Provincie-voluit;Postcode;Land;Land-voluit;email;gebruikersnaam;wachtwoord;Collectief ;label;ingangsdatum;pakket;aanvullende verzekering;status;saldo;geboortedatum
1;Dutch;female;Ms.;Josanne;S;van der Rijst;Bliek 189;Hellevoetsluis;ZH;Zuid-Holland;3225 XC;NL;Netherlands;JosannevanderRijst#dayrep.com;Sourawaspen;Lae0phaxee;Klant;CZ;11-7-2010;best;tand1;verleden;-137;30-12-1995
2;Dutch;female;Mrs.;Inci;K;du Bois;Castorweg 173;Hengelo;OV;Overijssel;7557 KL;NL;Netherlands;InciduBois#gustr.com;Hisfireeness;jee0zeiChoh;Klant;CZ;30-8-2015;goed ;geen;verleden;188;1-8-1960
3;Dutch;female;Mrs.;Lusanne;G;Hijlkema;Plutostraat 198;Den Haag;ZH;Zuid-Holland;2516 AL;NL;Netherlands;LusanneHijlkema#dayrep.com;Digum1969;eiTeThun6th;Klant;Achmea;12-2-2010;best;mix;huidig;-335;9-3-1973
4;Dutch;female;Dr.;Husna;M;Hoegee;Tiendweg 89;Ameide;ZH;Zuid-Holland;4233 VW;NL;Netherlands;HusnaHoegee#fleckens.hu;Hatimon;goe5OhS4t;Klant;VGZ;9-8-2015;goed ;gezin;huidig;144;12-8-1962
5;Dutch;male;Mr.;Sieds;D;Verspeek;Willem Albert Scholtenstraat 38;Groningen;GR;Groningen;9711 XA;NL;Netherlands;SiedsVerspeek#armyspy.com;Thade1947;Taexiet9zo;Intern;CZ;17-2-2004;beter;geen;verleden;-49;12-10-1961
6;Dutch;female;Ms.;Nazmiye;R;van Spronsen;Noorderbreedte 180;Amsterdam;NH;Noord-Holland;1034 PK;NL;Netherlands;NazmiyevanSpronsen#jourrapide.com;Whinsed;Oz9ailei;Intern;VGZ;17-6-2003;beter;mix;huidig;178;8-3-1974
7;Dutch;female;Ms.;Livia;X;Breukers;Everlaan 182;Veenendaal;UT;Utrecht;3903
Try this in loop..
awk -v variable=$var '$1 != variable' input.csv
awk - to make decision based on columns
-v - to use a variable into a awk command
variable - store the value for awk to process
$var - to search for a specific string in run-time
!= - to check if not exist
input.csv - your input file
It's awk's behavior, when you use -v it can will work with variable on run-time and provide an output that doesn't contain the value you passed. This way, you get all the values that are not matching to your variable. Hope this is helpful. :)
Thanks
This bash script should work:
!/bin/bash
FILTER="!/(^"$(echo "$#" | sed -e "s/ /\|^/g")")/ {print}"
awk "$FILTER" input.csv > output.csv
The idea is to build an awk relevant FILTER and then use it.
Assuming the call parameters are: 1 2 3, the filter will be: !/(^1|^2|^3)/ {print}
!: to invert matching
^: Beginning of the line
The input data are in the input.csv file and output result will be in the output.csv file.

Use sed to find and replace a number following by its successor in bash

I have a string that contains multiple occurrences of number ranges, which are separated by a comma, e.g.,
2-12,59-89,90-102,103-492,593-3990,3991-4930
Now I would like to remove all directly neighbouring ranges and remove them from the string, i.e., remove anything that is of the form -(x),(x+1), to get something like this:
2-12,59-492,593-4930
Can anyone think of a method to accomplish this? I can honestly not post anything that I have tried, because all my tries were highly unsuccessful. To me it seems like it is not possible to actually find anything of the form -(x),(x+1) using sed, since that would require doing operations or comparisons of a found number by another number that has to be part of the command that is currently searching for numbers.
If everybody agrees that sed is NOT the correct tool for doing this, I will do it another way, but I am still interested if it's possible.
with awk
awk -F, -v RS="-" -v ORS="-" '$2!=$1+1' file
with appropriate separator setting, print the record when second field is not +1.
RS is the record separator and ORS is the outpout record separator.
test:
> awk -F, -v RS="-" -v ORS="-"
'$2!=$1+1' <<< "2-12,59-89,90-102,103-492,593-3990,3991-4930"
2-12,59-492,593-4930
awk solution:
awk -F'-' '{ r=$1;
for (i=2; i<=NF; i++) {
split($i, a, ",");
r=sprintf("%s%s", r, a[2]-a[1]==1? "" : FS $i)
}
print r
}' file
-F'-' - treat -(hyphen) as field separator
r - resulting string
split($i, a, ",") - split adjacent range boundaries into array a by separator ,
a[2]-a[1]==1 - crucial condition, reflects (x),(x+1)
The output:
2-12,59-492,593-4930
This might work for you (GNU sed):
sed -r ' s/^/\n/;:a;ta;s/\n([^-]*-)([0-9]*)(.*,)/\1\n\2\n\2\n\3/;Td;:b;s/(\n.*\n.*)9(_*\n)/\1_\2/;tb;s/(\n.*\n)(_*\n)/\10\2/;s/$/\n0123456789/;s/(\n.*\n[0-9]*)([0-8])(_*\n.*)\n.*\2(.).*/\1\4\3/;:z;tz;s/(\n.*\n[^_]*)_([^\n]*\n)/\10\2/;tz;:c;tc;s/([0-9]*-)\n(.*)\n(.*)\n,(\3)-/\n\1/;ta;s/\n(.*)\n.*\n,/\1,\n/;ta;:d;s/\n//g' file
This proof-of-concept sed solution, iteratively increments and compares the end of one range with the start of another. If the comparison is true it removes both and repeats, otherwise it moves on to the next range and repeats until all ranges have been compared.

Awk using index with Substring

I have one command to cut string.
I wonder detail of control index of command in Linux "awk"
I have two different case.
I want to get word "Test" in below example string.
1. "Test-01-02-03"
2. "01-02-03-Test-Ref1-Ref2
First one I can get like
substr('Test-01-02-03',0,index('Test-01-02-03',"-"))
-> Then it will bring result only "test"
How about Second case I am not sure how can I get Test in that case using index function.
Do you have any idea about this using awk?
Thanks!
This is how to use index() to find/print a substring:
$ cat file
Test-01-02-03
01-02-03-Test-Ref1-Ref2
$ awk -v tgt="Test" 's=index($0,tgt){print substr($0,s,length(tgt))}' file
Test
Test
but that may not be the best solution for whatever your actual problem is.
For comparison here's how to do the equivalent with match() for an RE:
$ awk -v tgt="Test" 'match($0,tgt){print substr($0,RSTART,RLENGTH)}' file
Test
Test
and if you like the match() synopsis, here's how to write your own function to do it for strings:
awk -v tgt="Test" '
function strmatch(source,target) {
SSTART = index(source,target)
SLENGTH = length(target)
return SSTART
}
strmatch($0,tgt){print substr($0,SSTART,SLENGTH)}
' file
If these lines are the direct input to awk then the following work:
echo 'Test-01-02-03' | awk -F- '{print $1}' # First field
echo '01-02-03-Test-Ref1-Ref2' | awk -F- '{print $NF-2}' # Third field from the end.
If these lines are pulled out of a larger line in an awk script and need to be split again then the following snippets will do that:
str="Test-01-02-03"; split(str, a, /-/); print a[1]
str="01-02-03-Test-Ref1-Ref2"; numfields=split(str, a, /-/); print a[numfields-2]

sed - pass match to external command

I have written a little script using sed to transform this:
kaefert#Ultrablech ~ $ cat /sys/class/power_supply/BAT0/uevent
POWER_SUPPLY_NAME=BAT0
POWER_SUPPLY_STATUS=Full
POWER_SUPPLY_PRESENT=1
POWER_SUPPLY_TECHNOLOGY=Li-ion
POWER_SUPPLY_CYCLE_COUNT=0
POWER_SUPPLY_VOLTAGE_MIN_DESIGN=7400000
POWER_SUPPLY_VOLTAGE_NOW=8370000
POWER_SUPPLY_POWER_NOW=0
POWER_SUPPLY_ENERGY_FULL_DESIGN=45640000
POWER_SUPPLY_ENERGY_FULL=44541000
POWER_SUPPLY_ENERGY_NOW=44541000
POWER_SUPPLY_MODEL_NAME=UX32-65
POWER_SUPPLY_MANUFACTURER=ASUSTeK
POWER_SUPPLY_SERIAL_NUMBER=
into a csv file format like this:
kaefert#Ultrablech ~ $ Documents/Asus\ Zenbook\ UX32VD/power_to_csv.sh
"date";"status";"voltage µV";"power µW";"energy full µWh";"energy now µWh"
2012-07-30 11:29:01;"Full";8369000;0;44541000;44541000
2012-07-30 11:29:02;"Full";8369000;0;44541000;44541000
2012-07-30 11:29:04;"Full";8369000;0;44541000;44541000
... (in a loop)
What I would like now is to divide each of those numbers by 1.000.000 so that they don't represent µV but V and W instead of µW, so that they are easily interpretable on a quick glance. Of course I could do this manually afterwards once I've opened this csv inside libre office calc, but I would like to automatize it.
So what I found is, that I can call external programs in between sed, like this:
...
s/\nPOWER_SUPPLY_PRESENT=1\nPOWER_SUPPLY_TECHNOLOGY=Li-ion\nPOWER_SUPPLY_CYCLE_COUNT=0\nPOWER_SUPPLY_VOLTAGE_MIN_DESIGN=7400000\nPOWER_SUPPLY_VOLTAGE_NOW=\([0-9]\{1,\}\)/";'`echo 0`'\1/
and that I could get values like I want by something like this:
echo "scale=6;3094030/1000000" | bc | sed 's/0\{1,\}$//'
But the problem now is, how do I pass my match "\1" into the external command?
If you are interested in looking at the full script, you'll find it there:
http://koega.no-ip.org/mediawiki/index.php/Battery_info_to_csv
if your sed is GNU sed. you can use 'e' to pass matched group to external command/tools within sed command.
an example might be helpful to make it clear:
say, you have a problem:
you have a string "120+20foobar" now you want to get the calculation result of 120+20 part, and replace "oo" to "xx" in "foobar" part.
Note that this example is not for solving the problem above, just for
showing the sed 'e' usage
so you could make 120+20 in the first match group, and rest in 2nd group, then pass two groups to different command/tools and then get the result. like:
kent$ echo "100+20foobar"|sed -r 's#([0-9+]*)(.*)#echo \1 \|bc\;echo \2 \| sed "s/oo/xx/g"#ge'
120
fxxbar
in this way, you could nest many seds one in another one, till you get lost. :D
As sed doesn't do arithmetic on its own I would recommend using awk for something like this, e.g. to divide 3rd, 5th and 6th field by a million do something like this:
awk -F';' -v OFS=';' '
NR == 1
NR != 1 {
$3 /= 1e6
$5 /= 1e6
$6 /= 1e6
print
}'
Explanation
-F';' and -v OFS=';' specify the input and output field separator.
NR == 1 pass first line through without change.
NR != 1 if it is not the first line, divide and print.
To divide by 1,000,000 directly, you do so :
Q='3094030/1000000'
sed ':r /^[[:digit:]]\{7\}/{s$\([[:digit:]]*\)\([[:digit:]]\{6\}\)/1000000$\1.\2$;p;d};s:^:0:;br;d'

Resources