How to print output in table format in shell script - linux

I am new to shell scripting.. I want to disribute all the data of a file in a table format and redirect the output into another file.
I have below input file File.txt
Fruit_label:1 Fruit_name:Apple
Color:Red
Type: S
No.of seeds:10
Color of seeds :brown
Fruit_label:2 fruit_name:Banana
Color:Yellow
Type:NS
I want it to looks like this
Fruit_label| Fruit_name |color| Type |no.of seeds |Color of seeds
1 | apple | red | S | 10 | brown
2 | banana| yellow | NS
I want to read all the data line by line from text file and make the headerlike fruit_label,fruit_name,color,type, no.of seeds, color of seeds and then print all the assigned value in rows.All the above data is different for different fruits for ex. banana dont have seeds so want to keep its row value as blank ..
Can anyone help me here.

Another approach, is a "Decorate & Process" approach. What is "Decorate & Process"? To Decorate is to take the text you have and decorate it with another separator to make field-splitting easier -- like in your case your fields can contain included whitespace along with the ':' separator between the field-names and values. With your inconsistent whitespace around ':' -- that makes it a nightmare to process ... simply.
So instead of worrying about what the separator is, think about "What should the fields be?" and then add a new separator (Decorate) between the fields and then Process with awk.
Here sed is used to Decorate your input with '|' as separators (a second call eliminates the '|' after the last field) and then a simpler awk process is used to split() the fields on ':' to obtain the field-name and field-value where the field-value is simply printed and the field-names are stored in an array. When a duplicate field-name is found -- it is uses as seen variable to designate the change between records, e.g.
sed -E 's/([^:]+:[[:blank:]]*[^[:blank:]]+)[[:blank:]]*/\1|/g' file |
sed 's/|$//' |
awk '
BEGIN { FS = "|" }
{
for (i=1; i<=NF; i++) {
if (split ($i, parts, /[[:blank:]]*:[[:blank:]]*/)) {
if (! n || parts[1] in fldnames) {
printf "%s %s", n ? "\n" : "", parts[2]
delete fldnames
n = 1
}
else
printf " | %s", parts[2]
fldnames[parts[1]]++
}
}
}
END { print "" }
'
Example Output
With your input in file you would have:
1 | Apple | Red | S | 10 | brown
2 | Banana | Yellow | NS
You will also see a "Decorate-Sort-Undecorate" used to sort data on a new non-existent columns of values by "Decorating" your data with a new last field, sorting on that field, and then "Undecorating" to remove the additional field when sorting is done. This allow sorting by data that may be the sum (or combination) of any two columns, etc...

Here is my solution. It is a new year gift, usually you have to demonstrate what you have tried so far and we help you, not do it for you.
Disclaimer some guru will probably come up with a simpler awk version, but this works.
File script.awk
# Remove space prefix
function ltrim(s) { sub(/^[ \t\r\n]+/, "", s); return s }
# Remove space suffix
function rtrim(s) { sub(/[ \t\r\n]+$/, "", s); return s }
# Remove both suffix and prefix spaces
function trim(s) { return rtrim(ltrim(s)); }
# Initialise or reset a fruit array
function array_init() {
for (i = 0; i <= 6; ++i) {
fruit[i] = ""
}
}
# Print the content of the fruit
function array_print() {
# To keep track if something was printed. Yes, print a carriage return.
# To avoid printing a carriage return on an empty array.
printedsomething = 0
for (i = 0; i <= 6; =+i) {
# Do no print if the content is empty
if (fruit[i] != "") {
printedsomething = 1
if (i == 1) {
# The first field must be further split, to remove "Fruit_name"
# Split on the space
split(fruit[i], temparr, / /)
printf "%s", trim(temparr[1])
}
else {
printf " | %s", trim(fruit[i])
}
}
}
if ( printedsomething == 1 ) {
print ""
}
}
BEGIN {
FS = ":"
print "Fruit_label| Fruit_name |color| Type |no.of seeds |Color of seeds"
array_init()
}
/Fruit_label/ {
array_print()
array_init()
fruit[1] = $2
fruit[2] = $3
}
/Color:/ {
fruit[3] = $2
}
/Type/ {
fruit[4] = $2
}
/No.of seeds/ {
fruit[5] = $2
}
/Color of seeds/ {
fruit[6] = $2
}
END { array_print() }
To execute, call awk -f script.awk File.txt
awk processes a file line per line. So the idea is to store fruit information into an array.
Every time the line "Fruit_label:....." is found, print the current fruit and start a new one.
Since each line is read in sequence, you tell awk what to do with each line, based on a pattern.
The patterns are what are enclosed between // characters at the beginning of each section of code.
Difficulty: since the first line contains 2 information on every fruit, and I cut the lines on : char, the Fruit_label will include "Fruit_name".
I.e.: the first line is cut like this: $1 = Fruit_label, $2 = 1 Fruit_name, $3 = Apple
This is why the array_print() function is so complicated.
Trim functions are there to remove spaces.
Like for the Apple, Type: S when split on the : will result in S
If it meets your requirements, please see https://stackoverflow.com/help/someone-answers to accept it.

Related

Split file with multiple delimited entries in some columns into separate lines

I have a very large file with the following basic format, with a number of additional fields:
posA,id1,id2,posB,id3,name,(n additional fields)
1,ENST7,ENSP93,1,ENSG92,Gene1
2,ENST25;ENST76;ENST35,ENSP91;ENSP77;ENSP78,515;544;544,ENSG765,Gene2
3,ENST25;ENST76;ENST35,ENSP91;ENSP77;ENSP78,515;544;544,ENSG765,Gene2
4,ENST54;ENST93,ENSP83;ENSP36,1864;722,ENSG48,Gene3
5,ENST54;ENST93,ENSP83;ENSP36,1864;722,ENSG48,Gene3
6,ENST54;ENST93,ENSP83;ENSP36,1864;722,ENSG48,Gene3
Line one (posA=1) has a single entry for each column, and does not need to be modified. For lines with a variable number of multiple entries for some columns, for the third line (posA=2), the first entry for "id1" (ENST25) is paired with the first entry for "id2" (ENSP91) and the first entry for "posB" (515), and so on, but the columns with a single entry (eg, "posA", "id3", "name") apply to all of the paired entries in columns 2-4. Some fields in addition to columns 2-4 also rarely contain multiple entries.
I want to split the columns with multiple entries into separate lines, while retaining the data from the other columns, like so:
posA,id1,id2,posB,id3,name,(n additional fields)
1,ENST7,ENSP93,1,ENSG92,Gene1
2,ENST25,ENSP91,515,ENSG765,Gene2
2,ENST76,ENSP77,544,ENSG765,Gene2
2,ENST35,ENSP78,544,ENSG765,Gene2
3,ENST25,ENSP91,515,ENSG765,Gene2
3,ENST76,ENSP77,544,ENSG765,Gene2
3,ENST35,ENSP78,544,ENSG765,Gene2
4,ENST54,ENSP83,1864,ENSG48,Gene3
4,ENST93,ENSP36,722,ENSG48,Gene3
...
What is the best approach for this problem?
Thanks!
Taking your example as an example that at most there will be two-compound attributes, then using simple parameter expansion with substring removal, you can accomplish what you intend fairly easily, e.g.
#!/bin/bash
while IFS=, read -r p a1 a2 a3; do
[[ $a1 =~ ';' ]] && {
printf "%s,%s,%s,%s\n" "$p" "${a1%;*}" "${a2%;*}" "$a3"
printf "%s,%s,%s,%s\n" "$p" "${a1#*;}" "${a2#*;}" "$a3"
} || printf "%s,%s,%s,%s\n" "$p" "$a1" "$a2" "$a3"
done < "$1"
Where [[ $a1 =~ ';' ]] checks for a ';' in $a1 and if found then picks off the first attribute in $a1 and $a2 with ${a1%;*} and ${a2%;*}. Then for the second attribute in each, ${a1#*;} and ${a2#*;} are used.
If no ';' is contained in $a1, the attributes are printed unchanged. IFS=, insures the parameters are word-split on ','.
(note: you should add validation that the filename is valid, etc. to your final script. You can also use echo if you like)
Example Use/Output
$ splitattrib.sh file
Pos,Attribute1,Attribute2,Attribute3
1,a,b,-
2,c,e,+
2,d,f,+
the best is break it to three part.
You have 3 line patterns. One has 6 columns. Another has 12, and the last is 9.
6 columns => 1 line
12 columns => 3 lines
9 columns => 2 line
Your 6 columns should not be modified. So reminds 12, and 9. That you can separate them in the if, else if and else. Like:
if( column == 6 ){...}
else if( column == 12 ){...}
else {...}
And here is a Perl one-liner solution:
perl -a -F",|;" -lne '$s=scalar #F;if($s==6){print join ",",#F}elsif($s==12){print join",",#F[0,1,4,7,-2,-1];print join",",#F[0,1,5,8,-2,-1];print join",",#F[0,1,6,9,-2,-1];}else{print join",",#F[0,1,3,5,-2,-1];print join",",#F[0,1,4,6,-2,-1]} ' file
and for you input, the output is:
1,ENST7,ENSP93,1,ENSG92,Gene1
2,ENST25,ENSP91,515,ENSG765,Gene2
2,ENST25,ENSP77,544,ENSG765,Gene2
2,ENST25,ENSP78,544,ENSG765,Gene2
3,ENST25,ENSP91,515,ENSG765,Gene2
3,ENST25,ENSP77,544,ENSG765,Gene2
3,ENST25,ENSP78,544,ENSG765,Gene2
4,ENST54,ENSP83,1864,ENSG48,Gene3
4,ENST54,ENSP36,722,ENSG48,Gene3
5,ENST54,ENSP83,1864,ENSG48,Gene3
5,ENST54,ENSP36,722,ENSG48,Gene3
6,ENST54,ENSP83,1864,ENSG48,Gene3
6,ENST54,ENSP36,722,ENSG48,Gene3
Assume your multiple entries are separated with semicolon ;, here is the awk version to do.
BEGIN {
FS="[,]"
}
{
if ($0 ~ /^[0-9].*/) {
end_split_field = 0
for (f=2;f<=NF;f++) {
if ($f ~ /.*;.*/) {
end_split_field=f
}
}
if (end_split_field == 0) {
print $0
} else {
for (f=2;f<=end_split_field;f++) {
n = split($f, a, ";") #split and return the number
for (i=1;i<=n;i++) {
b[f, i] = a[i]
}
}
for (i=1;i<=n;i++) {
printf $1","
for (j=2;j<=end_split_field;j++) {
printf b[j, i]","
}
for (k=end_split_field;k<NF;k++) {
printf $k","
}
printf $NF"\n"
}
}
} else {
print $0
}
}
Save the content above as input.awk, example input and output
$ cat input
Pos,Attribute1,Attribute2,Attribute3
1,a,b,-
2,c;d,e;f,+
3,g;h;i,j;k;l,-
We can get the split output
$ awk -f input.awk input
Pos,Attribute1,Attribute2,Attribute3
1,a,b,-
2,c,e,+
2,d,f,+
3,g,j,-
3,h,k,-
3,i,l,-

How to split column by matching header?

I'm thinking if there is a way to split the column by matching the header ?
The data looks like this
ID_1 ID_2 ID_3 ID_6 ID_15
value1 0 2 4 7 6
value2 0 4 4 3 8
value3 2 2 3 7 8
I would like to get the columns only on ID_3 & ID_15
ID_3 ID_15
4 6
4 8
3 8
awk can simply separate it if I know the order of the column
However, I have a very huge table and only have a list of ID in hands.
Can I still use awk or there is an easier way in linux ?
The input format isn't well defined, but there are a few simple ways, awk, perl and sqlite.
(FNR==1) {
nocol=split(col,ocols,/,/) # cols contains named columns
ncols=split("vals " $0,cols) # header line
for (nn=1; nn<=ncols; nn++) colmap[cols[nn]]=nn # map names
OFS="\t" # to align output
for (nn=1; nn<=nocol; nn++) printf("%s%s",ocols[nn],OFS)
printf("\n") # output header line
}
(FNR>1) { # read data
for (nn=1; nn<=nocol; nn++) {
if (nn>1) printf(OFS) # pad
if (ocols[nn] in colmap) { printf("%s",$(colmap[ocols[nn]])) }
else { printf "--" } # named column not in data
}
printf("\n") # wrap line
}
$ nawk -f mycols.awk -v col=ID_3,ID_15 data
ID_3 ID_15
4 6
4 8
3 8
Perl, just a variation on the above with some perl idioms to confuse/entertain:
use strict;
use warnings;
our #ocols=split(/,/,$ENV{cols}); # cols contains named columns
our $nocol=scalar(#ocols);
our ($nn,%colmap);
$,="\t"; # OFS equiv
# while (<>) {...} implicit with perl -an
if ($. == 1) { # FNR equiv
%colmap = map { $F[$_] => $_+1 } 0..$#F ; # create name map hash
$colmap{vals}=0; # name anon 1st col
print #ocols,"\n"; # output header
} else {
for ($nn = 0; $nn < $nocol; $nn++) {
print "\t" if ($nn>0);
if (exists($colmap{$ocols[$nn]})) { printf("%s",$F[$colmap{$ocols[$nn]}]) }
else { printf("--") } # named column not in data
}
printf("\n")
}
$ cols="ID_3,ID_15" perl -an mycols.pl < data
That uses an environment variable to skip effort parsing the command line. It needs the perl options -an which set up field-splitting and an input read loop (much like awk does).
And with sqlite (I used v3.11, v3.8 or later is required for useful .import I believe). This uses an in-memory temporary database (name a file if too large for memory, or for a persistent copy of the parsed data), and automatically creates a table based on the first line. The advantages here are that you might not need any scripting at all, and you can perform multiple queries on your data with just one parse overhead.
You can skip this next step if you have a single hard-tab delimiting the columns, in which case replace .mode csv with .mode tab in the sqlite example below.
Otherwise, to convert your data to a suitable CSV-ish format:
nawk -v OFS="," '(FNR==1){$0="vals " $0} {$1=$1;print} < data > data.csv
This adds a dummy first column "vals" to the first line, then prints each line as comma-separated, it does this by a seemingly pointless assignment to $1, but this causes $0 to be recomputed replacing FS (space/tab) with OFS (comma).
$ sqlite3
sqlite> .mode csv
sqlite> .import data.csv mytable
sqlite> .schema mytable
CREATE TABLE mytable(
"vals" TEXT,
"ID_1" TEXT,
"ID_2" TEXT,
"ID_3" TEXT,
"ID_6" TEXT,
"ID_15" TEXT
);
sqlite> select ID_3,ID_15 from mytable;
ID_3,ID_15
4,6
4,8
3,8
sqlite> .mode column
sqlite> select ID_3,ID_15 from mytable;
ID_3 ID_15
---------- ----------
4 6
4 8
3 8
Use .once or .output to send output to a file (sqlite docs). Use .headers on or .headers off as required.
sqlite is quite happy to create an unnamed column, so you don't have to add a name to the first column of the header line, but you do need to make sure the number of columns is the same for all input lines and formats.
If you get "expected X columns but found Y" errors during the .import then you'll need to clean up the data format a little for this.
$ cat c.awk
NR == 1 {
for (i=1; i<=NF; ++i) {
if ($i == "ID_3") col_3 = (i + 1)
if ($i == "ID_15") col_15 = (i + 1)
}
print "ID_3", "ID_15"
}
NR > 1 { print $col_3, $col_15 }
$ awk -f c.awk c.txt
ID_3 ID_15
4 6
4 8
3 8
You could go for something like this:
BEGIN {
keys["ID_3"]
keys["ID_15"]
}
NR == 1 {
for (i = 1; i <= NF; ++i)
if ($i in keys) cols[++n] = i
}
{
for (i = 1; i <= n; ++i)
printf "%s%s", $(cols[i]+(NR>1)), (i < n ? OFS : ORS)
}
Save the script to a file and run it like awk -f script.awk file.
Alternatively, as a "one-liner":
awk 'BEGIN { keys["ID_3"]; keys["ID_15"] }
NR == 1 { for (i = 1; i <= NF; ++i) if ($i in keys) cols[++n] = i }
{ for (i = 1; i <= n; ++i) printf "%s%s", $(cols[i]+(NR>1)), (i < n ? OFS : ORS) }' file
Before the file is processed, keys are set in the keys array, corresponding to the column headings of interest.
On the first line, record all the column numbers that contain one of the keys in the cols array.
Loop through each of the cols and print them out, followed by either the output field separator OFS or the output record separator ORS, depending on whether it's the last one. $(cols[i]+(NR>1)) handles the fact that rows after the first have an extra field at the start, because NR>1 will be true (1) for those lines and false (0) for the first line.
Try below script:
#!/bin/sh
file="$1"; shift
awk -v cols="$*" '
BEGIN{
split(cols,C)
OFS=FS="\t"
getline
split($0,H)
for(c in C){
for(h in H){
if(C[c]==H[h])F[i++]=h
}
}
}
{ l="";for(f in F){l=l $F[f] OFS}print l }
' "$file"
In command line type:
[sumit.gupta#rpm01 ~]$ test.sh filename ID_3 ID_5

AWK to to find first occurrence of string and assign to variable for compare

I have written following line of code which explodes the string by the first occurrence of the string after a delimiter.
echo "$line" | awk -F':' '{ st = index($0,":");print "field1: "$1 "
=> " substr($0,st+1)}';
But I don't want to display it. Want to take both occurrences in variable so I tried the following code
explodetext="$line" | awk -F':' '{ st = index($0,":")}';
Sample data:
id:1
url:http://test.com
Expected OutPUt will be:
key=id
val=1
key=url
val=http://test.com
but not working as expected.Any solution?
Thanks
Your code, expanded:
echo "$line" \
| awk -F':' '
{
st = index($0,":")
print "field1: " $1 " => " substr($0,st+1)
}'
The output of this appears merely to split the line according to the first colon. From the sample data you've provided, it seems that your lines contain two fields, which are separated by the first colon found. This means you can't safely use awk's field separator to find your data (though you can use it for field names), making index() a reasonable approach.
One strategy might be to place your input into an array, for assessment:
#!/usr/bin/awk -f
BEGIN {
FS=":"
}
{
record[$1]=substr($0,index($0,":")+1);
}
END {
if (record["id"] > 0) {
printf("Record ID %d had a value of %s.\n", record["id"], record["url"])
} else {
print "No valid records found."
}
}
I suppose that your text file input.txt is stored in the format as given below:
id:1
url:http://test1.com
You could use the below piece of code, say awkscript, to achieve what you wish to do :
#!/bin/bash
awk '
BEGIN{FS=":"}
{
if ($2 > 0) {
if ( getline > 0){
st = index($0,":")
url = substr($0,st+1);
system("echo Do something with " url);
}
}
}' $1
Run the code as ./awkscript input.txt
Note: I assume that that the input file contains only one id/url pair as you confirmed in your comment.

Transliteration script for linux shell

I have multiple .txt files containing text in an alphabet; I want to transliterate the text into an other alphabet; some characters of alphabet1 are 1:1 with those of alphabet2 (i.e. a becomes e), whereas others are 1:2 (i.e. x becomes ch).
I would like to do this using a simple script for the Linux shell.
With tr or sed I can convert 1:1 characters:
sed -f y/abcdefghijklmnopqrstuvwxyz/nopqrstuvwxyzabcdefghijklm/
a will become n, b will become o et cetera (a Caesar's cipher, I think)
But how can I deal with 1:2 characters?
Not an answer, just to show a briefer, idiomatic way to populate the table[] array from #konsolebox's answer as discussed in the related comments:
BEGIN {
split("a e b", old)
split("x ch o", new)
for (i in old)
table[old[i]] = new[i]
FS = OFS = ""
}
so the mapping of old to new chars is clearly shown in that the char in the first split() is mapped to the char(s) below it and for any other mapping you want you just need to change the string(s) in the split(), not change 26-ish explicit assignments to table[].
You can even create a general script to do mappings and just pass in the old and new strings as variables:
BEGIN {
split(o, old)
split(n, new)
for (i in old)
table[old[i]] = new[i]
FS = OFS = ""
}
then in shell anything like this:
old="a e b"
new="x ch o"
awk -v o="$old" -v b="$new" -f script.awk file
and you can protect yourself from your own mistakes populating the strings, e.g.:
BEGIN {
numOld = split(o, old)
numNew = split(n, new)
if (numOld != numNew) {
printf "ERROR: #old vals (%d) != #new vals (%d)\n", numOld, numNew | "cat>&1"
exit 1
}
for (i=1; i <= numOld; i++) {
if (old[i] in table) {
printf "ERROR: \"%s\" duplicated at position %d in old string\n", old[i], i | "cat>&2"
exit 1
}
if (newvals[new[i]]++) {
printf "WARNING: \"%s\" duplicated at position %d in new string\n", new[i], i | "cat>&2"
}
table[old[i]] = new[i]
}
}
Wouldn't it be good to know if you wrote that b maps to x and then later mistakenly wrote that b maps to y? The above really is the best way to do this but your call of course.
Here's one complete solution as discussed in the comments below
BEGIN {
numOld = split("a e b", old)
numNew = split("x ch o", new)
if (numOld != numNew) {
printf "ERROR: #old vals (%d) != #new vals (%d)\n", numOld, numNew | "cat>&1"
exit 1
}
for (i=1; i <= numOld; i++) {
if (old[i] in table) {
printf "ERROR: \"%s\" duplicated at position %d in old string\n", old[i], i | "cat>&2"
exit 1
}
if (newvals[new[i]]++) {
printf "WARNING: \"%s\" duplicated at position %d in new string\n", new[i], i | "cat>&2"
}
map[old[i]] = new[i]
}
FS = OFS = ""
}
{
for (i = 1; i <= NF; ++i) {
if ($i in map) {
$i = map[$i]
}
}
print
}
I renamed the table array as map just because iMHO that better represents the purpose of the array.
save the above in a file script.awk and run it as awk -f script.awk inputfile
Using Awk:
#!/usr/bin/awk -f
BEGIN {
FS = OFS = ""
table["a"] = "e"
table["x"] = "ch"
# and so on...
}
{
for (i = 1; i <= NF; ++i) {
if ($i in table) {
$i = table[$i]
}
}
}
1
Usage:
awk -f script.awk file
Test:
# echo "the quick brown fox jumps over the lazy dog" | awk -f script.awk
the quick brown foch jumps over the lezy dog
This can be done quite concisely using a Perl one-liner:
perl -pe '%h=(a=>"xy",c=>"z"); s/(.)/defined $h{$1} ? $h{$1} : $1/eg'
or equivalently (thanks jaypal):
perl -pe '%h=(a=>"xy",c=>"z"); s|(.)|$h{$1}//=$1|eg'
%h is a hash containing the characters (keys) and their substitutions (values). s is the substitution command (as in sed). The g modifier means that the substitution is global and the e means that the replacement part is evaluated as an expression. It captures each character one by one and substitutes them with the value in the hash if it exists, otherwise keeps the original value. The -p switch means that each line in the input is automatically printed.
Testing it out:
$ perl -pe '%h=(a=>"xy",c=>"z"); s|(.)|$h{$1}//=$1|eg' <<<"abc"
xybz
Using sed.
Write a file transliterate.sed containing:
s/a/e/g
s/x/ch/g
and then run from your command line to get the transliterated output.txt from input.txt:
sed -f transliterate.sed input.txt > output.txt
If you need this more often consider adding #!/bin/sed -f as first line and making your file executable with chmod 744 transliterate.sed as described at the Wikipedia page for sed.

Using awk on large txt to extract specific characters of fields

I have a large txt file ("," as delimiter) with some data and string:
2014:04:29:00:00:58:GMT: subject=BMRA.BM.T_GRIFW-1.FPN, message={SD=2014:04:29:00:00:00:GMT,SP=5,NP=3,TS=2014:04:29:01:00:00:GMT,VP=4.0,TS=2014:04:29:01:29:00:GMT,VP=4.0,TS=2014:04:29:01:30:00:GMT,VP=3.0}
2014:04:29:00:00:59:GMT: subject=BMRA.BM.T_GRIFW-2.FPN, message={SD=2014:04:29:00:00:00:GMT,SP=5,NP=2,TS=2014:04:29:01:00:00:GMT,VP=3.0,TS=2014:04:29:01:30:00:GMT,VP=3.0}
I would like to find lines that contain 'T_GRIFW' and then print the $1 field from 'subject' onwards and only the times and floats from $2 onwards. Furthermore, I want to incorporate an if statement so that if field $4 == 'NP=3', only fields $5,$6,$9,$10 are printed after the previous fields and if $4 == 'NP=2', all following fields are printed (times and floats only)
For instance, the result of the two sample lines will be:
subject=BMRA.BM.T_GRIFW-1.FPN,2014:04:29:00:00:00,5,3,2014:04:29:01:00:00,4.0,2014:04:29:01:30:00,3.0
subject=BMRA.BM.T_GRIFW-2.FPN,2014:04:29:00:00:00,5,2,2014:04:29:01:00:00,3.0,2014:04:29:01:30:00,3.0
I know this is complex and I have tried my best to be thorough in my description. The basic code I have thus far is:
awk 'BEGIN {FS=","}{OFS=","} /T_GRIFW-1.FPN/ {print $1}' tib_messages.2014-04-29
THANKS A MILLION!
Here's an awk executable file that'll create your desired output:
#!/usr/bin/awk -f
# use a more complicated FS => field numbers counted differently
BEGIN { FS="=|,"; OFS="," }
$2 ~ /T_GRIFW/ && $8=="NP" {
str="subject=" $2 OFS
# strip ":GMT" from dates and "}" from everywhere
gsub( /:GMT|[\}]/, "")
# append common fields to str with OFS
for(i=5;i<=13;i+=2) str=str $i OFS
# print the remaining fields and line separator
if($9==3) { print str $19, $21 }
else if($9==2) { print str $15, $17 }
}
Placing that in a file called awko and chmod'ing it then running awko data yields:
subject=BMRA.BM.T_GRIFW-1.FPN,2014:04:29:00:00:00,5,3,2014:04:29:01:00:00,4.0,2014:04:29:01:30:00,3.0
subject=BMRA.BM.T_GRIFW-2.FPN,2014:04:29:00:00:00,5,2,2014:04:29:01:00:00,3.0,2014:04:29:01:30:00,3.0
I've placed comments in the script, but here are some things that could be spelled out better:
Using a more complicated FS means you don't have reparse for = to work with the field data
I "cheated" and just hard-coded subject (which now falls at the end of $1) for str
:GMT and } appeared to be the only data that needed to be forcibly removed
With this FS Dates and numbers are two apart from each other but still loop-able
In either final print call, the str already ends in an OFS, so the comma between it and next field can be skipped
If I understand your requirements, the following will work:
BEGIN {
FS=","
OFS=","
}
/T_GRIFW/ {
split($1, subject, " ")
result = subject[2] OFS
delete arr
counter = 1
for (i = 2; i <= NF; i++) {
add = 0
if ($4 == "NP=3") {
if (i == 5 || i == 6 || i == 9 || i == 10) {
add = 1
}
}
else if ($4 == "NP=2") {
add = 1
}
if (add) {
counter = counter + 1
split($i, field, "=")
if (match(field[2], "[0-9]*\.[0-9]+|GMT")) {
arr[counter] = field[2]
}
}
}
for (i in arr) {
gsub(/{|}/,"", arr[i]) # remove curly braces
result = result arr[i] OFS
}
print substr(result, 0, length(result)-1)
}

Resources