how to append same header data to one header in linux - linux

my data is seperated by comma delimiter
So by talking value before comma as main header column and if the same header occured somewhere elese, apped data into one header by placing open and closed flower brackets
Please consider my example for better understading
Input file data
19,66:BILL
19,34
19,02
21,:0
21,:0
21,:1
21,37
26,:19
26,87
27,35
31,77
31,12
31,202
Output file data
19,{66:BILL}{34}{02}
21,:{0}{:0}{:1}
21,37
26,{:19}{87}
27,35
31,{77}{12}{102}

A solution using awk
$ awk -F, '{a[$1]=a[$1]"{"$2"}"} END{for (i in a) print i FS a[i]}' input.csv
Assuming that the input file contains two columns only, the script constructs an array a by appending the values $2 of all rows with the same index $1 into the same element a[$1]
input.csv
19,66:BILL
19,34
19,02
21,:0
21,:0
21,:1
21,37
26,:19
26,87
27,35
31,77
31,12
31,202
output
19,{66:BILL}{34}{02}
21,{:0}{:0}{:1}{37}
26,{:19}{87}
27,{35}
31,{77}{12}{202}

Related

insert column with same row content to csv in cli

I am having a csv to which I need to add a new column at the end and add a certain string to all rows of the csv in the newly added column.
Example csv:
os,num1,alpha1
Unix,10,A
Linux,30,B
Solaris,40,C
Fedora,20,D
Ubuntu,50,E
I tried using awk command and did not get expected result. I am not sure whether indexing or counting column number is right.
awk -F'[[:null:]]' '$2 && !$1{ $4="NA" }1'
Expected result is:
os,num1,alpha1,code
Unix,10,A,NA
Linux,30,B,NA
Solaris,40,C,NA
Fedora,20,D,NA
Ubuntu,50,E,NA
You can use sed:
sed 's/$/,NA/' db1.csv > db2.csv
then edit the first line containing the column titles.
I'm not quite sure how you came up w/ that awk statement of yours, why you'd think that your file has NUL-terminated lines or that [[:null:]] has become a valid character class ...
The following, however, will do your bidding:
awk 'NR==1{print $0",code"}; NR>1{print $0",NA"}' example.csv
os,num1,alpha1,code
Unix,10,A,NA
Linux,30,B,NA
Solaris,40,C,NA
Fedora,20,D,NA
Ubuntu,50,E,NA

Reformat data using awk

I have a dataset that contains rows of UUIDs followed by locations and transaction IDs. The UUIDs are separated by a semi-colon (';') and the transactions are separated by tabs, like the following:
01234;LOC_1=ABC LOC_1=BCD LOC_2=CDE
56789;LOC_2=DEF LOC_3=EFG
I know all of the location codes in advance. What I want to do is transform this data into a format I can load into SQL/Postgres for analysis, like this:
01234;LOC_1=ABC
01234;LOC_1=BCD
01234;LOC_2=CDE
56789;LOC_2=DEF
56789;LOC_3=EFG
I'm pretty sure I can do this easily using awk (or similar) by looking up location IDs from a file (ex. LOC_1) and matching any instance of the location ID and printing that out next to the UUID. I haven't been able to get it right yet, and any help is much appreciated!
My locations file is named location and my dataset is data. Note that I can edit the original file or write the results to a new file, either is fine.
awk without using split: use semicolon or tab as the field separator
awk -F'[;\t]' -v OFS=';' '{for (i=2; i<=NF; i++) print $1,$i}' file
I don't think you need to match against a known list of locations; you should be able to just print each line as you go:
$ awk '{print $1; split($1,a,";"); for (i=2; i<=NF; ++i) print a[1] ";" $i}' file
01234;LOC_1=ABC
01234;LOC_1=BCD
01234;LOC_2=CDE
56789;LOC_2=DEF
56789;LOC_3=EFG
You comment on knowing the locations and the mapping file makes me suspicious what your example seems to have done isn't exactly what is being asked - but it seems like you're wanting to reformat each set of tab delimited LOC= values into a row with their UUID in front.
If so, this will do the trick:
awk ' BEGIN {OFS=FS=";"} {split($2,locs,"\t"); for (n in locs) { print $1,locs[n]}}'
Given:
$ cat -A data.txt
01234;LOC_1=ABC^ILOC_1=BCD^ILOC_2=CDE$
56789;LOC_2=DEF^ILOC_3=EFG$
Then:
$ awk ' BEGIN {OFS=FS=";"} {split($2,locs,"\t"); for (n in locs) { print $1,locs[n]}}' data.txt
01234;LOC_1=ABC
01234;LOC_1=BCD
01234;LOC_2=CDE
56789;LOC_2=DEF
56789;LOC_3=EFG
The BEGIN {OFS=FS=";"} block sets the input and output delimiter to ;.
For each row, we then split the second field into an array named locs, splitting on tab, via - split($2,locs,"\t")
And then loop through locs printing the UUID and each loc value - for (n in locs) { print $1,locs[n]}
How about without loop or without split one as follows.(considering that Input_file is same as shown samples only)
awk 'BEGIN{FS=OFS=";"}{gsub(/[[:space:]]+/,"\n"$1 OFS)} 1' Input_file
This might work for you (GNU sed):
sed -r 's/((.*;)\S+)\s+(\S+)/\1\n\2\3/;P;D' file
Repeatedly replace the white space between locations with a newline, followed by the UUID and a ;, printing/deleting each line as it appears.

Getting output from many lines in one single table

I have a file as below:
number=49090005940;
NUMBER TRANSLATION DATA
NUMBER DATA
NUMBER TYPE SUBCOND
49090005940 IN
NUMPRE
1117230111
END
number=49090005942;
NUMBER TRANSLATION DATA
NUMBER DATA
NUMBER TYPE SUBCOND
49090005942 IN
NUMPRE
1117230111
END
I want to have an output with "NUMBER, TYPE, and NUMPRE as below:
NUMBER=49090005940; TYPE=IN; NUMPRE=1117230111;
NUMBER=49090005942; TYPE=IN; NUMPRE=1117230111;
This is a mouthful, but it works.
awk '/number=/{split($0, a, "[=;]"); nump=a[2]} nextrec==1 && /[^ ]/{num=$0; nextrec=0} /NUMPRE/{nextrec=1} $1==nump{ty=$2} /END/{print "NUMBER="num"; TYPE="ty"; NUMPRE="nump";"}' infile
Here awk:
If it finds a record matching "number=" (/number=/ then it splits the record by an equal sign or a semicolon and stores it in array a (split($0, a, "[=;]");). It then puts the second element of the array into variable nump (nump=a[2]).
It looks for a line containing the word NUMPRE (/NUMPRE/) if it finds it, it sets variable nextrec to 1 (nextrec=1).
If nextrec is set to 1 and the record contains no spaces (nextrec==1 && /[^ ]/) then set variable num to the line (num=$0).
If the line starts with what is stored in nump ($1==nump) then store the second field of that record in variable ty (ty=$2).
Finally if we hit a record containing END (/END/) then print the output desired(print "NUMBER="num"; TYPE="ty"; NUMPRE="nump";")
With GNU awk for multi-char RS:
$ awk -v RS='\n\n\n' -v OFS='; ' -v ORS=';\n' '{print $7"="$10, $8"="$11, $12"="$13 }' file
NUMBER=49090005940; TYPE=IN; NUMPRE=1117230111;
NUMBER=49090005942; TYPE=IN; NUMPRE=1117230111;

How to update a flat file using the same flat file in bash (Linux)?

I have a flat file separated by | that I want to update from information already inside of the flat file. I want to fill the third field using information from the first and second. From the first field I want to ignore the last two numbers when using that data to compare against the data missing the third field. When matching against the second field I want it to be exact. I do not want to create a new flat file. I want to update the existing file. I researched a way to pull out the first two fields from the file but I do not know if that will even be helpful for the goal I am trying to achieve. To sum all of that up, I want to compare the first and second fields to other fields in the file to pull the third field that may be missing on some of the lines on the flat file.
awk -F'|' -v OFS='|' '{sub(/[0-9 ]+$/,"",$1)}1 {print $1 "\t" $2}' tstfile
first field|second field|third field
Original intput:
t1ttt01|/a1
t1ttt01|/b1
t1ttt01|/c1
t1ttt03|/a1|1
t1ttt03|/b1|1
t1ttt03|/c1|1
l1ttt03|/a1|3
l1ttt03|/b1|3
l1ttt03|/c1|3
What it should do:
t1ttt03|/a1|1 = t1ttt01|/a1
when comparing t1ttt|/a1| = t1ttt|/a1
Therefore
t1ttt01|/a1 becomes t1ttt01|/a1|/1
What I want the Output to look like:
t1ttt01|/a1|1
t1ttt01|/b1|1
t1ttt01|/c1|1
t1ttt03|/a1|1
t1ttt03|/b1|1
t1ttt03|/c1|1
l1ttt03|/a1|3
l1ttt03|/b1|3
l1ttt03|/c1|3
One way with awk:
awk '
# set the input and output field separator to "|"
BEGIN{FS=OFS="|"}
# Do this action when number of fields on a line is 3 for first file only. The
# action is to strip the number portion from first field and store it as a key
# along with the second field. The value of this should be field 3
NR==FNR&&NF==3{sub(/[0-9]+$/,"",$1);a[$1$2]=$3;next}
# For the second file if number of fields is 2, store the line in a variable
# called line. Validate if field 1 (without numbers) and 2 is present in
# our array. If so, print the line followed by "|" followed by value from array.
NF==2{line=$0;sub(/[0-9]+$/,"",$1);if($1$2 in a){print line OFS a[$1$2]};next}1
' file file
Test:
$ cat file
t1ttt01|/a1
t1ttt01|/b1
t1ttt01|/c1
t1ttt03|/a1|1
t1ttt03|/b1|1
t1ttt03|/c1|1
l1ttt03|/a1|3
l1ttt03|/b1|3
l1ttt03|/c1|3
$ awk 'BEGIN{FS=OFS="|"}NR==FNR&&NF==3{sub(/[0-9]+$/,"",$1);a[$1$2]=$3;next}NF==2{line=$0;sub(/[0-9]+$/,"",$1);if($1$2 in a){print line OFS a[$1$2]};next}1' file file
t1ttt01|/a1|1
t1ttt01|/b1|1
t1ttt01|/c1|1
t1ttt03|/a1|1
t1ttt03|/b1|1
t1ttt03|/c1|1
l1ttt03|/a1|3
l1ttt03|/b1|3
l1ttt03|/c1|3

in unix system, replace string in file, with variouse position

I work on unix server.
I have many csv files containing, among other info, date fields.
I have to replace some of this date fields with another value, for example 20110915 to 20110815. Their position is variable from a file to another.
The problem is that the substitution is position field specific. For example, if my file has row like this:
blablabla;12;0.2121;20110915;20110915;19951231;popopo;other text;321;20101010
I have to replace only first date fields and not other, transforming row in:
blablabla;12;0.2121;20110815;20110915;19951231;popopo;other text;321;20101010
Is there a way to restrict the replace in file, using some constraints?
Thanks
You can try awk:
awk 'BEGIN {FS=";";OFS=";"} {if($4=="20110915")$4="20110815"; print}' input.csv
How it works:
FS and OFS define the input and output field separators. It compares the fourth field ($4) against 20110915. If it matches, it is changed to 20110815. The line is then printed.
Here is an alternative using gsub in awk:
awk 'BEGIN {FS=";";OFS=";"} {gsub(/20110915/,20110815,$4); print}' input.csv
Here is a method, if you have to substitute in a range of fields/columns (e.g. 4-4):
awk 'BEGIN {FS=";";OFS=";"} {for(i=4;i<=4;i++){gsub(/20110915/,20110815,$i)}; print}' input.csv

Resources