How can i convert this format string to CSV? - linux

my string is:
AA:: aaaaaaaaaaaaaaaaa
BB:: bbbbbbbbbbbbbbbb
C: ccccccccccccccccc
DD:: DDDDDDDDDDDD
E: EEEEEEEEEEEEE
AA: aaaaaaaaaaaaaaaaa2
BB:: bbbbbbbbbbbbbbbb2
C:: ccccccccccccccccc2
DD: DDDDDDDDDDDD2
E: EEEEEEEEEEEEE
....
i need get this format with standard linux command like awk or ... or perl function
AA,BB,C,DD,E
aaaaaaaa,bbbbbb,ccccc,dddddd,eeeee
aaaaaaaa2,bbbbbb2,ccccc2,dddddd2,eeeee2
exm: OUTPUT_STRING | awk ....
or perlFunction(OUTPUT_STRING){ .....
return formated_string; }
i searched google and try many help on more site and not work, so dont send me a link
some field have single : and some field have double : (this is random)
i try some help and not worked for me
sed -r 's/\\,|,|CN=|OU*//g' |awk -F "|=|:" '{printf $2"|"}'
or
sed -n '1h; 2,$H;${g;s/\n/,/g;p}' | sed 's/,,/\n/g'
or
awk -F ":" '{printf $2} {if (NF==0) {printf "\n"}}' | sed "s/ //" | sed "s/ /;/g"

One of many ways to achieve desired result
use strict;
use warnings;
my $file = do { local $/; <DATA> }; # read whole file
my #blocks = split /\n\n/, $file; # split file into blocks
my $print_header = 1; # flag to print header
foreach my $block (#blocks) { # process each block
$block =~ s/:+/:/g; # clean up the block :: -> :
my #lines = split /\n/, $block; # split the block into lines
my(#header,#data); # arrays to store header and data
foreach my $line (#lines) { # process each line
my($h,$d) = split /:\s*/, $line; # split line into header and data part
push #header, $h; # add header names into array
push #data, $d; # add data into array
}
if( $print_header ){ # if header not printed yet
print join(',', #header) . "\n"; # print header array
$print_header = 0; # flag the header is printed
}
print join(',', #data) . "\n"; # print data array
}
__DATA__
AA:: aaaaaaaaaaaaaaaaa
BB:: bbbbbbbbbbbbbbbb
C: ccccccccccccccccc
DD:: DDDDDDDDDDDD
E: EEEEEEEEEEEEE
AA: aaaaaaaaaaaaaaaaa2
BB:: bbbbbbbbbbbbbbbb2
C:: ccccccccccccccccc2
DD: DDDDDDDDDDDD2
E: EEEEEEEEEEEEE2
output
AA,BB,C,DD,E
aaaaaaaaaaaaaaaaa,bbbbbbbbbbbbbbbb,ccccccccccccccccc,DDDDDDDDDDDD,EEEEEEEEEEEEE
aaaaaaaaaaaaaaaaa2,bbbbbbbbbbbbbbbb2,ccccccccccccccccc2,DDDDDDDDDDDD2,EEEEEEEEEEEEE2

This gnu awk should do:
awk -v RS='' -F':* ?|\n' 'NR==1{print $1","$3","$5","$7","$9} {print $2","$4","$6","$8","$10}' t
AA,BB,C,DD,E
aaaaaaaaaaaaaaaaa,bbbbbbbbbbbbbbbb,ccccccccccccccccc,DDDDDDDDDDDD,EEEEEEEEEEEEE
aaaaaaaaaaaaaaaaa2,bbbbbbbbbbbbbbbb2,ccccccccccccccccc2,DDDDDDDDDDDD2,EEEEEEEEEEEEE
RS='' set record selector to nothing so awk works in block mode.
-F':* ?|\n' Sets field separator to : or :: or newline
NR==1{print $1","$3","$5","$7","$9} for first line print the header
{print $2","$4","$6","$8","$10} print the data fields.
A more generic solution that should work with more fields:
awk -v RS='' -F':* ?|\n' 'NR==1{for(i=1;i<=NF-2;i+=2) printf "%s,",$i;print $i} {for(i=2;i<=NF-2;i+=2) printf "%s,",$i;print $i}' file
AA,BB,C,DD,E
aaaaaaaaaaaaaaaaa,bbbbbbbbbbbbbbbb,ccccccccccccccccc,DDDDDDDDDDDD,EEEEEEEEEEEEE
aaaaaaaaaaaaaaaaa2,bbbbbbbbbbbbbbbb2,ccccccccccccccccc2,DDDDDDDDDDDD2,EEEEEEEEEEEEE
PS If not all record have all ID, then its an whole other story to program.

Using Text::CSV to handle edge cases:
use strict;
use warnings;
use Text::CSV 'csv';
my $input = do { local $/; readline }; # input from STDIN or filename argument
my #aoh;
my %headers;
foreach my $block (split /\n\n+/, $input) {
my %row;
foreach my $line (split /^/, $block) {
if ($line =~ m/^([^:]+):+\s*(.*)$/) {
$row{$1} = $2;
$headers{$1} = 1;
}
}
push #aoh, \%row;
}
csv(in => \#aoh, out => *STDOUT, headers => [sort keys %headers],
encoding => 'UTF-8', auto_diag => 2);

Related

Using sed on line break element

Hello let say I have a file such as :
$OUT some text
some text
some text
$OUT
$OUT
$OUT
how can I use sed in order to replace the 3 $OUT into "replace-thing" ?
and get
$OUT some text
some text
some text
replace-thing
With sed:
sed -n '1h; 1!H; ${g; s/\$OUT\n\$OUT\n\$OUT/replace-thing/g; p;}' file
GNU sed does not require the semicolon after p.
With commentary
sed -n ' # without printing every line:
# next 2 lines read the entire file into memory
1h # line 1, store current line in the hold space
1!H # not line 1, append a newline and current line to hold space
# now do the search-and-replace on the file contents
${ # on the last line:
g # replace pattern space with contents of hold space
s/\$OUT\n\$OUT\n\$OUT/replace-thing/g # do replacement
p # and print the revised contents
}
' file
This is the main reason I only use sed for very simple things: once you start using the lesser-used commands, you need extensive commentary to understand the program.
Note the commented version does not work on the BSD-derived sed on MacOS -- the comments break it, but removing them is OK.
In plain bash:
pattern=$'$OUT\n$OUT\n$OUT' # using ANSI-C quotes
contents=$(< file)
echo "${contents//$pattern/replace-thing}"
And the perl one-liner:
perl -0777 -pe 's/\$OUT(\n\$OUT){2}/replace-thing/g' file
for this particular task, I recommend to use awk instead. (hope that's an option too)
Update: to replace all 3 $OUT use: (Thanks to #thanasisp and #glenn jackman)
cat input.txt | awk '
BEGIN {
i = 0
p = "$OUT" # Pattern to match
n = 3 # N matches
r = "replace-thing"
}
$0 == p {
++i
if(i == n){
print(r)
i = 0 #reset counter (optional)
}
}
$0 != p {
i = 0
print($0)
}'
If you just want to replace the 3th $OUT usage, use:
cat input.txt | awk '
BEGIN {
i = 0
p = "\\$OUT" # Pattern to match
n = 3 # Nth match
r = "replace-thing"
}
$0 ~ p {
++i
if(i == n){
print(r)
}
}
i <= n || $0 !~ p {
print($0)
}'
This might work for you (GNU sed):
sed -E ':a;N;s/[^\n]*/&/3;Ta;/^(\$OUT\n?){3}$/d;P;D' file
Gather up 3 lines in the pattern space and if those 3 lines each contain $OUT, delete them. Otherwise, print/delete the first line and repeat.

AWK to to find first occurrence of string and assign to variable for compare

I have written following line of code which explodes the string by the first occurrence of the string after a delimiter.
echo "$line" | awk -F':' '{ st = index($0,":");print "field1: "$1 "
=> " substr($0,st+1)}';
But I don't want to display it. Want to take both occurrences in variable so I tried the following code
explodetext="$line" | awk -F':' '{ st = index($0,":")}';
Sample data:
id:1
url:http://test.com
Expected OutPUt will be:
key=id
val=1
key=url
val=http://test.com
but not working as expected.Any solution?
Thanks
Your code, expanded:
echo "$line" \
| awk -F':' '
{
st = index($0,":")
print "field1: " $1 " => " substr($0,st+1)
}'
The output of this appears merely to split the line according to the first colon. From the sample data you've provided, it seems that your lines contain two fields, which are separated by the first colon found. This means you can't safely use awk's field separator to find your data (though you can use it for field names), making index() a reasonable approach.
One strategy might be to place your input into an array, for assessment:
#!/usr/bin/awk -f
BEGIN {
FS=":"
}
{
record[$1]=substr($0,index($0,":")+1);
}
END {
if (record["id"] > 0) {
printf("Record ID %d had a value of %s.\n", record["id"], record["url"])
} else {
print "No valid records found."
}
}
I suppose that your text file input.txt is stored in the format as given below:
id:1
url:http://test1.com
You could use the below piece of code, say awkscript, to achieve what you wish to do :
#!/bin/bash
awk '
BEGIN{FS=":"}
{
if ($2 > 0) {
if ( getline > 0){
st = index($0,":")
url = substr($0,st+1);
system("echo Do something with " url);
}
}
}' $1
Run the code as ./awkscript input.txt
Note: I assume that that the input file contains only one id/url pair as you confirmed in your comment.

file manipulation with command line tools on linux

I want to transform a file from this format
1;a;34;34;a
1;a;34;23;d
1;a;34;23;v
1;a;4;2;r
1;a;3;2;d
2;f;54;3;f
2;f;34;23;e
2;f;23;5;d
2;f;23;23;g
3;t;26;67;t
3;t;34;45;v
3;t;25;34;h
3;t;34;23;u
3;t;34;34;z
to this format
1;a;34;34;a;34;23;d;34;23;v;4;2;r;3;2;d
2;f;54;3;f;34;23;e;23;5;d;23;23;g;;;
3;t;26;67;t;34;45;v;25;34;h;34;23;u;34;34;z
These are cvs files, so it should work with awk or sed ... but I have failed till now. If the first value is the same, I want to add the last three values to the first line. And this will run till the last entry in the file.
Here some code in awk, but it does not work:
#!/usr/bin/awk -f
BEGIN{ FS = " *; *"}
{ ORS = "\;" }
{
x = $1
print $0
}
{ if (x == $1)
print $3, $4, $5
else
print "\n"
}
END{
print "\n"
}
$ cat tst.awk
BEGIN { FS=OFS=";" }
{ curr = $1 FS $2 }
curr == prev {
sub(/^[^;]*;[^;]*/,"")
printf "%s", $0
next
}
{
printf "%s%s", (NR>1?ORS:""), $0
prev = curr
}
END { print "" }
$ awk -f tst.awk file
1;a;34;34;a;34;23;d;34;23;v;4;2;r;3;2;d
2;f;54;3;f;34;23;e;23;5;d;23;23;g
3;t;26;67;t;34;45;v;25;34;h;34;23;u;34;34;z
If I understand you correctly that you want to build a line from fields 3-5 of all lines with the same first two fields (preceded by those two fields), then
awk -F \; 'key != $1 FS $2 { if(NR != 1) print line; key = $1 FS $2; line = key } { line = line FS $3 FS $4 FS $5 } END { print line }' filename
That is
key != $1 FS $2 { # if the key (first two fields) changed
if(NR != 1) print line; # print the line (except at the very
# beginning, to not get an empty line there)
key = $1 FS $2 # remember the new key
line = key # and start building the next line
}
{
line = line FS $3 FS $4 FS $5 # take the value fields from each line
}
END { # and at the very end,
print line # print the last line (that the block above
} # cannot handle)
You got good answers in awk. Here is one in perl:
perl -F';' -lane'
$key = join ";", #F[0..1]; # Establish your key
$seen{$key}++ or push #rec, $key; # Remember the order
push #{ $h{$key} }, #F[2..$#F] # Build your data structure
}{
$, = ";"; # Set the output list separator
print $_, #{ $h{$_} } for #rec' file # Print as per order
This is going to seem a lot more complicated than the other answers, but it's adding a few things:
It computes the maximum number of fields from all built up lines
Appends any missing fields as blanks to the end of the built up lines
The posix awk on a mac doesn't maintain the order of array elements even when the keys are numbered when using the for(key in array) syntax. To maintain the output order then, you can keep track of it as I've done or pipe to sort afterwards.
Having matching numbers of fields in the output appears to be a requirement per the specified output. Without knowing what it should be, this awk script is built to load all the lines first, compute the maximum number of fields in an output line then output the lines with any adjustments in order.
#!/usr/bin/awk -f
BEGIN {FS=OFS=";"}
{
key = $1
# create an order array for the mac's version of awk
if( key != last_key ) {
order[++key_cnt] = key
last_key = key
}
val = a[key]
# build up an output line in array a for the given key
start = (val=="" ? $1 OFS $2 : val)
a[key] = start OFS $3 OFS $4 OFS $5
# count number of fields for each built up output line
nf_a[key] += 3
}
END {
# compute the max number of fields per any built up output line
for(k in nf_a) {
nf_max = (nf_a[k]>nf_max ? nf_a[k] : nf_max)
}
for(i=1; i<=key_cnt; i++) {
key = order[i]
# compute the number of blank flds necessary
nf_pad = nf_max - nf_a[key]
blank_flds = nf_pad!=0 ? sprintf( "%*s", nf_pad, OFS ) : ""
gsub( / /, OFS, blank_flds )
# output lines along with appended blank fields in order
print a[key] blank_flds
}
}
If the desired number of fields in the output lines is known ahead of time, simply appending the blank fields on key switch without all these arrays would work and make a simpler script.
I get the following output:
1;a;34;34;a;34;23;d;34;23;v;4;2;r;3;2;d
2;f;54;3;f;34;23;e;23;5;d;23;23;g;;;
3;t;26;67;t;34;45;v;25;34;h;34;23;u;34;34;z

Check variables from different lines with awk

I want to combine values from multiple lines with different lengths using awk into one line if they match. In the following sample match values for first field,
aggregating values from second field into a list.
Input, sample csv:
222;a;DB;a
222;b;DB;a
555;f;DB;a
4444;a;DB;a
4444;d;DB;a
4444;z;DB;a
Output:
222;a|b
555;f
4444;a|d|z
How can I write an awk expression (maybe some other shell expression) to check if the first field value match with the next/previous line, and then print a list of second fields values aggregated and separated by a pipe?
awk '
BEGIN {FS=";"}
{ if ($1==prev) {sec=sec "|" $2; }
else { if (prev) { print prev ";" sec; };
prev=$1; sec=$2; }}
END { if (prev) { print prev ";" sec; }}'
This, as you requested, checks the consecutive lines.
does this oneliner work?
awk -F';' '{a[$1]=a[$1]?a[$1]"|"$2:$2;} END{for(x in a) print x";"a[x]}' file
tested here:
kent$ cat a
222;a;DB;a
222;b;DB;a
555;f;DB;a
4444;a;DB;a
4444;d;DB;a
4444;z;DB;a
kent$ awk -F';' '{a[$1]=a[$1]?a[$1]"|"$2:$2;} END{for(x in a) print x";"a[x]}' a
555;f
4444;a|d|z
222;a|b
if you want to keep it sorted, add a |sort at the end.
Slightly convoluted, but does the job:
awk -F';' \
'{
if (a[$1]) {
a[$1]=a[$1] "|" $2
} else {
a[$1]=$2
}
}
END {
for (k in a) {
print k ";" a[k]
}
}' file
Assuming that you have set the field separator ( -F ) to ; :
{
if ( $1 != last ) { print s; s = ""; }
last = $1;
s = s "|" $2;
} END {
print s;
}
The first line and the first character are slightly wrong, but that's an exercise for the reader :-). Two simple if's suffice to fix that.
(Edit: Missed out last line.)
this should work:
Command:
awk -F';' '{if(a[$1]){a[$1]=a[$1]"|"$2}else{a[$1]=$2}}END{for (i in a){print i";" a[i] }}' fil
Input:
222;a;DB;a
222;b;DB;a
555;f;DB;a
4444;a;DB;a
4444;d;DB;a
4444;z;DB;a
Output:
222;a|b
555;f
4444;a|d|z

Sorting List and Adding Together Amounts Shellscript

I have a list such as:
10,Car Tyres
8,Car Tyres
4,Wheels
18,Crowbars
5,Jacks
5,Jacks
8,Jacks
The first number is quantity, second is item name. I need to get this list so that it only shows each item once and it adds together the quantity if the item appears more than once. The output of this working correctly would be:
18,Car Tyres
4,Wheels
18,Crowbars
18,Jacks
This will need to work on lists in this format of a few thousand lines, preferably coded in Linux shellscript, any help appreciated, thanks!
awk -F"," '{ t[$2] = t[$2] + $1 }
END{
for(o in t){
print o, t[o]
}
}' file
output
$ ./shell.sh
Crowbars 18
Wheels 4
Car Tyres 18
Jacks 18
How about a perl script?:
#!/usr/bin/perl -w
use strict;
my %parts;
while (<>) {
chomp;
my #fields = split /,/, $_;
if (scalar #fields > 1) {
if ($parts{$fields[1]}) {
$parts{$fields[1]} += $fields[0];
} else {
$parts{$fields[1]} = $fields[0];
}
}
}
foreach my $k (keys %parts) {
print $parts{$k}, ",$k\n";
}
awk -v FS=, '{ if (! $2 in a) {
a[$2] = $1;
}
else {
a[$2] += $1;
}
}
END {
for (name in a) {
printf("%s\t%d\n", name, a[name]);
}
}'
Look at:
man sort
man awk
The actual command you need is:
sort -n -t, +1 yourfile.txt | awk ......
You could also do this entirely in awk
Sum by group

Resources