parse text vertical to horizontal

parse text vertical to horizontal - linux

I'm looking to parse the following data:
T
E
S
T
_
7
TTTTTTT
EEEEEEE
SSSSSSS
TTTTTTT
_______
5679111
012
into something like:
TEST_7
TEST_5, TEST_6, TEST_7, TEST_9, TEST_10, TEST_11, TEST_12
Any suggestions could help. Ty

awk to the rescue!
This is basically a transpose operation
awk 'BEGIN {FS=""}
{for(i=1;i<=NF;i++) a[NR,i]=$i;
if(max<NF)max=NF}
END {for(i=1;i<=max;i++)
{for(j=1;j<=NR;j++) printf "%s",a[j,i];
print ""}}' file
TEST_7TEST_5
TEST_6
TEST_7
TEST_9
TEST_10
TEST_11
TEST_12
you need to explain the rules on how to transform this to your desired layout.

Python:
#!/usr/bin/python
txt='''\
T
E
S
T
_
7
TTTTTTT
EEEEEEE
SSSSSSS
TTTTTTT
_______
5679111
012 '''
row_len=max(len(line.rstrip()) for line in txt.splitlines())
arr=[list('{:{w}}'.format(line.rstrip(), w=row_len)) for line in txt.splitlines()]
print '\n'.join([''.join(t) for t in zip(*arr)])
Or, awk:
awk 'BEGIN{RS="[ ]*\n"}
{lines[NR]=$0
max=length($0)>max ? length($0) : max }
END{ for (i=1; i in lines; i++)
lines[i]=sprintf("%-*s", max, lines[i])
for (i=1;i<=max; i++){
for (j=1; j in lines; j++)
printf "%s", substr(lines[j], i, 1)
print ""
}
}' file
Prints:
TEST_7TEST_5
TEST_6
TEST_7
TEST_9
TEST_10
TEST_11
TEST_12

In awk (well GNU awk for -F ''):
$ awk -F '' '
NR!=1 && NF!=p {
for(i=1;i<=p;i++)
printf "%s%s",a[i],(i==p?ORS:"")
delete a
p=NF }
NR==1 || NF==p {
for(i=1;i<=NF;i++)
a[i]=a[i] $i
p=NF
j++ }
END {
for(i=1;i<=p;i++)
printf "%s%s",a[i],(i==p?ORS:", ") }
' file
TEST_7
TEST_5 , TEST_6 , TEST_7 , TEST_9 , TEST_10, TEST_11, TEST_12
It detects change (and prints buffered) when record length (NF actually) changes.

Related

In bash how to move row field to column in a text file

I have a .txt file with this record:
field_1 value01a value01b value01c
field_2 value02
field_3 value03a value03b value03c
field_1 value11
field_2 value12a value12b
field_3 value13
field_1 value21
field_2 value22
field_3 value23
...
field_1 valuen1
field_2 valuen2
field_3 valuen3
I would like to convert them like that:
field1 field2 field3
value01a value01b value01c valu02 value03a value03b value03c
value11 value12a value12b value13
value21 value22 value23
...
valuen1 valuen2 valuen3
I have tried something like:
awk '{for (i = 1; i <NR; i ++) FNR == i {print i, $ (i + 1)}}' filename
or like
awk '
{
for (i=1; i<=NF; i++) {
a[NR,i] = $i
}
}
NF>p { p = NF }
END {
for(j=1; j<=p; j++) {
str=a[1,j]
for(i=2; i<=NR; i++){
str=str" "a[i,j]
}
print str
}
}'
but i can't get it to work
I would like the values to be transposed and that each tuple of values associated with a specific field is aligned with the others
Any suggestions?
Thank you in advance

I have downloaded your bigger sample file. And here is what I have come up with:
awk -v OFS='\t' -v RS= '
((n = split($0, a, / {2,}| *\n/)) % 2) == 0 {
# print header
if (NR==1)
for (i=1; i<=n; i+=2)
printf "%s", a[i] (i < n-1 ? OFS : ORS)
# print all records
for (i=2; i<=n; i+=2)
printf "%s", a[i] (i < n ? OFS : ORS)
}' reclamiTestFile.txt | column -t -s $'\t'
Code Demo

Could you please try following, written and tested with shown samples in GNU awk.
awk '
{
first=$1
$1=""
sub(/^ +/,"")
if(!arr[first]++){
++indArr
counter[indArr]=first
}
++count[first]
arr[first OFS count[first]]=$0
}
END{
for(j=1;j<=indArr;j++){
printf("%s\t%s",counter[j],j==indArr?ORS:"\t")
}
for(i=1;i<=FNR;i++){
for(j=1;j<=indArr;j++){
if(arr[counter[j] OFS i]){
printf("%s\t%s",arr[counter[j] OFS i],j==indArr?ORS:"\t")
}
}
}
}' Input_file | column -t -s $'\t'
column command is taken from #anubhava sir's answer here.

How to print total count of multiples strings from a text file using awk

I'm extremely new to awk so I'm having a little bit of trouble with this problem. What I need to do is write a script that prints the counts of three strings "They", "He" "She", but I can only do this in awk and the words are case sensitive. So far it's only printing the number of times "They" appears in the text file (which is an essay), and not the others. Some input on how to do this only in awk would be great. Here's what I have so far
awk 'BEGIN {print "They" " " "He" " " "She"} #printing header
{for (i=0;i<=NF;i++)if ( $i =="They" ) numA++;
if ( $i =="He" ) numB++;
if ( $i =="She" ) numC++ } END {print numA," ", numB, " ", numC}' myFile.txt
The expected output should be:
They He She
24 16 17

You're missing the braces for the 'for' loop - you should have :
awk 'BEGIN {print "They" " " "He" " " "She"} #printing header
{for (i=0;i<=NF;i++)
{
if ( $i =="They" ) numA++;
if ( $i =="He" ) numB++;
if ( $i =="She" ) numC++;
}
} END {print numA," ", numB, " ", numC}' myFile.txt

Assuming your input is space-separated lines of individual words, here's how to do it:
awk '
BEGIN{
numWords = split("They He She",tmp)
for (i in tmp) {
words[i]
}
}
{
for (i=1;i<=NF;i++) {
if ($i in words) {
cnt[$i]++
}
}
}
END {
for (wordNr=1; wordNr <= numWords; wordNr++) {
printf "%s%s", tmp[wordNr], (wordNr<numWords?OFS:ORS)
}
for (wordNr=1; wordNr <= numWords; wordNr++) {
printf "%d%s", cnt[tmp[wordNr]], (wordNr<numWords?OFS:ORS)
}
}' file
If that's not what your input is then update your question to show it.

Awk between two patterns with pattern in the middle

Hi i am looking for an awk that can find two patterns and print the data between them to
a file only if in the middle there is a third patterns in the middle.
for example:
Start
1
2
middle
3
End
Start
1
2
End
And the output will be:
Start
1
2
middle
3
End
I found in the web awk '/patterns1/, /patterns2/' path > text.txt
but i need only output with the third patterns in the middle.

And here is a solution without flags:
$ awk 'BEGIN{RS="End"}/middle/{printf "%s", $0; print RT}' file
Start
1
2
middle
3
End
Explanation: The RS variable is the record separator, so we set it to "End", so that each Record is separated by "End".
Then we filter the Records that contain "middle", with the /middle/ filter, and for the matched records we print the current record with $0 and the separator with print RT

This awk should work:
awk '$1=="Start"{ok++} ok>0{a[b++]=$0} $1=="middle"{ok++} $1=="End"{if(ok>1) for(i=0; i<length(a); i++) print a[i]; ok=0;b=0;delete a}' file
Start
1
2
middle
3
End
Expanded:
awk '$1 == "Start" {
ok++
}
ok > 0 {
a[b++] = $0
}
$1 == "middle" {
ok++
}
$1 == "End" {
if (ok > 1)
for (i=0; i<length(a); i++)
print a[i];
ok=0;
b=0;
delete a
}' file

Just use some flags with awk:
/Start/ {
start_flag=1
}
/middle/ {
mid_flag=1
}
start_flag {
n=NR;
lines[NR]=$0
}
/End/ {
if (start_flag && mid_flag)
for(i=n;i<NR;i++)
print lines[i]
start_flag=mid_flag=0
delete lines
}

Modified the awk user000001
awk '/middle/{printf "%s%s\n",$0,RT}' RS="End" file
EDIT:
Added test for Start tag
awk '/Start/ && /middle/{printf "%s%s\n",$0,RT}' RS="End" file

This will work with any modern awk:
awk '/Start/{f=1;rec=""} f{rec=rec $0 ORS} /End/{if (rec~/middle/) printf "%s",rec}' file
The solutions that set RS to "End" are gawk-specific, which may be fine but it's definitely worth mentioning.

Search and replace null and dot in a column of file

I want to search and replace null and dot in a column of file using awk or sed.
The file's content is:
02-01-12 28.46
02-02-12 27.15
02-03-12
02-04-12 27.36
02-05-12 47.57
02-06-12 27.01
02-07-12 27.41
02-08-12 27.27
02-09-12 27.39
02-10-12 .
02-11-12 27.44
02-12-12 49.93
02-13-12 26.99
02-14-12 27.47
02-15-12 27.21
02-16-12 27.48
02-17-12 27.66
02-18-12 27.15
02-19-12 51.74
02-20-12 27.37
The dots and null value can be be appeared in any rows in the file, I want to replace null and dots with the value above, say ,
02-01-12 28.46
02-02-12 27.15
02-03-12 27.15 ****** replace with the above value
02-04-12 27.36
02-05-12 47.57
02-06-12 27.01
02-07-12 27.41
02-08-12 27.27
02-09-12 27.39
02-10-12 27.39 ****** replace with the above value
02-11-12 27.44
02-12-12 49.93
02-13-12 26.99
02-14-12 27.47
02-15-12 27.21
02-16-12 27.48
02-17-12 27.66
02-18-12 27.15
02-19-12 51.74
02-20-12 27.37

This might work for you (GNU sed):
sed -i '$!N;s/^\(.\{9\}\(.*\)\n.\{9\}\)\.\?$/\1\2/;P;D' file

awk 'BEGIN {prev = "00.00"} NF < 2 || $2 == "." {$2 = prev} {prev = $2; print}' filename
If you have multiple columns which might have missing data:
awk 'BEGIN {p = "00.00"} {for (i = 1; i <= NF; i++) {if (! $i || $i == ".") {if (prev[i]) {$i = prev[i]} else {$i = p}}; prev[i] = $i}; print}' filename

The following awk script should work:
BEGIN {
last="00.00"
}
{
if ($2 != "" && $2 != ".") {
last=$2
}
print $1 " " last
}

$ awk -v prev=00.00 'NF<2 || $2=="." { print $1, prev; next }{prev=$2}1' input-file

unfolding a file on linux

I have a huge textfile, approx 400.000 lines 80 charachters wide on liux.
Need to "unfold" the file, merging four lines into one
ending up having 1/4 of the lines, each line 80*4 charachters long.
any suggestions?

perl -pe 'chomp if (++$i % 4);'

An easier way to do it with awk would be:
awk '{ printf $0 } (NR % 4 == 0) { print }' filename
Although if you wanted to protect against ending up without a trailing newline it gets a little more complicated:
awk '{ printf $0 } (NR % 4 == 0) { print } END { if (NR % 4 != 0) print }' filename

I hope I understood your question correctly. You have an input line like this (except your lines are longer):
abcdef
ghijkl
mnopqr
stuvwx
yz0123
456789
ABCDEF
You want output like this:
abcdefghijklmnopqrstuvwx
yz0123456789ABCDEF
The following awk program should do it:
{ line = line $0 }
(NR % 4) == 0 { print line; line = "" }
END { if (line != "") print line }
Run it like this:
awk -f merge.awk data.txt

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

parse text vertical to horizontal - linux

I'm looking to parse the following data: T E S T _ 7 TTTTTTT EEEEEEE SSSSSSS TTTTTTT _______ 5679111 012 into something like: TEST_7 TEST_5, TEST_6, TEST_7, TEST_9, TEST_10, TEST_11, TEST_12 Any suggestions could help. Ty

Related

In bash how to move row field to column in a text file

How to print total count of multiples strings from a text file using awk

Awk between two patterns with pattern in the middle

Search and replace null and dot in a column of file

unfolding a file on linux

Categories

Resources