Put every N rows of input into a new column - linux

In bash, given input
1
2
3
4
5
6
7
8
...
And N for example 5, I want the output
1 6 11
2 7 12
3 8 ...
4 9
5 10
How do I do this?

Using a little known gem pr:
$ seq 20 | pr -ts' ' --column 4
1 6 11 16
2 7 12 17
3 8 13 18
4 9 14 19
5 10 15 20

replace 5 in following script with your number.
seq 20|xargs -n5| awk '{for (i=1;i<=NF;i++) a[i,NR]=$i; }END{
for(i=1;i<=NF;i++) {for(j=1;j<=NR;j++)printf a[i,j]" "; print "" }}'
output:
1 6 11 16
2 7 12 17
3 8 13 18
4 9 14 19
5 10 15 20
note seq 20 above there is just for generating the number sequence for testing. You don't need it in your real work.
EDIT
as pointed out by sudo_O, I add an pure awk solution:
awk -vn=5 '{a[NR]=$0}END{ x=1; while (x<=n){ for(i=x;i<=length(a);i+=n) printf a[i]" "; print ""; x++; } }' file
test
kent$ seq 20| awk -vn=5 '{a[NR]=$0}END{ x=1; while (x<=n){ for(i=x;i<=length(a);i+=n) printf a[i]" "; print ""; x++; } }'
1 6 11 16
2 7 12 17
3 8 13 18
4 9 14 19
5 10 15 20
kent$ seq 12| awk -vn=5 '{a[NR]=$0}END{ x=1; while (x<=n){ for(i=x;i<=length(a);i+=n) printf a[i]" "; print ""; x++; } }'
1 6 11
2 7 12
3 8
4 9
5 10

Here's how I'd do it with awk:
awk -v n=5 '{ c++ } c>n { c=1 } { a[c] = (a[c] ? a[c] FS : "") $0 } END { for (i=1;i<=n;i++) print a[i] }'
Some simple testing:
seq 21 | awk -v n=5 '{ c++ } c>n { c=1 } { a[c] = (a[c] ? a[c] FS : "") $0 } END { for (i=1;i<=n;i++) print a[i] | "column -t" }'
Results:
1 6 11 16 21
2 7 12 17
3 8 13 18
4 9 14 19
5 10 15 20
And another:
seq 40 | awk -v n=6 '{ c++ } c>n { c=1 } { a[c] = (a[c] ? a[c] FS : "") $0 } END { for (i=1;i<=n;i++) print a[i] | "column -t" }'
Results:
1 7 13 19 25 31 37
2 8 14 20 26 32 38
3 9 15 21 27 33 39
4 10 16 22 28 34 40
5 11 17 23 29 35
6 12 18 24 30 36

Related

How to rearrange the columns using awk?

I have a file with 120 columns. A part of it is here with 12 columns.
A1 B1 C1 D1 A2 B2 C2 D2 A3 B3 C3 D3
4 4 5 2 3 3 2 1 9 17 25 33
5 6 4 6 8 2 3 5 3 1 -1 -3
7 8 3 10 13 1 4 9 -3 -15 -27 -39
9 10 2 14 18 0 5 13 -9 -31 -53 -75
11 12 1 18 23 -1 6 17 -15 -47 -79 -111
13 14 0 22 28 -2 7 21 -21 -63 -105 -147
15 16 -1 26 33 -3 8 25 -27 -79 -131 -183
17 18 -2 30 38 -4 9 29 -33 -95 -157 -219
19 20 -3 34 43 -5 10 33 -39 -111 -183 -255
21 22 -4 38 48 -6 11 37 -45 -127 -209 -291
I would like to rearrange it by bringing all A columns together (A1 A2 A3 A4) and similarly all Bs (B1 B2 B3 B4), Cs (C1 C2 C3 C4), Ds (D1 D2 D3 D4) together.
I am looking to print the columns as
A1 A2 A3 A4 B1 B2 B3 B4 C1 C2 C3 C4 D1 D2 D3 D4
My script is:
#!/bin/sh
sed -i '1d' input.txt
for i in {1..4};do
j=$(( 1 + $(( 3 * $(( i - 1 )) )) ))
awk '{print $'$j'}' input.txt >> output.txt
done
for i in {1..4};do
j=$(( 2 + $(( 3 * $(( i - 1 )) )) ))
awk '{print $'$j'}' input.txt >> output.txt
done
for i in {1..4};do
j=$(( 3 + $(( 3 * $(( i - 1 )) )) ))
awk '{print $'$j'}' input.txt >> output.txt
done
It is printing all in one column.
Here are two Generic approach solutions, without hard-coding the field numbers from Input_file, values can come in any order and it will sort them automatically. Written and tested in GNU awk with shown samples.
1st solution: Traverse through all the lines and their respective fields and then sort by values to perform indexing on headers.
awk '
FNR==1{
for(i=1;i<=NF;i++){
arrInd[i]=$i
}
next
}
{
for(i=1;i<=NF;i++){
value[FNR,arrInd[i]]=$i
}
}
END{
PROCINFO["sorted_in"]="#val_num_asc"
for(i in arrInd){
printf("%s%s",arrInd[i],i==length(arrInd)?ORS:OFS)
}
for(i=2;i<=FNR;i++){
for(k in arrInd){
printf("%s%s",value[i,arrInd[k]],k==length(arrInd)?ORS:OFS)
}
}
}
' Input_file
OR in case you want to get output in tabular format, then small tweak in above solution.
awk '
BEGIN { OFS="\t" }
FNR==1{
for(i=1;i<=NF;i++){
arrInd[i]=$i
}
next
}
{
for(i=1;i<=NF;i++){
value[FNR,arrInd[i]]=$i
}
}
END{
PROCINFO["sorted_in"]="#val_num_asc"
for(i in arrInd){
printf("%s%s",arrInd[i],i==length(arrInd)?ORS:OFS)
}
for(i=2;i<=FNR;i++){
for(k in arrInd){
printf("%s%s",value[i,arrInd[k]],k==length(arrInd)?ORS:OFS)
}
}
}
' Input_file | column -t -s $'\t'
2nd solution: Almost same concept of 1st solution, here traversing through array within conditions rather than explicitly calling it in END block of this program.
awk '
BEGIN { OFS="\t" }
FNR==1{
for(i=1;i<=NF;i++){
arrInd[i]=$i
}
next
}
{
for(i=1;i<=NF;i++){
value[FNR,arrInd[i]]=$i
}
}
END{
PROCINFO["sorted_in"]="#val_num_asc"
for(i=1;i<=FNR;i++){
if(i==1){
for(k in arrInd){
printf("%s%s",arrInd[k],k==length(arrInd)?ORS:OFS)
}
}
else{
for(k in arrInd){
printf("%s%s",value[i,arrInd[k]],k==length(arrInd)?ORS:OFS)
}
}
}
}
' Input_file | column -t -s $'\t'
Is it just A,B,C,D,A,B,C,D all the way across? Something like this should work (quick and dirty and specific though it be):
awk -v OFS='\t' '{
for (i=0; i<4; ++i) { # i=0:A, i=1:B,etc.
for (j=0; 4*j+i<NF; ++j) {
if (i || j) printf "%s", OFS;
printf "%s", $(4*j+i+1);
}
}
printf "%s", ORS;
}'
A similar approach to #MarkReed that manipulates the increment instead of the test condition can be written as:
awk '{
for (n=1; n<=4; n++)
for (c=n; c<=NF; c+=4)
printf "%s%s", ((c>1)?"\t":""), $c
print ""
}
' cols.txt
Example Use/Output
With your sample input in cols.txt you would have:
$ awk '{
> for (n=1; n<=4; n++)
> for (c=n; c<=NF; c+=4)
> printf "%s%s", ((c>1)?"\t":""), $c
> print ""
> }
> ' cols.txt
A1 A2 A3 B1 B2 B3 C1 C2 C3 D1 D2 D3
4 3 9 4 3 17 5 2 25 2 1 33
5 8 3 6 2 1 4 3 -1 6 5 -3
7 13 -3 8 1 -15 3 4 -27 10 9 -39
9 18 -9 10 0 -31 2 5 -53 14 13 -75
11 23 -15 12 -1 -47 1 6 -79 18 17 -111
13 28 -21 14 -2 -63 0 7 -105 22 21 -147
15 33 -27 16 -3 -79 -1 8 -131 26 25 -183
17 38 -33 18 -4 -95 -2 9 -157 30 29 -219
19 43 -39 20 -5 -111 -3 10 -183 34 33 -255
21 48 -45 22 -6 -127 -4 11 -209 38 37 -291
Here's a succinct generic solution that is not memory-bound, as RavinderSing13's solution is. (That is, it does not store the entire input in an array for printing in END.)
BEGIN {
OFS="\t" # output field separator
}
NR==1 {
# Sort column titles
for (i=1;i<=NF;i++) { sorted[i]=$i; position[$i]=i }
asort(sorted)
# And print them
for (i=1;i<=NF;i++) { $i=sorted[i] }
print
next
}
{
# Make an array of our input line...
split($0,line)
for (i=1;i<=NF;i++) { $i=line[position[sorted[i]]] }
print
}
The idea here is that at the first line of input, we record the position of our columns in the input, then sort the list of column names with asort(). It is important here that column names are not duplicated, as they are used as the index of an array.
As we step through the data, each line is reordered by replacing each field with the value from the position as sorted by the first line.
It is important that you set your input field separator correctly (whitespace, tab, comma, whatever), and have the complete set of fields in each line, or output will be garbled.
Also, this doesn't create columns. You mentioned A4 in your question, but there is no A4 in your sample data. We are only sorting what is there.
Lastly, this is a GNU awk program, due to the use of asort().
Using any awk for any number of tags (non-numeric leading strings in the header line) and/or numbers associated with them in the header line, including different counts of each letter so you could have A1 A2 but then B1 B2 B3 B4, reproducing the input order in the output and only storing 1 line at a time in memory:
$ cat tst.awk
BEGIN { OFS="\t" }
NR == 1 {
for ( fldNr=1; fldNr<=NF; fldNr++ ) {
tag = $fldNr
sub(/[0-9]+$/,"",tag)
if ( !seen[tag]++ ) {
tags[++numTags] = tag
}
fldNrs[tag,++numTagCols[tag]] = fldNr
}
}
{
out = ""
for ( tagNr=1; tagNr<=numTags; tagNr++ ) {
tag = tags[tagNr]
for ( tagColNr=1; tagColNr<=numTagCols[tag]; tagColNr++ ) {
fldNr = fldNrs[tag,tagColNr]
out = (out=="" ? "" : out OFS) $fldNr
}
}
print out
}
$ awk -f tst.awk file
A1 A2 A3 B1 B2 B3 C1 C2 C3 D1 D2 D3
4 3 9 4 3 17 5 2 25 2 1 33
5 8 3 6 2 1 4 3 -1 6 5 -3
7 13 -3 8 1 -15 3 4 -27 10 9 -39
9 18 -9 10 0 -31 2 5 -53 14 13 -75
11 23 -15 12 -1 -47 1 6 -79 18 17 -111
13 28 -21 14 -2 -63 0 7 -105 22 21 -147
15 33 -27 16 -3 -79 -1 8 -131 26 25 -183
17 38 -33 18 -4 -95 -2 9 -157 30 29 -219
19 43 -39 20 -5 -111 -3 10 -183 34 33 -255
21 48 -45 22 -6 -127 -4 11 -209 38 37 -291
or with different formats of tags and different numbers of columns per tag:
$ cat file
foo1 bar1 bar2 bar3 foo2 bar4
4 4 5 2 3 3
5 6 4 6 8 2
$ awk -f tst.awk file
foo1 foo2 bar1 bar2 bar3 bar4
4 3 4 5 2 3
5 8 6 4 6 2
The above assumes you want the output order per tag to match the input order, not be based on the numeric values after each tag so if you have input of A2 B1 A1 then the output will be A2 A1 B1, not A1 A2 B1.

How to print contents of column fields that have strings composed of "n" character/s using bash?

Say I have a file which contains:
22 30 31 3a 31 32 3a 32 " 0 9 : 1 2 : 2
30 32 30 20 32 32 3a 31 1 2 7 2 2 : 1
And, I want to print only the column fields that have string composed of 1 character. I want the output to be like this:
" 0 9 : 1 2 : 2
1 2 7 2 2 : 1
Then, I want to print only those strings that are composed of two characters, the output should be:
22 30 31 3a 31 32 3a 32
30 32 30 20 32 32 3a 31
I am a beginner and I really don't know how to do this. Thanks for your help!
Could you please try following, I am trying it in a different way for provided samples. Written and tested with provided samples only.
For getting values before BULK SPACE try:
awk '
{
line=$0
while(match($0,/[[:space:]]+/)){
arr=arr>RLENGTH?arr:RLENGTH
start[arr]+=RSTART+prev_start
prev_start=RSTART
$0=substr($0,RSTART+RLENGTH)
}
var=substr(line,1,start[arr]-1)
sub(/ +$/,"",var)
print var
delete start
var=arr=""
}
' Input_file
Output will be as follows.
22 30 31 3a 31 32 3a 32
30 32 30 20 32 32 3a 31
For getting values after BULK SPACE try:
awk '
{
line=$0
while(match($0,/[[:space:]]+/)){
arr=arr>RLENGTH?arr:RLENGTH
start[arr]+=RSTART+prev_start
prev_start=RSTART
$0=substr($0,RSTART+RLENGTH)
}
var=substr(line,start[arr])
sub(/^ +/,"",var)
print var
delete start
var=arr=""
}
' Input_file
Output will be as follows:
" 0 9 : 1 2 : 2
1 2 7 2 2 : 1
You can try
awk '{for(i=1;i<=NF;++i)if(length($i)==1)printf("%s ", $i);print("")}'
For each field, check the length and print it if it's desired. You may pass the -F option to awk if it's not separated by blanks.
The awk script is expanded as:
for( i = 1; i <= NF; ++i )
if( length( $i ) == 1 )
printf( "%s ", $i );
print( "" );
The print outside loop is to print a newline after each input line.
Assuming all the columns are tab-separated (So you can have a space as a column value like the second line of your sample), easy to do with a perl one-liner:
$ perl -F"\t" -lane 'BEGIN { $, = "\t" } print grep { /^.$/ } #F' foo.txt
" 0 9 : 1 2 : 2
1 2 7 2 2 : 1
$ perl -F"\t" -lane 'BEGIN { $, = "\t" } print grep { /^..$/ } #F' foo.txt
22 30 31 3a 31 32 3a 32
30 32 30 20 32 32 3a 31

Use printf to format list that is uneven

I have a small list of student grades, I need to format it them side by side depending on the gender of the student. So one column is Male the other Female. The problem is the list doesn't go male female male female, it is uneven.
I've tried using printf to format the output so the 2 columns are side by side, but the format is ruined because of the uneven list.
Name Gender Mark1 Mark2 Mark3
AA M 20 15 35
BB F 22 17 44
CC F 19 14 25
DD M 15 20 42
EE F 18 22 30
FF M 0 20 45
This is the list I am talking about ^^
awk 'BEGIN {print "Male" " Female"} {if (NR!=1) {if ($2 == "M") {printf "%-s %-s %-s", $3, $4, $5} else if ($2 == "F") {printf "%s %s %s\n", $3, $4 ,$5}}}' text.txt
So I'm getting results like
Male Female
20 15 35 22 17 44
19 14 25
15 20 42 18 22 30
0 20 45
But I want it like this:
Male Female
20 15 35 22 17 44
15 20 42 19 14 25
0 20 45 18 22 30
I haven't added separators yet I'm just trying to figure this out, not sure if it would be better to put the marks into 2 arrays depending on gender then printing them out.
another solution tries to address if M/F is not unity
$ awk 'NR==1 {print "Male\tFemale"}
NR>1 {k=$2;$1=$2="";sub(/ +/,"");
if(k=="M") m[++mc]=$0; else f[++fc]=$0}
END {max=mc>fc?mc:fc;
for(i=1;i<=max;i++) print (m[i]?m[i]:"-") "\t" (f[i]?f[i]:"-")}' file |
column -ts$'\t'
Male Female
20 15 35 22 17 44
15 20 42 19 14 25
0 20 45 18 22 30
Something like this?
awk 'BEGIN{format="%2s %2s %2s %2s\n";printf("Male Female\n"); }NR>1{if (s) { if ($2=="F") {printf(format, s, $3, $4, $5);} else {printf(format, $3,$4,$5,s);} s=""} else {s=sprintf("%2s %2s %2s", $3, $4, $5)}}' file
Another approach using awk
awk '
BEGIN {
print "Male\t\tFemale"
}
NR > 1 {
I = ++G[$2]
A[$2 FS I] = sprintf("%2d %2d %2d", $(NF-2), $(NF-1), $NF)
}
END {
M = ( G["M"] > G["F"] ? G["M"] : G["F"] )
for ( i = 1; i <= M; i++ )
print A["M" FS i] ? A["M" FS i] : OFS, A["F" FS i] ? A["F" FS i] : OFS
}
' OFS='\t' file
This might work for you (GNU sed):
sed -Ee '1c\Male Female' -e 'N;s/^.. M (.*)\n.. F(.*)/\1\2/;s/^.. F(.*)\n.. M (.*)/\2\1/' file
Change the header line. Then compare a pair of lines and re-arrange them as appropriate.

How to append a special character in awk?

I have three files with different column and row size. For example,
ifile1.txt ifile2.txt ifile3.txt
1 2 2 1 6 3 8
2 5 6 3 8 9 0
3 8 7 6 8 23 6
6 7 6 23 6 44 5
9 87 87 44 7 56 7
23 6 6 56 8 78 89
44 5 76 99 0 95 65
56 6 7 99 78
78 7 8 106 0
95 6 7 110 6
99 6 4
106 5 34
110 6 4
Here ifile1.txt has 3 coulmns and 13 rows,
ifile2.txt has 2 columns and 7 rows,
ifile3.txt has 2 columns and 10 rows.
1st column of each ifile is the ID,
This ID is sometimes missing in ifile2.txt and ifile3.txt.
I would like to make an outfile.txt with 4 columns whose 1st column would have all the IDs as in ifile1.txt, while the 2nd coulmn will be $3 from ifile1.txt, 3rd and 4th column will be $2 from ifile2.txt and ifile3.txt and the missing stations in ifile2.txt and ifile3.txt will be assigned as a special charecter '?'.
Desire output:
outfile.txt
1 2 6 ?
2 6 ? ?
3 7 8 8
6 6 8 ?
9 87 ? 0
23 6 6 6
44 76 7 5
56 7 8 7
78 8 ? 89
95 7 ? 65
99 4 0 78
106 34 ? 0
110 4 ? 6
I was trying with the following algorithm, but can't able to write a script.
for each i in $1, awk '{printf "%3s %3s %3s %3s\n", $1, $3 (from ifile1.txt),
check if i is present in $1 (ifile2.txt), then
write corresponding $2 values from ifile2.txt
else write ?
similarly check for ifile3.txt
You can do that with GNU AWK using this script:
script.awk
# read lines from the three files
ARGIND == 1 { file1[ $1 ] = $3
# init the other files with ?
file2[ $1 ] = "?"
file3[ $1 ] = "?"
next;
}
ARGIND == 2 { file2[ $1 ] = $2
next;
}
ARGIND == 3 { file3[ $1 ] = $2
next;
}
# output the collected information
END { for( k in file1) {
printf("%3s%6s%6s%6s\n", k, file1[ k ], file2[ k ], file3[ k ])
}
}
Run the script like this: awk -f script.awk ifile1.txt ifile2.txt ifile3.txt > outfile.txt

Sum each element of a row from two files

I want to write shell script in which each row's column element from file1 and file2 are added.
file1:
A 10 12 13 14
B 2 5 6 10
C 1
file2:
A 11 13 11 15
B 3 1 1 1
C 2
output:
A 22 25 24 29
B 5 6 7 11
C 3
I have tried to write this, but it seems very chaotic.
So I'd like to get some help to make it better!
awk '{getline v < "file1"; split( v, a );
for (i = 2; i <= NF; i++)
{print a[1], a[i]+ $i}
}' file2 > temp
awk '{a[$1]=a[$1]" "$2}
END{for(i in a)print i,a[i]
}' temp > out
file1
A 10 12 13 14
B 2 5 6 10
C 1
file2
A 11 13 11 15
B 3 1 1 1
C 2
Programme
cat file1 file2 | cut -d" " -f1 | sort -u | while read i
do
line1="`grep ^$i file1 | sed -e "s/ */ /g" | cut -d" " -f2-` "
line2="`grep ^$i file2 | sed -e "s/ */ /g" | cut -d" " -f2-` "
(
echo $i
while [ "${line1}${line2}" != "" ]
do
v1=0`echo "$line1" | cut -d" " -f1`
v2=0`echo "$line2" | cut -d" " -f1`
line1="`echo "$line1" | cut -d" " -f2-`"
line2="`echo "$line2" | cut -d" " -f2-`"
echo `expr $v1 + $v2`
done
) | xargs
done > file3
file3
A 21 25 24 29
B 5 6 7 11
C 3
This solution remains valid if the number of column or lines are not identical, the missing values are considered 0
#file1
A 10 12 13 14
B 2 5 6
C 1 10
D 1 1
#file2
A 11 13 11 15
B 3 1 1 1 5
C 2
F 3 3
#file3
A 21 25 24 29
B 5 6 7 1 5
C 3 10
D 1 1
F 3 3

Resources