How do I turn a text file with a single column into a matrix? - linux

I have a text file that has a single column of numbers, like this:
1
2
3
4
5
6
I want to convert it into two columns, in the left to right order this way:
1 2
3 4
5 6
I can do it with:
awk '{print>"line-"NR%2}' file
paste line-0 line-1 >newfile
But I think the reliance on two intermediate files will make it fragile in a script.
I'd like to use something like cat file | mystery-zip-command >newfile

You can use paste to do this:
paste -d " " - - < file > newfile
You can also use pr:
pr -ats" " -2 file > newfile
-a - use round robin order
-t - suppress header and trailer
-s " " - use single space as the delimiter
-2 - two column output
See also:
Convert a text file into columns

another alternative
$ seq 6 | xargs -n2
1 2
3 4
5 6
or with awk
$ seq 6 | awk '{ORS=NR%2?FS:RS}1'
1 2
3 4
5 6
if you want the output terminate with a new line in case of odd number of input lines..
$ seq 7 | awk '{ORS=NR%2?FS:RS}1; END{ORS=NR%2?RS:FS; print ""}'
1 2
3 4
5 6
7

awk 'NR % 2 == 1 { printf("%s", $1) }
NR % 2 == 0 { printf(" %s\n", $1) }
END { if (NR % 2 == 1) print "" }' file
The odd lines are printed with no newline after them, to print the first column. The even lines are printed with a space first and a newline after, to print the second column. At the end, if there were an odd number of lines, we print a newline so we don't end in the middle of the line.

With bash:
while IFS= read -r odd; do IFS= read -r even; echo "$odd $even"; done < file
Output:
1 2
3 4
5 6

$ seq 6 | awk '{ORS=(NR%2?FS:RS); print} END{if (ORS==FS) printf RS}'
1 2
3 4
5 6
$
$ seq 7 | awk '{ORS=(NR%2?FS:RS); print} END{if (ORS==FS) printf RS}'
1 2
3 4
5 6
7
$
Note that it always adds a terminating newline - that is important as future commands might depend on it, e.g.:
$ seq 6 | awk '{ORS=(NR%2?FS:RS); print}' | wc -l
3
$ seq 7 | awk '{ORS=(NR%2?FS:RS); print}' | wc -l
3
$ seq 7 | awk '{ORS=(NR%2?FS:RS); print} END{if (ORS==FS) printf RS}' | wc -l
4
Just change the single occurrence of 2 to 3 or however many columns you want if your requirements change:
$ seq 6 | awk '{ORS=(NR%3?FS:RS); print} END{if (ORS==FS) printf RS}'
1 2 3
4 5 6
$ seq 7 | awk '{ORS=(NR%3?FS:RS); print} END{if (ORS==FS) printf RS}'
1 2 3
4 5 6
7
$ seq 8 | awk '{ORS=(NR%3?FS:RS); print} END{if (ORS==FS) printf RS}'
1 2 3
4 5 6
7 8
$ seq 9 | awk '{ORS=(NR%3?FS:RS); print} END{if (ORS==FS) printf RS}'
1 2 3
4 5 6
7 8 9
$

Short awk approach:
awk '{print ( ((getline nl) > 0)? $0" "nl : $0 )}' file
The output:
1 2
3 4
5 6
(getline nl)>0 - getline will get the next record and assign it to variable nl. The getline command returns 1 if it finds a record and 0 if it encounters the end of the file
Short GNU sed approach:
sed 'N;s/\n/ /' file
N - add a newline to the pattern space, then append the next line of input to the pattern space
s/\n/ / - replace newline with whitespace within captured pattern space

seq 6 | tr '\n' ' ' | sed -r 's/([^ ]* [^ ]* )/\1\n/g'

Related

Sort the tab-delimited numbers on each line of a file

I'm trying to sort the numbers on each line of a file individually. The numbers within one line are separated by tabs. (I used spaces but they're actually tabs.)
For example, for the following input
5 8 7 6
1 5 6 8
8 9 7 1
the desired output would be:
5 6 7 8
1 5 6 7
1 7 8 9
My attempt so far is:
let i=1
while read line
do
echo "$line" | tr " " "\n" | sort -g
cut -f $i fileName | paste -s >> tempFile$$
((++i))
done < fileName
This is the best I got - I'm sure it can be done in 6 characters with awk/sed/perl:
while read line
do
echo $(printf "%d\n" $line | sort -n) | tr ' ' \\t >> another-file.txt
done < my-input-file.txt
Using a few features that are specific to GNU awk:
$ awk 'BEGIN{ PROCINFO["sorted_in"] = "#ind_num_asc" }
{ delete(a); n = 0; for (i=1;i<=NF;++i) a[$i];
for (i in a) printf "%s%s", i, (++n<NF?FS:RS) }' file
5 6 7 8
1 5 6 8
1 7 8 9
Each field is set as a key in the array a. In GNU awk it is possible to specify the order in which the for (i in a) loop traverses the array - here, I've set it to do so in ascending numerical order.
Here is a bash script that can do it. It takes a filename argument or reads stdin, was tested on CentOS and assumes IFS=$' \t\n'.
#!/bin/bash
if [ "$1" ] ; then exec < "$1" ; fi
cat - | while read line
do
set $line
echo $(for var in "$#"; do echo $var; done | sort -n) | tr " " "\t"
done
If you want to put the output in another file run it as:
cat input_file | sorting_script > another_file
or
sorting_script input_file > another file
Consider using perl for this:
perl -ape '#F=sort #F;$_="#F\n"' input.txt
Here -a turns on automatic field splitting (like awk does) into the array #F, -p makes it execute the script for each line and print $_ each time, and -e specifies the script directly on the command line.
Not quite 6 characters, I'm afraid, Sean.
This should have been simple in awk, but it doen't quite have the features needed. If there had been an array $# corresponding to the fields $1, $2, etc., then the solution would have been awk '{asort $#}' input.txt, but sadly no such array exits. The loops required to move the fields into an array and out of it again make it longer than the bash version:
awk '{for(i=1;i<=NF;i++)a[i]=$i;asort(a);for(i=1;i<=NF;i++)printf("%s ",a[i]);printf("\n")}' input.txt
So awk isn't the right tool for the job here. It's also a bit odd that sort itself doesn't have a switch to control its sorting direction.
Using awk
$ cat file
5 8 7 6
1 5 6 8
8 9 7 1
$ awk '{c=1;while(c!=""){c=""; for(i=1;i<NF;i++){n=i+1; if($i>$n){c=$i;$i=$n;$n=c}}}}1' file
5 6 7 8
1 5 6 8
1 7 8 9
Better Readable version
awk '{
c=1
while(c!="")
{
c=""
for(i=1;i<NF;i++)
{
n=i+1
if($i>$n)
{
c=$i
$i=$n
$n=c
}
}
}
}1
' file
If you have ksh, you may try this
#!/usr/bin/env ksh
while read line ; do
set -s +A cols $line
echo ${cols[*]}
done < "input_file"
Test
[akshay#localhost tmp]$ cat test.ksh
#!/usr/bin/env ksh
cat <<EOF | while read line ; do set -s +A cols $line; echo ${cols[*]};done
5 8 7 6
1 5 6 8
8 9 7 1
EOF
[akshay#localhost tmp]$ ksh test.ksh
5 6 7 8
1 5 6 8
1 7 8 9

how can i print the upper triangle of a matrix

using awk command I tried to print the upper triangle of a matrix
awk '{for (i=1;i<=NF;i++) if (i>=NR) printf $i FS "\n"}' matrix
but the output is shown as a single row
Consider this sample matrix:
$ cat matrix
1 2 3
4 5 6
7 8 9
To print the upper-right triangle:
$ awk '{for (i=1;i<=NF;i++) printf "%s%s",(i>=NR)?$i:" ",FS; print""}' matrix
1 2 3
5 6
9
Or:
$ awk '{for (i=1;i<=NF;i++) printf "%2s",(i>=NR)?$i:" "; print""}' matrix
1 2 3
5 6
9
To print the upper-left triangle:
$ awk '{for (i=1;i<=NF+1-NR;i++) printf "%s%s",$i,FS; print""}' matrix
1 2 3
4 5
7
Or:
$ awk '{for (i=1;i<=NF+1-NR;i++) printf "%2s",$i; print""}' matrix
1 2 3
4 5
7
This might work for you (GNU sed):
sed -r ':a;n;H;G;s/\n//;:b;s/^\S+\s*(.*)\n.*/\1/;tb;$!ba' file
Use the hold space as a counter for those lines that have been processed and for each current line remove those many fields from the front of the current line.
N.B. The counter is set following the printing of the current line otherwise the first line would be minus the first field.
On reflection an alternative/more elegant solution is:
sed -r '1!G;h;:a;s/^\S+\s*(.*)\n.*/\1/;ta' file
And to print the upper-left triangle:
sed -r '1!G;h;:a;s/^([^\n]*)\S+[^\n]*(.*)\n.*/\1\2/;ta' file
$ awk '{for (i=NR;i<=NF;i++) printf "%s%s",$i,(i<NF?FS:RS)}' file
1 2 3
5 6
9

Searching a column in a unix file?

I have the data file below:
136110828724515000007700877
137110904734015000007700877
138110911724215000007700877
127110626724515000007700871
127110626726015000007700871
131110724724515000007700871
134110814725015000007700871
134110814734015000007700871
104110122726027000001810072
107110208724527000002900000
And I want to extract value of column 3 ie values of 6787714447.
I tried by using:-
awk "print $3" <filename>
but it didn't work. What should I use instead?
It is a better job for cut:
$ cut -c 3 < file
6
7
8
7
7
1
4
4
4
7
As per man cut:
-c, --characters=LIST
select only these characters
To make them appear all in the same line, pipe tr -d '\n':
$ cut -c 3 < file | tr -d '\n'
6787714447
Or even to sed to have the new line at the end:
$ cut -c 3 < file | tr -d '\n' | sed 's/$/\n/'
6787714447
With grep:
$ grep -oP "^..\K." file
6
7
8
7
7
1
4
4
4
7
with sed:
$ sed -r 's/..(.).*/\1/' file
6
7
8
7
7
1
4
4
4
7
with awk:
$ awk '{split ($0, a, ""); print a[3]}' file
6
7
8
7
7
1
4
4
4
7
Cut is probably the simpler/cleaner option, but here two alternatives:
AWK version:
awk '{print substr($1, 3, 1) }' <filename>
Python version:
python -c 'print "\n".join(map(lambda x: x[2], open("<filename>").readlines()))'
EDIT: Please see 1_CR's comments and disregard this option in favour of his.

select the second line to last line of a file

How can I select the lines from the second line to the line before the last line of a file by using head and tail in unix?
For example if my file has 15 lines I want to select lines from 2 to 14.
tail -n +2 /path/to/file | head -n -1
perl -ne 'print if($.!=1 and !(eof))' your_file
tested below:
> cat temp
1
2
3
4
5
6
7
> perl -ne 'print if($.!=1 and !(eof))' temp
2
3
4
5
6
>
alternatively in awk you can use below:
awk '{a[count++]=$0}END{for(i=1;i<count-1;i++) print a[i]}' your_file
To print all lines but first and last ones you can use this awk as well:
awk 'NR==1 {next} {if (f) print f; f=$0}'
This always prints the previous line. To prevent the first one from being printed, we skip the line when NR is 1. Then, the last one won't be printed because when reading it we are printing the penultimate!
Test
$ seq 10 | awk 'NR==1 {next} {if (f) print f; f=$0}'
2
3
4
5
6
7
8
9

Count occurrences of character per line/field on Unix

Given a file with data like this (ie stores.dat file)
sid|storeNo|latitude|longitude
2tt|1|-28.0372000t0|153.42921670
9|2t|-33tt.85t09t0000|15t1.03274200
What is the command that would return the number of occurrences of the 't' character per line?
eg. would return:
count lineNum
4 1
3 2
6 3
Also, to do it by count of occurrences by field what is the command to return the following results?
eg. input of column 2 and character 't'
count lineNum
1 1
0 2
1 3
eg. input of column 3 and character 't'
count lineNum
2 1
1 2
4 3
To count occurrence of a character per line you can do:
awk -F'|' 'BEGIN{print "count", "lineNum"}{print gsub(/t/,"") "\t" NR}' file
count lineNum
4 1
3 2
6 3
To count occurrence of a character per field/column you can do:
column 2:
awk -F'|' -v fld=2 'BEGIN{print "count", "lineNum"}{print gsub(/t/,"",$fld) "\t" NR}' file
count lineNum
1 1
0 2
1 3
column 3:
awk -F'|' -v fld=3 'BEGIN{print "count", "lineNum"}{print gsub(/t/,"",$fld) "\t" NR}' file
count lineNum
2 1
1 2
4 3
gsub() function's return value is number of substitution made. So we use that to print the number.
NR holds the line number so we use it to print the line number.
For printing occurrences of particular field, we create a variable fld and put the field number we wish to extract counts from.
grep -n -o "t" stores.dat | sort -n | uniq -c | cut -d : -f 1
gives almost exactly the output you want:
4 1
3 2
6 3
Thanks to #raghav-bhushan for the grep -o hint, what a useful flag. The -n flag includes the line number as well.
To count occurences of a character per line:
$ awk -F 't' '{print NF-1, NR}' input.txt
4 1
3 2
6 3
this sets field separator to the character that needs to be counted, then uses the fact that number of fields is one greater than number of separators.
To count occurences in a particular column cut out that column first:
$ cut -d '|' -f 2 input.txt | awk -F 't' '{print NF-1, NR}'
1 1
0 2
1 3
$ cut -d '|' -f 3 input.txt | awk -F 't' '{print NF-1, NR}'
2 1
1 2
4 3
One possible solution using perl:
Content of script.pl:
use warnings;
use strict;
## Check arguments:
## 1.- Input file
## 2.- Char to search.
## 3.- (Optional) field to search. If blank, zero or bigger than number
## of columns, default to search char in all the line.
(#ARGV == 2 || #ARGV == 3) or die qq(Usage: perl $0 input-file char [column]\n);
my ($char,$column);
## Get values or arguments.
if ( #ARGV == 3 ) {
($char, $column) = splice #ARGV, -2;
} else {
$char = pop #ARGV;
$column = 0;
}
## Check that $char must be a non-white space character and $column
## only accept numbers.
die qq[Bad input\n] if $char !~ m/^\S$/ or $column !~ m/^\d+$/;
print qq[count\tlineNum\n];
while ( <> ) {
## Remove last '\n'
chomp;
## Get fields.
my #f = split /\|/;
## If column is a valid one, select it to the search.
if ( $column > 0 and $column <= scalar #f ) {
$_ = $f[ $column - 1];
}
## Count.
my $count = eval qq[tr/$char/$char/];
## Print result.
printf qq[%d\t%d\n], $count, $.;
}
The script accepts three parameters:
Input file
Char to search
Column to search: If column is a bad digit, it searchs all the line.
Running the script without arguments:
perl script.pl
Usage: perl script.pl input-file char [column]
With arguments and its output:
Here 0 is a bad column, it searches all the line.
perl script.pl stores.dat 't' 0
count lineNum
4 1
3 2
6 3
Here it searches in column 1.
perl script.pl stores.dat 't' 1
count lineNum
0 1
2 2
0 3
Here it searches in column 3.
perl script.pl stores.dat 't' 3
count lineNum
2 1
1 2
4 3
th is not a char.
perl script.pl stores.dat 'th' 3
Bad input
No need for awk or perl, only with bash and standard Unix utilities:
cat file | tr -c -d "t\n" | cat -n |
{ echo "count lineNum"
while read num data; do
test ${#data} -gt 0 && printf "%4d %5d\n" ${#data} $num
done; }
And for a particular column:
cut -d "|" -f 2 file | tr -c -d "t\n" | cat -n |
{ echo -e "count lineNum"
while read num data; do
test ${#data} -gt 0 && printf "%4d %5d\n" ${#data} $num
done; }
And we can even avoid tr and the cats:
echo "count lineNum"
num=1
while read data; do
new_data=${data//t/}
count=$((${#data}-${#new_data}))
test $count -gt 0 && printf "%4d %5d\n" $count $num
num=$(($num+1))
done < file
and event the cut:
echo "count lineNum"
num=1; OLF_IFS=$IFS; IFS="|"
while read -a array_data; do
data=${array_data[1]}
new_data=${data//t/}
count=$((${#data}-${#new_data}))
test $count -gt 0 && printf "%4d %5d\n" $count $num
num=$(($num+1))
done < file
IFS=$OLF_IFS
awk '{gsub("[^t]",""); print length($0),NR;}' stores.dat
The call to gsub() deletes everything in the line that is not a t, then just print the length of what remains, and the current line number.
Want to do it just for column 2?
awk 'BEGIN{FS="|"} {gsub("[^t]","",$2); print NR,length($2);}' stores.dat
You could also split the line or field with "t" and check the length of the resulting array - 1. Set the col variable to 0 for the line or 1 through 3 for columns:
awk -F'|' -v col=0 -v OFS=$'\t' 'BEGIN {
print "count", "lineNum"
}{
split($col, a, "t"); print length(a) - 1, NR
}
' stores.dat
$ cat -n test.txt
1 test 1
2 you want
3 void
4 you don't want
5 ttttttttttt
6 t t t t t t
$ awk '{n=split($0,c,"t")-1;if (n!=0) print n,NR}' test.txt
2 1
1 2
2 4
11 5
6 6
cat stores.dat | awk 'BEGIN {FS = "|"}; {print $1}' | awk 'BEGIN {FS = "\t"}; {print NF}'
Where $1 would be a column number you want to count.
perl -e 'while(<>) { $count = tr/t//; print "$count ".++$x."\n"; }' stores.dat
Another perl answer yay! The tr/t// function returns the count of the number of times the translation occurred on that line, in other words the number of times tr found the character 't'. ++$x maintains the line number count.

Resources