Searching a column in a unix file? - linux

I have the data file below:
136110828724515000007700877
137110904734015000007700877
138110911724215000007700877
127110626724515000007700871
127110626726015000007700871
131110724724515000007700871
134110814725015000007700871
134110814734015000007700871
104110122726027000001810072
107110208724527000002900000
And I want to extract value of column 3 ie values of 6787714447.
I tried by using:-
awk "print $3" <filename>
but it didn't work. What should I use instead?

It is a better job for cut:
$ cut -c 3 < file
6
7
8
7
7
1
4
4
4
7
As per man cut:
-c, --characters=LIST
select only these characters
To make them appear all in the same line, pipe tr -d '\n':
$ cut -c 3 < file | tr -d '\n'
6787714447
Or even to sed to have the new line at the end:
$ cut -c 3 < file | tr -d '\n' | sed 's/$/\n/'
6787714447
With grep:
$ grep -oP "^..\K." file
6
7
8
7
7
1
4
4
4
7
with sed:
$ sed -r 's/..(.).*/\1/' file
6
7
8
7
7
1
4
4
4
7
with awk:
$ awk '{split ($0, a, ""); print a[3]}' file
6
7
8
7
7
1
4
4
4
7

Cut is probably the simpler/cleaner option, but here two alternatives:
AWK version:
awk '{print substr($1, 3, 1) }' <filename>
Python version:
python -c 'print "\n".join(map(lambda x: x[2], open("<filename>").readlines()))'
EDIT: Please see 1_CR's comments and disregard this option in favour of his.

Related

How to print lines between 2 values using tail & head and pipe?

For example:how can I print specific lines of a .txt file between line 5 and line 8 using only tail and head
Copied from here
infile.txt contains a numerical value on each line.
➜ X=3
➜ Y=10
➜ < infile.txt tail -n +"$X" | head -n "$((Y - X))"
3
4
5
6
7
8
9
➜

How do I turn a text file with a single column into a matrix?

I have a text file that has a single column of numbers, like this:
1
2
3
4
5
6
I want to convert it into two columns, in the left to right order this way:
1 2
3 4
5 6
I can do it with:
awk '{print>"line-"NR%2}' file
paste line-0 line-1 >newfile
But I think the reliance on two intermediate files will make it fragile in a script.
I'd like to use something like cat file | mystery-zip-command >newfile
You can use paste to do this:
paste -d " " - - < file > newfile
You can also use pr:
pr -ats" " -2 file > newfile
-a - use round robin order
-t - suppress header and trailer
-s " " - use single space as the delimiter
-2 - two column output
See also:
Convert a text file into columns
another alternative
$ seq 6 | xargs -n2
1 2
3 4
5 6
or with awk
$ seq 6 | awk '{ORS=NR%2?FS:RS}1'
1 2
3 4
5 6
if you want the output terminate with a new line in case of odd number of input lines..
$ seq 7 | awk '{ORS=NR%2?FS:RS}1; END{ORS=NR%2?RS:FS; print ""}'
1 2
3 4
5 6
7
awk 'NR % 2 == 1 { printf("%s", $1) }
NR % 2 == 0 { printf(" %s\n", $1) }
END { if (NR % 2 == 1) print "" }' file
The odd lines are printed with no newline after them, to print the first column. The even lines are printed with a space first and a newline after, to print the second column. At the end, if there were an odd number of lines, we print a newline so we don't end in the middle of the line.
With bash:
while IFS= read -r odd; do IFS= read -r even; echo "$odd $even"; done < file
Output:
1 2
3 4
5 6
$ seq 6 | awk '{ORS=(NR%2?FS:RS); print} END{if (ORS==FS) printf RS}'
1 2
3 4
5 6
$
$ seq 7 | awk '{ORS=(NR%2?FS:RS); print} END{if (ORS==FS) printf RS}'
1 2
3 4
5 6
7
$
Note that it always adds a terminating newline - that is important as future commands might depend on it, e.g.:
$ seq 6 | awk '{ORS=(NR%2?FS:RS); print}' | wc -l
3
$ seq 7 | awk '{ORS=(NR%2?FS:RS); print}' | wc -l
3
$ seq 7 | awk '{ORS=(NR%2?FS:RS); print} END{if (ORS==FS) printf RS}' | wc -l
4
Just change the single occurrence of 2 to 3 or however many columns you want if your requirements change:
$ seq 6 | awk '{ORS=(NR%3?FS:RS); print} END{if (ORS==FS) printf RS}'
1 2 3
4 5 6
$ seq 7 | awk '{ORS=(NR%3?FS:RS); print} END{if (ORS==FS) printf RS}'
1 2 3
4 5 6
7
$ seq 8 | awk '{ORS=(NR%3?FS:RS); print} END{if (ORS==FS) printf RS}'
1 2 3
4 5 6
7 8
$ seq 9 | awk '{ORS=(NR%3?FS:RS); print} END{if (ORS==FS) printf RS}'
1 2 3
4 5 6
7 8 9
$
Short awk approach:
awk '{print ( ((getline nl) > 0)? $0" "nl : $0 )}' file
The output:
1 2
3 4
5 6
(getline nl)>0 - getline will get the next record and assign it to variable nl. The getline command returns 1 if it finds a record and 0 if it encounters the end of the file
Short GNU sed approach:
sed 'N;s/\n/ /' file
N - add a newline to the pattern space, then append the next line of input to the pattern space
s/\n/ / - replace newline with whitespace within captured pattern space
seq 6 | tr '\n' ' ' | sed -r 's/([^ ]* [^ ]* )/\1\n/g'

How sort specific column in bash (linux) without modifying other columns and without creating a temporary file?

Input file
0 1.0069770730517629 A
1 1.0068122761874614 A
2 1.0004297763706849 B
3 1.0069220626905635 C
4 1.0079998216945956 C
5 1.0006092898635817 D
6 1.0071274842017928 A
7 1.0083750686808803 A
8 1.0006868227863552 B
9 1.0073693844413083 C
10 1.0086546525825624 C
11 1.0007234442925264 D
Expected output:
0 1.0086546525825624 A
1 1.0083750686808803 A
2 1.0079998216945956 B
3 1.0073693844413083 C
4 1.0071274842017928 C
5 1.0069770730517629 D
6 1.0069220626905635 A
7 1.0068122761874614 A
8 1.0007234442925264 B
9 1.0006868227863552 C
10 1.0006092898635817 C
11 1.0004297763706849 D
My solution using a temporal file
awk '{print $2}' input.txt | sort -gr > temp.txt
paste input.txt temp.txt | awk '{print $1,$4,$3}'
rm temp.txt
Question
Is posible sort a specific column in bash (linux) without modifying other columns and without creating a temporary file?
You can use - as a filename argument to paste to tell it to use standard input.
cut -d' ' -f2 input.txt | sort -gr | paste input.txt - | cut -d' ' -f1,4,3
And if it didn't support this, you could use process substitution.
paste input.txt <(cut -d' ' -f2 input.txt | sort -gr) | cut -d' ' -f1,4,3
awk to the rescue
awk '{c1[NR]=$1; c2[NR]=$2; c3[NR]=$3} END {asort(c2); for(i=1;i<=NR;i++) print c1[i],c2[NR+1-i],c3[i]}'

Sort the tab-delimited numbers on each line of a file

I'm trying to sort the numbers on each line of a file individually. The numbers within one line are separated by tabs. (I used spaces but they're actually tabs.)
For example, for the following input
5 8 7 6
1 5 6 8
8 9 7 1
the desired output would be:
5 6 7 8
1 5 6 7
1 7 8 9
My attempt so far is:
let i=1
while read line
do
echo "$line" | tr " " "\n" | sort -g
cut -f $i fileName | paste -s >> tempFile$$
((++i))
done < fileName
This is the best I got - I'm sure it can be done in 6 characters with awk/sed/perl:
while read line
do
echo $(printf "%d\n" $line | sort -n) | tr ' ' \\t >> another-file.txt
done < my-input-file.txt
Using a few features that are specific to GNU awk:
$ awk 'BEGIN{ PROCINFO["sorted_in"] = "#ind_num_asc" }
{ delete(a); n = 0; for (i=1;i<=NF;++i) a[$i];
for (i in a) printf "%s%s", i, (++n<NF?FS:RS) }' file
5 6 7 8
1 5 6 8
1 7 8 9
Each field is set as a key in the array a. In GNU awk it is possible to specify the order in which the for (i in a) loop traverses the array - here, I've set it to do so in ascending numerical order.
Here is a bash script that can do it. It takes a filename argument or reads stdin, was tested on CentOS and assumes IFS=$' \t\n'.
#!/bin/bash
if [ "$1" ] ; then exec < "$1" ; fi
cat - | while read line
do
set $line
echo $(for var in "$#"; do echo $var; done | sort -n) | tr " " "\t"
done
If you want to put the output in another file run it as:
cat input_file | sorting_script > another_file
or
sorting_script input_file > another file
Consider using perl for this:
perl -ape '#F=sort #F;$_="#F\n"' input.txt
Here -a turns on automatic field splitting (like awk does) into the array #F, -p makes it execute the script for each line and print $_ each time, and -e specifies the script directly on the command line.
Not quite 6 characters, I'm afraid, Sean.
This should have been simple in awk, but it doen't quite have the features needed. If there had been an array $# corresponding to the fields $1, $2, etc., then the solution would have been awk '{asort $#}' input.txt, but sadly no such array exits. The loops required to move the fields into an array and out of it again make it longer than the bash version:
awk '{for(i=1;i<=NF;i++)a[i]=$i;asort(a);for(i=1;i<=NF;i++)printf("%s ",a[i]);printf("\n")}' input.txt
So awk isn't the right tool for the job here. It's also a bit odd that sort itself doesn't have a switch to control its sorting direction.
Using awk
$ cat file
5 8 7 6
1 5 6 8
8 9 7 1
$ awk '{c=1;while(c!=""){c=""; for(i=1;i<NF;i++){n=i+1; if($i>$n){c=$i;$i=$n;$n=c}}}}1' file
5 6 7 8
1 5 6 8
1 7 8 9
Better Readable version
awk '{
c=1
while(c!="")
{
c=""
for(i=1;i<NF;i++)
{
n=i+1
if($i>$n)
{
c=$i
$i=$n
$n=c
}
}
}
}1
' file
If you have ksh, you may try this
#!/usr/bin/env ksh
while read line ; do
set -s +A cols $line
echo ${cols[*]}
done < "input_file"
Test
[akshay#localhost tmp]$ cat test.ksh
#!/usr/bin/env ksh
cat <<EOF | while read line ; do set -s +A cols $line; echo ${cols[*]};done
5 8 7 6
1 5 6 8
8 9 7 1
EOF
[akshay#localhost tmp]$ ksh test.ksh
5 6 7 8
1 5 6 8
1 7 8 9

How to read n-th line from a text file in bash?

Say I have a text file called "demo.txt" who looks like this:
1 2 3 4
5 6 7 8
9 10 11 12
13 14 15 16
Now I want to read a certain line, say line 2, with a command which will look something like this:
Line2 = read 2 "demo.txt"
So when I'll print it:
echo "$Line2"
I'll get:
5 6 7 8
I know how to use 'sed' command in order to print a n-th line from a file, but not how to read it. I also know the 'read' command but dont know how to use it in order a certain line.
Thanks in advance for the help.
Using head and tail
$ head -2 inputFile | tail -1
5 6 7 8
OR
a generalized version
$ line=2
$ head -"$line" input | tail -1
5 6 7 8
Using sed
$ sed -n '2 p' input
5 6 7 8
$ sed -n "$line p" input
5 6 7 8
What it does?
-n suppresses normal printing of pattern space.
'2 p' specifies the line number, 2 or ($line for more general), p commands to print the current patternspace
input input file
Edit
To get the output to some variable use some command substitution techniques.
$ content=`sed -n "$line p" input`
$ echo $content
5 6 7 8
OR
$ content=$(sed -n "$line p" input)
$ echo $content
5 6 7 8
To obtain the output to a bash array
$ content= ( $(sed -n "$line p" input) )
$ echo ${content[0]}
5
$ echo ${content[1]}
6
Using awk
Perhaps an awk solution might look like
$ awk -v line=$line 'NR==line' input
5 6 7 8
Thanks to Fredrik Pihl for the suggestion.
Perl has convenient support for this, too, and it's actually the most intuitive!
The flip-flop operator can be used with line numbers:
$ printf "0\n1\n2\n3\n4" | perl -ne 'printf if 2 .. 4'
1
2
3
Note that it's 1-based.
You can also mix regular expressions:
$ printf "0\n1\nfoo\n3\n4" | perl -ne 'printf if /foo/ .. -1'
foo
3
4
(-1 refers to the last line)

Resources