How to sort data while printing line by line in Bash - linux

I'm writing a shell script traversing a list of directories and counting words from files inside them. The code prints data each time I read a file. So the output is not sorted. How can I sort it?
The output right now is like this:
cat 5
door 1
bird 3
dog 1
and I want to sort it first by second column and then by first column:
dog 1
door 1
bird 3
cat 5

You can pipe your shell script to:
sort -n -k2 -k1
With -n you specify numeric sort and with -k2 that you want to sort first by the second field and with -k1 to sort then by first field.

First of all, I tried to reproduce what OP is doing, so after creating the files, I tried this command:
% for i in *; do echo -n "$i "; wc -w < $i; done
bird 3
cat 5
dog 1
door 1
Then I have added the sort:
% (for i in *; do echo -n "$i "; wc -w < $i; done) | sort -n -k 2 -k 1
dog 1
door 1
bird 3
cat 5

Related

Check values in file is greater or equal in bash script

I have file.txt include:
2
10
60
90
now how can i check if numbers in that file is equal on greater than 50 end then do something. Something in my case is sending an email this part i have.
I have tried do this with awk but it does not work in script.
The following command will output the greatest value of your file:
sort -nr file.txt | head -1
Then just compare it to the value of your choice and voilĂ . Something like:
if [ `sort -nr file.txt | head -1` -ge 50 ]
then
<do something>
fi
Explanation:
sort -n sorts the file as numbers (otherwise 12 would be considered greater than 100).
sort -r reverse the sort (by default it displays lower numbers first, with -r it displays higher first).
head -1 displays only the first output.
This will serve your job.
$ awk 'FNR > 0 { if($1 > 50) print $1 }' <file>

Finding the Number of strings in a File

I'm trying to write a very small program that will check the number of sub strings in a large text file. All it will do is count the first 2000 lines of the text file, find any "TTT" sub-strings, count them, and set a variable to that total. I'm a bit new to shell, so any help would be amazingly appreciated!
#!/bin/bash
$counter=(head -2000 [file name] | grep TTT | grep -o TTT | wc -l)
echo $counter
For what it's worth you might awk better suited for this task:
awk -F"ttt" '{j=(NF-1)+j}END{print j}' filename
This will split each record in your file by delimiter "ttt". Then it counts the number of fields, subtracts one, and adds that to the total.
A file like:
ttt tttttt something
1 5 ttt
tt
one more ttt record
Would be split (visualizing with pipe delim) like:
| || something
1 5 |
tt
one more | record
Counting the number of fields per record:
4
2
1
2
Subtracting one from that:
3
1
0
1
Which totals to 5, which is how many "ttt" substrings are present.
To incorporate this into your script (and fixing your other issue):
#!/bin/bash
counter=$(awk -F"ttt" '{j=(NF-1)+j}END{print j}' filename)
echo $counter
The change here is that when we set a variable in Bash we don't include the $ sign at the front. Only in referencing the variable do we include the $.
You have some minor syntax errors there, probably you meant this:
counter=$(head -2000 [file name] | grep TTT | grep -o TTT | wc -l)
echo $counter
Notice the tiny changes I made there to make it work.
Btw the grep TTT in the middle is redundant, you can simply drop it, that is:
counter=$(head -2000 [file name] | grep -o TTT | wc -l)
grep can already do what you want: counter=$(grep -c TTT $infile). You can limit the number of hits (not lines) with -m NUM, --max-count=NUM, which makes grep stop at the end of the file OR when NUM occurrences are found.

Obtaining the total of coincidences with multiple pattern using grep command

I have a file in Linux contains strings:
CALLTMA
Starting
Starting
Ending
Starting
Ending
Ending
CALLTMA
Ending
I need the quantity of any string (FE. #Ending, # Starting, #CALLTMA). In my example I need obtaining:
CALLTMA : 2
Starting: 3
Ending : 4
I can obtaining this output when I execute 3 commands:
grep -i "Starting" "/myfile.txt" | wc -l
grep -i "Ending" "/myfile.txt" | wc -l
grep -i "CALLTMA" "/myfile.txt" | wc -l
I want to know if it is possible to obtain the same output using only one command.
I try running this command
grep -iE "CALLTMA|Starting|Ending" "/myfile.txt" | wc -l
But this returned the total of coincidences. I appreciate your help .
Use sort and uniq:
sort myfile.txt | uniq -c
The -c adds the counts to the unique lines. If you want to sort the output by frequency, add
| sort -n
to the end (and change to -nr if you want the descending order).
A simple awk way to handle this:
awk '{counts[$1]++} END{for (c in counts) print c, counts[c]}' file
Starting 3
Ending 4
CALLTMA 2
grep -c will work. You can put it all together in a short script:
for i in Starting CALLTMA Ending; do
printf "%-8s : %d\n" "$i" $(grep -c "$i" file.txt)
done
(to enter the search terms as arguments, just use the arguments array for the loop list, e.g. for i in "$#"; do)
Output
Starting : 3
CALLTMA : 2
Ending : 4

combining the text files in to one text file

i have a requirement like the following.
i am using linux
i have a set of text files like text1.txt ,text2.txt, text3.txt.
now i am combining into one final text file.
text1.txt
1
NULL
NULL
4
text2.txt
1
2
NULL
4
text3.txt
a
b
c
d
i am using the following command :
paste -d ' ' text1.txt text2.txt text3.txt >> text4.txt
i am getting the :
text4.txt
1 1 a
2 b
c
4 4 d
but i want the output like the following
text4.txt
1 1 a
NULL 2 b
NULL NULL c
4 4 d
NOTE :- NULL means space
i am passing this text4 to another loop as a input so here there i am reading the variable by positionl
thanks in advance
I expect that you want TABs separating your records in file4.txt... what about this?
NLINES=$(wc -l file1.txt | awk '{print $1}')
rm -f file4.txt
for i in $(seq 1 $NLINES); do
rec1=$(sed -n "$i p" file1.txt)
rec2=$(sed -n "$i p" file2.txt)
rec3=$(sed -n "$i p" file3.txt)
echo -e "$rec1\t$rec2\t$rec3" >> file4.txt
done
But actually paste, without "-d ' '" gave the same exact result!
you can achieve same with AWK command
awk '{a[FNR]=a[FNR]$0" "}END{for(i=1;i<=length(a);i++)print a[i]}' text1.txt text2.txt text3.txt >> text4.txt

Find unique lines

How can I find the unique lines and remove all duplicates from a file?
My input file is
1
1
2
3
5
5
7
7
I would like the result to be:
2
3
sort file | uniq will not do the job. Will show all values 1 time
uniq has the option you need:
-u, --unique
only print unique lines
$ cat file.txt
1
1
2
3
5
5
7
7
$ uniq -u file.txt
2
3
Use as follows:
sort < filea | uniq > fileb
You could also print out the unique value in "file" using the cat command by piping to sort and uniq
cat file | sort | uniq -u
While sort takes O(n log(n)) time, I prefer using
awk '!seen[$0]++'
awk '!seen[$0]++' is an abbreviation for awk '!seen[$0]++ {print}', print line(=$0) if seen[$0] is not zero.
It take more space but only O(n) time.
I find this easier.
sort -u input_filename > output_filename
-u stands for unique.
you can use:
sort data.txt| uniq -u
this sort data and filter by unique values
uniq -u has been driving me crazy because it did not work.
So instead of that, if you have python (most Linux distros and servers already have it):
Assuming you have the data file in notUnique.txt
#Python
#Assuming file has data on different lines
#Otherwise fix split() accordingly.
uniqueData = []
fileData = open('notUnique.txt').read().split('\n')
for i in fileData:
if i.strip()!='':
uniqueData.append(i)
print uniqueData
###Another option (less keystrokes):
set(open('notUnique.txt').read().split('\n'))
Note that due to empty lines, the final set may contain '' or only-space strings. You can remove that later. Or just get away with copying from the terminal ;)
#
Just FYI, From the uniq Man page:
"Note: 'uniq' does not detect repeated lines unless they are adjacent. You may want to sort the input first, or use 'sort -u' without 'uniq'. Also, comparisons honor the rules specified by 'LC_COLLATE'."
One of the correct ways, to invoke with:
#
sort nonUnique.txt | uniq
Example run:
$ cat x
3
1
2
2
2
3
1
3
$ uniq x
3
1
2
3
1
3
$ uniq -u x
3
1
3
1
3
$ sort x | uniq
1
2
3
Spaces might be printed, so be prepared!
uniq -u < file will do the job.
uniq should do fine if you're file is/can be sorted, if you can't sort the file for some reason you can use awk:
awk '{a[$0]++}END{for(i in a)if(a[i]<2)print i}'
sort -d "file name" | uniq -u
this worked for me for a similar one. Use this if it is not arranged.
You can remove sort if it is arranged
This was the first i tried
skilla:~# uniq -u all.sorted
76679787
76679787
76794979
76794979
76869286
76869286
......
After doing a cat -e all.sorted
skilla:~# cat -e all.sorted
$
76679787$
76679787 $
76701427$
76701427$
76794979$
76794979 $
76869286$
76869286 $
Every second line has a trailing space :(
After removing all trailing spaces it worked!
thank you
Instead of sorting and then using uniq, you could also just use sort -u. From sort --help:
-u, --unique with -c, check for strict ordering;
without -c, output only the first of an equal run

Resources