I would like to count the occurence of strings from certain file using pipelines, without awk and sed command.
my_file content:
ls -al
bash
cat datoteka.txt
cat d.txt | sort | less
/bin/bash
terminal command that I use:
cat $my_file | cut -d ' ' -f 1 | tr '|' '\n' | xargs -r -L1 basename | sort | uniq -c | xargs -r -L1 sh -c 'echo $1 $0'
desired output:
bash 2
cat 2
less 1
ls 1
sort 1
In my case, I get:
bash 2
cat 2
ls 1
_sort 1 (not counted )
_less 1 (not counted )
Sort and less comand are not counted because of the whitespace (I marked with _ ) infront of those two strings. How shall I improve my code, to remove this blank space before "sort" and "less"? Thanks in advance!
Update: Here is a second and longer example of an input file:
nl /etc/passwd
seq 1 10 | tr "\n" ","
seq 1 10 | tr -d 13579 | tr -s "\n "
seq 1 100 | split -d -a 2 -l10 - blabla-
uname -a | cut -d" " -f1,3
cut -d: -f1 /etc/passwd > fst
cut -d: -f3 /etc/passwd > scnd
ps -e | column
echo -n ABC | wc -m -c
cmp -s dat1.txt dat1.txt ; echo $?
diff dat1 dat2
ps -e | grep firefox
echo dat1 dat2 dat3 | tr " " "\n" | xargs -I {} -p ln -s {}
The problem with the code in the question, as you were aware, was with the cut statement. This replaces cut with a shell while loop that also includes the basename command:
$ tr '|' '\n' <my_file | while read cmd other; do basename "$cmd"; done | sort | uniq -c | xargs -r -L1 sh -c 'echo $1 $0'
bash 2
cat 2
less 1
ls 1
sort 1
Alternate Sorting
The above sorts the results alphabetically by the name of the command. If instead we want to sort in descending numerical order of number of occurrences, then:
tr '|' '\n' <file2 | while read cmd other; do basename "$cmd"; done | sort | uniq -c | xargs -r -L1 sh -c 'echo $1 $0' | sort -snrk2
Applying this command to the second input example in the question:
$ tr '|' '\n' <file2 | while read cmd other; do basename "$cmd"; done | sort | uniq -c | xargs -r -L1 sh -c 'echo $1 $0' | sort -snrk2
tr 4
cut 3
seq 3
echo 2
ps 2
cmp 1
column 1
diff 1
grep 1
nl 1
split 1
uname 1
wc 1
xargs 1
while IFS='|' read -ra commands; do
for cmd in "${commands[#]}"; do
set -- $cmd # unquoted to discard irrelevant whitespace
basename $1
done
done < myfile |
sort |
uniq -c |
while read num cmd; do
echo "$cmd $num"
done
bash 2
cat 2
less 1
ls 1
sort 1
Related
tr -c '[:alnum:]' '[\n*]' < 4300-0.txt | sort | uniq -c | sort -nr | head
The following command retrieves unique words along with the count. I'd like to retrieve punctuation marks along with the unique word counts.
What is the way to achieve this?
You could split your input with tee and extract punctuations and alnum separately.
echo "Helo, world!" |
{
tee >(tr -c '[:alnum:]' '\n' >&3) |
tr -c '[:punct:]' '\n'
} 3>&1 |
sed '/^$/d' |
sort | uniq -c | sort -nr | head
should output:
1 world
1 Helo
1 !
1 ,
A short sed script also seems to work:
echo "Helo, world!
OK!" |
sed '
s/\([[:alnum:]]\+\)\([^[:alnum:]]\)/\1\n\2/g
s/\([[:punct:]]\+\)\([^[:punct:]]\)/\1\n\2/g
s/[^[:punct:][:alnum:]]/\n/g
' |
sed '/^$/d' |
sort | uniq -c | sort -nr | head
should output:
2 !
1 world
1 OK
1 Helo
1 ,
You can use [:punct:] to retrieve the punctuation marks
And you can run:
tr -c '[:alnum:][:punct:]' '[\n*]' < 4300-0.txt | sort | uniq -c | sort -nr | head
it will print out the punctuation marks as well.
For example:
if you have in your txt file
aaa,
aaa
the output will be:
1 aaa
1 aaa,
I have logs in redmine, about users, that have connected from ip of my server. I can do this via command:
tail -n 100000 /usr/share/redmine/log/production.log | grep -A 3 "192.168.110.50" | grep "Current user" | awk '{print $3}' | head -n 1 | tail -n 1
I need to write this parameter in variable.
No problem:
temp=$(tail -n 100000 /usr/share/redmine/log/production.log | grep -A 3 "192.168.110.50" | grep "Current user" | awk '{print $3}' | head -n 1 | tail -n 1)
It works. But it can return user name anonymous. To return other user name i should write head -n 2. If it still anonymous, i can change in my formula to head -n 3.
So ... of course i can do this work
#!/bin/bash
temp=$(tail -n 100000 /usr/share/redmine/log/production.log | grep -A 3 "192.168.110.50" | grep "Current user" | awk '{print $3}' | head -n 1 | tail -n 1)
if [[ $temp == "anonymous" ]]; then
temp=$(tail -n 100000 /usr/share/redmine/log/production.log | grep -A 3 "192.168.110.50" | grep "Current user" | awk '{print $3}' | head -n 2 | tail -n 1)
fi
But it will work for one iteration. I tried to:
while [ $tmp != "anonymous" ]; do
temp=$(tail -n 100000 /usr/share/redmine/log/production.log | grep -A 3 "192.168.110.50" | grep "Current user" | awk '{print $3}' | head -n ((var=var+1)) | tail -n 1)
done
But of course it does not work. I can't understand just logically, how can i do this ? Can you help me ? Thanks for your attention.
The immediate problem is that you're setting the variable temp, but checking tmp. You also need double-quotes around the variable in case it's blank, a wildcard, or something else troublesome.
But there's an easier way, because awk can handle most of the work by itself:
tail -n 100000 /usr/share/redmine/log/production.log |
grep -A 3 "192.168.110.50" |
awk '/Current user/ {if ($3 != "anonymous") {print $3; exit}}'
It'd actually be possible to have awk handle looking at the lines after "192.168.110.50" too, but that'd be a little more complicated. BTW, since I don't have any redmine logs (let alone yours) to test with, this has not been properly tested.
You can also use grep -v :
temp=$(tail -n 100000 /usr/share/redmine/log/production.log | grep -A 3 "192.168.110.50" | grep "Current user" | grep -v "anonymous" | awk '{print $3}' | head -n 1 )
Note you don't need final tail -n 1 after head -n 1
I am trying to find the highest number inside a string from a file. For example, in the file password.txt we have:
jaime:45:/home/jaime:/bin/bash
sofia:113:/home/sofia:/bin/bash
marta:2015:/home/marta:/bin/bash
pedro:2024:/home/pedro:/bin/bash
So the highest number should be 2024 and we have to save it into a variable:-
number=2024
I've tried several things with grep, awk, sed or even with sort, but without any solution.
I suggest:
number=$(cut -d: -f 2 file | sort -n | tail -n 1)
Awk to the rescue!
awk -F":" 'BEGIN{max=0}{if(($2)>max) max=$2}END {print max}' file
2024
To save it in a variable,
max="$( awk -F":" 'BEGIN{max=0}{if(($2)>max) max=$2}END {print max}' file)"
printf "%d\n" "$max"
2024
Try this:
number=$(grep -o '[0-9]*' password.txt | sort -nr | head -1)
#Thotensar: If your Input_file is same as shown as sample Input, then following may help you in same.
awk -F":" '{Q=Q>$2?Q:$2} END{print Q}' Input_file
I hope this helps you.
Fahrad his answer gives me the expected result in some other context of finding the highest amount of a range of amounts, in order to use it for percentage calculation afterwards:
cat $(dirname "$0")/wordcount.txt | xargs -n 1 | sort -g | uniq -c | paste -s --delimiters=";" | tr -s ' ' 'x' | sed 's/;x/; /g' | sed 's/x/ x /g' | cut -c 4- > $(dirname "$0")/wordcount_temp.txt
RESULT=$(cat $(dirname "$0")/wordcount_temp.txt)
echo "$RESULT"
echo
MAXIMUM=$(grep -Eo '[0-9]{1,} x' $(dirname "$0")/wordcount_temp.txt | sort -nr | head -1 | sed 's/ x//g')
echo "$MAXIMUM"
Gives:
12 x 0; 14 x 1; 17 x 2; 7 x 3; 3 x 4; 8 x 5; 11 x 6; 18 x 7; 9 x 8; 13 x 9
18
Thank you.
I'm trying to run a shell script using cron every 15 & 45 mins of the hour. But for some vague reasons it always produces empty strings while executed by cron whereas in when i run using terminals ./my_script.sh it produces expected results. I read many of answers relating to this questions, but none could solve my problem.
codes:
#!/bin/bash
PATH=/bin:/home/mpadmin/bin:/opt/ant/bin:/opt/cc/bin:/opt/cvsconsole:/opt/cvsconsole:/opt/cvsconsole:/sbin:/usr/bin:/usr/lib/qt-3.3/bin:/usr/local/bin:/usr/local/sbin:/usr/sbin:/var/mp/95930/scripts:/var/mp/95930/opt/bin:/opt/prgs/dlc101c/bin:/opt/cvsconsole
tail -n 1000000 conveyor2.log | grep -P 'curingresult OK' | sed 's/FT\ /FT/g' |awk '{print $5 $13}' |sed 's/\"//g' | uniq | sort -n |uniq > /var/www/html/95910/master_list.txt
tail -n 1000000 registration.log | grep -P 'TirePresent: 8' | sed 's/GT\ /GT/g' |awk '{print $7 $15}' |sed 's/\"//g' | uniq | sort -n |uniq > /var/www/html/95910/TBM_list.txt
my cron
# run 15 and 45 mins every hour, every day
15,47 * * * * sh /var/mp/95910/log/update_master_list.sh
permissions:
all files are having read write and execute permissions for all users
Hope I have given all the relevant & necessary infos
Probably you need to change to the /var/mp/95910/log/ directory first...
#!/bin/bash
cd /var/mp/95910/log/
PATH=/bin:/home/mpadmin/bin:/opt/ant/bin:/opt/cc/bin:/opt/cvsconsole:/opt/cvsconsole:/opt/cvsconsole:/sbin:/usr/bin:/usr/lib/qt-3.3/bin:/usr/local/bin:/usr/local/sbin:/usr/sbin:/var/mp/95930/scripts:/var/mp/95930/opt/bin:/opt/prgs/dlc101c/bin:/opt/cvsconsole
tail -n 1000000 conveyor2.log | grep -P 'curingresult OK' | sed 's/FT\ /FT/g' |awk '{print $5 $13}' |sed 's/\"//g' | uniq | sort -n |uniq > /var/www/html/95910/master_list.txt
tail -n 1000000 registration.log | grep -P 'TirePresent: 8' | sed 's/GT\ /GT/g' |awk '{print $7 $15}' |sed 's/\"//g' | uniq | sort -n |uniq > /var/www/html/95910/TBM_list.txt
or specify the file paths explicitly
#!/bin/bash
PATH=/bin:/home/mpadmin/bin:/opt/ant/bin:/opt/cc/bin:/opt/cvsconsole:/opt/cvsconsole:/opt/cvsconsole:/sbin:/usr/bin:/usr/lib/qt-3.3/bin:/usr/local/bin:/usr/local/sbin:/usr/sbin:/var/mp/95930/scripts:/var/mp/95930/opt/bin:/opt/prgs/dlc101c/bin:/opt/cvsconsole
tail -n 1000000 /var/mp/95910/log/conveyor2.log | grep -P 'curingresult OK' | sed 's/FT\ /FT/g' |awk '{print $5 $13}' |sed 's/\"//g' | uniq | sort -n |uniq > /var/www/html/95910/master_list.txt
tail -n 1000000 /var/mp/95910/log/registration.log | grep -P 'TirePresent: 8' | sed 's/GT\ /GT/g' |awk '{print $7 $15}' |sed 's/\"//g' | uniq | sort -n |uniq > /var/www/html/95910/TBM_list.txt
So i've been goofing with this since last night and I can get a lot of things to happen just not what I want.
I need a code to find the file with the most lines in a directory and then print the name of the file and the number of lines that file has.
I can get the entire directory's lines to print but can't seem to narrow the field so to speak.
Any help for a fool of a learner?
wc -l $1/* 2>/dev/null
| grep -v ' total$'
| sort -n -k1
| tail -1l
After some pro help in another question, this is where I got to, but it returns them all, and doesn't print their line counts.
Following awk command should do the job for you and you can avoid all redundant piped commands:
wc -l $1/* | awk '$2 != "total"{if($1>max){max=$1;fn=$2}} END{print max, fn}'
UPDATE: To avoid last line of wc's output this might be better awk command:
wc -l $1/* | awk '{arr[cnt++]=$0} END {for (i=0; i<length(arr)-1; i++)
{split(arr[i], a, " "); if(a[1]>max) {max=a[1]; fn=a[2]}} print max, fn}'
you can try:
wc -l $1/* | grep -v total | sort -g | tail -1
actually to avoid the grep that would also remove files containing "total":
for f in $1/*; do wc -l $f; done | sort -g | tail -1
or even better, as suggested in comments:
wc -l $1/* | sort -rg | sed -n '2p'
you can even make it a function:
function get_biggest_file() {
wc -l $* | sort -rg | sed -n '2p'
}
% ls -l
... 0 Jun 12 17:33 a
... 0 Jun 12 17:33 b
... 0 Jun 12 17:33 c
... 0 Jun 12 17:33 d
... 25 Jun 12 17:33 total
% get_biggest_file ./*
5 total
EDIT2: using the function I gave, you can simply output what you need as follows:
get_biggest $1/* | awk '{print "The file \"" $2 "\" has the maximum number of lines: " $1}'
EDIT: if you tried to write the function as you've written it in the question, you should add line continuation character at the end, as follows, or your shell will think you're trying to issue 4 commands:
wc -l $1/* 2>/dev/null \
| grep -v ' total$' \
| sort -n -k1 \
| tail -1l