Linux-About sorting shell output - linux

I have output from a customised log file like this:
8 24 yum
8 24 yum
8 24 make
8 24 make
8 24 cd
8 24 cd
8 25 make
8 25 make
8 25 make
8 26 yum
8 26 yum
8 26 make
8 27 yum
8 27 install
8 28 ./linux
8 28 yum
I'd like to know if there's anyway to count the number of specific values of the third field. For example I may want to count the number of cd,yum and install only.

You can use awk to do get the third field values and wc -l to count the number.
awk '$3=="cd"||$3=="yum"||$3=="install"||$3=="cat" {print $0}' file | wc -l
You can also use egrep, but this will look for these words not only on the third field, but everywhere else in the line.
egrep "(cd|yum|install|cat)" file | wc -l
if you want to count a specific word on the third field, then you can do the above without multiple regexs.
awk '$3=="cd" {print $0}' | wc -l

A classic shell script to do the job is:
awk '{print $3}' "$file" | sort | uniq -c | sort -n
Extract values from column 3 with awk, sort the identical names together, count the repeats, sort the output in increasing order of count. The sort | uniq -c | sort -n part is a common meme.
If you're using GNU awk, you can do it all in the awk script; it might be more efficient, but for really humungous files, it can run out of memory where the pipeline doesn't (sort spills to disk when necessary; writing code to spill to disk in awk is not sensible).

Use cut, sort and uniq:
$ cut -d" " -f3 inputfile | sort | uniq -c
2 cd
1 install
1 ./linux
6 make
6 yum

For your input this
awk '{++a[$3]}END{for(i in a)print i "\t" a[i];}' file
Would print:
cd 2
install 1
./linux 1
make 6
yum 6

Using awk to count the occurrences of field three and sort to order the output:
$ awk '{a[$3]++}END{for(k in a)print a[k],k}' file | sort -n
1 install
1 ./linux
2 cd
6 make
6 yum
So filter by command:
$ awk '/cd|yum|install/{a[$3]++}END{for(k in a)print a[k],k}' file | sort -n
1 install
2 cd
6 yum
To stop partial matches such as grep in egrep use word boundaries \< and \> so the filter would be /\<cd\>|\<yum\>|\<install\>/

You can use grep to filter by multiple terms at the same time:
cut -f3 -d' ' file | grep -x -e yum -e make -e install | sort | uniq -c
Explanation:
The -x flag is to match only the lines that match exactly, as if with ^pattern$
The cut extracts the 3rd column only
We sort, uniq with count in the end for efficiency, after all junk is removed from the input

i guess u want to count the values of yum install & cd separately. if so, u shud go for 3 separate awk statements: awk '$3=="cd" {print $0}' file | wc -l
awk '$3=="yum" {print $0}' file | wc -l
awk '$3=="install" {print $0}' file | wc -l

Related

How would I disable accounts that have been inactive for 90 days in Linux?

Working on a script that disables accounts that have been inactive for 90 days. Couldn't really find an answer after researching my problem for a few days, but I did find this command on a forum:
lastlog -t 10000 > temp1; lastlog -t 90 > temp2; diff temp1 temp2; rm temp1; rm temp2
This command outputs the users that have been inactive for 90 days. I think the solution to my problem would be to:
Filter the output of this command so only the usernames are displayed (in a list, with 1 username per line).
Take this output and write it to a text file.
Run a for-loop that for each line in the file, the contents of the line (which should be just a single username) are stored in a variable called "inactiveUser". Then the command usermod -L $inactiveUser will be executed.
Would my proposed solution work? If so, how could it be achieved? Is there a much easier method to lock inactive accounts that I am not aware of?
you can simplify this with:
lastlog -b 90
which directly lists users who have not logged in in the past 90 days.
however, it also has a header row, and lists lots of system users.
use tail to skip the header row:
lastlog -b 90 | tail -n+2
then you could use grep to filter out system users:
lastlog -b 90 | tail -n+2 | grep -v 'Never log'
although perhaps there is a safer way to find real, non-system users, e.g.:
cd /home; find * -maxdepth 0 -type d
that issue aside, you can get just the usernames out with awk:
lastlog -b 90 | tail -n+2 | grep -v 'Never log' | awk '{print $1}'
then either output the list to a file, or else directly run usermod via while read loop or xargs:
lastlog -b 90 | tail -n+2 | grep -v 'Never log' | awk '{print $1}' |
xargs -I{} usermod -L {}
perhaps you should also log what you've done:
lastlog -b 90 | tail -n+2 | grep -v 'Never log' | awk '{print $1}' |
tee -a ~/usermod-L.log | xargs -I{} usermod -L {}
While the other answer works, it can be made much cleaner by using awk instead of tail | grep | awk
lastlog -b 90 | awk '!/Never log/ {if (NR > 1) print $1}' | xargs -I{} usermod -L {}
The awk command checkes for lines that don't have the expression 'Never log' in it (!/Never log/).
NR > 1 emulates tail -n +2.
print $1 prints the first column.

Validating file records shell script

I have a file with content as follows and want to validate the content as
1.I have entries of rec$NUM and this field should be repeated 7 times only.
for example I have rec1.any_attribute this rec1 should come only 7 times in whole file.
2.I need validating script for this.
If records for rec$NUM are less than 7 or Greater than 7 script should report that record.
FILE IS AS FOLLOWS :::
rec1:sourcefile.name=
rec1:mapfile.name=
rec1:outputfile.name=
rec1:logfile.name=
rec1:sourcefile.nodename_col=
rec1:sourcefle.snmpnode_col=
rec1:mapfile.enc=
rec2:sourcefile.name=abc
rec2:mapfile.name=
rec2:outputfile.name=
rec2:logfile.name=
rec2:sourcefile.nodename_col=
rec2:sourcefle.snmpnode_col=
rec2:mapfile.enc=
rec3:sourcefile.name=abc
rec3:mapfile.name=
rec3:outputfile.name=
rec3:logfile.name=
rec3:sourcefile.nodename_col=
rec3:sourcefle.snmpnode_col=
rec3:mapfile.enc=
Please Help
Thanks in Advance... :)
Simple awk:
awk -F: '/^rec/{a[$1]++}END{for(t in a){if(a[t]!=7){print "Some error for record: " t}}}' test.rc
grep '^rec1' file.txt | wc -l
grep '^rec2' file.txt | wc -l
grep '^rec3' file.txt | wc -l
All above should return 7.
The commands:
grep rec file2.txt | cut -d':' -f1 | uniq -c | egrep -v '^ *7'
will success if file follows your rules, fails (and returns the failing record) if it doesn't.
(replace "uniq -c" by "sort -u" if record numbers can be mixed).

history and cut command: get the second field

I am trying to get the commands from the history command.
ubuntu#ip-172-31-13-192:~/redacted$ history
1 ls
2 sudo apt-get install git -y
3 git clone https://redacted#bitbucket.org/redacted/redacted.git
4 ls
5 cd redacted
ubuntu#ip-172-31-13-192:~/redacted$ history | cut -d ' ' -f 2
No output. What's wrong?
There are also spaces in the beginning of each row, so column 2 is most likely just another space. Since history's format is fixed, you could base your cut on the number of characters, like so:
[mureinik#computer /]$ history | cut -c8-
Through sed,
history | sed 's/^ *[^ ]* *//'
It removes all the leading spaces along with the numbers.
This is because cut gets a space as field separator, defining each one of them a different field.
So whenever you have history as this:
1 ls
2 sudo apt-get install git -y
3 git clone https://redacted#bitbucket.org/redacted/redacted.git
4 ls
5 cd redacted
^
what you get
When you do cut -d' ' -f2 you get the space just after each number.
How can you solve it?
squeeze the spaces with tr:
history | tr -s ' ' | cut -d' ' -f2
use awk to print the second field. For awk, many fields do not count, so that the following will always print the second block of text:
history | awk '{print $2}'

How to return substring from a linux command

I'm connecting to an exadata and want to get information about "ORACLE_HOME" variable inside them. So i'm using this command:
ls -l /proc/<pid>/cwd
this is the output:
2 oracle oinstall 0 Jan 23 21:20 /proc/<pid>/cwd -> /u01/app/database/11.2.0/dbs/
i need the get the last part :
/u01/app/database/11.2.0 (i dont want the "/dbs/" there)
i will be using this command several times in different machines. So how can i get this substring from whole output?
Awk and grep are good for these types of issues.
New:
ls -l /proc/<pid>/cwd | awk '{print ($NF) }' | sed 's#/dbs/##'
Old:
ls -l /proc/<pid>/cwd | awk '{print ($NF) }' | egrep -o '^.+[.0-9]'
Awk prints the last column of the input which is your ls command and then grep grabs the beginning of that string up the last occurrence of numbers and dots. This is a situational solution and perhaps not the best.
Parsing the output of ls is generally considered sub-optimal. I would use something more like this instead:
dirname $(readlink -f /proc/<pid>/cwd)

Sorting in bash

I have been trying to get the unique values in each column of a tab delimited file in bash. So, I used the following command.
cut -f <column_number> <filename> | sort | uniq -c
It works fine and I can get the unique values in a column and its count like
105 Linux
55 MacOS
500 Windows
What I want to do is instead of sorting by the column value names (which in this example are OS names) I want to sort them by count and possibly have the count in the second column in this output format. So It will have to look like:
Windows 500
MacOS 105
Linux 55
How do I do this?
Use:
cut -f <col_num> <filename>
| sort
| uniq -c
| sort -r -k1 -n
| awk '{print $2" "$1}'
The sort -r -k1 -n sorts in reverse order, using the first field as a numeric value. The awk simply reverses the order of the columns. You can test the added pipeline commands thus (with nicer formatting):
pax> echo '105 Linux
55 MacOS
500 Windows' | sort -r -k1 -n | awk '{printf "%-10s %5d\n",$2,$1}'
Windows 500
Linux 105
MacOS 55
Mine:
cut -f <column_number> <filename> | sort | uniq -c | awk '{ print $2" "$1}' | sort
This will alter the column order (awk) and then just sort the output.
Hope this will help you
Using sed based on Tagged RE:
cut -f <column_number> <filename> | sort | uniq -c | sort -r -k1 -n | sed 's/\([0-9]*\)[ ]*\(.*\)/\2 \1/'
Doesn't produce output in a neat format though.

Resources