Sort logs by date field in bash

Sort logs by date field in bash - linux

let's have
126 Mar 8 07:45:09 nod1 /sbin/ccccilio[12712]: INFO: sadasdasdas
2 Mar 9 08:16:22 nod1 /sbin/zzzzo[12712]: sadsdasdas
1 Mar 8 17:20:01 nod1 /usr/sbin/cron[1826]: asdasdas
4 Mar 9 06:24:01 nod1 /USR/SBIN/CRON[27199]: aaaasdsd
1 Mar 9 06:24:01 nod1 /USR/SBIN/CRON[27201]: aaadas
I would like to sort this output by date and time key.
Thank you very much.
Martin

For GNU sort: sort -k2M -k3n -k4
-k2M sorts by second column by month (this way "March" comes before "April")
-k3n sorts by third column in numeric mode (so that " 9" comes before "10")
-k4 sorts by the fourth column.
See more details in the manual.

little off-topic - but anyway. only useful when working within filetrees
ls -l -r --sort=time
from this you could create a one-liner which for example deletes the oldest backup in town.
ls -l -r --sort=time | grep backup | head -n1 | while read line; do oldbackup=\`echo $line | awk '{print$8}'\`; rm $oldbackup; done;

days need numeric (not lexical) sort, so it should be sort -s -k 2M -k 3n -k 4,4
See more details here.

You can use the sort command:
cat $logfile | sort -M -k 2
That means: Sort by month (-M) beginning from second column (-k 2).

Related

Bash sort files by filename containing year and abbreviated month [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
I have files for many years and all months, with example names: dir1/dir2/dir3/file_name_2017_v_2017.Jan.exp.var.txt, dir1/dir2/dir3/file_name_2017_v_2017.Feb.exp.var.txt, …, and dir1/dir2/dir3/file_name_2017_v_2017.Dec.exp.var.txt.
There is a script executing a one line command to store a list of files in an array.
ls dir1/dir2/dir3/file_name_2017_v_*.exp.var.txt
This works, however they are out of order by month. I would like them to be sorted by YYYY.MMM. I have tried various sort commands using -M to lastly sort by month, however nothing is working. What am I missing to sort these files? I prefer a one-line command to sort these.
Edit 1:
Using ls *.txt | sort --field-separator='.' -k 1,2M -r reverses the year order, and the alphabetical order of the months. Removing the -r puts the years in chronological order, however the months are in alphabetical order. This is not what I want, as I want the files in chronological order

Try this command:
ls dir1/dir2/dir3/file_name_2017_v_*.exp.var.txt | sort -t '.' -k 1.33,1.36n -k 2,2M
Or use _ as the field-separator:
ls dir1/dir2/dir3/file_name_2017_v_*.exp.var.txt | sort -t '_' -k 5.1,5.4n -k 5.6,5.8M
If years that are different before and after the v, need to add another -k:
ls dir1/dir2/dir3/file_name_*_v_*.exp.var.txt | sort -t '_' -k 3.1,3.4n -k 5.1,5.4n -k 5.6,5.8M
Example(Update):
$ mkdir -p dir1/dir2/dir3
$ touch dir1/dir2/dir3/file_name_2017_v_201{5..7}.{Jan,Feb,Mar,Apr,May,Jun,Jul,Aug,Sep,Oct,Nov,Dec}.exp.var.txt
$ ls dir1/dir2/dir3/file_name_2017_v_*.exp.var.txt | sort -t '.' -k 1.33,1.36n -k 2,2M
$ ls dir1/dir2/dir3/file_name_2017_v_*.exp.var.txt | sort -t '_' -k 5.1,5.4n -k 5.6,5.8M
dir1/dir2/dir3/file_name_2017_v_2015.Jan.exp.var.txt
dir1/dir2/dir3/file_name_2017_v_2015.Feb.exp.var.txt
dir1/dir2/dir3/file_name_2017_v_2015.Mar.exp.var.txt
dir1/dir2/dir3/file_name_2017_v_2015.Apr.exp.var.txt
dir1/dir2/dir3/file_name_2017_v_2015.May.exp.var.txt
dir1/dir2/dir3/file_name_2017_v_2015.Jun.exp.var.txt
dir1/dir2/dir3/file_name_2017_v_2015.Jul.exp.var.txt
dir1/dir2/dir3/file_name_2017_v_2015.Aug.exp.var.txt
dir1/dir2/dir3/file_name_2017_v_2015.Sep.exp.var.txt
dir1/dir2/dir3/file_name_2017_v_2015.Oct.exp.var.txt
dir1/dir2/dir3/file_name_2017_v_2015.Nov.exp.var.txt
dir1/dir2/dir3/file_name_2017_v_2015.Dec.exp.var.txt
dir1/dir2/dir3/file_name_2017_v_2016.Jan.exp.var.txt
dir1/dir2/dir3/file_name_2017_v_2016.Feb.exp.var.txt
dir1/dir2/dir3/file_name_2017_v_2016.Mar.exp.var.txt
dir1/dir2/dir3/file_name_2017_v_2016.Apr.exp.var.txt
dir1/dir2/dir3/file_name_2017_v_2016.May.exp.var.txt
dir1/dir2/dir3/file_name_2017_v_2016.Jun.exp.var.txt
dir1/dir2/dir3/file_name_2017_v_2016.Jul.exp.var.txt
dir1/dir2/dir3/file_name_2017_v_2016.Aug.exp.var.txt
dir1/dir2/dir3/file_name_2017_v_2016.Sep.exp.var.txt
dir1/dir2/dir3/file_name_2017_v_2016.Oct.exp.var.txt
dir1/dir2/dir3/file_name_2017_v_2016.Nov.exp.var.txt
dir1/dir2/dir3/file_name_2017_v_2016.Dec.exp.var.txt
dir1/dir2/dir3/file_name_2017_v_2017.Jan.exp.var.txt
dir1/dir2/dir3/file_name_2017_v_2017.Feb.exp.var.txt
dir1/dir2/dir3/file_name_2017_v_2017.Mar.exp.var.txt
dir1/dir2/dir3/file_name_2017_v_2017.Apr.exp.var.txt
dir1/dir2/dir3/file_name_2017_v_2017.May.exp.var.txt
dir1/dir2/dir3/file_name_2017_v_2017.Jun.exp.var.txt
dir1/dir2/dir3/file_name_2017_v_2017.Jul.exp.var.txt
dir1/dir2/dir3/file_name_2017_v_2017.Aug.exp.var.txt
dir1/dir2/dir3/file_name_2017_v_2017.Sep.exp.var.txt
dir1/dir2/dir3/file_name_2017_v_2017.Oct.exp.var.txt
dir1/dir2/dir3/file_name_2017_v_2017.Nov.exp.var.txt
dir1/dir2/dir3/file_name_2017_v_2017.Dec.exp.var.txt
P.S. You need to count the start and end index of the month, e.g. its 1.33,1.36n in your example.

Just for fun, this one liner will sort independently of directory names and with a bit of work, also independently of filenames.
First, add year and month as fields at beginning of record and sort by them
find dir1/ -name '*.exp.var.txt' | sed -re 's/^.*_v_(201[5-7])\.([A-Za-z]{3,3})\.exp.var.txt$/\1 \U\2 \E&/' | LC_TIME=en_US sort -k 1n -k 2M
This will return
2015 JAN dir1/dir2/dir3/file_name_2017_v_2015.Jan.exp.var.txt
2015 FEB dir1/dir2/dir3/file_name_2017_v_2015.Feb.exp.var.txt
2015 MAR dir1/dir2/dir3/file_name_2017_v_2015.Mar.exp.var.txt
2015 APR dir1/dir2/dir3/file_name_2017_v_2015.Apr.exp.var.txt
2015 MAY dir1/dir2/dir3/file_name_2017_v_2015.May.exp.var.txt
Then, just print the needed field
find dir1/ -name '*.exp.var.txt' | \
sed -re 's/^.*_v_(201[5-7])\.([A-Za-z]{3,3})\.exp.var.txt$/\1 \U\2 \E&/' | \
LC_TIME=en_US sort -k 1n -k 2M | \
gawk '{ print $3 }'
Result:
dir1/dir2/dir3/file_name_2017_v_2015.Jan.exp.var.txt
dir1/dir2/dir3/file_name_2017_v_2015.Feb.exp.var.txt
dir1/dir2/dir3/file_name_2017_v_2015.Mar.exp.var.txt
dir1/dir2/dir3/file_name_2017_v_2015.Apr.exp.var.txt
dir1/dir2/dir3/file_name_2017_v_2015.May.exp.var.txt
dir1/dir2/dir3/file_name_2017_v_2015.Jun.exp.var.txt
dir1/dir2/dir3/file_name_2017_v_2015.Jul.exp.var.txt

Linux-About sorting shell output

I have output from a customised log file like this:
8 24 yum
8 24 yum
8 24 make
8 24 make
8 24 cd
8 24 cd
8 25 make
8 25 make
8 25 make
8 26 yum
8 26 yum
8 26 make
8 27 yum
8 27 install
8 28 ./linux
8 28 yum
I'd like to know if there's anyway to count the number of specific values of the third field. For example I may want to count the number of cd,yum and install only.

You can use awk to do get the third field values and wc -l to count the number.
awk '$3=="cd"||$3=="yum"||$3=="install"||$3=="cat" {print $0}' file | wc -l
You can also use egrep, but this will look for these words not only on the third field, but everywhere else in the line.
egrep "(cd|yum|install|cat)" file | wc -l
if you want to count a specific word on the third field, then you can do the above without multiple regexs.
awk '$3=="cd" {print $0}' | wc -l

A classic shell script to do the job is:
awk '{print $3}' "$file" | sort | uniq -c | sort -n
Extract values from column 3 with awk, sort the identical names together, count the repeats, sort the output in increasing order of count. The sort | uniq -c | sort -n part is a common meme.
If you're using GNU awk, you can do it all in the awk script; it might be more efficient, but for really humungous files, it can run out of memory where the pipeline doesn't (sort spills to disk when necessary; writing code to spill to disk in awk is not sensible).

Use cut, sort and uniq:
$ cut -d" " -f3 inputfile | sort | uniq -c
2 cd
1 install
1 ./linux
6 make
6 yum

For your input this
awk '{++a[$3]}END{for(i in a)print i "\t" a[i];}' file
Would print:
cd 2
install 1
./linux 1
make 6
yum 6

Using awk to count the occurrences of field three and sort to order the output:
$ awk '{a[$3]++}END{for(k in a)print a[k],k}' file | sort -n
1 install
1 ./linux
2 cd
6 make
6 yum
So filter by command:
$ awk '/cd|yum|install/{a[$3]++}END{for(k in a)print a[k],k}' file | sort -n
1 install
2 cd
6 yum
To stop partial matches such as grep in egrep use word boundaries \< and \> so the filter would be /\<cd\>|\<yum\>|\<install\>/

You can use grep to filter by multiple terms at the same time:
cut -f3 -d' ' file | grep -x -e yum -e make -e install | sort | uniq -c
Explanation:
The -x flag is to match only the lines that match exactly, as if with ^pattern$
The cut extracts the 3rd column only
We sort, uniq with count in the end for efficiency, after all junk is removed from the input

i guess u want to count the values of yum install & cd separately. if so, u shud go for 3 separate awk statements: awk '$3=="cd" {print $0}' file | wc -l
awk '$3=="yum" {print $0}' file | wc -l
awk '$3=="install" {print $0}' file | wc -l

How can I list (ls) the 5 last modified files in a directory?

I know ls -t will list all files by modified time. But how can I limit these results to only the last n files?

Try using head or tail. If you want the 5 most-recently modified files:
ls -1t | head -5
The -1 (that's a one) says one file per line and the head says take the first 5 entries.
If you want the last 5 try
ls -1t | tail -5

The accepted answer lists only the filenames, but to get the top 5 files one can also use:
ls -lht | head -6
where:
-l outputs in a list format
-h makes output human readable (i.e. file sizes appear in kb, mb, etc.)
-t sorts output by placing most recently modified file first
head -6 will show 5 files because ls prints the block size in the first line of output.
I think this is a slightly more elegant and possibly more useful approach.
Example output:
total 26960312
-rw-r--r--# 1 user staff 1.2K 11 Jan 11:22 phone2.7.py
-rw-r--r--# 1 user staff 2.7M 10 Jan 15:26 03-cookies-1.pdf
-rw-r--r--# 1 user staff 9.2M 9 Jan 16:21 Wk1_sem.pdf
-rw-r--r--# 1 user staff 502K 8 Jan 10:20 lab-01.pdf
-rw-rw-rw-# 1 user staff 2.0M 5 Jan 22:06 0410-1.wmv

Use tail command:
ls -t | tail -n 5

By default ls -t sorts output from newest to oldest, so the combination of commands to use depends in which direction you want your output to be ordered.
For the newest 5 files ordered from newest to oldest, use head to take the first 5 lines of output:
ls -t | head -n 5
For the newest 5 files ordered from oldest to newest, use the -r switch to reverse ls's sort order, and use tail to take the last 5 lines of output:
ls -tr | tail -n 5

ls -t list files by creation time not last modified time. Use ls -ltc if you want to list files by last modified time from last to first(top to bottom). Thus to list the last n: ls -ltc | head ${n}

None of other answers worked for me. The results were both folders and files, which is not what I would expect.
The solution that worked for me was:
find . -type f -mmin -10 -ls
This lists in the current directory all the files modified in the last 10 minutes. It will not list last 5 files, but I think it might help nevertheless

if you want to watch as it process last five modified file and refresh in 2 secs and show total number of files at top use this:
watch 'ls -Art | wc -l ; ls -ltr | tail -n 5'

How to get the latest filename alone in a directory?

I am using
ls -ltr /homedir/mydirectory/work/ |tail -n 1|cut -d ' ' -f 10
But this is a very crude way of getting the desired result.And also its unreliable.
The output I get on simply executing
ls -ltr /homedir/mydirectory/work/ |tail -n 1
is
-rw-r--r-- 1 user pusers 1764 Apr 1 12:06 firstfile.xml
So here I get the file name.
But if the output on doing the above command is like
-rw-r--r-- 100 user pusers 1764 Apr 1 12:06 firstfile.xml
the first command fails ! And understandably as I am cutting the result from the 10th character which does not hold valid now.
So how to refine it.

Why do you use the -l flag for ls if you don't need it? Make ls simply output the filenames if you don't need more information instead of trying to "parse" its non-unified output (raping poor text processing utilities...).
LAST_MODIFIED_FILE=`ls -tr | tail -n 1`

If you really want to achieve this using your method, then, use awk instead of cut
ls -ltr /var/log/ |tail -n 1| awk '{print $9}'

Extended user user529758 answer which can give result as per file name
use below commnad as per the file name
ls -tr Filename* | tail -n 1

Unix command "uniq" & "sort"

As we known
uniq [options] [file1 [file2]]
It remove duplicate adjacent lines from sorted file1. The option -c prints each line once, counting instances of each. So if we have the following result:
34 Operating System
254 Data Structure
5 Crypo
21 C++
1435 C Language
589 Java 1.6
And we sort above data using "sort -1knr", the result is as below:
1435 C Language
589 Java 1.6
254 Data Structure
34 Operating System
21 C++
5 Crypo
Can anyone help me out that how to output only the book name in this order (no number)?

uniq -c filename | sort -k 1nr | awk '{$1='';print}'

You can also use sed for that, as follows:
uniq -c filename | sort -k -1nr | sed 's/[0-9]\+ \(.\+\)/\1/g'
Test:
echo "34 Data Structure" | sed 's/[0-9]\+ \(.\+\)/\1/g'
Data Structure
This can also be done with a simplified regex (courtesy William Pursell):
echo "34 Data Structure" | sed 's/[0-9]* *//'
Data Structure

Why do you use uniq -c to print the number of occurences, which you then want to remove with some cut/awk/sed dance?
Instead , you could just use
sort -u $file1 $file2 /path/to/more_files_to_glob*
Or do some systems come with a version of sort which doesn't support -u ?

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Sort logs by date field in bash - linux

For GNU sort: sort -k2M -k3n -k4 -k2M sorts by second column by month (this way "March" comes before "April") -k3n sorts by third column in numeric mode (so that " 9" comes before "10") -k4 sorts by the fourth column. See more details in the manual.

days need numeric (not lexical) sort, so it should be sort -s -k 2M -k 3n -k 4,4 See more details here.

You can use the sort command: cat $logfile | sort -M -k 2 That means: Sort by month (-M) beginning from second column (-k 2).

Related

Bash sort files by filename containing year and abbreviated month [closed]

Linux-About sorting shell output

How can I list (ls) the 5 last modified files in a directory?

How to get the latest filename alone in a directory?

Unix command "uniq" & "sort"

Categories

Resources