Linux Grouping and Counting Files by attribute

Linux Grouping and Counting Files by attribute - linux

I am trying to return a list of the months that files were created using the following code.
ls -l|awk '{A[$6":"]++}END{for (i in A){print i" "A[i]}}'
I am using the below code to validate each output.
ls -la | grep -c "Jan"
However as you can see from my output:
: 1
Jan: 19
Feb: 11
Mar: 28
Apr: 10
May: 14
Jun: 24
Jul: 4
Aug: 16
Sep: 10
Oct: 30
Nov: 4
Dec: 1
Output of ls|grep
I end up with 1 record showing no date. Also both January and December are short by 1. Can anyone assist?

You could do it this way using awk and sort
$ ls -l | awk '$6!=""{m[$6]++}END{for(i in m){printf "%s : %s%s",i,m[i],ORS }}' | sort -k1M
Jan : 7
Mar : 1
Apr : 8
Aug : 2
The problem comes with the first line of ls -l which doesn't contain a month field

Related

Grep total amount of specific elements based on date

Is there a way in linux to filter multiple files with bunch of data in one command without writing a script?
For this example I want to know how many males appear by date. Also the problem is that a specific date (January 3rd) appears in 2 seperate files:
file1
Jan 1 john male=yes
Jan 1 james male=yes
Jan 2 kate male=no
Jan 3 jonathan male=yes
file2
Jan 3 alice male=no
Jan 4 john male=yes
Jan 4 jonathan male=yes
Jan 4 alice male=no
I want the total amount of males for each date from all files. If there are no males for a specific date, no output will be given.
Jan 1 2
Jan 3 1
Jan 4 2
The only way I can think of is count the total amount of male genders given a specific date, but this would not performant as in real-world examples there could be much more files and manually entering all the dates would be a waste of time. Any help would be appreciated, thank you!
localhost:~# cat file1 file2 | grep "male=yes" | grep "Jan 1" | wc -l
2

grep -h 'male=yes' file? | \
cut -c-6 | \
awk '{c[$0] += 1} END {for(i in c){printf "%6s %4d\n", i, c[i]}}'
The grep will print the male lines, cut will remove everything but the first 6 chars (date) and awk will count every date and printout every date and the counter in the end.
Given your files the output will be:
Jan 1 2
Jan 3 1
Jan 4 2

Awk to find lines within date range in a file with custom date format

I'm trying to find all lines between a date range in a file. However dates are formatted in a non standard way. Is there a way for awk to read these? The log file is formatted like so:
Jan 5 11:34:00 log messages here
Jan 13 16:21:00 log messages here
Feb 1 01:14:00 log messages here
Feb 10 16:32:00 more messages
Mar 7 16:32:00 more messages
Apr 21 16:32:00 more messages
For example if I want to pull all lines between January 1st and Feb 10th:
I've tried:
awk 'BEGIN{IGNORECASE=1} ($0>=from&&$0<=to)' from="Jan 1 00:00:00" to="Feb 10 23:59:59"
It's a system that only has access to awk so I am kind of limited. Any help would be greatly appreciated.
EDIT:
Thanks alot for the answers so far! They've worked great and have helped my understanding of AWK. However I did forget to mention I need to be able to include the time as well.
For example finding lines in the range including and between:
Jan 1 12:34:00
and
Feb 20 14:23:01
EDIT2: Based on the answer provided by #Cyrus, I decided to use this to parse through times as well:
awk -v start="0101 10:23:22" -v stop="0210 14:21:02" \
'BEGIN{m["Jan"]="01"; m["Feb"]="02"; m["Mar"]="03"; m["Apr"]="04"}
{original = $0; $1 = m[$1]; $2 = sprintf("%.2d", $2)}
$1$2$3 >= start && $1$2$3 <= stop {print original}' file

$ cat tst.awk
{
mthNr = (index("JanFebMarAprMayJunJulAugSepOctNovDec",$1)+2)/3
date = sprintf("%02d%02d", mthNr, $2)
}
(date >= from) && (date <= to)
$ awk -v from='0101' -v to='0210' -f tst.awk file
Jan 5 11:34:00 log messages here
Jan 13 16:21:00 log messages here
Feb 1 01:14:00 log messages here
Feb 10 16:32:00 more messages
Massage to suit...

With awk. 0101 is January 1st and 0210 February 10th.
awk -v start="0101" -v stop="0210" \
'BEGIN{m["Jan"]="01"; m["Feb"]="02"; m["Mar"]="03"; m["Apr"]="04"}
{original = $0; $1 = m[$1]; $2 = sprintf("%.2d", $2)}
$1$2 >= start && $1$2 <= stop {print original}' file
Output:
Jan 5 11:34:00 log messages here
Jan 13 16:21:00 log messages here
Feb 1 01:14:00 log messages here
Feb 10 16:32:00 more messages

extract header if pattern in a column matches

I am trying to extract and print header of a file if the pattern in that particular column matches.
Here is a example :
[user ~]$ cal |sed 's/July 2014//'
Su Mo Tu We Th Fr Sa
1 2 3 4 5
6 7 8 9 10 11 12
13 14 15 16 17 18 19
20 21 22 23 24 25 26
27 28 29 30 31
Expected output :
if input date =31 then print the day on 31st.
Just to be clear, I cannot use date -d flag as its not supported by my OS.Probably would need awk here to crack the question.
[user ~]$ date -d 20140731 +%A
Thursday
I hope I am able to convey my question and concern clearly.

Using awk:
cal | awk -v date=31 'NR == 2 { split($0, header) } NR > 2 { for (i = 1; i <= NF; ++i) if ($i == date) { print header[NR == 3 ? i + 7 - NF : i]; exit } }'
Output:
Th

Here is a gnu awk solution:
cal | awk -v date=31 -v FIELDWIDTHS="3 3 3 3 3 3 3 3" 'NR==2 {split($0,a)} {for (i=1;i<=NF;i++) if ($i==date) print a[i]}'
Th
You set the date that you like to be displayed as a variable, so it can be change to what you like.
Or it could be written like this:
cal | awk 'NR==2 {split($0,a)} {for (i=1;i<=NF;i++) if ($i==date) print a[i]}' FIELDWIDTHS="3 3 3 3 3 3 3 3" date=31
PS FIELDWIDTH was introduced in gnu awk 2.31

Parsing the output of cal isn't really that advisable...
Can your OS's date handle -j?
date -j 073100002014 "+%a"
Thu
How is your OS at perl?
perl -MDateTime -E '$dt=DateTime->new(year=>2014,month=>7,day=>31);say $dt->day_name'
Thursday
Or, if it doesn't do perl -E, you could do
perl -MDateTime -e '$dt=DateTime->new(year=>2014,month=>7,day=>31);print $dt->day_name'
Thursday
How is your OS at php?
php -r '$jd=cal_to_jd(CAL_GREGORIAN,7,31,2014);echo(jdk($jd,2));'
Thu

Show a list of users that logged in exactly 5 days ago from today in linux?

The last command displays the history of login attempts. How to filter the output so that it displays the users logged in from 5 days before current date?
Here is what I've been able to do so far:
last | grep Dec | grep -v reboot | awk '{print$5}'
This parses the dates from the output of last command.
#!/bin/bash
count=`$date "+%d"`
count=$((count-5))
last|grep -v reboot|grep Dec|awk '($5>=$count) {print $0}'
worked for me :) Thanks for the help #Olivier Dulac

I couldn't do it in one line, but here's a little bash script which might get the job done:
#! /bin/bash
# Find the date string we want
x=$(date --date="5 days ago" +"%a %b %e");
# And now chain a heap of commands together to...
# 1. Get the list of user
# 2. Ignore reboot
# 3. Filter the date lines we want
# 4. Print the user name using awk
# 5+6. Sort them and extract the unique values
last | grep -v "reboot" | grep "$x" | awk '{print $1}' | sort | uniq

in your awk (I don't have "last" here so I can't know the format)
just add a condition to only print the whole line when you see what you want:
ex: if the month is the 3rd field, and day is the 4th field,
last | grep -v reboot | awk ' ( ($3 == "Dec") && ($4 == "07") ) { print $0 ; }'
(once again, without an actual excerpt of "last", I can't tell if the above works, but I hope you get the general idea)

I think chooban's solution is the closest, but it lists only the matching lines. I found a better solution, and most probably it handles the 2013-12-31 - 2014-01-01 issue properly (I found no trace of the output format if a user is logged in more the one year..., or the login time is in the previous year). It is a grep-less one (long)liner:
last | awk -v l="$(last -t $(date -d '-5 day' +%Y%m%d%H%M%S)|head -n 1)" 'BEGIN {l=substr(l,1,55)} /^reboot / {next} substr($0,1,55) == l {exit} 1
I assumed that there is no such user as 'reboot'. It uses the fact that last -t YYYYMMDDHHMMSS prints the lines before the specific date, but unfortunately it changes the format if the logout is inside the specified period (shows "gone - no logout"), so it has to be cut off.
This is not the nicest as it calls last twice, but it seems working.
Output:
root pts/1 mytst.xyzzy.tv Wed Dec 11 12:45 still logged in
root pts/0 mytst.xyzzy.tv Wed Dec 11 11:25 still logged in
root pts/0 mytst.xyzzy.tv Tue Dec 10 16:02 - 17:14 (01:12)
root pts/0 mytst.xyzzy.tv Tue Dec 10 10:59 - 15:04 (04:05)
root pts/0 mytst.xyzzy.tv Mon Dec 9 13:23 - 17:10 (03:46)
root pts/1 mytst.xyzzy.tv Fri Dec 6 16:01 - 16:07 (00:06)
root pts/0 mytst.xyzzy.tv Fri Dec 6 15:52 - 16:08 (00:15)
I hope this could help!

Using variables with sed

I'm trying to delete a part of a file using sed in Linux (Ubuntu). Specifically, I want to delete the first lines of a log file until the first occurrence of the current system date (using the pattern '10 Jan 13').
So, I store the date in a variable
root#server:/# VAR_DATE=`date -R | cut -c6-11`
And after that, I use sed
root#server:/# cat log_file.txt | sed -n -e '/$VAR_DATE/,$p'
But it doesn't work. I've tried a lot of combinations with the same result:
root#server:/# cat log_file.txt | sed -n -e '/"$VAR_DATE"/,$p'
root#server:/# cat log_file.txt | sed -n -e '/"${VAR_DATE}"/,$p'
root#server:/# cat log_file.txt | sed -n -e "/$VAR_DATE/,$p"
What I'm doing wrong?

Use double quotes so the variable $vardate gets expanded by the shell and escape the last $ so it's not expanded by the shell sed -n "/$vardate/,\$p" file:
$ cat file
6 Jan 13
7 Jan 13
8 Jan 13
9 Jan 13
10 Jan 13
11 Jan 13
12 Jan 13
13 Jan 13
$ vardate="10 Jan 13"
$ sed -n "/$vardate/,\$p" file
10 Jan 13
11 Jan 13
12 Jan 13
13 Jan 13

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Linux Grouping and Counting Files by attribute - linux

You could do it this way using awk and sort $ ls -l | awk '$6!=""{m[$6]++}END{for(i in m){printf "%s : %s%s",i,m[i],ORS }}' | sort -k1M Jan : 7 Mar : 1 Apr : 8 Aug : 2 The problem comes with the first line of ls -l which doesn't contain a month field

Related

Grep total amount of specific elements based on date

Awk to find lines within date range in a file with custom date format

extract header if pattern in a column matches

Show a list of users that logged in exactly 5 days ago from today in linux?

Using variables with sed

Categories

Resources