using Find/Grep to search files between specific time of day

using Find/Grep to search files between specific time of day - linux

I can use the following command to search log files over the past 10 days:
find . -type f -mtime -10 |xargs grep -i -n 'exception' 2> /dev/null
But i want to further limit the search for lines in the file that are logged between 6am and 6pm. I'm wondering how i can modify the grep command to filter these if the lines look like this:
2012-09-04 03:50:41,658 [MainLogger: ] EXCEPTION AppLog - some exception 1
2012-09-04 10:01:32,902 [MainLogger: ] EXCEPTION AppLog - some exception 2
2012-09-04 15:39:51,901 [MainLogger: ] EXCEPTION AppLog - some exception 3
2012-09-04 18:12:51,901 [MainLogger: ] EXCEPTION AppLog - some exception 4
In the above case on lines 2 and 3 should be returned since they are between 6am and 6pm.
any help would be appreciated

One easy ugly way to do it could be adding a lot of greps, like this :
find . -type f -mtime -10 |xargs grep -i -n 'exception' | grep -v " 00" | grep -v " 01" | ... | grep -v " 18" | grep -v " 19" ... 2> /dev/null
Or more concisely :
find . -type f -mtime -10 |xargs grep -i -n 'exception' | grep -v -e " \(0[012345]\|18\|19\|2[0123]\)" 2> /dev/null

You can hack it like this:
grep ' 0[6789]:\| 1[01234567]\| 18:00:00,000'
But if you will need some more time handling, I recommend switching to a more powerful language (e.g. Perl and DateTime).

In any language with a proper datetime library, converting all dates to a canonical representation makes the problem trivial. The default canonicalization is to convert to seconds since midnight, Jan 1, 1970. Then just see if the canonical number of the input line is bigger than the start time and smaller than the end time.

Related

Sorting after using wc on files from a directory

Specify a command / set of commands that displays the number of lines of code in the .c and .h files in the current directory, displaying each file in alphabetical order followed by: and the number of lines in the files, and finally the total of the lines of code. .
An example that might be displayed would be:
main.c: 202
util.c: 124
util.h: 43
TOTAL: 369
After many attempts, my final result of a one-liner command was :
wc -l *.c *.h | awk '{print $2 ": " $1}' | sed "$ s/total/TOTAL/g"
The problem is that i don't know how to sort them alphabetically without moving the TOTAL aswell (considering we don't know how many files are in that folder). I'm not sure if the command above is that efficient so if you have a better one you can include more variations of it.

perhaps easier would be to sort the input arguments to wc -- perhaps something like this:
$ find . -maxdepth 1 '(' -name '*.py' -o -name '*.md' ')' | sort | xargs -d'\n' wc -l | awk '{print $2": "$1}' | sed 's/^total:/TOTAL:/'
./__main__.py: 70
./README.md: 96
./SCREENS.md: 76
./setup.py: 2
./t.py: 10
TOTAL: 254
note that I use xargs -d'\n' to avoid filenames with spaces in them (if I were targetting GNU+ I would drop the . from find . and perhaps use -print0 | sort -z | xargs -0 instead)

You could save input into a variable, extract the last line, sort everything except the last line and then output the last line:
printf '%s\n' 'main.c: 202' 'util.c: 124' 'util.h: 43' 'TOTAL: 369' | {
v=$(cat);
total=$(tail -n1 <<<"$v");
head -n-1 <<<"$v" | sort -r;
printf "%s\n" "$total";
}

here's a meme-ier answer which uses python3.8+ (hey awk is turing complete why not use python?)
python3.8 -c 'import sys;s=0;[print(x+":",-s+(s:=s+len(list(open(x)))))for x in sorted(sys.argv[1:])];print("TOTAL:",s)' *.py *.cfg
__main__.py: 70
setup.cfg: 55
setup.py: 2
t.py: 10
TOTAL: 137
expanding it out, it abuses a few things:
len(list(open(x))) - open(x) returns a file, list exhausts the iterator (by lines) and then the length of that is the number of lines
-s+(s:=s+...) -- this is an assignment expression which has the side-effect of accumulating s but having the expression be the difference (...)
sorted(sys.argv) satisfies the sorting part

How to find random files in Linux shell

How to pick 100 files randomly from a directory by Linux shell. I read other topic, 'shuf' command can do this: find . -type f | shuf -n100, but our environments do not have 'shuf' cmd. Is there other method to do this? use bash, awk, sed or sth else?

You can get a directory listing, then randomize it, then pick the top N lines.
ls | sort -R | head -n 100
Replace ls with an appropriate find command if you want a recursive listing or need finer control of the files to be included.

This should work on your CentOS5:
shuf() { awk 'BEGIN{srand()}{print rand()"\t"$0}' "$#" | sort | cut -f2- ;}
This comes from a comment by Meow on https://stackoverflow.com/a/2153889/5844347
Use like so: find . -type f | shuf | head -100

# To get a integer number between 1 to 100 :
N=`echo|awk 'srand() {print 99*rand() + 1 }' | sed -e "s/\..*$//g"`
echo $N
# To get the Nth file :
find . -type f | head -${N} | tail -1
# To get 100 files randomly :
for i in $(seq 1 100 )
N=`echo|awk 'srand() {print 99*rand() + 1 }' | sed -e "s/\..*$//g"`
find . -type f | head -${N} | tail -1
done

File with the most lines in a directory NOT bytes

I'm trying to to wc -l an entire directory and then display the filename in an echo with the number of lines.
To add to my frustration, the directory has to come from a passed argument. So without looking stupid, can someone first tell me why a simple wc -l $1 doesn't give me the line count for the directory I type in the argument? I know i'm not understanding it completely.
On top of that I need validation too, if the argument given is not a directory or there is more than one argument.

wc works on files rather than directories so, if you want the word count on all files in the directory, you would start with:
wc -l $1/*
With various gyrations to get rid of the total, sort it and extract only the largest, you could end up with something like (split across multiple lines for readability but should be entered on a single line):
pax> wc -l $1/* 2>/dev/null
| grep -v ' total$'
| sort -n -k1
| tail -1l
2892 target_dir/big_honkin_file.txt
As to the validation, you can check the number of parameters passed to your script with something like:
if [[ $# -ne 1 ]] ; then
echo 'Whoa! Wrong parameteer count'
exit 1
fi
and you can check if it's a directory with:
if [[ ! -d $1 ]] ; then
echo 'Whoa!' "[$1]" 'is not a directory'
exit 1
fi

Is this what you want?
> find ./test1/ -type f|xargs wc -l
1 ./test1/firstSession_cnaiErrorFile.txt
77 ./test1/firstSession_cnaiReportFile.txt
14950 ./test1/exp.txt
1 ./test1/test1_cnaExitValue.txt
15029 total
so your directory which is the argument should go here:
find $your_complete_directory_path/ -type f|xargs wc -l

I'm trying to to wc -l an entire directory and then display the
filename in an echo with the number of lines.
You can do a find on the directory and use -exec option to trigger wc -l. Something like this:
$ find ~/Temp/perl/temp/ -exec wc -l '{}' \;
wc: /Volumes/Data/jaypalsingh/Temp/perl/temp/: read: Is a directory
11 /Volumes/Data/jaypalsingh/Temp/perl/temp//accessor1.plx
25 /Volumes/Data/jaypalsingh/Temp/perl/temp//autoincrement.pm
12 /Volumes/Data/jaypalsingh/Temp/perl/temp//bless1.plx
14 /Volumes/Data/jaypalsingh/Temp/perl/temp//bless2.plx
22 /Volumes/Data/jaypalsingh/Temp/perl/temp//classatr1.plx
27 /Volumes/Data/jaypalsingh/Temp/perl/temp//classatr2.plx
7 /Volumes/Data/jaypalsingh/Temp/perl/temp//employee1.pm
18 /Volumes/Data/jaypalsingh/Temp/perl/temp//employee2.pm
26 /Volumes/Data/jaypalsingh/Temp/perl/temp//employee3.pm
12 /Volumes/Data/jaypalsingh/Temp/perl/temp//ftp.plx
14 /Volumes/Data/jaypalsingh/Temp/perl/temp//inherit1.plx
16 /Volumes/Data/jaypalsingh/Temp/perl/temp//inherit2.plx
24 /Volumes/Data/jaypalsingh/Temp/perl/temp//inherit3.plx
33 /Volumes/Data/jaypalsingh/Temp/perl/temp//persisthash.pm

Nice question!
I saw the answers. Some are pretty good. The find ...|xrags is my most preferred. It could be simplified anyway using find ... -exec wc -l {} + syntax. But there is a problem. When the command line buffer is full a wc -l ... is called and every time a <number> total line is printer. As wc has no arg to disable this feature wc has to be reimplemented. To filter out these lines with grep is not nice:
So my complete answer is
#!/usr/bin/bash
[ $# -ne 1 ] && echo "Bad number of args">&2 && exit 1
[ ! -d "$1" ] && echo "Not dir">&2 && exit 1
find "$1" -type f -exec awk '{++n[FILENAME]}END{for(i in n) printf "%8d %s\n",n[i],i}' {} +
Or using less temporary space, but a little bit larger code in awk:
find "$1" -type f -exec awk 'function pr(){printf "%8d %s\n",n,f}FNR==1{f&&pr();n=0;f=FILENAME}{++n}END{pr()}' {} +
Misc
If it should not be called for subdirectories then add -maxdepth 1 before -type to find.
It is pretty fast. I was afraid that it would be much slower then the find ... wc + version, but for a directory containing 14770 files (in several subdirs) the wc version run 3.8 sec and awk version run 5.2 sec.
awk and wc consider the not \n ended lines differently. The last line ended with no \n is not counted by wc. I prefer to count it as awk does.
It does not print the empty files

To find the file with most lines in the current directory and its subdirectories, with zsh:
lines() REPLY=$(wc -l < "$REPLY")
wc -l -- **/*(D.nO+lined[1])
That defines a lines function which is going to be used as a glob sorting function that returns in $REPLY the number of lines of the file whose path is given in $REPLY.
Then we use zsh's recursive globbing **/* to find regular files (.), numerically (n) reverse sorted (O) with the lines function (+lines), and select the first one [1]. (D to include dotfiles and traverse dotdirs).
Doing it with standard utilities is a bit tricky if you don't want to make assumptions on what characters file names may contain (like newline, space...). With GNU tools as found on most Linux distributions, it's a bit easier as they can deal with NUL terminated lines:
find . -type f -exec sh -c '
for file do
size=$(wc -c < "$file") &&
printf "%s\0" "$size:$file"
done' sh {} + |
tr '\n\0' '\0\n' |
sort -rn |
head -n1 |
tr '\0' '\n'
Or with zsh or GNU bash syntax:
biggest= max=-1
find . -type f -print0 |
{
while IFS= read -rd '' file; do
size=$(wc -l < "$file") &&
((size > max)) &&
max=$size biggest=$file
done
[[ -n $biggest ]] && printf '%s\n' "$max: $biggest"
}

Here's one that works for me with the git bash (mingw32) under windows:
find . -type f -print0| xargs -0 wc -l
This will list the files and line counts in the current directory and sub dirs. You can also direct the output to a text file and import it into Excel if needed:
find . -type f -print0| xargs -0 wc -l > fileListingWithLineCount.txt

Linux Shell - String manipulation then calculating age of file in minutes

I am writing a script that calculates the age of the oldest file in a directory. The first commands run are:
OLDFILE=`ls -lt $DIR | grep "^-" | tail -1 `
echo $OLDFILE
The output contains a lot more than just the filename. eg
-rwxrwxr-- 1 abc abc 334 May 10 2011 ABCD_xyz20110510113817046.abc.bak
Q1/. How do I obtain the output after the last space of the above line? This would give me the filename. I realise some sort of string manipulation is required but am new to shell scripting.
Q2/. How do I obtain the age of this file in minutes?

To obtain just the oldest file's name,
ls -lt | awk '/^-/{file=$NF}END{print file}'
However, this is not robust if you have files with spaces in their names, etc. Generally, you should try to avoid parsing the output from ls.
With stat you can obtain a file's creation date in machine-readable format, expressed as seconds since Jan 1, 1970; with date +%s you can obtain the current time in the same format. Subtract and divide by 60. (More Awk skills would come in handy for the arithmetic.)
Finally, for an alternate solution, look at the options for find; in particular, its printf format strings allow you to extract a file's age. The following will directly get you the age in seconds and inode number of the oldest file:
find . -maxdepth 1 -type f -printf '%T# %i\n' |
sort -n | head -n 1
Using the inode number avoids the issues of funny file names; once you have a single inode, converting that to a file name is a snap:
find . -maxdepth 1 -inum "$number"
Tying the two together, you might want something like this:
# set -- Replace $# with output from command
set -- $(find . -maxdepth 1 -type f -printf '%T# %i\n' |
sort -n | head -n 1)
# now $1 is the timestamp and $2 is the inode
oldest_filename=$(find . -maxdepth 1 -inum "$2")
age_in_minutes=$(date +%s | awk -v d="$1" '{ print ($1 - d) / 60 }')

an awk solution, giving you how old the file is in minutes, (as your ls output does not contain the min of creation, so 00 is assumed by default). Also as tripleee pointed out, ls outputs are inherently risky to be parsed.
[[bash_prompt$]]$ echo $l; echo "##############";echo $l | awk -f test.sh ; echo "#############"; cat test.sh
-rwxrwxr-- 1 abc abc 334 May 20 2013 ABCD_xyz20110510113817046.abc.bak
##############
File ABCD_xyz20110510113817046.abc.bak is 2074.67 min old
#############
BEGIN{
m=split("Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec",d,"|")
for(o=1;o<=m;o++){
date[d[o]]=sprintf("%02d",o)
}
}
{
month=date[$6];
old_time=mktime($8" "month" "$7" "00" "00" "00);
curr_time=systime();
print "File " $NF " is " (curr_time-old_time)/60 " min old";
}

For Q2 a bash one-liner could be:
let mtime_minutes=\(`date +%s`-`stat -c%Y "$file_to_inspect"`\)/60

You probably want ls -t. Including the '-l' option explicitly asks for ls to give you all that other info that you actually want to get rid of.
However what you probably REALLY want is find, first:
NEWEST=$(find . -maxdepth 1 -type f | cut -c3- |xargs ls -t| head -1)
this will get you the plain filename of the newest item, no other processing needed.
Directories will be correctly excluded. (thanks to tripleee for pointing out that's what you were aiming for.)
For the second question, you can use stat and bc:
TIME_MINUTES=$(stat --format %Y/60 "$NEWEST" |bc -l)
remove the '-l' option from bc if you only want a whole number of minutes.

How to count lines of code including sub-directories [duplicate]

This question already has answers here:
How can I count all the lines of code in a directory recursively?
(51 answers)
Closed 7 years ago.
Suppose I want to count the lines of code in a project. If all of the files are in the same directory I can execute:
cat * | wc -l
However, if there are sub-directories, this doesn't work. For this to work cat would have to have a recursive mode. I suspect this might be a job for xargs, but I wonder if there is a more elegant solution?

First you do not need to use cat to count lines. This is an antipattern called Useless Use of Cat (UUoC). To count lines in files in the current directory, use wc:
wc -l *
Then the find command recurses the sub-directories:
find . -name "*.c" -exec wc -l {} \;
. is the name of the top directory to start searching from
-name "*.c" is the pattern of the file you're interested in
-exec gives a command to be executed
{} is the result of the find command to be passed to the command (here wc-l)
\; indicates the end of the command
This command produces a list of all files found with their line count, if you want to have the sum for all the files found, you can use find to list the files (with the -print option) and than use xargs to pass this list as argument to wc-l.
find . -name "*.c" -print | xargs wc -l
EDIT to address Robert Gamble comment (thanks): if you have spaces or newlines (!) in file names, then you have to use -print0 option instead of -print and xargs -null so that the list of file names are exchanged with null-terminated strings.
find . -name "*.c" -print0 | xargs -0 wc -l
The Unix philosophy is to have tools that do one thing only, and do it well.

If you want a code-golfing answer:
grep '' -R . | wc -l
The problem with just using wc -l on its own is it cant descend well, and the oneliners using
find . -exec wc -l {} \;
Won't give you a total line count because it runs wc once for every file, ( loL! )
and
find . -exec wc -l {} +
Will get confused as soon as find hits the ~200k1,2 character argument limit for parameters and instead calls wc multiple times, each time only giving you a partial summary.
Additionally, the above grep trick will not add more than 1 line to the output when it encounters a binary file, which could be circumstantially beneficial.
For the cost of 1 extra command character, you can ignore binary files completely:
grep '' -IR . | wc -l
If you want to run line counts on binary files too
grep '' -aR . | wc -l
Footnote on limits:
The docs are a bit vague as to whether its a string size limit or a number of tokens limit.
cd /usr/include;
find -type f -exec perl -e 'printf qq[%s => %s\n], scalar #ARGV, length join q[ ], #ARGV' {} +
# 4066 => 130974
# 3399 => 130955
# 3155 => 130978
# 2762 => 130991
# 3923 => 130959
# 3642 => 130989
# 4145 => 130993
# 4382 => 130989
# 4406 => 130973
# 4190 => 131000
# 4603 => 130988
# 3060 => 95435
This implies its going to chunk very very easily.

I think you're probably stuck with xargs
find -name '*php' | xargs cat | wc -l
chromakode's method gives the same result but is much much slower. If you use xargs your cating and wcing can start as soon as find starts finding.
Good explanation at Linux: xargs vs. exec {}

Try using the find command, which recurses directories by default:
find . -type f -execdir cat {} \; | wc -l

The correct way is:
find . -name "*.c" -print0 | xargs -0 cat | wc -l
You must use -print0 because there are only two invalid characters in Unix filenames: The null byte and "/" (slash). So for example "xxx\npasswd" is a valid name. In reality, you're more likely to encounter names with spaces in them, though. The commands above would count each word as a separate file.
You might also want to use "-type f" instead of -name to limit the search to files.

Using cat or grep in the solutions above is wasteful if you can use relatively recent GNU tools, including Bash:
wc -l --files0-from=<(find . -name \*.c -print0)
This handles file names with spaces, arbitrary recursion and any number of matching files, even if they exceed the command line length limit.

wc -cl `find . -name "*.php" -type f`

I like to use find and head together for "a recursively cat" on all the files in a project directory, for example:
find . -name "*rb" -print0 | xargs -0 head -10000
The advantage is that head will add your the filename and path:
==> ./recipes/default.rb <==
DOWNLOAD_DIR = '/tmp/downloads'
MYSQL_DOWNLOAD_URL = 'http://cdn.mysql.com/Downloads/MySQL-5.6/mysql-5.6.10-debian6.0-x86_64.deb'
MYSQL_DOWNLOAD_FILE = "#{DOWNLOAD_DIR}/mysql-5.6.10-debian6.0-x86_64.deb"
package "mysql-server-5.5"
...
==> ./templates/default/my.cnf.erb <==
#
# The MySQL database server configuration file.
#
...
==> ./templates/default/mysql56.sh.erb <==
PATH=/opt/mysql/server-5.6/bin:$PATH
For the complete example here, please see my blog post :
http://haildata.net/2013/04/using-cat-recursively-with-nicely-formatted-output-including-headers/
Note I used 'head -10000', clearly if I have files over 10,000 lines this is going to truncate the output ... however I could use head 100000 but for "informal project/directory browsing" this approach works very well for me.

If you want to generate only a total line count and not a line count for each file something like:
find . -type f -exec wc -l {} \; | awk '{total += $1} END{print total}'
works well. This saves you the need to do further text filtering in a script.

Here's a Bash script that counts the lines of code in a project. It traverses a source tree recursively, and it excludes blank lines and single line comments that use "//".
# $excluded is a regex for paths to exclude from line counting
excluded="spec\|node_modules\|README\|lib\|docs\|csv\|XLS\|json\|png"
countLines(){
# $total is the total lines of code counted
total=0
# -mindepth exclues the current directory (".")
for file in `find . -mindepth 1 -name "*.*" |grep -v "$excluded"`; do
# First sed: only count lines of code that are not commented with //
# Second sed: don't count blank lines
# $numLines is the lines of code
numLines=`cat $file | sed '/\/\//d' | sed '/^\s*$/d' | wc -l`
total=$(($total + $numLines))
echo " " $numLines $file
done
echo " " $total in total
}
echo Source code files:
countLines
echo Unit tests:
cd spec
countLines
Here's what the output looks like for my project:
Source code files:
2 ./buildDocs.sh
24 ./countLines.sh
15 ./css/dashboard.css
53 ./data/un_population/provenance/preprocess.js
19 ./index.html
5 ./server/server.js
2 ./server/startServer.sh
24 ./SpecRunner.html
34 ./src/computeLayout.js
60 ./src/configDiff.js
18 ./src/dashboardMirror.js
37 ./src/dashboardScaffold.js
14 ./src/data.js
68 ./src/dummyVis.js
27 ./src/layout.js
28 ./src/links.js
5 ./src/main.js
52 ./src/processActions.js
86 ./src/timeline.js
73 ./src/udc.js
18 ./src/wire.js
664 in total
Unit tests:
230 ./ComputeLayoutSpec.js
134 ./ConfigDiffSpec.js
134 ./ProcessActionsSpec.js
84 ./UDCSpec.js
149 ./WireSpec.js
731 in total
Enjoy! --Curran

find . -name "*.h" -print | xargs wc -l

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

using Find/Grep to search files between specific time of day - linux

You can hack it like this: grep ' 0[6789]:\| 1[01234567]\| 18:00:00,000' But if you will need some more time handling, I recommend switching to a more powerful language (e.g. Perl and DateTime).

Related

Sorting after using wc on files from a directory

How to find random files in Linux shell

File with the most lines in a directory NOT bytes

Linux Shell - String manipulation then calculating age of file in minutes

How to count lines of code including sub-directories [duplicate]

Categories

Resources