Grep files with numeric extension - linux

Consider a directory of 20 files numbered as follows:
ll *test*
> test.dat
> test.dat.1
> test.dat.2
...
> test.dat.20
A subset of the files that match to a given date can be found via
ll *test* | grep "Sep 29"
> test.dat
> test.dat.1
> test.dat.2
How can I search for a line pattern in ONLY this subset of files? I want to grep for the string WARNING in each line of the above three files. How can I tell grep to limit its search to only this subset?

-l option is made for that: list files that match
-L option does the opposite: list files that don't match
grep WARNING $(grep -l "Sep 29" *test.dat*)
EDIT
I misundrestood the question: you don't want to grep "WARNING" on files already containing "Sep 29", you want to grep "WARNING" on files last modified on Sep 29.
Therefore I suggest:
grep WARNING $(ll *test.dat* | grep "Sep 29")
But I wouldn't rely on ll output.

Use a subshell:
grep "WARNING" $(ll *test* | grep "Sep 29")
That way, the output of your command will become the <files_to_search_in> argument of your outer-most grep command.
Keep in mind that since you are using ll in your original command, the output of it will give you not only the file names you want, but other file details (permissions, date, etc). You might have to do further processing in your "inner" grep, so that the information passed to the outer-most grep command will be limited to file names.
While at it, consider doing your file filtering in your inner-most subshell with the find command (man page) instead of a combination of ll + grep: use the right tool for the job (:

Another way of doing this:
find . -type f -name "test.dat*" -newermt 2017-09-29 ! -newermt 2017-09-30 -exec grep WARNING {} \;
Details
-type f: searching file only
-name "test.dat*": only file beginning by "test.dat"
-newermt 2017-09-29 ! -newermt 2017-09-30: only file with modification date = 29 Spetember 2017
-exec grep WARNING {} \;: each time a file is found, execute grep WARNING on it

Related

using grep in single-line files to find the number of occurrences of a word/pattern

I have json files in the current directory, and subdirectories. All the files have a single line of content.
I want to a list of all files that contain the word XYZ, and the number of times it occurs in that file.
I want to print the list according to the following format:
file_name pattern_occurence_times
It should look something like:
.\x1\x2\file1.json 3
.\x1\file3.json 2
The problem is that grep counts the NUMBER of lines containing XYZ, not the number of occurrences.
Since the whole content of the files is always contained in a single line, the count is always 1 (if the pattern occurs in the file).
I used this command for that:
find . -type f -name "*.json" -exec grep --files-with-match -i 'xyz' {} \; -exec grep -wci 'xyz' {} \;
I wrote a python code, and it works, but I would like to know if there is any way of doing that using find and grep or any other command line tools.
Thanks
The classical approach to this problem is the pipeline grep -o regex file | wc -l. However, to execute a pipeline in find's -exec you have to run a shell (e.g. sh -c ... ). But all these things together will only print the number of matches, not the file names. Also, files with no matches have to be filtered out.
Because of all of this I think a single awk command would be preferable:
find ... -type f -exec awk '{$0=tolower($0); c+=gsub(/xyz/,"")}
END {if(c>0) print FILENAME " " c}' {} \;
Here the tolower($0) emulates grep's -i option. Make sure to write your search pattern xyz only in lowercase.
If you want to combine this with subsequent filters in find you can add else exit 1 at the end of the last awk block to continue (inside find) only with the printed files.
Use the -o option of grep, e.g. in conjunction with wc, e.g.
find . -name "*.json" | while read -r f ; do
echo $f : $(grep -ow XYZ "$f" | wc -l)
done

How can I use grep to get all the lines that contains string1 and string2 separated by space?

Line1: .................
Line2: #hello1 #hello2 #hello3
Line3: .................
Line4: .................
Line5: #hello1 #hello4 #hello3
Line6: #hello1 #hello2 #hello3
Line7: .................
I have files that look similar in terms of lines on one of my project directories. I want to get the counts of all the lines that contain #hello1 and #hello2. In this case I would get 2 as a result only for this file. However, I want to do this recursively.
The canonical way to "do something recursively" is to use the find command. If you want to find lines that have two words on them, a simple regex will do:
grep -lr '#hello1.*#hello2' .
The option -l instructs grep to show us only filenames rather than file content, and the option -r tells grep to traverse the filesystem recursively. The start of the search is the path at the end of the line. Once you have the list of files, you can parse that list using commands run by xargs.
For example, this will count all the lines in files matching the pattern you specified.
grep -lr '#hello1.*#hello2' . | xargs -n 1 wc -l
This uses xargs to run the wc command on each of the files listed by grep. You could probably also run this without the -n 1, unless you're dealing with many many thousands of files that would exceed your maximum command line length.
Or, if I'm interpreting your question correctly, the following will count just the patterns in those files.
grep -lr '#hello1.*#hello2' . | xargs -n 1 grep -Hc '#hello1.*#hello2'
This runs a similar grep to the one used to generate your recursive list of files, and presents the output with filename (-H) and count (-c).
But if you want complex rules like finding two patterns possibly on different lines in the file, then grep probably is not the optimal tool, unless you use multiple greps launched by find:
find /path/to/base -type f \
-exec grep -q '#hello1' {} \; \
-exec grep -q '#hello2' {} \; \
-print
(Lines split for easier reading.)
This is somewhat costly, as find needs to launch up to two children for each file. So another approach would be to use awk instead:
find /path/to/base -type f \
-exec awk '/#hello1/{c++} /#hello2/{c++} c==2{r=1} END{exit 1-r}' {} \; \
-print
Alternately, if your shell is bash version 4 or above, you can avoid using find and use the bash option globstar:
$ shopt -s globstar
$ awk 'FNR=1{c=0} /#hello1/{c++} /#hello2/{c++} c==2{print FILENAME;nextfile}' **/*
Note: none of this is tested.
If you are not nterested in the number of files also,
then just something along:
find $BASEDIRECTORY -type f -print0 | xargs -0 grep -h PATTERN | wc -l
If you want to count lines containing #hello1 and #hello2 separated by space in a specific file you can:
$ grep -c '#hello1 #hello2' file
If you want to count in more than one file:
$ grep -c '#hello1 #hello2' file1 file2 ...
And if you want to get the gran total:
$ grep -c '#hello1 #hello2' file1 file2 ... | paste -s -d+ - | bc
of course you can let your shell expanding file names. So, for example:
$ grep -c '#hello1 #hello2' *.txt | paste -s -d+ - | bc
or so...
find . -type f | xargs -1 awk '/#hello1/ && /#hello2/{c++} END{print FILENAME, c+0}'

How to list the files using sort command but not ls -lrt command

I am writing a shell script to check some parameters like errors or exception inside the log files which are getting generated in last 2 hours inside the directory /var/log. So this is command I am using:
find /var/log -mmin -120|xargs egrep -i "error|exception"
It is displaying the list of file names and its corresponding parameters(errors and exceptions) but the list of files are not as per the time sequence. I mean the output is something like this(the sequence):
/var/log/123.log:RPM returned error
/var/log/361.log:There is error in line 1
/var/log/4w1.log:Error in configuration line
But the sequence how these 3 log files have been generated is different.
/var/log>ls -lrt
Dec24 1:19 361.log
Dec24 2:01 4w1.log
Dec24 2:15 123.log
So I want the output in the same sequence,I mean like this:
/var/log/361.log:There is error in line 1
/var/log/4w1.log:Error in configuration line
/var/log/123.log:RPM returned error
I tried this:
find /var/log -mmin -120|ls -ltr|xargs egrep -i "error|exception"
but it is not working.
Any help on this is really appreciated.
If your filenames don't have any special characters (like newline characters, etc), all you need is another call to xargs:
find . -type f -mmin -120 | xargs ls -tr | xargs egrep -i "error|exception"
Or if your filenames contain said special chars:
find . -type f -mmin -120 -print0 | xargs -0 ls -tr | xargs egrep -i "error|exception"
You can prepend the modified time using the -printf argument to find, then sort, and then remove the modified time with sed:
find /var/log -mmin -120 -printf '%T#:%p\n' | sort -V | sed -r 's/^[^:]+://' | xargs egrep -i "error|exception"
find ... -printf '%T#:%p\n' prints out each found file (%p) prepended by the seconds since the UNIX epoch (%T#; e.g., 1419433217.1835886710) and a colon separator (:), each on a new line (\n).
sort -V sorts the files naturally by modification time because it is at the beginning of each line (e.g., 1419433217.1835886710:path/to/the/file).
sed -r 's/^[^:]+://' takes each line in the format 123456789.1234:path/to/the/file and strips out the modification time leaving just the file path path/to/the/file

grep a pattern in some files and print the sum in each file

I want to grep a pattern in some files and count the occurrence with the filename. Right know, if I use
grep -r "month" report* | wc -l
it will sum all instances in all files. So the output is a single value 324343. I want something like this
report1: 3433
report2: 24399
....
The grep command will show the filename but will print every instance.
grep -c will give you a count of matches for each file:
grep -rc "month" report*
You need to pass each file to grep: echo report* | xargs grep -c month .
If recursively, use find report* -exec grep month -Hc '{}' \;.

Copy the three newest files under one directory (recursively) to another specified directory

I'm using bash.
Suppose I have a log file directory /var/myprogram/logs/.
Under this directory I have many sub-directories and sub-sub-directories that include different types of log files from my program.
I'd like to find the three newest files (modified most recently), whose name starts with 2010, under /var/myprogram/logs/, regardless of sub-directory and copy them to my home directory.
Here's what I would do manually
1. Go through each directory and do ls -lt 2010*
to see which files starting with 2010 are modified most recently.
2. Once I go through all directories, I'd know which three files are the newest. So I copy them manually to my home directory.
This is pretty tedious, so I wondered if maybe I could somehow pipe some commands together to do this in one step, preferably without using shell scripts?
I've been looking into find, ls, head, and awk that I might be able to use but haven't figured the right way to glue them together.
Let me know if I need to clarify. Thanks.
Here's how you can do it:
find -type f -name '2010*' -printf "%C#\t%P\n" |sort -r -k1,1 |head -3 |cut -f 2-
This outputs a list of files prefixed by their last change time, sorts them based on that value, takes the top 3 and removes the timestamp.
Your answers feel very complicated, how about
for FILE in find . -type d; do ls -t -1 -F $FILE | grep -v "/" | head -n3 | xargs -I{} mv {} ..; done;
or laid out nicely
for FILE in `find . -type d`;
do
ls -t -1 -F $FILE | grep -v "/" | grep "^2010" | head -n3 | xargs -I{} mv {} ~;
done;
My "shortest" answer after quickly hacking it up.
for file in $(find . -iname *.php -mtime 1 | xargs ls -l | awk '{ print $6" "$7" "$8" "$9 }' | sort | sed -n '1,3p' | awk '{ print $4 }'); do cp $file ../; done
The main command stored in $() does the following:
Find all files recursively in current directory matching (case insensitive) the name *.php and having been modified in the last 24 hours.
Pipe to ls -l, required to be able to sort by modification date, so we can have the first three
Extract the modification date and file name/path with awk
Sort these files based on datetime
With sed print only the first 3 files
With awk print only their name/path
Used in a for loop and as action copy them to the desired location.
Or use #Hasturkun's variant, which popped as a response while I was editing this post :)

Resources