How to list the files using sort command but not ls -lrt command - linux

I am writing a shell script to check some parameters like errors or exception inside the log files which are getting generated in last 2 hours inside the directory /var/log. So this is command I am using:
find /var/log -mmin -120|xargs egrep -i "error|exception"
It is displaying the list of file names and its corresponding parameters(errors and exceptions) but the list of files are not as per the time sequence. I mean the output is something like this(the sequence):
/var/log/123.log:RPM returned error
/var/log/361.log:There is error in line 1
/var/log/4w1.log:Error in configuration line
But the sequence how these 3 log files have been generated is different.
/var/log>ls -lrt
Dec24 1:19 361.log
Dec24 2:01 4w1.log
Dec24 2:15 123.log
So I want the output in the same sequence,I mean like this:
/var/log/361.log:There is error in line 1
/var/log/4w1.log:Error in configuration line
/var/log/123.log:RPM returned error
I tried this:
find /var/log -mmin -120|ls -ltr|xargs egrep -i "error|exception"
but it is not working.
Any help on this is really appreciated.

If your filenames don't have any special characters (like newline characters, etc), all you need is another call to xargs:
find . -type f -mmin -120 | xargs ls -tr | xargs egrep -i "error|exception"
Or if your filenames contain said special chars:
find . -type f -mmin -120 -print0 | xargs -0 ls -tr | xargs egrep -i "error|exception"

You can prepend the modified time using the -printf argument to find, then sort, and then remove the modified time with sed:
find /var/log -mmin -120 -printf '%T#:%p\n' | sort -V | sed -r 's/^[^:]+://' | xargs egrep -i "error|exception"
find ... -printf '%T#:%p\n' prints out each found file (%p) prepended by the seconds since the UNIX epoch (%T#; e.g., 1419433217.1835886710) and a colon separator (:), each on a new line (\n).
sort -V sorts the files naturally by modification time because it is at the beginning of each line (e.g., 1419433217.1835886710:path/to/the/file).
sed -r 's/^[^:]+://' takes each line in the format 123456789.1234:path/to/the/file and strips out the modification time leaving just the file path path/to/the/file

Related

Format xargs output to grep

I have a script that I'm trying to optimize with xargs. The current version uses find with -exec to call the command:
find -type f -iname "*.mp4" -print0 -printf '\n' -exec getfattr -d --absolute-names {} \;
after which I can pipe to grep with something like:
grep -z -P user\.md5\=\"$input_search_hash\"
to filter the results while keeping the whole output with -z.
I need the whole output returned from getfattr to be "preserved", per file, because I need the filename for which there is a matching extended attribute, which then is then passed to sed to extract it. There are also cases where I have multiple grep commands in sequence if I need to search for files with multiple matches in the extended attributes. The problem is that the output of:
find -type f -iname "*.mp4" -print0 | xargs -0 getfattr -d --absolute-names
is not formatted in such a way that grep will filter in this way. This does work with the -exec method. Can I pass an addional option to xargs or pipe in some additional command that will format the output to make grep properly replicate the behaviour of -exec? I'm guessing I need some sort of line-break before feeding to grep like what -printf '\n' does in the -exec method. I would just use getfattr to "search" the extended attributes instead of needing to grep the output at all, but it has no way to do this by suppling a xattr name and value.
Example
The input comes from the find command, which is a list of video files in an arbitrary directory structure. The output of each getfattr command, for each file is such:
# file: /path/to/file/test.mp4
user.md5="0e29a7f555af518872771689e28d998d"
user.quality="10"
user.sha256="d49ba58e3b30f4ef8c81d19ce960edcf6552977bb8adb79b5b9a677ba9a54b2b"
user.size="1645645"
If I attempt to grep the output of find using the + method, say for a value of "10" on the quality, I will get results like this:
# file: /path/to/file/test.mp4
user.md5="8cf97b888e6fdbed27b02233cd6779f5"
user.quality="12"
user.sha256="613d16b2a0270e2e5f81cfd58b1eacf710a65b82ce2dab49a1e415275440f429"
user.size="1645645"
# file: /path/to/file/test1.mp4
user.md5="3c5a39f1ceefce1e124bcd6786a99155"
user.quality="10"
user.sha256="0d7128a7642d24ea879bbfb3de812b7939b618d8af639f07d5104c954c8049c3"
user.size="5674567"
# file: /path/to/file/test2.mp4
user.md5="0e29a7f555af518872771689e28d998d"
user.quality="6"
user.sha256="d49ba58e3b30f4ef8c81d19ce960edcf6552977bb8adb79b5b9a677ba9a54b2b"
user.size="15645"
All files that find locates are returned and the string to be searched from grep, in this example user.quality="10", is highlighted, but the other files test.mp4 and test2.mp4 still have the output printed post-grep. In other words, find may locate 1000 mp4 files of which maybe 20 have a user.quality="10" entry, but even applying grep to search for that string still returns 1000 filenames (after sed).
This does not happen when using \;. The only thing I would get out from grep would be:
# file: /path/to/file/test.mp4
user.md5="3c5a39f1ceefce1e124bcd6786a99155"
user.quality="10"
user.sha256="0d7128a7642d24ea879bbfb3de812b7939b618d8af639f07d5104c954c8049c3"
user.size="5674567"
This is the expected behaviour.
xargs vs find -exec
To me it seems like you want to use xargs instead of find -exec {} \; to speed things up.
Yes, xargs is faster than find -exec {} \;, not because it does the same work more efficiently, but because it does different work!
find -exec {} \; calls once for each file (getfattr file1, then getfattr file2, and so on).
xargs crams as many files into one call as possible (getfattr file1 file2 file3 ...).
The same behavior (and even more speedup) can be achieved with find -exec {} + -- no need to use xargs for that.
With xargs and find -exec {} + you loose control over the output format. There is only one call of getfattr so that program decides what to print between file1, file2 and so on. getfattr has no option to customize its output format.
No problem! You can ...
Parse getfattr's output
... pretty easily.
For starters, we assume that all path names are pretty normal. Spaces, *, and ? are ok though. For really unusual path names containing backslashes and linebreaks see the last section.
If you output only the relevant attribute using -n user.md5 instead of -d, then you know that the output (if any) for each file is always of the form
# file: path in a single line
user.md5=encoded value of the attribute
Files without the attribute user.md5 are not printed at all. They cause a warning on stderr which can be suppressed by 2> /dev/null.
Now, grep for matching attributes. Use grep -B1 to print the line above each match (i.e. the path) too. Then use sed -n or grep -o to extract the filenames.
find -type f -iname '*.mp4' -exec getfattr -n user.md5 --absolute-names {} + 2> /dev/null |
grep -B1 -Fx "user.md5=\"$input_search_hash\"" |
sed -n 's/^# file: //p'
Above command prints the paths of all mp4 files having the attribute user.md5 with value $input_search_hash.
Handling Unusual Filenames
At least my version (getfattr 2.4.48 by Andreas Gruenbacher) on Debian 10 always prints the file name in a single line. Linebreaks are encoded using \012 and backslashes are encoded using \134. Therefore, safe processing of those files is possible.
Above command works, but prints only the encoded file names. To get the actual filenames you have to extend the sed command or add another command to interpret octal escape sequences. For me, getfattr only escapes \n, \r and \\, thus sed 's:\\012:\n:g;s:\\015:\r:g;s:\\134:\\:g' should be sufficient for printing. For further processing, you may want to use tr \\n \\0 | sed -z ... instead, such that filenames are separated by null bytes.
To test which characters are escaped for you, create a filename containing all allowed bytes and let getfattr print its name:
f=$(printf $(printf '\\%o' $(seq 1 255)) | tr -d /)
touch "$f"
setfattr -n user.md5 -v 123 "$f"
getfattr -n user.md5 "$f"
rm "$f"

Grep files with numeric extension

Consider a directory of 20 files numbered as follows:
ll *test*
> test.dat
> test.dat.1
> test.dat.2
...
> test.dat.20
A subset of the files that match to a given date can be found via
ll *test* | grep "Sep 29"
> test.dat
> test.dat.1
> test.dat.2
How can I search for a line pattern in ONLY this subset of files? I want to grep for the string WARNING in each line of the above three files. How can I tell grep to limit its search to only this subset?
-l option is made for that: list files that match
-L option does the opposite: list files that don't match
grep WARNING $(grep -l "Sep 29" *test.dat*)
EDIT
I misundrestood the question: you don't want to grep "WARNING" on files already containing "Sep 29", you want to grep "WARNING" on files last modified on Sep 29.
Therefore I suggest:
grep WARNING $(ll *test.dat* | grep "Sep 29")
But I wouldn't rely on ll output.
Use a subshell:
grep "WARNING" $(ll *test* | grep "Sep 29")
That way, the output of your command will become the <files_to_search_in> argument of your outer-most grep command.
Keep in mind that since you are using ll in your original command, the output of it will give you not only the file names you want, but other file details (permissions, date, etc). You might have to do further processing in your "inner" grep, so that the information passed to the outer-most grep command will be limited to file names.
While at it, consider doing your file filtering in your inner-most subshell with the find command (man page) instead of a combination of ll + grep: use the right tool for the job (:
Another way of doing this:
find . -type f -name "test.dat*" -newermt 2017-09-29 ! -newermt 2017-09-30 -exec grep WARNING {} \;
Details
-type f: searching file only
-name "test.dat*": only file beginning by "test.dat"
-newermt 2017-09-29 ! -newermt 2017-09-30: only file with modification date = 29 Spetember 2017
-exec grep WARNING {} \;: each time a file is found, execute grep WARNING on it

Grep regular files in a linux File System and show their content

How do I display the content of files regular files matched with grep command? For example I grep a directory in order to see the regular files it has. I used the next line to see the regular files only:
ls -lR | grep ^-
Then I would like to display the content of the files found there. How do I do it?
I would do something like:
$ cat `ls -lR | egrep "^-" | rev | cut -d ' ' -f 1 | rev`
Use ls to find the files
grep finds your pattern
reverse the whole result
cut out the first file separated field to get the file name (files with spaces are problematic)
reverse the file name back to normal direction
Backticks will execute that and return the list of file names to cat.
or the way I would probably do it is use vim to look at each file.
$ vim `ls -lR | egrep "^-" | rev | cut -d ' ' -f 1 | rev`
It feels like you are trying to find only the files recursively. This is what I do in those cases:
$ vim `find . -type f -print`
There are multiple ways of doing it. Would try to give you a few easy and clean ways here. All of them handle filenames with space.
$ find . -type f -print0 | xargs -0 cat
-print0 adds a null character '\0' delimiter and you need to call xargs -0 to recognise the null delimiter. If you don't do that, whitespace in the filename create problems.
e.g. without -print0 filenames: abc 123.txt and 1.inc would be read as three separate files abc, 123.txt and 1.inc.
with -print0 this becomes abc 123.txt'\0' and 1.inc'\0' and would be read as abc 123.txt and 1.inc
As for xargs, it can accept the input as a parameter. command1 | xargs command2 means the output of command1 is passed to command2.
cat displays the content of the file.
$ find . -type f -exec echo {} \; -exec cat {} \;
This is just using the find command. It finds all the files (type f), calls echo to output the filename, then calls cat to display its content.
If you don't want the filename, omit -exec echo {} \;
Alternatively you can use cat command and pass the output of find.
$ cat `find . -type f -print`
If you want to scroll through the content of multiple files one by one. You can use.
$ less `find . -type f -print`
When using less, you can navigate through :n and :p for next and previous file respectively. press q to quit less.

how to count number of files with a specific entry?

In my script i am using the following...
edi824Files=`find $EDI_824_FILE_DIR -name "*$ediFileDate*.edi" -a "(" -mtime -1 ")"`
here $ediFileDate should be the current date, and $EDI_824_FILE_DIR has the value like DIR/. e.g: DIR/2014-01-24
the above one is working fine, but it just pull all the files with .edi extensions. But i would like to get only the files with a specific entry. for e.g: the files which containing the value "824". how can i achieve this?
Try this:
find $EDI_824_FILE_DIR -name "*$ediFileDate*.edi" -a "(" -mtime -1 ")" -exec grep -l "824" {} \;
Pipe find through xargs and grep:
find ... | xargs grep '824'
This will list all the lines that contain "824", along with the filename and line number. If you just want the filenames, then:
find ... | xargs grep -l '824'
xargs reads from stdin. For each line read, it executes the command, passing the line as the last argument to the command.

Linux command: How to 'find' only text files?

After a few searches from Google, what I come up with is:
find my_folder -type f -exec grep -l "needle text" {} \; -exec file {} \; | grep text
which is very unhandy and outputs unneeded texts such as mime type information. Any better solutions? I have lots of images and other binary files in the same folder with a lot of text files that I need to search through.
I know this is an old thread, but I stumbled across it and thought I'd share my method which I have found to be a very fast way to use find to find only non-binary files:
find . -type f -exec grep -Iq . {} \; -print
The -I option to grep tells it to immediately ignore binary files and the . option along with the -q will make it immediately match text files so it goes very fast. You can change the -print to a -print0 for piping into an xargs -0 or something if you are concerned about spaces (thanks for the tip, #lucas.werkmeister!)
Also the first dot is only necessary for certain BSD versions of find such as on OS X, but it doesn't hurt anything just having it there all the time if you want to put this in an alias or something.
EDIT: As #ruslan correctly pointed out, the -and can be omitted since it is implied.
Based on this SO question :
grep -rIl "needle text" my_folder
Why is it unhandy? If you need to use it often, and don't want to type it every time just define a bash function for it:
function findTextInAsciiFiles {
# usage: findTextInAsciiFiles DIRECTORY NEEDLE_TEXT
find "$1" -type f -exec grep -l "$2" {} \; -exec file {} \; | grep text
}
put it in your .bashrc and then just run:
findTextInAsciiFiles your_folder "needle text"
whenever you want.
EDIT to reflect OP's edit:
if you want to cut out mime informations you could just add a further stage to the pipeline that filters out mime informations. This should do the trick, by taking only what comes before :: cut -d':' -f1:
function findTextInAsciiFiles {
# usage: findTextInAsciiFiles DIRECTORY NEEDLE_TEXT
find "$1" -type f -exec grep -l "$2" {} \; -exec file {} \; | grep text | cut -d ':' -f1
}
find . -type f -print0 | xargs -0 file | grep -P text | cut -d: -f1 | xargs grep -Pil "search"
This is unfortunately not space save. Putting this into bash script makes it a bit easier.
This is space safe:
#!/bin/bash
#if [ ! "$1" ] ; then
echo "Usage: $0 <search>";
exit
fi
find . -type f -print0 \
| xargs -0 file \
| grep -P text \
| cut -d: -f1 \
| xargs -i% grep -Pil "$1" "%"
Another way of doing this:
# find . |xargs file {} \; |grep "ASCII text"
If you want empty files too:
# find . |xargs file {} \; |egrep "ASCII text|empty"
How about this:
$ grep -rl "needle text" my_folder | tr '\n' '\0' | xargs -r -0 file | grep -e ':[^:]*text[^:]*$' | grep -v -e 'executable'
If you want the filenames without the file types, just add a final sed filter.
$ grep -rl "needle text" my_folder | tr '\n' '\0' | xargs -r -0 file | grep -e ':[^:]*text[^:]*$' | grep -v -e 'executable' | sed 's|:[^:]*$||'
You can filter-out unneeded file types by adding more -e 'type' options to the last grep command.
EDIT:
If your xargs version supports the -d option, the commands above become simpler:
$ grep -rl "needle text" my_folder | xargs -d '\n' -r file | grep -e ':[^:]*text[^:]*$' | grep -v -e 'executable' | sed 's|:[^:]*$||'
Here's how I've done it ...
1 . make a small script to test if a file is plain text
istext:
#!/bin/bash
[[ "$(file -bi $1)" == *"file"* ]]
2 . use find as before
find . -type f -exec istext {} \; -exec grep -nHi mystring {} \;
Here's a simplified version with extended explanation for beginners like me who are trying to learn how to put more than one command in one line.
If you were to write out the problem in steps, it would look like this:
// For every file in this directory
// Check the filetype
// If it's an ASCII file, then print out the filename
To achieve this, we can use three UNIX commands: find, file, and grep.
find will check every file in the directory.
file will give us the filetype. In our case, we're looking for a return of 'ASCII text'
grep will look for the keyword 'ASCII' in the output from file
So how can we string these together in a single line? There are multiple ways to do it, but I find that doing it in order of our pseudo-code makes the most sense (especially to a beginner like me).
find ./ -exec file {} ";" | grep 'ASCII'
Looks complicated, but not bad when we break it down:
find ./ = look through every file in this directory. The find command prints out the filename of any file that matches the 'expression', or whatever comes after the path, which in our case is the current directory or ./
The most important thing to understand is that everything after that first bit is going to be evaluated as either True or False. If True, the file name will get printed out. If not, then the command moves on.
-exec = this flag is an option within the find command that allows us to use the result of some other command as the search expression. It's like calling a function within a function.
file {} = the command being called inside of find. The file command returns a string that tells you the filetype of a file. Regularly, it would look like this: file mytextfile.txt. In our case, we want it to use whatever file is being looked at by the find command, so we put in the curly braces {} to act as an empty variable, or parameter. In other words, we're just asking for the system to output a string for every file in the directory.
";" = this is required by find and is the punctuation mark at the end of our -exec command. See the manual for 'find' for more explanation if you need it by running man find.
| grep 'ASCII' = | is a pipe. Pipe take the output of whatever is on the left and uses it as input to whatever is on the right. It takes the output of the find command (a string that is the filetype of a single file) and tests it to see if it contains the string 'ASCII'. If it does, it returns true.
NOW, the expression to the right of find ./ will return true when the grep command returns true. Voila.
I have two issues with histumness' answer:
It only list text files. It does not actually search them as
requested. To actually search, use
find . -type f -exec grep -Iq . {} \; -and -print0 | xargs -0 grep "needle text"
It spawns a grep process for every file, which is very slow. A better solution is then
find . -type f -print0 | xargs -0 grep -IZl . | xargs -0 grep "needle text"
or simply
find . -type f -print0 | xargs -0 grep -I "needle text"
This only takes 0.2s compared to 4s for solution above (2.5GB data / 7700 files), i.e. 20x faster.
Also, nobody cited ag, the Silver Searcher or ack-grep¸as alternatives. If one of these are available, they are much better alternatives:
ag -t "needle text" # Much faster than ack
ack -t "needle text" # or ack-grep
As a last note, beware of false positives (binary files taken as text files). I already had false positive using either grep/ag/ack, so better list the matched files first before editing the files.
Although it is an old question, I think this info bellow will add to the quality of the answers here.
When ignoring files with the executable bit set, I just use this command:
find . ! -perm -111
To keep it from recursively enter into other directories:
find . -maxdepth 1 ! -perm -111
No need for pipes to mix lots of commands, just the powerful plain find command.
Disclaimer: it is not exactly what OP asked, because it doesn't check if the file is binary or not. It will, for example, filter out bash script files, that are text themselves but have the executable bit set.
That said, I hope this is useful to anyone.
I do it this way:
1) since there're too many files (~30k) to search thru, I generate the text file list daily for use via crontab using below command:
find /to/src/folder -type f -exec file {} \; | grep text | cut -d: -f1 > ~/.src_list &
2) create a function in .bashrc:
findex() {
cat ~/.src_list | xargs grep "$*" 2>/dev/null
}
Then I can use below command to do the search:
findex "needle text"
HTH:)
I prefer xargs
find . -type f | xargs grep -I "needle text"
if your filenames are weird look up using the -0 options:
find . -type f -print0 | xargs -0 grep -I "needle text"
bash example to serach text "eth0" in /etc in all text/ascii files
grep eth0 $(find /etc/ -type f -exec file {} \; | egrep -i "text|ascii" | cut -d ':' -f1)
If you are interested in finding any file type by their magic bytes using the awesome file utility combined with power of find, this can come in handy:
$ # Let's make some test files
$ mkdir ASCII-finder
$ cd ASCII-finder
$ dd if=/dev/urandom of=binary.file bs=1M count=1
1+0 records in
1+0 records out
1048576 bytes (1.0 MB, 1.0 MiB) copied, 0.009023 s, 116 MB/s
$ file binary.file
binary.file: data
$ echo 123 > text.txt
$ # Let the magic begin
$ find -type f -print0 | \
xargs -0 -I ## bash -c 'file "$#" | grep ASCII &>/dev/null && echo "file is ASCII: $#"' -- ##
Output:
file is ASCII: ./text.txt
Legend: $ is the interactive shell prompt where we enter our commands
You can modify the part after && to call some other script or do some other stuff inline as well, i.e. if that file contains given string, cat the entire file or look for a secondary string in it.
Explanation:
find items that are files
Make xargs feed each item as a line into one liner bash
command/script
file checks type of file by magic byte, grep checks if ASCII
exists, if so, then after && your next command executes.
find prints results null separated, this is good to escape
filenames with spaces and meta-characters in it.
xargs , using -0 option, reads them null separated, -I ##
takes each record and uses as positional parameter/args to bash
script.
-- for bash ensures whatever comes after it is an argument even
if it starts with - like -c which could otherwise be interpreted
as bash option
If you need to find types other than ASCII, simply replace grep ASCII with other type, like grep "PDF document, version 1.4"
find . -type f | xargs file | grep "ASCII text" | awk -F: '{print $1}'
Use find command to list all files, use file command to verify they are text (not tar,key), finally use awk command to filter and print the result.
How about this
find . -type f|xargs grep "needle text"

Resources