Find and show information from logs inside a folder in linux - linux

I'm trying to create a little script using bash in linux. That allows me to find if there is any tag 103=16 inside a log
I have multiple folders named for example l51prdsrv-api1.nebex.local, l51prdsrv-oe1.nebex.local, etc... inside those folders are .log files like TRADX_gsoe3.log, TRADX_gseuoe2.log, etc... .
I need to find if inside those logs there is the tag 103=16
I'm trying this command
find . /opt/FIXLOGS/l51prdsrv* -iname "TRADX_" -type f | grep -e 103=16
But what it does is that is showing just the logs names and not the content to see if there is a tag 103=16

First of all, you are not searching files of the form TRADX_something.log, but only files which are just named TRADX_ (case-insensitively, so TradX_ would also be found).
Then you are feeding to grep the names of the files, but never look into the content of those files. From the grep man page, you see that the file content can be supplied either via stdin, or by specifying the file name on the command line. In your case, the latter is the way to go. Therefore you can either do a
find . /opt/FIXLOGS/l51prdsrv* -iname "TRADX_*.log" -type f -exec grep -F 103=16 {} \;
if you are only interested in the matchin lines, or a
find . /opt/FIXLOGS/l51prdsrv* -iname "TRADX_*.log" -type f -exec grep -F 103=16 {} /dev/null \;
if you also want to see the file names where the pattern matches. The reason is that grep is printing the filename only if it sees more than 1 filename on the command line and the /dev/null provides a second dummy file. find replaces the {} by the filename.
BTW, I used -f for grep instead of your -e, because you don't seem to use any specific regular expression pattern anyway.
But you don't need find for this task. An alternative would be an explicit loop:
shopt -s nocasematch # make globbing case-insensitive
shopt -s globstar # turn on ** globbing
for f in {.,/opt/FIXLOGS/l51prdsrv*}/**/tradx_*.log
do
[[ -f $f ]] && grep -F 103=16 "$f" /dev/null
done
While the loop looks more complicated at first glance, it is easier to extend the logic in case you want to do more with the files instead of just grepping the lines, for instance taking specific actions on those files which contain the pattern.

You are doing:
find . /opt/FIXLOGS/l51prdsrv* -iname "TRADX_" -type f | grep -e 103=16
I propose you do:
find . /opt/FIXLOGS/l51prdsrv* -iname "TRADX_" -type f -exec grep -e "103=16" {} /dev/null \;
What's the difference?
find ... -type f
=> gives you a list of files.
When you add | grep -e 103=16, then you perform that on the filenames.
When you add -exec grep ..., then you perform that on the files itselfs.

Related

How to use grep to reverse search files in a folder

I'm trying to create a script which will find missing topics from multiple log files. These logfiles are filled top down, so the newest logs are at the bottom of the file. I would like to grep only the last line from this file which includes UNKNOWN_TOPIC_OR_PARTITION. This should be done in multiple files with completely different names. Is grep the best solution or is there another solution that suits my needs. I already tried adding tail, but that doesn't seem to work.
missingTopics=$(grep -Ri -m1 --exclude=*.{1,2,3,4,5} UNKNOWN_TOPIC_OR_PARTITION /app/tibco/log/tra/domain/)
You could try a combination of find, tac and grep:
find /app/tibco/log/tra/domain -type f ! -name '*.[1-5]' -exec sh -c \
'tac "$1" | grep -im1 UNKNOWN_TOPIC_OR_PARTITION' "sh" '{}' \;
tac prints files in reverse, the -exec sh -c SCRIPT "sh" '{}' \; action of find executes the shell SCRIPT each time a file matching the previous tests is found. The SCRIPT is executed with "sh" as parameter $0 and the path of the found file as parameter $1.
If performance is an issue you can probably improve it with:
find . -type f ! -name '*.[1-5]' -exec sh -c 'for f in "$#"; do \
tac "$f" | grep -im1 UNKNOWN_TOPIC_OR_PARTITION; done' "sh" '{}' +
which will spawn less shells. If security is also an issue you can also replace -exec by -execdir (even if with this SCRIPT I do not immediately see any exploit).

Piping find results into grep for fast directory exclusion

I am successfully using find to create a list of all files in the current subdirectory, excluding those in the subdirectory "cache." Here's my first bit of code:
find . -wholename './cach*' -prune -o -print
I now wish to pipe this into a grep command. It seems like that should be simple:
find . -wholename './cach*' -prune -o -print | xargs grep -r -R -i "samson"
... but this is returning results that are mostly from the cache directory. I've tried removing the xargs reference, but that does what you'd expect, running the grep on text of the file names, rather than on the files themselves. My goal is to find "samson" in any files that aren't cached content.
I'll probably get around this issue by just using doubled greps in this instance, but I'm very curious about why this one-liner behaves this way. I'd love to hear thoughts on a way to modify it while still using these two commands (as there are speed advantages to doing it this way).
(This is in CentOS 5, btw.)
The wholename match may be the reason why it's still including "cache" files. If you're executing the find command in the directory that contains the "cache" folder, it should work. If not, try changing it to -name '*cache*' instead.
Also, you do not need the -r or -R for your grep, that tells it to recurse through directories - but you're testing individual files.
You can update your command using the piped version, or a single-command:
find . -name '*cache*' -prune -o -print0 | xargs -0 grep -il "samson"
or
find . -name '*cache*' -prune -o -exec grep -iq "samson" {} \; -print
Note, the -l in the first command tells grep to "list the file" and not the line(s) that match. The -q in the second does the same; it tells grep to respond quietly so find will then just print the filename.
You've told grep itself to recurse (twice! -r and -R are synonyms). Since one of the arguments you're passing is . (the top directory), grep is searching in every file (some of them twice, or even more if they're in subdirectories).
If you're going to use find and grep, do this:
find . -path './cach*' -prune -o -print0 | xargs -0 grep -i "samson"
Using -print0 and -0 makes your script work even with file names that contain spaces or punctuation characters.
However, you probably don't need to bother with find here, since GNU grep is capable of excluding directories:
grep -R --exclude-dir='cach*' -i "samson" .
(This also excludes ./deeply/nested/directory/cache. If you only want to exclude cache directories at the toplevel, use find as you did.)
Use the -exec option on find instead of piping them to another command. From there you can use grep "samson" {} \; to look for samson in each file listed.
For example:
find . -wholename './cach*' -prune -o -exec grep "samson" "{}" +

Remove files not containing a specific string

I want to find the files not containing a specific string (in a directory and its sub-directories) and remove those files. How I can do this?
The following will work:
find . -type f -print0 | xargs --null grep -Z -L 'my string' | xargs --null rm
This will firstly use find to print the names of all the files in the current directory and any subdirectories. These names are printed with a null terminator rather than the usual newline separator (try piping the output to od -c to see the effect of the -print0 argument.
Then the --null parameter to xargs tells it to accept null-terminated inputs. xargs will then call grep on a list of filenames.
The -Z argument to grep works like the -print0 argument to find, so grep will print out its results null-terminated (which is why the final call to xargs needs a --null option too). The -L argument to grep causes grep to print the filenames of those files on its command line (that xargs has added) which don't match the regular expression:
my string
If you want simple matching without regular expression magic then add the -F option. If you want more powerful regular expressions then give a -E argument. It's a good habit to use single quotes rather than double quotes as this protects you against any shell magic being applied to the string (such as variable substitution)
Finally you call xargs again to get rid of all the files that you've found with the previous calls.
The problem with calling grep directly from the find command with the -exec argument is that grep then gets invoked once per file rather than once for a whole batch of files as xargs does. This is much faster if you have lots of files. Also don't be tempted to do stuff like:
rm $(some command that produces lots of filenames)
It's always better to pass it to xargs as this knows the maximum command-line limits and will call rm multiple times each time with as many arguments as it can.
Note that this solution would have been simpler without the need to cope with files containing white space and new lines.
Alternatively
grep -r -L -Z 'my string' . | xargs --null rm
will work too (and is shorter). The -r argument to grep causes it to read all files in the directory and recursively descend into any subdirectories). Use the find ... approach if you want to do some other tests on the files as well (such as age or permissions).
Note that any of the single letter arguments, with a single dash introducer, can be grouped together (for instance as -rLZ). But note also that find does not use the same conventions and has multi-letter arguments introduced with a single dash. This is for historical reasons and hasn't ever been fixed because it would have broken too many scripts.
GNU grep and bash.
grep -rLZ "$str" . | while IFS= read -rd '' x; do rm "$x"; done
Use a find solution if portability is needed. This is slightly faster.
EDIT: This is how you SHOULD NOT do this! Reason is given here. Thanks to #ormaaj for pointing it out!
find . -type f | grep -v "exclude string" | xargs rm
Note: grep pattern will match against full file path from current directory (see find . -type f output)
One possibility is
find . -type f '!' -exec grep -q "my string" {} \; -exec echo rm {} \;
You can remove the echo if the output of this preview looks correct.
The equivalent with -delete is
find . -type f '!' -exec grep -q "user_id" {} \; -delete
but then you don't get the nice preview option.
To remove files not containing a specific string:
Bash:
To use them, enable the extglob shell option as follows:
shopt -s extglob
And just remove all files that don't have the string "fix":
rm !(*fix*)
If you want to don't delete all the files that don't have the names "fix" and "class":
rm !(*fix*|*class*)
Zsh:
To use them, enable the extended glob zsh shell option as follows:
setopt extended_glob
Remove all files that don't have the string, in this example "fix":
rm -- ^*fix*
If you want to don't delete all the files that don't have the names "fix" and "class":
rm -- ^(*fix*|*class*)
It's possible to use it for extensions, you only need to change the regex: (.zip) , (.doc), etc.
Here are the sources:
https://www.tecmint.com/delete-all-files-in-directory-except-one-few-file-extensions/
https://codeday.me/es/qa/20190819/1296122.html
I can think of a few ways to approach this. Here's one: find and grep to generate a list of files with no match, and then xargs rm them.
find yourdir -type f -exec grep -F -L 'yourstring' '{}' + | xargs -d '\n' rm
This assumes GNU tools (grep -L and xargs -d are non-portable) and of course no filenames with newlines in them. It has the advantage of not running grep and rm once per file, so it'll be reasonably fast. I recommend testing it with "echo" in place of "rm" just to make sure it picks the right files before you unleash the destruction.
This worked for me, you can remove the -f if you're okay with deleting directories.
myString="keepThis"
for x in `find ./`
do if [[ -f $x && ! $x =~ $myString ]]
then rm $x
fi
done
Another solution (although not as fast). The top solution didn't work in my case because the string I needed to use in place of 'my string' has special characters.
find -type f ! -name "*my string*" -exec rm {} \; -print

Run expand on find results

I'm trying to run the expand shell command on all files found by a find command. I've tried -exec and xargs but both failed. Can anyone explain me why? I'm on a mac for the record.
find . -name "*.php" -exec expand -t 4 {} > {} \;
This just creates a file {} with all the output instead of overwriting each individual found file itself.
find . -name "*.php" -print0 | xargs -0 -I expand -t 4 {} > {}
And this just outputs
4 {}
xargs: 4: No such file or directory
Your command does not work for two reasons.
The output redirection is done by the shell and not by find. That means that the shell will redirect finds output into the file {}.
The redirection would occur immediately. That means that the file will be written even before it is read by the expand command. So it's not possible to redirect a command's output into the input file.
Unfortunately expand doesn't allow to write it's output into a file. So you have to use output redirection. If you use bash you could define a function that executes expand, redirects the output into a temporary file and move the temporary file back over the original file. The problem is that find will run a new shell to execute the expand command.
But there is a solution:
expand_func () {
expand -t 4 "$1" > "$1.tmp"
mv "$1.tmp" "$1"
}
export -f expand_func
find . -name \*.php -exec bash -c 'expand_func {}' \;
You are exporting the function expand_func to sub shells using export -f. And you don't execute expand itself using find -exec but you execute a new bash that executes the exported expand_func.
'expand' isn't really worth the trouble.
You can just use sed instead:
find . -name "*.php" | xargs sed -i -e 's/\t/ /g'

How to list specific type of files in recursive directories in shell?

How can we find specific type of files i.e. doc pdf files present in nested directories.
command I tried:
$ ls -R | grep .doc
but if there is a file name like alok.doc.txt the command will display that too which is obviously not what I want. What command should I use instead?
If you are more confortable with "ls" and "grep", you can do what you want using a regular expression in the grep command (the ending '$' character indicates that .doc must be at the end of the line. That will exclude "file.doc.txt"):
ls -R |grep "\.doc$"
More information about using grep with regular expressions in the man.
ls command output is mainly intended for reading by humans. For advanced querying for automated processing, you should use more powerful find command:
find /path -type f \( -iname "*.doc" -o -iname "*.pdf" \)
As if you have bash 4.0++
#!/bin/bash
shopt -s globstar
shopt -s nullglob
for file in **/*.{pdf,doc}
do
echo "$file"
done
find . | grep "\.doc$"
This will show the path as well.
Some of the other methods that can be used:
echo *.{pdf,docx,jpeg}
stat -c %n * | grep 'pdf\|docx\|jpeg'
We had a similar question. We wanted a list - with paths - of all the config files in the etc directory. This worked:
find /etc -type f \( -iname "*.conf" \)
It gives a nice list of all the .conf file with their path. Output looks like:
/etc/conf/server.conf
But, we wanted to DO something with ALL those files, like grep those files to find a word, or setting, in all the files. So we use
find /etc -type f \( -iname "*.conf" \) -print0 | xargs -0 grep -Hi "ServerName"
to find via grep ALL the config files in /etc that contain a setting like "ServerName" Output looks like:
/etc/conf/server.conf: ServerName "default-118_11_170_172"
Hope you find it useful.
Sid
Similarly if you prefer using the wildcard character * (not quite like the regex suggestions) you can just use ls with both the -l flag to list one file per line (like grep) and the -R flag like you had. Then you can specify the files you want to search for with *.doc
I.E. Either
ls -l -R *.doc
or if you want it to list the files on fewer lines.
ls -R *.doc
If you have files with extensions that don't match the file type, you could use the file utility.
find $PWD -type f -exec file -N \{\} \; | grep "PDF document" | awk -F: '{print $1}'
Instead of $PWD you can use the directory you want to start the search in. file prints even out he PDF version.

Resources