I have a case where multiple .bz2 files are situated in subdirectories. And I want to search for a text, from all files, using bzcat and grep command linux commands.
I am able to search one-one file by using the following command:
bzcat <filename.bz2> | grep -ia 'text string' | less
But I now I need to do the above for all files in subdirectories.
You can use bzgrep instead of bzcat and grep. This is faster.
To grep recursively in a directory tree use find:
find -type f -name '*.bz2' -execdir bzgrep "pattern" {} \;
find is searching recursively for all files with the *.bz2 extension and applies the command specified with -execdir to them.
There are several methods:
bzgrep regexp $(find -name \*.bz2)
This method will work if number of the found files is not very big (and they have no special characters in the pathes). Otherwise you better use this one:
find -name \*.bz2 -exec bzgrep regexp {} /dev/null \;
Please note /dev/null in the second method. You use it to make bzgrep print the filename,
where the regexp was found.
Just try to use:
bzgrep --help
grep through bzip2 files
Usage: bzgrep [grep_options] pattern [files]
For example, I need grep information from list of files by number 1941974:
'billing_log_1.bz'
'billing_log_2.bz'
'billing_log_3.bz'
'billing_log_4.bz'
'billing_log_5.bz'
What can I do?
bzgrep '1941974' billing_log_1
Continuous your code with fixes by bzcat:
find . -type f -name "*.bz2" |while read file
do
bzcat $file | grep -ia 'text string' | less
done
Related
I am running -> "find . -name '*.txt'" command and getting list of files.
I am getting below mention output:
./bsd/contrib/amd/ldap-id.txt
./bsd/contrib/expat/tests/benchmark/README.txt
./bsd/contrib/expat/tests/README.txt
./bsd/lib/libc/softfloat/README.txt
and so on,
Out of these files how can i run grep command and read contents and filter only those files which have certain keyword? for e.g. "version" in it.
xargs is a great way to accomplish this, and its already been covered.
The -exec option of find is also useful for this. It will perform a command over all files returned from find.
To invoke grep as few times as possible, passing multiple filenames to each call:
find . -name '*.txt' -exec grep -H 'foo' {} +
Alternately, to invoke grep exactly once for each file found:
find . -name '*.txt' -exec grep -H 'foo' {} ';'
In either case, {} is like a placeholder for the values from find; if your shell is zsh, it may be necessary to escape it, as in '{}'.
There are several ways to accomplish this.
If there are non-.txt files which might usefully contain the keyword:
grep -r KEYWORD *
This uses the recursive directory search option of grep.
To search only .txt files:
find . -name '*.txt' -exec grep KEYWORD {} \;
or
find . -name '*.txt' -exec grep KEYWORD {} +
or
find . -execdir grep KEYWORD {}
The first runs grep for each matching file. The second runs grep much fewer times, accumulating many matched files before invoking grep. The third form runsgrep` once in every directory.
There is usually a function built into find for that, but to be portable across platforms, I typically use xargs. Say you want to find all the xml files in or below the current directly and get a list of each occurrence of 'foo', you can do this:
find ./ -type f -name '*.xml' -print0 | xargs -0 -n 1 grep -H foo
It should be self-explanatory except for the -print0, which separates filenames with NULs rather than newlines, and the -0, which tells xargs to use those NULs rather than interpreting spaces and quotes as syntax (which can confuse it if filenames contain either).
I want to search an expression say "abcd" in all files of a directory but exclude some file of a certain type in my search
like
grep -rn 'abcd' *
But the result should not include expression found in files with extensions .js and .h. How will I do that?
use the --exclude option
grep 'your string here' -r --exclude=\*.{js,h}
So, at this point I typically advise to use ack, which is like grep, but has a lot of built-in features like file type selection.
But with grep, and a bit of shell magic, this can work:
find -not -iname '*.h' -not -iname '*.js' -print0 | xargs -0 grep -rn 'abcd'
I am having files like a_dbg.txt, b_dbg.txt ... in a Suse 10 system. I want to write a bash shell script which should rename these files by removing "_dbg" from them.
Google suggested me to use rename command. So I executed the command rename _dbg.txt .txt *dbg* on the CURRENT_FOLDER
My actual CURRENT_FOLDER contains the below files.
CURRENT_FOLDER/a_dbg.txt
CURRENT_FOLDER/b_dbg.txt
CURRENT_FOLDER/XX/c_dbg.txt
CURRENT_FOLDER/YY/d_dbg.txt
After executing the rename command,
CURRENT_FOLDER/a.txt
CURRENT_FOLDER/b.txt
CURRENT_FOLDER/XX/c_dbg.txt
CURRENT_FOLDER/YY/d_dbg.txt
Its not doing recursively, how to make this command to rename files in all subdirectories. Like XX and YY I will be having so many subdirectories which name is unpredictable. And also my CURRENT_FOLDER will be having some other files also.
You can use find to find all matching files recursively:
find . -iname "*dbg*" -exec rename _dbg.txt .txt '{}' \;
EDIT: what the '{}' and \; are?
The -exec argument makes find execute rename for every matching file found. '{}' will be replaced with the path name of the file. The last token, \; is there only to mark the end of the exec expression.
All that is described nicely in the man page for find:
-exec utility [argument ...] ;
True if the program named utility returns a zero value as its
exit status. Optional arguments may be passed to the utility.
The expression must be terminated by a semicolon (``;''). If you
invoke find from a shell you may need to quote the semicolon if
the shell would otherwise treat it as a control operator. If the
string ``{}'' appears anywhere in the utility name or the argu-
ments it is replaced by the pathname of the current file.
Utility will be executed from the directory from which find was
executed. Utility and arguments are not subject to the further
expansion of shell patterns and constructs.
For renaming recursively I use the following commands:
find -iname \*.* | rename -v "s/ /-/g"
small script i wrote to replace all files with .txt extension to .cpp extension under /tmp and sub directories recursively
#!/bin/bash
for file in $(find /tmp -name '*.txt')
do
mv $file $(echo "$file" | sed -r 's|.txt|.cpp|g')
done
with bash:
shopt -s globstar nullglob
rename _dbg.txt .txt **/*dbg*
find -execdir rename also works for non-suffix replacements on basenames
https://stackoverflow.com/a/16541670/895245 works directly only for suffixes, but this will work for arbitrary regex replacements on basenames:
PATH=/usr/bin find . -depth -execdir rename 's/_dbg.txt$/_.txt' '{}' \;
or to affect files only:
PATH=/usr/bin find . -type f -execdir rename 's/_dbg.txt$/_.txt' '{}' \;
-execdir first cds into the directory before executing only on the basename.
Tested on Ubuntu 20.04, find 4.7.0, rename 1.10.
Convenient and safer helper for it
find-rename-regex() (
set -eu
find_and_replace="$1"
PATH="$(echo "$PATH" | sed -E 's/(^|:)[^\/][^:]*//g')" \
find . -depth -execdir rename "${2:--n}" "s/${find_and_replace}" '{}' \;
)
GitHub upstream.
Sample usage to replace spaces ' ' with hyphens '-'.
Dry run that shows what would be renamed to what without actually doing it:
find-rename-regex ' /-/g'
Do the replace:
find-rename-regex ' /-/g' -v
Command explanation
The awesome -execdir option does a cd into the directory before executing the rename command, unlike -exec.
-depth ensure that the renaming happens first on children, and then on parents, to prevent potential problems with missing parent directories.
-execdir is required because rename does not play well with non-basename input paths, e.g. the following fails:
rename 's/findme/replaceme/g' acc/acc
The PATH hacking is required because -execdir has one very annoying drawback: find is extremely opinionated and refuses to do anything with -execdir if you have any relative paths in your PATH environment variable, e.g. ./node_modules/.bin, failing with:
find: The relative path ‘./node_modules/.bin’ is included in the PATH environment variable, which is insecure in combination with the -execdir action of find. Please remove that entry from $PATH
See also: https://askubuntu.com/questions/621132/why-using-the-execdir-action-is-insecure-for-directory-which-is-in-the-path/1109378#1109378
-execdir is a GNU find extension to POSIX. rename is Perl based and comes from the rename package.
Rename lookahead workaround
If your input paths don't come from find, or if you've had enough of the relative path annoyance, we can use some Perl lookahead to safely rename directories as in:
git ls-files | sort -r | xargs rename 's/findme(?!.*\/)\/?$/replaceme/g' '{}'
I haven't found a convenient analogue for -execdir with xargs: https://superuser.com/questions/893890/xargs-change-working-directory-to-file-path-before-executing/915686
The sort -r is required to ensure that files come after their respective directories, since longer paths come after shorter ones with the same prefix.
Tested in Ubuntu 18.10.
Script above can be written in one line:
find /tmp -name "*.txt" -exec bash -c 'mv $0 $(echo "$0" | sed -r \"s|.txt|.cpp|g\")' '{}' \;
If you just want to rename and don't mind using an external tool, then you can use rnm. The command would be:
#on current folder
rnm -dp -1 -fo -ssf '_dbg' -rs '/_dbg//' *
-dp -1 will make it recursive to all subdirectories.
-fo implies file only mode.
-ssf '_dbg' searches for files with _dbg in the filename.
-rs '/_dbg//' replaces _dbg with empty string.
You can run the above command with the path of the CURRENT_FOLDER too:
rnm -dp -1 -fo -ssf '_dbg' -rs '/_dbg//' /path/to/the/directory
You can use this below.
rename --no-act 's/\.html$/\.php/' *.html */*.html
This command worked for me. Remember first to install the perl rename package:
find -iname \*.* | grep oldname | rename -v "s/oldname/newname/g
To expand on the excellent answer #CiroSantilliПутлерКапут六四事 : do not match files in the find that we don't have to rename.
I have found this to improve performance significantly on Cygwin.
Please feel free to correct my ineffective bash coding.
FIND_STRING="ZZZZ"
REPLACE_STRING="YYYY"
FIND_PARAMS="-type d"
find-rename-regex() (
set -eu
find_and_replace="${1}/${2}/g"
echo "${find_and_replace}"
find_params="${3}"
mode="${4}"
if [ "${mode}" = 'real' ]; then
PATH="$(echo "$PATH" | sed -E 's/(^|:)[^\/][^:]*//g')" \
find . -depth -name "*${1}*" ${find_params} -execdir rename -v "s/${find_and_replace}" '{}' \;
elif [ "${mode}" = 'dryrun' ]; then
echo "${mode}"
PATH="$(echo "$PATH" | sed -E 's/(^|:)[^\/][^:]*//g')" \
find . -depth -name "*${1}*" ${find_params} -execdir rename -n "s/${find_and_replace}" '{}' \;
fi
)
find-rename-regex "${FIND_STRING}" "${REPLACE_STRING}" "${FIND_PARAMS}" "dryrun"
# find-rename-regex "${FIND_STRING}" "${REPLACE_STRING}" "${FIND_PARAMS}" "real"
In case anyone is comfortable with fd and rnr, the command is:
fd -t f -x rnr '_dbg.txt' '.txt'
rnr only command is:
rnr -f -r '_dbg.txt' '.txt' *
rnr has the benefit of being able to undo the command.
On Ubuntu (after installing rename), this simpler solution worked the best for me. This replaces space with underscore, but can be modified as needed.
find . -depth | rename -d -v -n "s/ /_/g"
The -depth flag is telling find to traverse the depth of a directory first, which is good because I want to rename the leaf nodes first.
The -d flag on rename tells it to only rename the filename component of the path. I don't know how general the behavior is but on my installation (Ubuntu 20.04), it could be the file or the directory as long as it is the leaf node of the path.
I recommend the -n (no action) flag first along with -v, so you can see what would get renamed and how.
Using the two flags together, it renames all the files in a directory first and then the directory itself. Working backwards. Which is exactly what I needed.
classic solution:
for f in $(find . -name "*dbg*"); do mv $f $(echo $f | sed 's/_dbg//'); done
How can we find specific type of files i.e. doc pdf files present in nested directories.
command I tried:
$ ls -R | grep .doc
but if there is a file name like alok.doc.txt the command will display that too which is obviously not what I want. What command should I use instead?
If you are more confortable with "ls" and "grep", you can do what you want using a regular expression in the grep command (the ending '$' character indicates that .doc must be at the end of the line. That will exclude "file.doc.txt"):
ls -R |grep "\.doc$"
More information about using grep with regular expressions in the man.
ls command output is mainly intended for reading by humans. For advanced querying for automated processing, you should use more powerful find command:
find /path -type f \( -iname "*.doc" -o -iname "*.pdf" \)
As if you have bash 4.0++
#!/bin/bash
shopt -s globstar
shopt -s nullglob
for file in **/*.{pdf,doc}
do
echo "$file"
done
find . | grep "\.doc$"
This will show the path as well.
Some of the other methods that can be used:
echo *.{pdf,docx,jpeg}
stat -c %n * | grep 'pdf\|docx\|jpeg'
We had a similar question. We wanted a list - with paths - of all the config files in the etc directory. This worked:
find /etc -type f \( -iname "*.conf" \)
It gives a nice list of all the .conf file with their path. Output looks like:
/etc/conf/server.conf
But, we wanted to DO something with ALL those files, like grep those files to find a word, or setting, in all the files. So we use
find /etc -type f \( -iname "*.conf" \) -print0 | xargs -0 grep -Hi "ServerName"
to find via grep ALL the config files in /etc that contain a setting like "ServerName" Output looks like:
/etc/conf/server.conf: ServerName "default-118_11_170_172"
Hope you find it useful.
Sid
Similarly if you prefer using the wildcard character * (not quite like the regex suggestions) you can just use ls with both the -l flag to list one file per line (like grep) and the -R flag like you had. Then you can specify the files you want to search for with *.doc
I.E. Either
ls -l -R *.doc
or if you want it to list the files on fewer lines.
ls -R *.doc
If you have files with extensions that don't match the file type, you could use the file utility.
find $PWD -type f -exec file -N \{\} \; | grep "PDF document" | awk -F: '{print $1}'
Instead of $PWD you can use the directory you want to start the search in. file prints even out he PDF version.
i have to search for a particular text in files and for that im using grep command but it searches only in current folder.What i want is that using a single grep command i can search a particular thing in the current folder as well as in all of its sub folders.How can i do that???
POSIX grep does not support recursive searching - the GNU version of grep does.
find . -type f -exec grep 'pattern' {} \;
would be runnable on any POSIX compliant UNIX.
man grep says
-R, -r, --recursive
Read all files under each directory, recursively; this is
equivalent to the -d recurse option.
And even more common is to use find with xargs, say
find <dir> -type f -name <shellglob> -print0 | xargs grep -0
where -print0 and -0, respectively, would use null char to separate entries in order to avoid issues with filenames having space characters.