Regarding grep in solaris - linux

I want grep for a particular work in multiple files. Multiple files are stored in variable testing.
TESTING=$(ls -tr *.txt)
echo $TESTING
test.txt ab.txt bc.txt
grep "word" "$TESTING"
grep: can't open test.txt
ab.txt
bc.txt
Giving me an error. Is there any other way to do it other than for loop

Take the double quotes out from around $TESTING.
grep "word" $TESTING
The double quotes are making your whole file list expand to a single argument to grep. The right way to do this is:
find . -name \*.txt -print0 | xargs -0 grep "word"

No quotes needed I guess.
grep "word" $TESTING
works for me (Ubuntu, bash).

Related

Count files in a directory with filename matching a string

The command:
ls /some/path/some/dir/ | grep some_mask_*.txt | wc -l
returns the correct number of files when doing this via ssh on bash. When I put this into a .sh Script
iFiles=`ls /some/path/some/dir/ | grep some_mask_*.txt | wc -l`
echo "iFiles: ${iFiles}"
it is always 0. Whats wrong here?
Solution:
When I worked on it I found out that my "wildcard-mask" seems to be the problem. using grep some_mask_ | grep \.txt instead of the single grep above helped me to solve the problem for the first.
I marked the answer as solution which pretty much describes exactly what I made wrong. I'm going to edit my script now. Thanks everyone.
The problem here is that grep some_mask_*.txt is expanded by the shell and not by grep, so most likely you have a file in the directory where grep is executed which matches some_mask_*.txtand that filename is then used by grep as a filter.
If you want to ensure that the pattern is used by grep then you need to enclose it in single quotes. In addition you need to write the pattern as a regexp and not as a wildcard match (which bash uses for matching). Putting this together your command line version should be:
ls /some/path/some/dir/ | grep 'some_mask_.*\.txt' | wc -l
and the script:
iFiles=`ls /some/path/some/dir/ | grep 'some_mask_.*\.txt' | wc -l`
echo "iFiles: ${iFiles}"
Note that . needs to be prefixed with a backslash since it has special significance as a regexp that matches a single character.
I would also suggest that you postfix the regexp with $ in order to anchor it to the end (thus ensuring that the regexp matches filenames that ends with ".txt"):
ls /some/path/some/dir/ | grep 'some_mask_.*\.txt$' | wc -l
Parsing ls is not a good thing. If you want to find files, use find:
find /some/path/some/dir/ -maxdepth 1 -name "some_mask_*.txt" -print0
This will print those files matching the condition within that directory and without going into subdirectories. Using print0 prevents weird situations when the file name contains not common characters:
-print0
True; print the full file name on the standard output, followed
by a null character (instead of the newline character that
-print uses). This allows file names that contain newlines or
other types of white space to be correctly interpreted by pro‐
grams that process the find output. This option corresponds to
the -0 option of xargs.
Then, just pipe to wc -l to get the final count.
By the way, note that
ls /some/path/some/dir/ | grep some_mask_*.txt
can be reduced to a simple
ls /some/path/some/dir/some_mask_*.txt
Simple solution is (for bash)
find -name "*pattern*" | wc -l
"*" represent anything (prefix- anything before , postfix - anything after)
wc -l : give the count
find -name : will find file with given name in double quotes
I suggest to use find as shown below. The reason for that is that filenames may contain newlines which would break a script that is using wc -l. I'm printing just a dot per filename and count the dots with wc -c:
find /some/path/some/dir/ -maxdepth 1 -name 'some_mask_*.txt' -printf '.' | wc -c
or if you want to write the results to variable:
ifiles=$(find /some/path/some/dir/ -maxdepth 1 -name 'some_mask_*.txt' -printf '.' | wc -c)
Try this,
iFiles=$(ls /some/path/some/dir/ | grep some_mask_*.txt | wc -l)
echo "iFiles: ${iFiles}"
I think there wouldn't be the shell version problem.
try to use escape char on your command. It likes below.
ls /some/path/some/dir/ | grep some_mask_\*.txt | wc -l
Your problem is due to shell expansion. You probably tested the command line in the original directory, but if you try it from another directory then it will not work anymore.
When you type:
grep *.txt
then the shell replace *.txt with all the file names that correspond to the pattern and then execute the command (something like grep a.txt dummy.txt). But you want the pattern to be interpreted by grep not expanded by the shell, so:
ls /tmp | grep '.*.cpp'
wille make it. Here the pattern is in the syntax of grep command (each command as its own syntax) and not expanded because it is protected with surroundings '.
Modify your command like:
a=`ls /tmp | grep '.*.cpp'`
This is quite similar to other answers, but with a bit more robustness
iFiles=$( find /some/path/ -name "some_mask_*.txt" -type f 2> /dev/null | wc -l )
echo "Number of files: $iFiles"
This limits the find to files and also pipes stderr to null, so if the find command doesn't work or has permission issues you don't get a bogus result.
I was writing a shell script to count the files of same format in a directory. For that I have used the below command
LOCATION=/home/students/run_date/FILENAME #stored the location in a variable
DIRECTORYCOUNT=$(find $LOCATION -type d -print | wc -l) using find command
DIRECTORYCOUNT=$(find $LOCATION -type f -print | wc -l)
I have used above commands and enter code here it worked well

How can I use grep to get all the lines that contains string1 and string2 separated by space?

Line1: .................
Line2: #hello1 #hello2 #hello3
Line3: .................
Line4: .................
Line5: #hello1 #hello4 #hello3
Line6: #hello1 #hello2 #hello3
Line7: .................
I have files that look similar in terms of lines on one of my project directories. I want to get the counts of all the lines that contain #hello1 and #hello2. In this case I would get 2 as a result only for this file. However, I want to do this recursively.
The canonical way to "do something recursively" is to use the find command. If you want to find lines that have two words on them, a simple regex will do:
grep -lr '#hello1.*#hello2' .
The option -l instructs grep to show us only filenames rather than file content, and the option -r tells grep to traverse the filesystem recursively. The start of the search is the path at the end of the line. Once you have the list of files, you can parse that list using commands run by xargs.
For example, this will count all the lines in files matching the pattern you specified.
grep -lr '#hello1.*#hello2' . | xargs -n 1 wc -l
This uses xargs to run the wc command on each of the files listed by grep. You could probably also run this without the -n 1, unless you're dealing with many many thousands of files that would exceed your maximum command line length.
Or, if I'm interpreting your question correctly, the following will count just the patterns in those files.
grep -lr '#hello1.*#hello2' . | xargs -n 1 grep -Hc '#hello1.*#hello2'
This runs a similar grep to the one used to generate your recursive list of files, and presents the output with filename (-H) and count (-c).
But if you want complex rules like finding two patterns possibly on different lines in the file, then grep probably is not the optimal tool, unless you use multiple greps launched by find:
find /path/to/base -type f \
-exec grep -q '#hello1' {} \; \
-exec grep -q '#hello2' {} \; \
-print
(Lines split for easier reading.)
This is somewhat costly, as find needs to launch up to two children for each file. So another approach would be to use awk instead:
find /path/to/base -type f \
-exec awk '/#hello1/{c++} /#hello2/{c++} c==2{r=1} END{exit 1-r}' {} \; \
-print
Alternately, if your shell is bash version 4 or above, you can avoid using find and use the bash option globstar:
$ shopt -s globstar
$ awk 'FNR=1{c=0} /#hello1/{c++} /#hello2/{c++} c==2{print FILENAME;nextfile}' **/*
Note: none of this is tested.
If you are not nterested in the number of files also,
then just something along:
find $BASEDIRECTORY -type f -print0 | xargs -0 grep -h PATTERN | wc -l
If you want to count lines containing #hello1 and #hello2 separated by space in a specific file you can:
$ grep -c '#hello1 #hello2' file
If you want to count in more than one file:
$ grep -c '#hello1 #hello2' file1 file2 ...
And if you want to get the gran total:
$ grep -c '#hello1 #hello2' file1 file2 ... | paste -s -d+ - | bc
of course you can let your shell expanding file names. So, for example:
$ grep -c '#hello1 #hello2' *.txt | paste -s -d+ - | bc
or so...
find . -type f | xargs -1 awk '/#hello1/ && /#hello2/{c++} END{print FILENAME, c+0}'

How to search and replace using grep

I need to recursively search for a specified string within all files and subdirectories within a directory and replace this string with another string.
I know that the command to find it might look like this:
grep 'string_to_find' -r ./*
But how can I replace every instance of string_to_find with another string?
Another option is to use find and then pass it through sed.
find /path/to/files -type f -exec sed -i 's/oldstring/new string/g' {} \;
I got the answer.
grep -rl matchstring somedir/ | xargs sed -i 's/string1/string2/g'
You could even do it like this:
Example
grep -rl 'windows' ./ | xargs sed -i 's/windows/linux/g'
This will search for the string 'windows' in all files relative to the current directory and replace 'windows' with 'linux' for each occurrence of the string in each file.
This works best for me on OS X:
grep -r -l 'searchtext' . | sort | uniq | xargs perl -e "s/matchtext/replacetext/" -pi
Source: http://www.praj.com.au/post/23691181208/grep-replace-text-string-in-files
Usually not with grep, but rather with sed -i 's/string_to_find/another_string/g' or perl -i.bak -pe 's/string_to_find/another_string/g'.
Other solutions mix regex syntaxes. To use perl/PCRE patterns for both search and replace, and process only matching files, this works quite well:
grep -rlIZPi 'match1' | xargs -0r perl -pi -e 's/match2/replace/gi;'
match1 and match2 are usually identical but match2 can contain more advanced features that are only relevant to the substitution, e.g. capturing groups.
Translation: grep recursively and list matching filenames, each separated by null to protect any special characters; pipe any filenames to xargs which is expecting a null-separated list; if any filenames are received, pass them to perl to perform the actual substitutions.
For case-sensitive matching, drop the i flag from grep and the i pattern modifier from the s/// expression, but not the i flag from perl itself. To include binary files, remove the I flag from grep.
Be very careful when using find and sed in a git repo! If you don't exclude the binary files you can end up with this error:
error: bad index file sha1 signature
fatal: index file corrupt
To solve this error you need to revert the sed by replacing your new_string with your old_string. This will revert your replaced strings, so you will be back to the beginning of the problem.
The correct way to search for a string and replace it is to skip find and use grep instead in order to ignore the binary files:
sed -ri -e "s/old_string/new_string/g" $(grep -Elr --binary-files=without-match "old_string" "/files_dir")
Credits for #hobs
Here is what I would do:
find /path/to/dir -type f -iname "*filename*" -print0 | xargs -0 sed -i '/searchstring/s/old/new/g'
this will look for all files containing filename in the file's name under the /path/to/dir, than for every file found, search for the line with searchstring and replace old with new.
Though if you want to omit looking for a specific file with a filename string in the file's name, than simply do:
find /path/to/dir -type f -print0 | xargs -0 sed -i '/searchstring/s/old/new/g'
This will do the same thing above, but to all files found under /path/to/dir.
Modern rust tools can be used to do this job.
For example to replace in all (non ignored) files "oldstring" and "oldString" with "newstring" and "newString" respectively you can :
Use fd and sd
fd -tf -x sd 'old([Ss]tring)' 'new$1' {}
Use ned
ned -R -p 'old([Ss]tring)' -r 'new$1' .
Use ruplacer
ruplacer --go 'old([Ss]tring)' 'new$1' .
Ignored files
To include ignored (by .gitignore) and hidden files you have to specify it :
use -IH for fd,
use --ignored --hiddenfor ruplacer.
Another option would be to just use perl with globstar.
Enabling shopt -s globstar in your .bashrc (or wherever) allows the ** glob pattern to match all sub-directories and files recursively.
Thus using perl -pXe 's/SEARCH/REPLACE/g' -i ** will recursively
replace SEARCH with REPLACE.
The -X flag tells perl to "disable all warnings" - which means that
it won't complain about directories.
The globstar also allows you to do things like sed -i 's/SEARCH/REPLACE/g' **/*.ext if you wanted to replace SEARCH with REPLACE in all child files with the extension .ext.

Shell: find files in a list under a directory

I have a list containing about 1000 file names to search under a directory and its subdirectories. There are hundreds of subdirs with more than 1,000,000 files. The following command will run find for 1000 times:
cat filelist.txt | while read f; do find /dir -name $f; done
Is there a much faster way to do it?
If filelist.txt has a single filename per line:
find /dir | grep -f <(sed 's#^#/#; s/$/$/; s/\([\.[\*]\|\]\)/\\\1/g' filelist.txt)
(The -f option means that grep searches for all the patterns in the given file.)
Explanation of <(sed 's#^#/#; s/$/$/; s/\([\.[\*]\|\]\)/\\\1/g' filelist.txt):
The <( ... ) is called a process subsitution, and is a little similar to $( ... ). The situation is equivalent to (but using the process substitution is neater and possibly a little faster):
sed 's#^#/#; s/$/$/; s/\([\.[\*]\|\]\)/\\\1/g' filelist.txt > processed_filelist.txt
find /dir | grep -f processed_filelist.txt
The call to sed runs the commands s#^#/#, s/$/$/ and s/\([\.[\*]\|\]\)/\\\1/g on each line of filelist.txt and prints them out. These commands convert the filenames into a format that will work better with grep.
s#^#/# means put a / at the before each filename. (The ^ means "start of line" in a regex)
s/$/$/ means put a $ at the end of each filename. (The first $ means "end of line", the second is just a literal $ which is then interpreted by grep to mean "end of line").
The combination of these two rules means that grep will only look for matches like .../<filename>, so that a.txt doesn't match ./a.txt.backup or ./abba.txt.
s/\([\.[\*]\|\]\)/\\\1/g puts a \ before each occurrence of . [ ] or *. Grep uses regexes and those characters are considered special, but we want them to be plain so we need to escape them (if we didn't escape them, then a file name like a.txt would match files like abtxt).
As an example:
$ cat filelist.txt
file1.txt
file2.txt
blah[2012].txt
blah[2011].txt
lastfile
$ sed 's#^#/#; s/$/$/; s/\([\.[\*]\|\]\)/\\\1/g' filelist.txt
/file1\.txt$
/file2\.txt$
/blah\[2012\]\.txt$
/blah\[2011\]\.txt$
/lastfile$
Grep then uses each line of that output as a pattern when it is searching the output of find.
If filelist.txt is a plain list:
$ find /dir | grep -F -f filelist.txt
If filelist.txt is a pattern list:
$ find /dir | grep -f filelist.txt
Use xargs(1) for the while loop can be a bit faster than in bash.
Like this
xargs -a filelist.txt -I filename find /dir -name filename
Be careful if the file names in filelist.txt contains whitespaces, read the second paragraph in the DESCRIPTION section of xargs(1) manpage about this problem.
An improvement based on some assumptions. For example, a.txt is in filelist.txt, and you can make sure there is only one a.txt in /dir. Then you can tell find(1) to exit early when it finds the instance.
xargs -a filelist.txt -I filename find /dir -name filename -print -quit
Another solution. You can pre-process the filelist.txt, make it into a find(1) arguments list like this. This will reduce find(1) invocations:
find /dir -name 'a.txt' -or -name 'b.txt' -or -name 'c.txt'
I'm not entirely sure of the question here, but I came to this page after trying to find a way to discover which 4 of 13000 files had failed to copy.
Neither of the answers did it for me so I did this:
cp file-list file-list2
find dir/ >> file-list2
sort file-list2 | uniq -u
Which resulted with a list of the 4 files I needed.
The idea is to combine the two file lists to determine the unique entries.
sort is used to make duplicate entries adjacent to each other which is the only way uniq will filter them out.

open 100 files in vim

I need to grep to tons (10k+) of files for specific words.
now that returns a list of files that i also need to grep for another word.
i found on that grep can do this so i use:
grep -rl word1 *
which returns the list of files i want to check.
now from these files (100+), i need to grep another word. so i have to do another grep
vim `grep word2 `grep -rl word1 *``
but that hangs, and it does not do anything,
why?
Because you have a double `, you need to use the $()
vi `grep -l 'word2' $(grep -rl 'word1' *)`
Or you can use nested $(...) (like goblar mentioned)
vi $(grep -l 'word2' $(grep -rl 'word1' *))
grep -rl 'word1' | xargs grep -l 'word2' | xargs vi
is another option.

Resources