Linux list files recursively ignoring a pattern - linux

I have exactly the same problem as this post.
Only instead of finding all the .txt files, I want a list of all files that are not .txt files.
Something like
$ ls -LR | grep -v .java
Which definitely does not do what I want.

Use find as suggested in that post and negate the -name condition with ! to have the other way round:
find . -type f ! -name "*.txt"
# ^^^^^^^ ^^^^^^^^^^^^^^^
# just files |
# file names not ending with .txt

Does this work for you?,
ls -lr | grep -E -v "*.txt$"
I tested it on my end and it worked, -E extended grep

Related

Grep files in subdirectories and write out files for each directory

I am working on a bioinformatics workflow in which the tool in question, 'salmon' creates multiple directories having a 'quant.sf' file. I want to find all 'lnc' entries within these files and save them as 'lnc.sf' for all directories.
I was previously running
cat quant.sf | grep 'lnc' > lnc.sf
in all directories individually that seemed to solve my problem. Now I want to write a script that goes into each directory and generates a lnc.sf file.
I have tried doing
find . -name "quant.sf" | while read A
do
cat $A | grep 'lnc' > lnc.sf
done
But this just creates a concatenated lnc.sf file in the current directory. Any help is highly appreciated.
Thank You!
If all your quant.sf files are at the same hierarchy level, the following should work, assuming a folder structure like month/day/quant.sf:
grep -h 'lnc' */*/quant.sf > lnc.sf
Otherwise, find the files, be aware of using find+read instead of exec or xargs; understand variable expansion with whitespaces, get rid of the redundant cat process, and write the file to the correct directory:
find . -name 'quant.sf' | while IFS= read -r A
do
grep 'lnc' "$A" > "${A%/*}/lnc.sf"
done
If you have GNU find + xargs, use -print0 combined with -0:
find . -name 'quant.sf' -print0 | xargs -0 -n1 sh -c 'grep "lnc" "$1" > "${1%/*}/lnc.sf"' -
Or use -exec of find, which avoids problems with weird files names:
find . -name 'quant.sf' -exec sh -c 'grep "lnc" "$1" > "${1%/*}/lnc.sf"' - ';'

Grep into given filenames

I have a directory that contains many subdirectories that contain many files.
I list the contents of the current directory using ls *. I see that there are certain files that are relevant, in terms of their names. Therefore, the relevant files can be obtained as such ls * | grep "abc\|def\|ghi".
Now I want to search within the given filenames. So I try something like:
ls * | grep "abc\|def\|ghi" | zgrep -i "ERROR" *, however, this is not looking into the file contents, rather the names. Is there an easy way to do this with pipes?
To use grep to search the contents of files within a directory, try using the find command, using xargs to couple it with the grep command, like so:
find . -type f | xargs grep '...'
You can do it like this:
find -E . -type f -regex ".*/.*(abc|def).*" -exec grep -H ERROR {} \+
The -E allows use of extended regexes so you can use the pipe (|) for expressing alternations. The + at the end allows searching in as many files as possible for each invocation of -exec grep rather than needing a whole new process for every single file.
You should use xargs to grep each file contents:
ls * | grep "abc\|def\|ghi" | xargs zgrep -i "ERROR" *
I know you asked for a solution with pipes, but they are not necessary for this task. grep has many parameters, and can solve this problem alone:
grep . -rh --include "*abc*" --include "*def*" -e "ERROR"
Parameters:
--include : Search only files whose base name matches the give wildcard pattern (not regex!)
-h : Suppress the prefixing of file names on output.
-r : recursive
-e : regex filter pattern
grep -i "ERROR" `ls * | grep "abc\|def\|ghi"`

Select files by extension using grep

I need to count all the .txt files in the current folder.
I tried ls | grep .txt but if my folder content is: a.txt btxt c.c it will select a.txt and btxt and I only want files that end with .txt. I tried various combinations of regexp but with no result.
Find may be better than in this case since it is designed for handling file names:
find . -maxdepth 0 -name '*.txt' | wc -l
Buf if you are very cautious about possibly strange file names:
find . -maxdepth 0 -name '*.txt' -exec echo 1 \; | wc -l
For Grep, using the character '.' means: "any character"... so you'll need to escape the dot:
ls | grep -e "\.txt"
edit in fact the -e option is not even necessary. this will do the trick:
ls | grep "\.txt"
If all you need is number of files with extension '.txt' in current directory only, then this will also help.
ls -l *.txt | wc -l

Unix Command to List files containing string but *NOT* containing another string

How do I recursively view a list of files that has one string and specifically doesn't have another string? Also, I mean to evaluate the text of the files, not the filenames.
Conclusion:
As per comments, I ended up using:
find . -name "*.html" -exec grep -lR 'base\-maps' {} \; | xargs grep -L 'base\-maps\-bot'
This returned files with "base-maps" and not "base-maps-bot". Thank you!!
Try this:
grep -rl <string-to-match> | xargs grep -L <string-not-to-match>
Explanation: grep -lr makes grep recursively (r) output a list (l) of all files that contain <string-to-match>. xargs loops over these files, calling grep -L on each one of them. grep -L will only output the filename when the file does not contain <string-not-to-match>.
The use of xargs in the answers above is not necessary; you can achieve the same thing like this:
find . -type f -exec grep -q <string-to-match> {} \; -not -exec grep -q <string-not-to-match> {} \; -print
grep -q means run quietly but return an exit code indicating whether a match was found; find can then use that exit code to determine whether to keep executing the rest of its options. If -exec grep -q <string-to-match> {} \; returns 0, then it will go on to execute -not -exec grep -q <string-not-to-match>{} \;. If that also returns 0, it will go on to execute -print, which prints the name of the file.
As another answer has noted, using find in this way has major advantages over grep -Rl where you only want to search files of a certain type. If, on the other hand, you really want to search all files, grep -Rl is probably quicker, as it uses one grep process to perform the first filter for all files, instead of a separate grep process for each file.
These answers seem off as the match BOTH strings. The following command should work better:
grep -l <string-to-match> * | xargs grep -c <string-not-to-match> | grep '\:0'
Here is a more generic construction:
find . -name <nameFilter> -print0 | xargs -0 grep -Z -l <patternYes> | xargs -0 grep -L <patternNo>
This command outputs files whose name matches <nameFilter> (adjust find predicates as you need) which contain <patternYes>, but do not contain <patternNo>.
The enhancements are:
It works with filenames containing whitespace.
It lets you filter files by name.
If you don't need to filter by name (one often wants to consider all the files in current directory), you can strip find and add -R to the first grep:
grep -R -Z -l <patternYes> | xargs -0 grep -L <patternNo>
find . -maxdepth 1 -name "*.py" -exec grep -L "string-not-to-match" {} \;
This Command will get all ".py" files that don't contain "string-not-to-match" at same directory.
To match string A and exclude strings B & C being present in the same line I use, and quotes to allow search string to contain a space
grep -r <string A> | grep -v -e <string B> -e "<string C>" | awk -F ':' '{print $1}'
Explanation: grep -r recursively filters all lines matching in output format
filename: line
To exclude (grep -v) from those lines the ones that also contain either -e string B or -e string C. awk is used to print only the first field (the filename) using the colon as fieldseparator -F

How to list specific type of files in recursive directories in shell?

How can we find specific type of files i.e. doc pdf files present in nested directories.
command I tried:
$ ls -R | grep .doc
but if there is a file name like alok.doc.txt the command will display that too which is obviously not what I want. What command should I use instead?
If you are more confortable with "ls" and "grep", you can do what you want using a regular expression in the grep command (the ending '$' character indicates that .doc must be at the end of the line. That will exclude "file.doc.txt"):
ls -R |grep "\.doc$"
More information about using grep with regular expressions in the man.
ls command output is mainly intended for reading by humans. For advanced querying for automated processing, you should use more powerful find command:
find /path -type f \( -iname "*.doc" -o -iname "*.pdf" \)
As if you have bash 4.0++
#!/bin/bash
shopt -s globstar
shopt -s nullglob
for file in **/*.{pdf,doc}
do
echo "$file"
done
find . | grep "\.doc$"
This will show the path as well.
Some of the other methods that can be used:
echo *.{pdf,docx,jpeg}
stat -c %n * | grep 'pdf\|docx\|jpeg'
We had a similar question. We wanted a list - with paths - of all the config files in the etc directory. This worked:
find /etc -type f \( -iname "*.conf" \)
It gives a nice list of all the .conf file with their path. Output looks like:
/etc/conf/server.conf
But, we wanted to DO something with ALL those files, like grep those files to find a word, or setting, in all the files. So we use
find /etc -type f \( -iname "*.conf" \) -print0 | xargs -0 grep -Hi "ServerName"
to find via grep ALL the config files in /etc that contain a setting like "ServerName" Output looks like:
/etc/conf/server.conf: ServerName "default-118_11_170_172"
Hope you find it useful.
Sid
Similarly if you prefer using the wildcard character * (not quite like the regex suggestions) you can just use ls with both the -l flag to list one file per line (like grep) and the -R flag like you had. Then you can specify the files you want to search for with *.doc
I.E. Either
ls -l -R *.doc
or if you want it to list the files on fewer lines.
ls -R *.doc
If you have files with extensions that don't match the file type, you could use the file utility.
find $PWD -type f -exec file -N \{\} \; | grep "PDF document" | awk -F: '{print $1}'
Instead of $PWD you can use the directory you want to start the search in. file prints even out he PDF version.

Resources