Is it possible to search within a directory to scan all files for a particular string, then return the file(s) if the string is found?
For example I am looking to try find files where "120854" is found. If we take the below example using a directory called /users/TCP/ that contains two files called File1 and File2.
File1
-----
Product1:432153
Product2:8614
Product3:975
File2
-----
Product76:87
Product324:684
Product965:120854
The expected outcome would return /users/TCP/File2 as "120854" is found on line 3 in that file. Obviously the directory I'm using has thousands of files and therefore wondering if this is possible. Can't find anything online myself
Thanks!
grep -Ril "120854" /users/TCP/
-- R stands for recursive.
-- i stands for ignore case (optional in your case).
-- l stands for "show the file name, not the result itself".
-- /users/TCP/ stands for directory you are searching in
Related
In my directory there are the files:
file1.txt fix.log fixRRRRRR.log fixXXXX.log output.txt
In order to understand the find command, I tried a lot of stuff among other things I wanted to use 2 wildcards. Target was to find files that start with an f and have an extension starting with an l.
$ find . f*.l*
./file1.txt
./fix.log
./fixRRRRRR.log
./output.txt
./fixXXXX.log
fix.log
fixRRRRRR.log
fixXXXX.log
I read in a forum answer to use quotation marks with find find . "f*.l*" with the result: `
./file1.txt
./fix.log
./fixRRRRRR.log
./output.txt
./fixXXXX.log
It results in find: ‘f*.l*’: No such file or directory
What am I doing wrong, where is my error in reasoning?
Thanks for an answer.
find doesn't work like that. In general find's call form looks like:
find [entry1] [entry2] ... [expressions ...]
Where an entry is a starting point where find starts the search for files.
In your case, you haven't actually supplied any expressions.
In the first command (without quotes), the shell expands the wildcards to a list of matching files (in the current directory), then passes the list to find as arguments. So find . f*.l* is essentially equivalent to find . fix.log fixRRRRRR.log fixXXXX.log. As a result, find treats all of those arguments as directories/files to search (not patterns to search for), and lists all files under ., (everything) then all files under fix.log (it's not a directory, so that's just the file itself), then all files under fixRRRRRR.log and finally all files under fixXXXX.log.
In the second one (with quotes) it searches for all files beneath the current directory (.) and tries the same for the file literally called "f*.l*".
Actually you are likely seeking for the "-name" expression, which may be used like this:
find . -name "f*.l*"
I want pick the specific format of file among the list of files in a directory. Please find the below example.
I have a below list of files (6 files).
Set-1
1) MAG_L_NT_AA_SUM_2017_01_20.dat
2) MAG_L_NT_AA_2017_01_20.dat
Set-2
1) MAG_L_NT_BB_SUM_2017_01_20.dat
2) MAG_L_NT_BB_2017_01_20.dat
Set-3
1) MAG_L_NT_CC_SUM_2017_01_20.dat
2) MAG_L_NT_CC_2017_01_20.dat
From the above three sets I need only 3 files.
1) MAG_L_NT_AA_2017_01_20.dat
2) MAG_L_NT_BB_2017_01_20.dat
3) MAG_L_NT_CC_2017_01_20.dat
Note: There can be multiple lines of commands because i have create the script for above req. Thanks
Probably easiest and least complex solution to your problem is combining find (a tool for searching for files in a directory hierarchy) and grep (tool for printing lines that match a pattern). You also can read those tools manuals by typing man find and man grep.
Before going straight to solution we need to understand, how we will approach your problem. To find pattern in a name of file we search we will use find command with option -name:
-name pattern
Base of file name (the path with the leading directories removed) matches shell pattern pattern. The metacharacters ('*', '?', and '[]')
match a '.' at the start of the base name (this is a change in
findutils-4.2.2; see section STANDARDS CONFORMANCE below). To ignore a
directory and the files under it, use -prune; see an example in the
description of -path. Braces are not recognised as being special,
despite the fact that some shells including Bash imbue braces with a
special meaning in shell patterns. The filename matching is performed
with the use of the fnmatch(3) library function. Don't forget to
enclose the pattern in quotes in order to protect it from expansion by
the shell.
For instance, if we want to search for a file containing string 'abc' in directory called 'words_directory', we will enter following:
$ find words_directory -name "*abc*"
And if we want to search all directories in directory:
$ find words_directory/* -name "*abc*"
So first, we will need to find all files, which begin with string "MAG_L_NT_" and end with ".dat", therefore to find all matching names in /your/specified/path/ which contains many subdirectories, which could contain files that match this pattern:
$ find /your/specified/path/* -name "MAG_L_NT_*.dat"
However this prints all found filenames, but we still get names containing "SUM" string, there comes in grep. To exclude names containing unwanted string we will use option -v:
-v, --invert-match
Invert the sense of matching, to select non-matching lines. (-v is
specified by POSIX .)
To use grep to filter out first commands output we will use pipe () |:
The standard shell syntax for pipelines is to list multiple commands,
separated by vertical bars ("pipes" in common Unix verbiage). For
example, to list files in the current directory (ls), retain only the
lines of ls output containing the string "key" (grep), and view the
result in a scrolling page (less), a user types the following into the
command line of a terminal:
ls -l | grep key | less
"ls -l" produces a process, the output (stdout) of which is piped to
the input (stdin) of the process for "grep key"; and likewise for the
process for "less". Each process takes input from the previous process
and produces output for the next process via standard streams. Each
"|" tells the shell to connect the standard output of the command on
the left to the standard input of the command on the right by an
inter-process communication mechanism called an (anonymous) pipe,
implemented in the operating system. Pipes are unidirectional; data
flows through the pipeline from left to right.
process1 | process2 | process3
After you got acquainted to mentioned commands and options which will be used to achieve your goal, you are ready for solution:
$ find /your/specified/path/* -name "MAG_L_NT_*.dat" | grep -v "SUM"
This command will produce output of all names which begin "MAG_L_NT_" and end with ".dat". grep -v will use first command output as input and remove all lines containing "SUM" string.
If I want to get the difference between the 2 directories, I use the command below:
diff -aruN dir1/ dir2/ > dir.patch
so the dir.patch file should comprise all differences I want, right?
But if dir2/ contains a file with empty content, and that file is not existent in dir1/, for example,
dir1/
dir2/empty_content_file.txt ------ with empty content.
Then the diff command will not generate any patch for empty_content_file.txt, but it is a needed file.
Is there any expertise or alternative way to do this?
Thank you in advance.
It's because you're using -N option, which is added to explicitly treat absent file as empty. man diff says :
-N, --new-file
treat absent file as empty
The screenshot below shows the operation of "diff -aru" command for inexistent files in the first directory, a message "Only in xxx" will show.
I have created a test directory structure:
t1.html
t2.php
a/t1.html
a/t2.php
b/t1.html
b/t2.php
All files contain the string "HELLO".
The following commands are run from the root folder above:
> grep -r "HELLO" *
b/t1.html:HELLO
b/t2.php:HELLO
c/t1.html:HELLO
c/t2.php:HELLO
t1.html:HELLO
t2.php:HELLO
> grep -r --include=*.html "HELLO" *
b/t1.html:HELLO
c/t1.html:HELLO
t2.php:HELLO
Why is it including the correct .html files from the sub-directories, but the .php file from the current directory?
If I pop up a level to the directory above my whole structure, then it gives following result:
grep -r --include=*.html "HELLO" *
a/t1.html:HELLO
a/c/t1.html:HELLO
a/b/t1.html:HELLO
This is what I expected when ran from within my structure.
I assume I can achieve the goal using find+grep together, but I thought this was valid usage of grep.
Thanks for any help.
Andy
Use a dot instead of the asterisk:
grep -r HELLO .
Asterisk gets evaluated by the shell and replaced with the list of all the files in the current directory (whose names don't start with a dot). All of them are then grepped recursively.
I have been working on this for quite some time and decided to ask for some help. I'm trying to use a command to find a multiple occurrences of a function (basically a string) within a directory (that has multiple files) and would like to view only the file names which the string is found.
Lets say this was the directory I want to search filled with multiple .h and .cpp files is:
~/Project/Files
and I was looking for occurrences of a function called 'doThis'
So far I have tried:
grep -r doThis ~/Project/Files
But I get the path and where it occurs in the file, I only need the file names.
Also grep -f wont work because I get an error message saying "No such file or directory" and when using just grep I get an error message saying "path is a directory"
Any help would be great: Thanks guys!
Simply use the -l switch ;)
So :
grep -rl foobar dir