how to find the files/(pwd of file) which is having a particular word below a particular word in directories and sub directories in linux - linux

I have 200 folders, Each folder is having multiple shell and sql files, my requirement is to grep/find all the directories and the files which are having the below
Insert into dbname.table_name
Select
I want know what are all the files(pwd of the file) having insert into ${dbname}.{table_name} followed by select which is in next line. Db name and table name is same for all

You could use grep -r -i -A1 "insert.into" | grep -i -B1 select
-r will grep on all files in the current directory and recursively in all subdirectories.
-A1 prints one line After the matching line,
-B1 prints one line Before the matching line.
So the first grep above will print all lines matching insert.into plus the next; the second grep will keep only those pairs that have a select on their second line.
(-i to ignore case)
You may then append | grep -i insert.into | cut -d: -f1 | sort -u to get only the file names.
Note this makes some assumptions:
options -A/-B are only on Linux/gnu, not on plain Unixes like HPUX.
if you have lines containing both insert.into and select, you'll get some funky output.

Related

Recursively grep unique pattern in different files

Sorry title is not very clear.
So let's say I'm grepping recursively for urls like this:
grep -ERo '(http|https)://[^/"]+' /folder
and in folder there are several files containing the same url. My goal is to output only once this url. I tried to pipe the grep to | uniq or sort -u but that doesn't help
example result:
/www/tmpl/button.tpl.php:http://www.w3.org
/www/tmpl/header.tpl.php:http://www.w3.org
/www/tmpl/main.tpl.php:http://www.w3.org
/www/tmpl/master.tpl.php:http://www.w3.org
/www/tmpl/progress.tpl.php:http://www.w3.org
If you only want the address and never the file where it was found in, there is a grep option -h to suppress file output; the list can then be piped to sort -u to make sure every address appears only once:
$ grep -hERo 'https?://[^/"]+' folder/ | sort -u
http://www.w3.org
If you don't want the https?:// part, you can use Perl regular expressions (-P instead of -E) with variable length look-behind (\K):
$ grep -hPRo 'https?://\K[^/"]+' folder/ | sort -u
www.w3.org
If the structure of the output is always:
/some/path/to/file.php:http://www.someurl.org
you can use the command cut :
cut -d ':' -f 2- should work. Basically, it cuts each line into fields separated by a delimiter (here ":") and you select the 2nd and following fields (-f 2-)
After that, you can use uniq to filter.
Pipe to Awk:
grep -ERo 'https?://[^/"]+' /folder |
awk -F: '!a[substr($0,length($1))]++'
The basic Awk idiom !a[key]++ is true the first time we see key, and forever false after that. Extracting the URL (or a reasonable approximation) into the key requires a bit of additional trickery.
This prints the whole input line if the key is one we have not seen before, i.e. it will print the file name and the URL for the first occurrence of each URL from the grep output.
Doing the whole thing in Awk should not be too hard, either.

Find files from a folder when a specific word appears on at least a specific number of lines

How can I find the files from a folder where a specific word appears on more than 3 lines? I tried using recursive grep for finding that word and then using -c to count the number of lines where the word appears.
This command will recursively list the files in the current directory where word appears on more than 3 lines, along with the matches count for each file:
grep -c -r 'word' . | grep -v -e ':[0123]$' | sort -n -t: -k2
The final sort is not necessary if you don't want the results sorted, but I'd say it's convenient.
The first command in the pipeline (grep -c -r 'word' .) recursively finds every file in the current directory that contains word, and counts the occurrences for each file. The intermediate grep discards every count that is 0, 1, 2 or 3, so you just get counts greater than 3 (this is because -v in grep(1) inverts the sense of matching to select non-matching lines). The final sort step sorts the list according to the occurrences for each file; it sets the field delimiter to : and instructs sort(1) to do a numeric-based sorting using the 2nd field (the count) as the sort key.
Here's a sample output from some tests I ran:
./file1:4
./dir1/dir2/file3:5
./dir1/file2:8
If you just want the filenames without the match counts, you can use sed(1) to discard the :count portions:
grep -m 4 -c -r 'word' . | grep -v -e ':[0123]$' | sed -r 's/:[0-9]+$//'
As noted in the comments, if matches count is not important, in this case we can optimize the first grep with -m 4, which stops reading the file after 4 matching lines.
UPDATE
The solution above works fine up to a certain extent if used with small numbers, but it does not scale well for larger numbers. If you want to filter based on an arbitrary number, you can use awk(1) (and in fact it ends up being much more clean), like so:
grep -c -r 'word' . | awk -F: '$2 > 10'
The -F: argument to awk(1) is necessary; it instructs awk(1) to separate fields by : rather than the default (whitespace and tab). This solution generalizes well to any number.
Again, if matches count doesn't matter and all you want is to get a list of the filenames, do this instead:
grep -c -r 'word' . | awk -F: '$2 > 10 { print $1 }'

How do I grep in a list of files targeted by a previous grep?

I am using grep to get a list of files that I want to use for another grep search (and not simply piping it).
For example I got as an output:
file1.h:XXX: linecontent
file2.h:XXX: linecontent
file3.h:XXX: linecontent
file4.h:XXX: linecontent
and I want to grep only file1.h, file2.h ...
I'm assuming you want to search for files that contain two different patterns. If so this is what you want:
grep 'your pattern 2' `grep -l 'your pattern 1' *`
The contents of the back quotes will be executed first and the output substituted into the command line. Use of the -l flag will restrict the output of the grep command to just the file names.
If there are a very large number of files that match against your pattern 1 this could fail. The solution for that is to use xargs
grep -l 'your pattern 1' * | xargs grep 'your argument 2'
Assuming what you want is the names of files that contain 'lineofcontent', you could use:
grep -l 'lineofcontent' file*.h

Find specific string in subdirectories and order top directories by modification date

I have a directory structure containing some files. I'm trying to find the names of top directories that do contain a file with specific string in it.
I've got this:
grep -r abcdefg . | grep commit_id | sed -r 's/\.\/(.+)\/.*/\1/';
Which returns something like:
topDir1
topDir2
topDir3
I would like to be able to take this output and somehow feed it into this command:
ls -t | grep -e topDir1 -e topDir2 -e topDir3
which would returned the output filtered by the first command and ordered by modification date.
I'm hoping for a one liner. Or maybe there is a better way of doing it?
This should work as long as none of the directory names contain whitespace or wildcard characters:
ls -td $(grep -r abcdefg . | grep commit_id | dirname)

grep and sed command

i have a truckload of files with sql commands in them, i have been asked to extract all database table names from the files
How can I use grep and sed to parse the files and create a list of the unique table names in a text file ..one per line?
the name names all seem to start with "db_" which is handy!
what would be the best way to use grep and sed together to pull the table names out?
This will search for lines containing the table names. The output of this will quickly reveal if a more selective search is needed:
grep "\<db_[a-zA-Z0-9_]*" *.sql
Once the proper search is sorted out, remove all other characters from lines with tablenames:
grep "\<db_[a-zA-Z0-9_]*" *.sql | sed 's/.*\(\<db_[a-zA-Z0-9_]*\).*/\1/'
Once that's running, add on a sort and remove duplicates:
(same last pipe expression) | sort | uniq
you just need grep
grep -owE "db_[a-zA-Z0-9]+" file|sort -u
or awk
awk '{for(i=1;i<=NF;i++)if($i~/^db_[a-zA-Z0-9]+/){print $i} }' file

Resources