Searching and moving files - linux

I have 9000+ XML files in a folder. I'm searching for those that contains a certain word then copy them to a certain location. I'm using the terminal:-
grep -r "the word I'm searching"
It's working but I'm looking for a better and faster way if anybody has an idea.

Easy and efficient way:
find . -name '*.xml' | xargs grep -l 'You search string' \
| xargs mv -t your_target_directory

You can do it in a single line using the following code
mv `ls | grep 'the word you are searching for' -rl` directoryname/
This works only if your directory contains only xml file.

Related

Grep files in subdirectories and write out files for each directory

I am working on a bioinformatics workflow in which the tool in question, 'salmon' creates multiple directories having a 'quant.sf' file. I want to find all 'lnc' entries within these files and save them as 'lnc.sf' for all directories.
I was previously running
cat quant.sf | grep 'lnc' > lnc.sf
in all directories individually that seemed to solve my problem. Now I want to write a script that goes into each directory and generates a lnc.sf file.
I have tried doing
find . -name "quant.sf" | while read A
do
cat $A | grep 'lnc' > lnc.sf
done
But this just creates a concatenated lnc.sf file in the current directory. Any help is highly appreciated.
Thank You!
If all your quant.sf files are at the same hierarchy level, the following should work, assuming a folder structure like month/day/quant.sf:
grep -h 'lnc' */*/quant.sf > lnc.sf
Otherwise, find the files, be aware of using find+read instead of exec or xargs; understand variable expansion with whitespaces, get rid of the redundant cat process, and write the file to the correct directory:
find . -name 'quant.sf' | while IFS= read -r A
do
grep 'lnc' "$A" > "${A%/*}/lnc.sf"
done
If you have GNU find + xargs, use -print0 combined with -0:
find . -name 'quant.sf' -print0 | xargs -0 -n1 sh -c 'grep "lnc" "$1" > "${1%/*}/lnc.sf"' -
Or use -exec of find, which avoids problems with weird files names:
find . -name 'quant.sf' -exec sh -c 'grep "lnc" "$1" > "${1%/*}/lnc.sf"' - ';'

How to know which file holds grep result?

There is a directory which contains 100 text files. I used grep to search a given text in the directory as follow:
cat *.txt | grep Ya_Mahdi
and grep shows Ya_Mahdi.
I need to know which file holds the text. Is it possible?
Just get rid of cat and provide the list of files to grep:
grep Ya_Mahdi *.txt
While this would generally work, depending on the number of .txt files in that folder, the argument list for grep might get too large.
You can use find for a bullet proof solution:
find --maxdepth 1 -name '*.txt' -exec grep -H Ya_Mahdi {} +

Search and replace entire files

I've seen numerous examples for replacing one string with another among multiple files but what I want to do is a bit different. Probably a lot simpler :)
Find all the files that match a certain string and replace them completely with the contents of a new file.
I have a find command that works
find /home/*/public_html -name "index.php" -exec grep "version:1.23" '{}' \; -print
This finds all the files I need to update.
Now how do I replace their entire content with the CONTENTS of /home/indexnew.txt (I could also name it /home/index.php)
I emphasize content because I don't want to change the name or ownership of the files I'm updating.
find ... | while read filename; do cat static_file > "$filename"; done
efficiency hint: use grep -q -- it will return "true" immediately when the first match is found, not having to read the entire file.
If you have a bunch of files you want to replace, and you can get all of their names using wildcards you can try piping output to the tee command:
cat my_file | tee /home/*/update.txt
This should look through all the directories in /home and write the text in my_file to update.txt in each of those directories.
Let me know if this helps or isn't what you want.
I am not sure if your command without -l and then print it is better than to add -l in grep to list file directly.
find /home/*/public_html -name "index.php" -exec grep -l "version:1.23" '{}' \; |xargs -i cp /home/index.php {}
Here is the option -l detail
-l, --files-with-matches
Suppress normal output; instead print the name of each input
file from which output would normally have been printed. The
scanning will stop on the first match. (-l is specified by
POSIX.)

How can I search multiple files for a single word or phrase using grep and strings?

Im trying to look for a word like "numbers" in multiple files not just txt files using terminal. I have tried strings -r /media/E016-5484/* | grep numbers But it still doesn't work !
let say you are looking for 1234 in all files which in name contain file_pattern
grep 1234 ` find . -name "*file_pattern*"`
or
find . -name "*file_pattern*" -exec grep 1234 {} \;
If I am not mistaken, you are looking for
grep numbers -r /media/E016-5484
From the manpage:
-r, --recursive
Read all files under each directory, recursively, following symbolic links only if they are on the command line. This is equivalent to the -d recurse option.

Shell: find files in a list under a directory

I have a list containing about 1000 file names to search under a directory and its subdirectories. There are hundreds of subdirs with more than 1,000,000 files. The following command will run find for 1000 times:
cat filelist.txt | while read f; do find /dir -name $f; done
Is there a much faster way to do it?
If filelist.txt has a single filename per line:
find /dir | grep -f <(sed 's#^#/#; s/$/$/; s/\([\.[\*]\|\]\)/\\\1/g' filelist.txt)
(The -f option means that grep searches for all the patterns in the given file.)
Explanation of <(sed 's#^#/#; s/$/$/; s/\([\.[\*]\|\]\)/\\\1/g' filelist.txt):
The <( ... ) is called a process subsitution, and is a little similar to $( ... ). The situation is equivalent to (but using the process substitution is neater and possibly a little faster):
sed 's#^#/#; s/$/$/; s/\([\.[\*]\|\]\)/\\\1/g' filelist.txt > processed_filelist.txt
find /dir | grep -f processed_filelist.txt
The call to sed runs the commands s#^#/#, s/$/$/ and s/\([\.[\*]\|\]\)/\\\1/g on each line of filelist.txt and prints them out. These commands convert the filenames into a format that will work better with grep.
s#^#/# means put a / at the before each filename. (The ^ means "start of line" in a regex)
s/$/$/ means put a $ at the end of each filename. (The first $ means "end of line", the second is just a literal $ which is then interpreted by grep to mean "end of line").
The combination of these two rules means that grep will only look for matches like .../<filename>, so that a.txt doesn't match ./a.txt.backup or ./abba.txt.
s/\([\.[\*]\|\]\)/\\\1/g puts a \ before each occurrence of . [ ] or *. Grep uses regexes and those characters are considered special, but we want them to be plain so we need to escape them (if we didn't escape them, then a file name like a.txt would match files like abtxt).
As an example:
$ cat filelist.txt
file1.txt
file2.txt
blah[2012].txt
blah[2011].txt
lastfile
$ sed 's#^#/#; s/$/$/; s/\([\.[\*]\|\]\)/\\\1/g' filelist.txt
/file1\.txt$
/file2\.txt$
/blah\[2012\]\.txt$
/blah\[2011\]\.txt$
/lastfile$
Grep then uses each line of that output as a pattern when it is searching the output of find.
If filelist.txt is a plain list:
$ find /dir | grep -F -f filelist.txt
If filelist.txt is a pattern list:
$ find /dir | grep -f filelist.txt
Use xargs(1) for the while loop can be a bit faster than in bash.
Like this
xargs -a filelist.txt -I filename find /dir -name filename
Be careful if the file names in filelist.txt contains whitespaces, read the second paragraph in the DESCRIPTION section of xargs(1) manpage about this problem.
An improvement based on some assumptions. For example, a.txt is in filelist.txt, and you can make sure there is only one a.txt in /dir. Then you can tell find(1) to exit early when it finds the instance.
xargs -a filelist.txt -I filename find /dir -name filename -print -quit
Another solution. You can pre-process the filelist.txt, make it into a find(1) arguments list like this. This will reduce find(1) invocations:
find /dir -name 'a.txt' -or -name 'b.txt' -or -name 'c.txt'
I'm not entirely sure of the question here, but I came to this page after trying to find a way to discover which 4 of 13000 files had failed to copy.
Neither of the answers did it for me so I did this:
cp file-list file-list2
find dir/ >> file-list2
sort file-list2 | uniq -u
Which resulted with a list of the 4 files I needed.
The idea is to combine the two file lists to determine the unique entries.
sort is used to make duplicate entries adjacent to each other which is the only way uniq will filter them out.

Resources