grep filenames matching a pattern and move to desired folder - linux

I have a list of patterns in a .txt file. [list.txt]. Foreach line in list.txt, I want to find all the files at a location which begin with the specified pattern in list.txt, and then move these files to another location.
Consider an example case.
at ~/home/ana/folder_a I have list.txt, which looks like this...
list.txt
1abc
2def
3xyz
At this location i.e /home/ana/folder_a/, there are multiple files which are beginning with the patterns in list.txt. So, there are files like 1abc_a.txt, 1abc_c.txt, 1abc_f.txt, 2def_g.txt, 3xyz_a.txt
So what I want to achieve is this:
for i in cat list.txt; do
ls | grep '^$i' [thats the pattern] |
mv [files containing the pattern] to /home/ana/folder_b/
Please note that at the other location, i.e /home/ana/folder_b/ I have already created directories, specific for each pattern.
So /home/ana/folder_b/ contains subdirectories like 1abc/ , 2def/ , 3xyz/
In effect, I wish to move all the files matching pattern '1abc', '2def' and '3xyz' from /home/ana/folder_a/ to their respective sub-directories in /home/ana/folder_b/, such that /home/ana/folder_b/1abc will have 1abc_a.txt , 1abc_c.txt , and 1abc_f.txt ; /home/ana/folder_b/2def/ will have 2def_g.txt and /home/ana/folder_b/3xyz/ will have 3xyz_a.txt

Grep's -f option matches patterns from a file so you don't have to loop over each line in the file in shell:
$ ls # List all files in dir, some match, some don't
1abc_a.txt 1abc_c.txt 1abc_f.txt 2def_g.txt 3xyz_a.txt file1 file2 list.txt
$ cat list.txt # List patterns to match against
1abc
2def
3xyz
$ ls | grep -f list.txt # grep for files that only match pattern
1abc_a.txt
1abc_c.txt
1abc_f.txt
2def_g.txt
3xyz_a.txt
Pipe to xargs to do the move:
ls | grep -f list.txt | xargs -i -t mv {} ../folder_B
mv 1abc_a.txt ../folderB
mv 1abc_c.txt ../folderB
mv 1abc_f.txt ../folderB
mv 2def_g.txt ../folderB
mv 3xyz_a.txt ../folderB
Edit: Realised I missed the subdirectory part of the question, #Thor's answers is the best approach for this, still I think you might find some use from this answer.

I think glob expansion is the way to go here:
while read pattern; do
mv "${pattern}"* ../folder_b/"$pattern"
done < list.txt
Start with an echo in front of the mv command, and remove it when you're happy with the output.

i'd suggest using the -exec action of find to call mv in your loop.
beginning file structure: (as you can see, i'm calling this from the parent of folder_a and folder_b)
$ find
.
./folder_a
./folder_a/1abc_a.txt
./folder_a/1abc_c.txt
./folder_a/1abc_f.txt
./folder_a/2def_g.txt
./folder_a/3xyz_a.txt
./folder_b
./folder_b/1abc
./folder_b/2def
./folder_b/3xyz
./list.txt
$ cat list.txt
1abc
2def
3xyz
command:
while read pattern
do
find ./folder_a -type f -name "$pattern*" -exec mv "{}" "./folder_b/$pattern" \;
done <list.txt
alternate command (same thing, just all on one line):
while read pattern; do find ./folder_a -type f -name "$pattern*" -exec mv "{}" "./folder_b/$pattern" \;; done <list.txt
resulting file structure:
$ find
.
./folder_a
./folder_b
./folder_b/1abc
./folder_b/1abc/1abc_a.txt
./folder_b/1abc/1abc_c.txt
./folder_b/1abc/1abc_f.txt
./folder_b/2def
./folder_b/2def/2def_g.txt
./folder_b/3xyz
./folder_b/3xyz/3xyz_a.txt
./list.txt

Related

Searching through every file in a directory (and in any sub-directories) one by one

I'm trying to loop through every file in a directory (including files in its subdirectories) and perform some action if the file meets an if-condition.
Part of my code is as follows:
for f in $direc/*
do
if grep -q 'search_term' $f; then
#action on this file
fi
done
However, this fails in the case of subdirectories. I would be very grateful if someone could help me out.
Thank you!
The -R option to grep will read all files in the directory tree including subdirectories. Combined with the -l option to print only the matching file names, you can use that to perform an action on each file that matches.
egrep -Rl pattern directory | while read path; do echo $path && mv $path /tmp; done
For example, that would print the file name and move the file to a different directory.
Find | xargs is the usual pattern I use, and has the advantage of not getting hung up on special characters in file names (spaces etc.) if you use the -print0 option of find.
find . -type f -print0 | xargs -0 -I{} sh -c "if grep -q 'search string' '{}'; then cmd-to-run '{}'; fi"
Yes because with this syntax, grep expect to process file(s) not directories. Minimal change to your script would be to test if $f is a file or not:
...
if [ -f "$f" ] && grep -q 'search_term' $f; then
...
In reality you would probably want to get list of files with patter match and act on those:
while read f; do
: #action on file file $f
done < <(grep -rl 'search_term' $direc/)
I've opted for getting the get the list of files through <(list) because piping it into while would cause the inside of your loop to run in another process (which could be a problem in particular if you expect any variable (changes) to be accessible from outside. And unlike simple for with `` it's not as as sensitive to what filenames you encounter (namely I have spaces in mind, this would still get confused by newlines though). Speaking of which:
while read -d "" f; do
: #action on file file $f
done < <(grep -rZl 'search_term' $direc/)
Nothing should be able to confuse that, as entries are nul character delimited and that one just must not appear in a file name.
Assuming no newlines in your file names:
find "$direc" -type f -exec grep -q 'search_term' {} \; -print |
while IFS= read -r f; do
#action on this file
done

Grep regular files in a linux File System and show their content

How do I display the content of files regular files matched with grep command? For example I grep a directory in order to see the regular files it has. I used the next line to see the regular files only:
ls -lR | grep ^-
Then I would like to display the content of the files found there. How do I do it?
I would do something like:
$ cat `ls -lR | egrep "^-" | rev | cut -d ' ' -f 1 | rev`
Use ls to find the files
grep finds your pattern
reverse the whole result
cut out the first file separated field to get the file name (files with spaces are problematic)
reverse the file name back to normal direction
Backticks will execute that and return the list of file names to cat.
or the way I would probably do it is use vim to look at each file.
$ vim `ls -lR | egrep "^-" | rev | cut -d ' ' -f 1 | rev`
It feels like you are trying to find only the files recursively. This is what I do in those cases:
$ vim `find . -type f -print`
There are multiple ways of doing it. Would try to give you a few easy and clean ways here. All of them handle filenames with space.
$ find . -type f -print0 | xargs -0 cat
-print0 adds a null character '\0' delimiter and you need to call xargs -0 to recognise the null delimiter. If you don't do that, whitespace in the filename create problems.
e.g. without -print0 filenames: abc 123.txt and 1.inc would be read as three separate files abc, 123.txt and 1.inc.
with -print0 this becomes abc 123.txt'\0' and 1.inc'\0' and would be read as abc 123.txt and 1.inc
As for xargs, it can accept the input as a parameter. command1 | xargs command2 means the output of command1 is passed to command2.
cat displays the content of the file.
$ find . -type f -exec echo {} \; -exec cat {} \;
This is just using the find command. It finds all the files (type f), calls echo to output the filename, then calls cat to display its content.
If you don't want the filename, omit -exec echo {} \;
Alternatively you can use cat command and pass the output of find.
$ cat `find . -type f -print`
If you want to scroll through the content of multiple files one by one. You can use.
$ less `find . -type f -print`
When using less, you can navigate through :n and :p for next and previous file respectively. press q to quit less.

Find all directories containing a file that contains a keyword in linux

In my hierarchy of directories I have many text files called STATUS.txt. These text files each contain one keyword such as COMPLETE, WAITING, FUTURE or OPEN. I wish to execute a shell command of the following form:
./mycommand OPEN
which will list all the directories that contain a file called STATUS.txt, where this file contains the text "OPEN"
In future I will want to extend this script so that the directories returned are sorted. Sorting will determined by a numeric value stored the file PRIORITY.txt, which lives in the same directories as STATUS.txt. However, this can wait until my competence level improves. For the time being I am happy to list the directories in any order.
I have searched Stack Overflow for the following, but to no avail:
unix filter by file contents
linux filter by file contents
shell traverse directory file contents
bash traverse directory file contents
shell traverse directory find
bash traverse directory find
linux file contents directory
unix file contents directory
linux find name contents
unix find name contents
shell read file show directory
bash read file show directory
bash directory search
shell directory search
I have tried the following shell commands:
This helps me identify all the directories that contain STATUS.txt
$ find ./ -name STATUS.txt
This reads STATUS.txt for every directory that contains it
$ find ./ -name STATUS.txt | xargs -I{} cat {}
This doesn't return any text, I was hoping it would return the name of each directory
$ find . -type d | while read d; do if [ -f STATUS.txt ]; then echo "${d}"; fi; done
... or the other way around:
find . -name "STATUS.txt" -exec grep -lF "OPEN" \{} +
If you want to wrap that in a script, a good starting point might be:
#!/bin/sh
[ $# -ne 1 ] && echo "One argument required" >&2 && exit 2
find . -name "STATUS.txt" -exec grep -lF "$1" \{} +
As pointed out by #BroSlow, if you are looking for directories containing the matching STATUS.txt files, this might be more what you are looking for:
fgrep --include='STATUS.txt' -rl 'OPEN' | xargs -L 1 dirname
Or better
fgrep --include='STATUS.txt' -rl 'OPEN' |
sed -e 's|^[^/]*$|./&|' -e 's|/[^/]*$||'
# ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
# simulate `xargs -L 1 dirname` using `sed`
# (no trailing `\`; returns `.` for path without dir part)
Maybe you can try this:
grep -rl "OPEN" . --include='STATUS.txt'| sed 's/STATUS.txt//'
where grep -r means recursive , -l means only list the files matching, '.' is the directory location. You can pipe it to sed to remove the file name.
You can then wrap this in a bash script file where you can pass in keywords such as 'OPEN', 'FUTURE' as an argument.
#!/bin/bash
grep -rl "$1" . --include='STATUS.txt'| sed 's/STATUS.txt//'
Try something like this
find -type f -name "STATUS.txt" -exec grep -q "OPEN" {} \; -exec dirname {} \;
or in a script
#!/bin/bash
(($#==1)) || { echo "Usage: $0 <pattern>" && exit 1; }
find -type f -name "STATUS.txt" -exec grep -q "$1" {} \; -exec dirname {} \;
You could use grep and awk instead of find:
grep -r OPEN * | awk '{split($1, path, ":"); print path[1]}' | xargs -I{} dirname {}
The above grep will list all files containing "OPEN" recursively inside you dir structure. The result will be something like:
dir_1/subdir_1/STATUS.txt:OPEN
dir_2/subdir_2/STATUS.txt:OPEN
dir_2/subdir_3/STATUS.txt:OPEN
Then the awk script will split this output at the colon and print the first part of it (the dir path).
dir_1/subdir_1/STATUS.txt
dir_2/subdir_2/STATUS.txt
dir_2/subdir_3/STATUS.txt
The dirname will then return only the directory path, not the file name, which I suppose it what you want.
I'd consider using Perl or Python if you want to evolve this further, though, as it might get messier if you want to add priorities and sorting.
Taking up the accepted answer, it does not output a sorted and unique directory list. At the end of the "find" command, add:
| sort -u
or:
| sort | uniq
to get the unique list of the directories.
Credits go to Get unique list of all directories which contain a file whose name contains a string.
IMHO you should write a Python script which:
Examines your directory structure and finds all files named STATUS.txt.
For each found file:
reads the file and executes mycommand depending on what the file contains.
If you want to extend the script later with sorting, you can find all the interesting files first, save them to a list, sort the list and execute the commands on the sorted list.
Hint: http://pythonadventures.wordpress.com/2011/03/26/traversing-a-directory-recursively/

Search&Replace into multiple files with the name of the containing folder

I have multiple folders with names :
1_1,1_2,...,2_1,...,
each of these folders contains the same file with the name file.sh. The file has the following form :
job_name=NAME
Partition = Long
I want to use a search&replace command in the terminal (Linux) for all my folders, like for example the following
find . -type f -name "file.sh" -print |xargs sed -i 's/job_name/REPLACED_TEXT/g'
and in the position of the REPLACED_TEXT I want the name of the folder. For example, inside folder 1_1, there will be the file.sh file with the modified form:
job_name=1_1
Partition = Long
I haven't found a solution for that yet.
You didn't specify how many subdirectories you might have to traverse, e.g.
./1_1/file.sh
./1_2/file.sh
./a/b/c/1_1/file.sh
So for this I'll just assume one subdirectory like so:
./1_1/file.sh
./1_2/file.sh
Something like the below should be able to get you started, not tested, just writing it off the top of my head. It's bash scripted but you can turn it into one big long command. Make sure to back up your directory first in case the script has unpredictable results.
for i in `find . -type f -print "file.sh"`;
do
subdir=`echo $i | awk -F\/ '{print $2}'`
sed -e s/job_name=NAME/jobname=$subdir/ $i > $i.bak
mv $i.bak $i
done
You can try this line to print all the sed commands you want to execute:
find . -type f -name 'file.sh' | \
sed 's=\(.*\)/\([^/]*\)=sed -i "s/NAME/\1/" \"&\"='
For each file we found, it extracts the name of its directory and creates a sed command able to replace NAME with it.
Output should be something like:
sed -i "s/NAME/1_1/" "1_1/file.sh"
sed -i "s/NAME/1_2/" "1_2/file.sh"
Then, if it looks good to you, you can repeat with the e command for sed, which will make the outer sed execute its result (i.e. inner sed command), like this:
find . -type f -name 'file.sh' | \
sed 's=\(.*\)/\([^/]*\)=sed -i "s/NAME/\1/" \"&\"=e'
# 'e' command added here -------------------------^

Shell: find files in a list under a directory

I have a list containing about 1000 file names to search under a directory and its subdirectories. There are hundreds of subdirs with more than 1,000,000 files. The following command will run find for 1000 times:
cat filelist.txt | while read f; do find /dir -name $f; done
Is there a much faster way to do it?
If filelist.txt has a single filename per line:
find /dir | grep -f <(sed 's#^#/#; s/$/$/; s/\([\.[\*]\|\]\)/\\\1/g' filelist.txt)
(The -f option means that grep searches for all the patterns in the given file.)
Explanation of <(sed 's#^#/#; s/$/$/; s/\([\.[\*]\|\]\)/\\\1/g' filelist.txt):
The <( ... ) is called a process subsitution, and is a little similar to $( ... ). The situation is equivalent to (but using the process substitution is neater and possibly a little faster):
sed 's#^#/#; s/$/$/; s/\([\.[\*]\|\]\)/\\\1/g' filelist.txt > processed_filelist.txt
find /dir | grep -f processed_filelist.txt
The call to sed runs the commands s#^#/#, s/$/$/ and s/\([\.[\*]\|\]\)/\\\1/g on each line of filelist.txt and prints them out. These commands convert the filenames into a format that will work better with grep.
s#^#/# means put a / at the before each filename. (The ^ means "start of line" in a regex)
s/$/$/ means put a $ at the end of each filename. (The first $ means "end of line", the second is just a literal $ which is then interpreted by grep to mean "end of line").
The combination of these two rules means that grep will only look for matches like .../<filename>, so that a.txt doesn't match ./a.txt.backup or ./abba.txt.
s/\([\.[\*]\|\]\)/\\\1/g puts a \ before each occurrence of . [ ] or *. Grep uses regexes and those characters are considered special, but we want them to be plain so we need to escape them (if we didn't escape them, then a file name like a.txt would match files like abtxt).
As an example:
$ cat filelist.txt
file1.txt
file2.txt
blah[2012].txt
blah[2011].txt
lastfile
$ sed 's#^#/#; s/$/$/; s/\([\.[\*]\|\]\)/\\\1/g' filelist.txt
/file1\.txt$
/file2\.txt$
/blah\[2012\]\.txt$
/blah\[2011\]\.txt$
/lastfile$
Grep then uses each line of that output as a pattern when it is searching the output of find.
If filelist.txt is a plain list:
$ find /dir | grep -F -f filelist.txt
If filelist.txt is a pattern list:
$ find /dir | grep -f filelist.txt
Use xargs(1) for the while loop can be a bit faster than in bash.
Like this
xargs -a filelist.txt -I filename find /dir -name filename
Be careful if the file names in filelist.txt contains whitespaces, read the second paragraph in the DESCRIPTION section of xargs(1) manpage about this problem.
An improvement based on some assumptions. For example, a.txt is in filelist.txt, and you can make sure there is only one a.txt in /dir. Then you can tell find(1) to exit early when it finds the instance.
xargs -a filelist.txt -I filename find /dir -name filename -print -quit
Another solution. You can pre-process the filelist.txt, make it into a find(1) arguments list like this. This will reduce find(1) invocations:
find /dir -name 'a.txt' -or -name 'b.txt' -or -name 'c.txt'
I'm not entirely sure of the question here, but I came to this page after trying to find a way to discover which 4 of 13000 files had failed to copy.
Neither of the answers did it for me so I did this:
cp file-list file-list2
find dir/ >> file-list2
sort file-list2 | uniq -u
Which resulted with a list of the 4 files I needed.
The idea is to combine the two file lists to determine the unique entries.
sort is used to make duplicate entries adjacent to each other which is the only way uniq will filter them out.

Resources