Search filenames for a list of patterns and copy to destination - linux

I have a list of patterns in filenames.txt, and I want to search a folder for filenames containing the names.
patterns.txt:
254b
0284ee
001ty
288qa
I want to search a folder for filenames containing any of these patterns in its filename and copy all found files to a destination directory.
So far i found a solution to view files as follows:
set -f; find ./ -type f \( $(printf -- ' -o - iname *%s*' $(cat patterns.txt) | cut -b4-) \); set +f
I can find all files based on the patterns on my patterns.txt file, but how do I copy them top a newfolder ?

Assuming target folder will not need to maintain the original hierarchy (or that the input directory does not have sub directories), using find, grep, and xargs should work:
find . -type f -print0 |
grep -z -i -F -f patterns.txt |
xargs -0 -s1000 cp -t /new/folder
The sequence has the advantage of bulking the copy - will be efficient for large number of files. Using NUL to separate file name should allow any special character in the file name.

Related

Bash How to find directories of given files

I have a folder with several subfolders and in some of those subfolders I have text files example_1.txt, example_2.txt and so on. example_1.txt may be found in subfolder1, some subfolders do not contain text files.
How can I list all directories that contain a text file starting with example?
I can find all those files by running this command
find . -name "example*"
But what I need to do is to find the directories these files are located in? I would need a list like this subfolder1, subfolder4, subfolder8 and so on. Not sure how to do that.
You must be use the following command,
find . -name "example*" | uniq -u | awk -F / '{print $2}'
find all files in subfolders with mindepth 2 to avoid this folder (.)
get dirname with xargs dirname
sort output-list and make folders unique with sort -u
print only basenames with awk (delimiter is / and last string is $NF). add "," after every subfolder
translate newlines in blanks with tr
remove last ", " with sed
list=$(find ./ -mindepth 2 -type f -name "example*"|xargs dirname|sort -u|awk -F/ '{print $NF","}'|tr '\n' ' '|sed 's/, $//')
echo $list
flow, over, stack
Suggesting find command that prints only the directories path.
Than sort the paths.
Than remove duplicates.
find . -type f -name "example*" -printf "%h\n"|sort|uniq

How can I make a bash script where I can move certain files to certain folders which are named based on a string in the files?

This is the script that I'm using to move files with the string "john" in them (124334_john_rtx.mp4 , 3464r64_john_gty.mp4 etc) to a certain folder
find /home/peter/Videos -maxdepth 1 -type f -iname '*john' -print0 | \
xargs -0 --no-run-if-empty echo mv --target-directory=/home/peter/Videos/john/
Since I have a large amount of videos with various names written in the files, I want to make a bash script which moves videos with a string between the underscores to a folder named based on the string between the underscores. So for example if a file is named 4345655_ben_rts.mp4 the script would identify the string "ben" between the underscores, create a folder named as the string between the underscores which in this case is "ben" and move the file to that folder. Any advice is greatly appreciated !
My way to do it :
cd /home/peter/Videos # Change directory to your start directory
for name in $(ls *.mp4 | cut -d'_' -f2 | sort -u) # loops on a list of names after the first underscore
do
mkdir -p /home/peter/Videos/${name} # create the target directory if it doesn't exist
mv *_${name}_*.mp4 /home/peter/Videos/${name} # Moving the files
done
This bash loop should do what you need:
find dir -maxdepth 1 -type f -iname '*mp4' -print0 | while IFS= read -r -d '' file
do
if [[ $file =~ _([^_]+)_ ]]; then
TARGET_DIR="/PARENTPATH/${BASH_REMATCH[1]}"
mkdir -p "$TARGET_DIR"
mv "$file" "$TARGET_DIR"
fi
done
It'll only move the files if it finds a directory token.
I used _([^_]+)_ to make sure there is no _ in the dir name, but you didn't specify what you want if there are more than two _ in the file name. _(.+)_ will work if foo_bar_baz_buz.mp4 is meant to go into directory bar_baz.
And this answer to a different question explains the find | while logic: https://stackoverflow.com/a/64826172/3216427 .
EDIT: As per a question in the comments, I added mkdir -p to create the target directory. The -p means recursively create any part of the path that doesn't already exist, and will not error out if the full directory already exists.

Use a text file (containing file names) to copy files from current directory to new directory

I have created a file (search.txt) containing file names of .fasta files I want to copy from the current directory (which also contains many unwanted .fasta files). Is it possible to use this text file to find and copy the matching files in the current directory to a new location?
The search.txt file contains a list of names like this:
name_1
name_2
name_3
I tried to build the search term using find and grep, like this:
find . *.fasta | grep -f search.txt
which is returning output like this for each matching file:
./name_1.fasta
./name_2.fasta
./name_3.fasta
name_1.fasta
name_2.fasta
name_3.fasta
It's finding the correct files, but I'm not sure if this output is useful / can be used to copy these files?
To get only matching filenames from search.txt I would do this:
find . -type f -name '*.fasta' -print0 | grep -zf search.txt | xargs -r0 cp -t target-dir/
It will find all files with the extension .fasta, display only the ones with matching patterns in search.txt, and bulk cp them to target-dir, and each filename is terminated with a nullbyte in case filenames contain newlines.
Using Bash, you can read all the files from the list into an array:
$ mapfile -t files < search.txt
$ declare -p files
declare -a files=([0]="name_1" [1]="name_2" [2]="name_3")
Then, you can append the desired file extension to all array elements:
$ files=("${files[#]/%/.fasta}")
$ declare -p files
declare -a files=([0]="name_1.fasta" [1]="name_2.fasta" [2]="name_3.fasta")
And finally, move them to the desired location:
$ mv "${files[#]}" path/to/new/location
You don't actually need the intermediate step:
mapfile -t files < search.txt
mv "${files[#]/%/.fasta}" path/to/new/location

Copying a type of file, in specific directories, to another directory

I have a .txt file that contains a list of directories. I want to make a script that goes through this .txt file, copies anything in the directory thats listed of a certain file type, to another directory.
I've never done this with directories, only files.
How can i edit this simple script to work for reading a directory list, looking for a .csv file, and copy it to another directory?
cat filenames.list | \
while read FILENAME
do
find . -name "$FILENAME" -exec cp '{}' new_dir\;
done
for DIRNAME in $(dirname.list); do find $DIRNAME -type f -name "*.csv" -exec cp \{} dest \; ; done;
sorry, in my first answer i didnt understand what you asking for.
The first line of code, simply, take a dirname entry in your directory list as a path and search in it for each file which end with ".csv" extension; then copy it inside the destination you want.
But you could do with less code:
for DIRNAME in $(dirname.list); do cp $DIRNAME/*.csv dest ; done
Despite the filename of the list filenames.list, let me assume the file contains the list of directory names, not filenames. Then would you please try:
while IFS= read -r dir; do
find "$dir" -type f -name "*.mp3" -exec cp -p -- {} new_dir \;
done < filenames.list
The find command searches in "$dir" for files which have an extension .mp3 then copies them to the new_dir.
The script above does not care the duplication of the filenames. If you want to keep the original directory tree and/or need a countermeasure for the duplication of the filenames, please let me know.
Using find inside a while loop works but find will run on each line of the file, another alternative is to save the list in an array, that way find can search on the directories in the list in one search.
If you have bash4+ you can use mapfile.
mapfile -t directories < filenames.list
If you're stuck at bash3.
directories=()
while IFS= read -r line; do
directories+=("$lines")
done < filenames.list
Now if you're just after one file type like files ending in *.csv.
find "${directories[#]}" -type f -name '*.csv' -exec sh -c 'cp -v -- "$#" /newdirectory' _ {} +
If you have multiple file type to match and multiple directories to copy the files.
while IFS= read -r -d '' file; do
case $file in
*.csv) cp -v -- "$file" /foodirectory;; ##: csv file copy to foodirectory
*.mp3) cp -v -- "$file" /bardirectory;; ##: mp3 file copy to bardirectory
*.avi) cp -v -- "$file" /bazdirectory;; ##: avi file copy to bazdirectory
esac
done < <(find "${directories[#]}" -type f -print0)
find's print0 will work with read's -d '' when dealing with files with white spaces and newlines. see How can I find and deal with file names containing newlines, spaces or both?
The -- is there so if you have a problematic filename that starts with a dash - cp will not interpret it as an option.
Given find ability to process multiple folder, and assuming goal is to 'flatten' all csv files into a single destination, consider the following.
Note that it assumes folder names do not have special characters (including spaces, tabs, new lines, etc).
As a side benefit, it will minimize the number of 'cp' calls, making the process efficient across large number of files/folders.
find $(<filename.list) -name '*.csv' | xargs cp -t DESTINATION/
For the more complex case, where folder names/file name can be anything (including space, '*', etc.), consider using NUL separator (-print0 and -0).
xargs -I{} -t find '{}' -name '*.csv' <dd -print0 | xargs -0 -I{} -t cp -t new/ '{}'
Which will fork multiple find and multiple cp.

Find all directories containing a file that contains a keyword in linux

In my hierarchy of directories I have many text files called STATUS.txt. These text files each contain one keyword such as COMPLETE, WAITING, FUTURE or OPEN. I wish to execute a shell command of the following form:
./mycommand OPEN
which will list all the directories that contain a file called STATUS.txt, where this file contains the text "OPEN"
In future I will want to extend this script so that the directories returned are sorted. Sorting will determined by a numeric value stored the file PRIORITY.txt, which lives in the same directories as STATUS.txt. However, this can wait until my competence level improves. For the time being I am happy to list the directories in any order.
I have searched Stack Overflow for the following, but to no avail:
unix filter by file contents
linux filter by file contents
shell traverse directory file contents
bash traverse directory file contents
shell traverse directory find
bash traverse directory find
linux file contents directory
unix file contents directory
linux find name contents
unix find name contents
shell read file show directory
bash read file show directory
bash directory search
shell directory search
I have tried the following shell commands:
This helps me identify all the directories that contain STATUS.txt
$ find ./ -name STATUS.txt
This reads STATUS.txt for every directory that contains it
$ find ./ -name STATUS.txt | xargs -I{} cat {}
This doesn't return any text, I was hoping it would return the name of each directory
$ find . -type d | while read d; do if [ -f STATUS.txt ]; then echo "${d}"; fi; done
... or the other way around:
find . -name "STATUS.txt" -exec grep -lF "OPEN" \{} +
If you want to wrap that in a script, a good starting point might be:
#!/bin/sh
[ $# -ne 1 ] && echo "One argument required" >&2 && exit 2
find . -name "STATUS.txt" -exec grep -lF "$1" \{} +
As pointed out by #BroSlow, if you are looking for directories containing the matching STATUS.txt files, this might be more what you are looking for:
fgrep --include='STATUS.txt' -rl 'OPEN' | xargs -L 1 dirname
Or better
fgrep --include='STATUS.txt' -rl 'OPEN' |
sed -e 's|^[^/]*$|./&|' -e 's|/[^/]*$||'
# ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
# simulate `xargs -L 1 dirname` using `sed`
# (no trailing `\`; returns `.` for path without dir part)
Maybe you can try this:
grep -rl "OPEN" . --include='STATUS.txt'| sed 's/STATUS.txt//'
where grep -r means recursive , -l means only list the files matching, '.' is the directory location. You can pipe it to sed to remove the file name.
You can then wrap this in a bash script file where you can pass in keywords such as 'OPEN', 'FUTURE' as an argument.
#!/bin/bash
grep -rl "$1" . --include='STATUS.txt'| sed 's/STATUS.txt//'
Try something like this
find -type f -name "STATUS.txt" -exec grep -q "OPEN" {} \; -exec dirname {} \;
or in a script
#!/bin/bash
(($#==1)) || { echo "Usage: $0 <pattern>" && exit 1; }
find -type f -name "STATUS.txt" -exec grep -q "$1" {} \; -exec dirname {} \;
You could use grep and awk instead of find:
grep -r OPEN * | awk '{split($1, path, ":"); print path[1]}' | xargs -I{} dirname {}
The above grep will list all files containing "OPEN" recursively inside you dir structure. The result will be something like:
dir_1/subdir_1/STATUS.txt:OPEN
dir_2/subdir_2/STATUS.txt:OPEN
dir_2/subdir_3/STATUS.txt:OPEN
Then the awk script will split this output at the colon and print the first part of it (the dir path).
dir_1/subdir_1/STATUS.txt
dir_2/subdir_2/STATUS.txt
dir_2/subdir_3/STATUS.txt
The dirname will then return only the directory path, not the file name, which I suppose it what you want.
I'd consider using Perl or Python if you want to evolve this further, though, as it might get messier if you want to add priorities and sorting.
Taking up the accepted answer, it does not output a sorted and unique directory list. At the end of the "find" command, add:
| sort -u
or:
| sort | uniq
to get the unique list of the directories.
Credits go to Get unique list of all directories which contain a file whose name contains a string.
IMHO you should write a Python script which:
Examines your directory structure and finds all files named STATUS.txt.
For each found file:
reads the file and executes mycommand depending on what the file contains.
If you want to extend the script later with sorting, you can find all the interesting files first, save them to a list, sort the list and execute the commands on the sorted list.
Hint: http://pythonadventures.wordpress.com/2011/03/26/traversing-a-directory-recursively/

Resources