Bash script for searching of files (file types) which are defined in the external file recursively in directory using find command - linux

I'd like to search in the directory structure recursively for files of the specific file types. But I need to pass the file types from the external file. The output should be list where each line is absolute path to the file. I will use the output for further processing.
The external file where is the list of file types looks for example like this (filter.lst):
*.properties
I've tried this (searchfiles.sh):
while read line
do
echo "$(find $1 -type f -name $line)"
done < $2
Echo command inside the script is only for the test purpose. I ran the script:
./searchfiles.sh test_scripting filter.lst
The the output of the echo of the find command was empty. Why? I tried to alter the script in the following way to test if the command is built correctly and the files *.properties exist:
while read line
do
echo "find $1 -type f -name $line"
echo "$(find $1 -type f -name $line)"
done < $2
I've got output:
./searchfiles.sh test_scripting filter.lst
find test_scripting -type f -name *.properties
If I copy manualy "find test_scripting -type f -name *.properties" and paste it to the shell the files are correctly found:
find test_scripting -type f -name *.properties
test_scripting/dir1/audit.properties
test_scripting/audit.properties
test_scripting/dir2/audit.properties
Why does not "find" command process correctly the variables?

The cause of the strange behaviour were hidden characters in the input filter.lst file. The filter.lst was created in the Windows OS and then copied to the Linux OS. Hence the find command didn't find expected files. Test the input file if it contains hidden characters:
od -c filter.lst
0000000 * . p r o p e r t i e s \r \n
0000016
The hidden character is "\r". Edit the script to remove hidden characters using sed command in each line.
while read line
do
echo "$(find $1 -type f -name $(echo $line | sed -e 's/\r$//'))"
done < $2
More about removing hidden characters is in this thread.
Notice: The best way is to run the script in the empty directory. If there is the file with the name e.g. example.properties in the directory where you run the script, the "echo $line" (executed as echo *.properties) will only display the list of .properties files - in this case only file example.properties.

Related

combine linux `find` and `cp` with output if file is not found

Hello stack overflow -
I would like to take an input text file (one line per file to find) to do two things: copy the files that are found to a different directory and provide a message whether the file is not found. The message does not have to exclusively say a file is not found; it can also include the location of files that are found like the output displayed below. I have not been able to combine the two commands below. Is this possible? I am sure there are alternative solutions and am open to those
#will tell you if a file is not found or the location of the file if found:
command:
for i in $(cat toGet.txt); do find . -name "$i" | grep . || echo "$i - file not found" ; done
output:
file1_L001_R*_001.fastq.gz - file not found
./file2_S13_L001_R2_001.fastq.gz
./file2_S13_L001_R1_001.fastq.gz
file3_L001_R*_001.fastq.gz - file not found
#will copy files found to new directory
for i in $(cat toGet.txt); do find . -name "$i" -exec cp {} /path/to/directory \; ; done
Any suggestions
Write a script that receives the filename to copy on its standard input. If the input is empty, it reports that the file is not found, otherwise it copies it. Then pipe the find output to it.
copy_to.sh:
#!/bin/sh
looking_for=$1
dest_dir=$2
found=$(cat)
if [ -z "$found" ]
then echo "$looking_for - file not found"
else cp "$found" "$dest_dir"
while read -r i; do
find . -name "$i" | ./copy_to.sh "$i" /path/to/directory
done < toGet.txt

Save output command in a variable and write for loop

I want to write a shell script. I list my jpg files inside nested subdirectories with the following command line:
find . -type f -name "*.jpg"
How can I save the output of this command inside a variable and write a for loop for that? (I want to do some processing steps for each jpg file)
You don't want to store output containing multiple files into a variable/array and then post-process it later. You can just do those actions on the files on-the-run.
Assuming you have bash shell available, you could write a small script as
#!/usr/bin/env bash
# ^^^^ bash shell needed over any POSIX shell because
# of the need to use process-substitution <()
while IFS= read -r -d '' image; do
printf '%s\n' "$image"
# Your other actions can be done here
done < <(find . -type f -name "*.jpg" -print0)
The -print0 option writes filenames with a null byte terminator, which is then subsequently read using the read command. This will ensure the file names containing special characters are handled without choking on them.
Better than storing in a variable, use this :
find . -type f -name "*.jpg" -exec command {} \;
Even, if you want, command can be a full bloated shell script.
A demo is better than an explanation, no ? Copy paste the whole lines in a terminal :
cat<<'EOF' >/tmp/test
#!/bin/bash
echo "I play with $1 and I can replay with $1, even 3 times: $1"
EOF
chmod +x /tmp/test
find . -type f -name "*.jpg" -exec /tmp/test {} \;
Edit: new demo (from new questions from comments)
find . -type f -name "*.jpg" | head -n 10 | xargs -n1 command
(this another solution doesn't take care of filenames with newlines or spaces)
This one take care :
#!/bin/bash
shopt -s globstar
count=0
for file in **/*.jpg; do
if ((++count < 10)); then
echo "process file $file number $count"
else
break
fi
done

unix bash find file directories with 2 explicit file extensions

I am trying to create a small bash script that essentially looks through a directory that includes hundreds of sub directories. in SOME of these subdirectories include a textfile.txt and a htmlfile.html where the names textfile and htmlfile are variable.
I only really care about sub directories that have both the .txt and the .html, all other subdirecories can be ignored.
I then want to list all the .html files and .txt files that are in the same sub directory
this seems like a pretty simple issue to solve but I am at a loss. all I can really get working is a line of code that outputs sub directories that have either a .html file or .txt with no association with the actual sub directory they are in, and I am pretty new at bash scripting so I can't go any further
#!/bin/bash
files="$(find ~/file/ -type f -name '*.txt' -or -name '*.html')"
for file in $files
do
echo $file
done
The following find command looks checks every subdirectory and, if it has both html and txt files, it lists all of them:
find . -type d -exec env d={} bash -c 'ls "$d"/*.html &>/dev/null && ls "$d"/*.txt &>/dev/null && ls "$d/"*.{html,txt}' \;
Explanation:
find . -type d
This looks for all subdirectories of the current directory.
-exec env d={} bash -c '...' \;
This sets the environment variable d to the value of the found subdirectory and then executes the bash command that is contained within the single quotes (see below).
ls "$d"/*.html &>/dev/null && ls "$d"/*.txt &>/dev/null && ls "$d/"*.{html,txt}
This is the bash command that is executed. It consists of three statements and-ed together. The first checks to see if directory d has any html files. If so, the second statement runs and it checks to see if there are any txt files. If so, the last statement is executed and it lists all html and txt files in the directory d.
This command is safe for all file and directory names containing spaces, tabs, or other difficult characters.
You could do it by searching recursively with the globstar option:
shopt -s globstar
for file in **; do
if [[ -d $file ]]; then
for sub_file in "$file"/*; do
case "$sub_file" in
*.html)
html=1;;
*.txt)
txt=1;;
esac
done
[[ $html && $txt ]] && echo "$file"
html=""
txt=""
fi
done
You can make use of -o
#!/bin/bash
files=$(find ~/file/ -type f -name '*.txt' -o -name '*.html')
for file in $files
do
echo $file
done
#!/bin/bash
#A quick peek into a dir to see if there's at least one file that matches pattern
dir_has_file() { dir="$1"; pattern="$2";
[ -n "$(find "$dir" -maxdepth 1 -type f -name "$pattern" -print -quit)" ]
}
#Assumes there are no newline characters in the filenames, but will behave correctly with subdirectories that match *.html or *.txt
find "$1" -type d|\
while read d
do
dir_has_file "$d" '*.txt' &&
dir_has_file "$d" '*.html' &&
#Now print all the matching files
find "$d" -maxdepth 1 -type f -name '*.txt' -o -name '*.html'
done
This script takes the root directory to look into as the first argument ($1).
The test command is what you need to check for the existence of each file in each of the subdirs:
find . -type d -exec sh -c "if test -f {}/$file1 -a -f {}/$file2 ; then ls {}/*.{txt,html} ; fi" \;
where $file1 and $file2 are the two .txt and .html files you are looking for.

directory path as command line argument in bash

The following bash script finds a .txt file from the given directory path, then changes one word (change mountain to sea) from the .txt file
#!/bin/bash
FILE=`find /home/abc/Documents/2011.11.* -type f -name "abc.txt"`
sed -e 's/mountain/sea/g' $FILE
The output I am getting is ok in this case.
My problem is if I want to give the directory path as command line argument then it is not working. Suppose, I modify my bash script to:
#!/bin/bash
FILE=`find $1 -type f -name "abc.txt"`
sed -e 's/mountain/sea/g' $FILE
and invoke it like:
./test.sh /home/abc/Documents/2011.11.*
Error is:
./test.sh: line 2: /home/abc/Documents/2011.11.10/abc.txt: Permission denied
Can anybody suggest how to access directory path as command line argument?
Your first line should be:
FILE=`find "$#" -type f -name "abc.txt"`
The wildcard will be expanded before calling the script, so you need to use "$#" to get all the directories that it expands to and pass these as the arguments to find.
You don't need to pass .* to your script.
Have your script like this:
#!/bin/bash
# some sanity checks here
path="$1"
find "$path".* -type f -name "abc.txt" -exec sed -i.bak 's/mountain/sea/g' '{}' \;
And run it like:
./test.sh "/home/abc/Documents/2011.11"
PS: See how sed can be invoked directly from find itself using -exec option.

bash script collecting filenames seems to get confused by spaces

I'm trying to build a script that lists all the zip files in a set of directories, with some filters and get it to spit them out to file but when a filename has a space in it it seems to appear on a new line.
This list will eventually be used as an input to tar to gzip all the zip files, script is below:
#!/bin/bash
rm -f set1.txt
rm -f set2.txt
for line in $(find /home -type d -name assets ;);
do
echo $line >> set1.txt
for line in $(find $line -type f -name \*.zip -mtime +2 ;);
do
echo \"$line\" >> set2.txt
done;
This works as expected until you get a space in a filename then set2.txt contains entries like this:
"/home/xxxxxx/oldwebroot/htdocs/upload/assets/jobbags/rbjbCost"
"in"
"use"
"sept"
"2010.zip"
Does anyone know how I can get it to keep these filenames with spaces in in a single line with the whole lot wrapped in one set of quotes?
Thanks!
The correct way to loop over a set of files located via find is with a while read construct, thus:
while IFS= read -r -d '' line ; do
echo "$line" >> set1.txt
while IFS= read -r -d '' file ; do
printf '"%s"\n' "$file" >> set2.txt
done < <(find "$line" -type f -name \*.zip -mtime +2 -print0)
done < <(find /home -type d -name assets -print0)
For clarity I have given the inner loop variable a different name.
If you didn't have bash you'd have to issue the find command separately and redirect the output to a file, then read the file with while read ; do .. done < filename.
Note that each expansion of each variable is double-quoted. This is necessary.
Note also, however, that for what you want you can simply use the -printf switch to find, if you have GNU find.
find /home -type f -path '*/assets/*.zip' -mtime +2 -printf '"%p"\n' > set2.txt
Although, as #sarnold notes, this is not safe.
You should probably be executing your tar(1) command through some other mechanism; the find(1) program supports a -print0 option to request ASCII NUL-separated filename output, and the xargs(1) program supports a -0 option to tell it that the input is separated by ASCII NUL characters. (Since NUL is the only character that is not allowed in filenames, this is the only way to get reliable filename handling.)
Simply using the -print0 and -0 options will help but this still leaves the script open to another problem -- xargs(1) might decide to execute the tar(1) command two, three, or more times, depending upon its input. The last execution is the one that will "win", and the data from earlier invocations will be lost for ever. (This is useless as a backup.)
So you should also look into adding the --concatenate command line option to tar(1), too, so that it will add to the archive. It might make sense to perform the compression after all the files have been added, via gzip(1) or bzip2(1). (This does mean you need to remove the archive before a "fresh run" of this script.)

Resources