script to traverse through directories and subdirectories to list files - linux

#!/bin/bash
#script to loop through directories to merge files
mydir=/data/
files="/data/*"
for f in $files
do
if[ -d "$f" ]
then
for ff in $f/*
do
echo "Processing $ff"
done
else
echo "Processing $f"
fi
done
I have the above code to go through directories and sub-directories and list all the files. I am getting the error: syntax error near unexpected token `then'
What am I doing wrong here?

if [ -d "$f" ]
^
There needs to be a space between if and [. If you don't have a space, bash thinks you're trying to execute a command named if[.
files="/data/*"
for f in $files
Also know that this won't work. To store a wildcard expansion in a variable like that you need to use an array. The syntax is a bit hairier...
files=(/data/*)
for f in "${files[#]}"
Or you could write the wildcard inline the way you do with the inner loop. That would work fine.
for f in "$mydir"/*
For what it's worth, you could use find to recurse through all files and sub-directories recursively.
find /data/ -type f -print0 | while read -d $'\0' file; do
echo "Processing $file"
done
-type f matches files only. -print0 combined with -d $'\0' is a way to be extra careful with file names containing characters like spaces, tabs, and even newlines. It is legal to have these characters in file names so I like to write my scripts in a way that can handle them.
Note that this will recurse deeper than just sub-directories. It'll go all the way down. If that's not what you want, add -maxdepth 2.

As an alternative, you could probably replace this entire loop with something like
# find all files either in /data or /data/subdir
find /data -type f -maxdepth 2 | while read file; do
echo $file;
end

Here's a function that does what you ask, you pass it a folder see the call at the bottom func_process_folder_set "/folder".
# --- -------------------------------- --- #
# FUNC: Process a folder of files
# --- -------------------------------- --- #
func_process_folder_set(){
FOLDER="${1}"
while read -rd $'\0' file; do
fileext=${file##*.} # -- get the .ext of file
case ${fileext,,} # -- make ext lowercase for checking in case statement
echo "FILE: $file" # -- print the file (always use " " to handle file spaces)
done < <(find ${FOLDER} -type f -maxdepth 20 -name '*.*' -print0)
}
# -- call the function above with this:
func_process_folder_set "/some/folder"

Related

Rename all files in multiple folders with some condition in single linux command os script.

I have multiple folders with multiple files. I need to rename those files with the same name like the folder where the file stored with "_partN" prefix.
As example,
I have a folder named as "new_folder_for_upload" which have 2 files. I need to convert the name of these 2 files like,
new_folder_for_upload_part1
new_folder_for_upload_part2
I have so many folders like above which have multiple files. I need to convert all the file names as I describe above.
Can anybody help me to find out for a single linux command or script to do this work automatically?
Assuming bash shell, and assuming you want the file numbering to restart for each subdirectory, and doing the moving of all files to the top directory (leaving empty subdirectories). Formatted as script for easier reading:
find . -type f -print0 | while IFS= read -r -d '' file
do
myfile=$(echo $file | sed "s#./##")
mydir=$(dirname "$myfile")
if [[ $mydir != $lastdir ]]
then
NR=1
fi
lastdir=${mydir}
mv "$myfile" "$(dirname "$myfile")_part${NR}"
((NR++))
done
Or as one-line command:
find . -type f -print0 | while IFS= read -r -d '' file; do myfile=$(echo $file | sed "s#./##"); mydir=$(dirname "$myfile"); if [[ $mydir != $lastdir ]]; then NR=1; fi; lastdir=${mydir}; mv "$myfile" "$(dirname "$myfile")_part${NR}"; ((NR++)); done
Beware. This is armed, and will do a bulk renaming / moving of every file in or below your current work directory. Use at your own risk.
To delete the empty subdirs:
find . -depth -empty -type d -delete

How to remove all but a few selected files in a directory?

I want to remove all files in a directory except some through a shell script. The name of files will be passed as command line argument and number of arguments may vary.
Suppose the directory has these 5 files:
1.txt, 2.txt, 3.txt. 4.txt. 5.txt
I want to remove two files from it through a shell script using file name. Also, the number of files may vary.
There are several ways this could be done, but the one that's most robust and highest performance with large directories is probably to construct a find command.
#!/usr/bin/env bash
# first argument is the directory name to search in
dir=$1; shift
# subsequent arguments are filenames to absolve from deletion
find_args=( )
for name; do
find_args+=( -name "$name" -prune -o )
done
if [[ $dry_run ]]; then
exec find "$dir" -mindepth 1 -maxdepth 1 "${find_args[#]}" -print
else
exec find "$dir" -mindepth 1 -maxdepth 1 "${find_args[#]}" -exec rm -f -- '{}' +
fi
Thereafter, to list files which would be deleted (if the above is in a script named delete-except):
dry_run=1 delete-except /path/to/dir 1.txt 2.txt
or, to actually delete those files:
delete-except /path/to/dir 1.txt 2.txt
A simple, straightforward way could be using the GLOBIGNORE variable.
GLOBIGNORE is a colon-separated list of patterns defining the set of filenames to be ignored by pathname expansion. If a filename matched by a pathname expansion pattern also matches one of the patterns in GLOBIGNORE, it is removed from the list of matches.
Thus, the solution is to iterate through the command line args, appending file names to the list. Then call rm *. Don't forget to unset GLOBIGNORE var at the end.
#!/bin/bash
for arg in "$#"
do
if [ $arg = $1 ]
then
GLOBIGNORE=$arg
else
GLOBIGNORE=${GLOBIGNORE}:$arg
fi
done
rm *
unset GLOBIGNORE
*In case you had set GLOBIGNORE before, you can just store the val in a tmp var then reset it at the end.
We can accomplish this in pure Bash, without the need for any external tools:
#!/usr/bin/env bash
# build an associative array that contains all the filenames to be preserved
declare -A skip_list
for f in "$#"; do
skip_list[$f]=1
done
# walk through all files and build an array of files to be deleted
declare -a rm_list
for f in *; do # loop through all files
[[ -f "$f" ]] || continue # not a regular file
[[ "${skip_list[$f]}" ]] && continue # skip this file
rm_list+=("$f") # now it qualifies for rm
done
# remove the files
printf '%s\0' "${rm_list[#]}" | xargs -0 rm -- # Thanks to Charles' suggestion
This solution will also work for files that have whitespaces or glob characters in them.
Thanks all for your answers, I have figured out my solution. Below is the solution worked for me:
find /home/mydir -type f | grep -vw "goo" | xargs rm

A bash script to run a program for directories that do not have a certain file

I need a Bash Script to Execute a program for all directories that do not have a specific file and create the output file on the same directory.This program needs an input file which exist in every directory with the name *.DNA.fasta.Suppose I have the following directories that may contain sub directories also
dir1/a.protein.fasta
dir2/b.protein.fasta
dir3/anyfile
dir4/x.orf.fasta
I have started by finding the directories that don't have that specific file whic name is *.protein.fasta
in this case I want the dir3 and dir4 to be listed (since they do not contain *.protein.fasta)
I have tried this code:
find . -maxdepth 1 -type d \! -exec test -e '{}/*protein.fasta' \; -print
but it seems I missed some thing it does not work.
also I do not know how to proceed for the whole story.
This is a tricky one.
I can't think of a good solution. But here's a solution, nevertheless. Note that this is guaranteed not to work if your directory or file names contain newlines, and it's not guaranteed to work if they contain other special characters. (I've only tested with the samples in your question.)
Also, I haven't included a -maxdepth because you said you need to search subdirectories too.
#!/bin/bash
# Create an associative array
declare -A excludes
# Build an associative array of directories containing the file
while read line; do
excludes[$(dirname "$line")]=1
echo "excluded: $(dirname "$line")" >&2
done <<EOT
$(find . -name "*protein.fasta" -print)
EOT
# Walk through all directories, print only those not in array
find . -type d \
| while read line ; do
if [[ ! ${excludes[$line]} ]]; then
echo "$line"
fi
done
For me, this returns:
.
./dir3
./dir4
All of which are directories that do not contain a file matching *.protein.fasta. Of course, you can replace the last echo "$line" with whatever you need to do with these directories.
Alternately:
If what you're really looking for is just the list of top-level directories that do not contain the matching file in any subdirectory, the following bash one-liner may be sufficient:
for i in *; do test -d "$i" && ( find "$i" -name '*protein.fasta' | grep -q . || echo "$i" ); done
#!/bin/bash
for dir in *; do
test -d "$dir" && ( find "$dir" -name '*protein.fasta' | grep -q . || Programfoo"$dir/$dir.DNA.fasta");
done

bash script collecting filenames seems to get confused by spaces

I'm trying to build a script that lists all the zip files in a set of directories, with some filters and get it to spit them out to file but when a filename has a space in it it seems to appear on a new line.
This list will eventually be used as an input to tar to gzip all the zip files, script is below:
#!/bin/bash
rm -f set1.txt
rm -f set2.txt
for line in $(find /home -type d -name assets ;);
do
echo $line >> set1.txt
for line in $(find $line -type f -name \*.zip -mtime +2 ;);
do
echo \"$line\" >> set2.txt
done;
This works as expected until you get a space in a filename then set2.txt contains entries like this:
"/home/xxxxxx/oldwebroot/htdocs/upload/assets/jobbags/rbjbCost"
"in"
"use"
"sept"
"2010.zip"
Does anyone know how I can get it to keep these filenames with spaces in in a single line with the whole lot wrapped in one set of quotes?
Thanks!
The correct way to loop over a set of files located via find is with a while read construct, thus:
while IFS= read -r -d '' line ; do
echo "$line" >> set1.txt
while IFS= read -r -d '' file ; do
printf '"%s"\n' "$file" >> set2.txt
done < <(find "$line" -type f -name \*.zip -mtime +2 -print0)
done < <(find /home -type d -name assets -print0)
For clarity I have given the inner loop variable a different name.
If you didn't have bash you'd have to issue the find command separately and redirect the output to a file, then read the file with while read ; do .. done < filename.
Note that each expansion of each variable is double-quoted. This is necessary.
Note also, however, that for what you want you can simply use the -printf switch to find, if you have GNU find.
find /home -type f -path '*/assets/*.zip' -mtime +2 -printf '"%p"\n' > set2.txt
Although, as #sarnold notes, this is not safe.
You should probably be executing your tar(1) command through some other mechanism; the find(1) program supports a -print0 option to request ASCII NUL-separated filename output, and the xargs(1) program supports a -0 option to tell it that the input is separated by ASCII NUL characters. (Since NUL is the only character that is not allowed in filenames, this is the only way to get reliable filename handling.)
Simply using the -print0 and -0 options will help but this still leaves the script open to another problem -- xargs(1) might decide to execute the tar(1) command two, three, or more times, depending upon its input. The last execution is the one that will "win", and the data from earlier invocations will be lost for ever. (This is useless as a backup.)
So you should also look into adding the --concatenate command line option to tar(1), too, so that it will add to the archive. It might make sense to perform the compression after all the files have been added, via gzip(1) or bzip2(1). (This does mean you need to remove the archive before a "fresh run" of this script.)

How to loop over directories in Linux?

I am writing a script in bash on Linux and need to go through all subdirectory names in a given directory. How can I loop through these directories (and skip regular files)?
For example:
the given directory is /tmp/
it has the following subdirectories: /tmp/A, /tmp/B, /tmp/C
I want to retrieve A, B, C.
All answers so far use find, so here's one with just the shell. No need for external tools in your case:
for dir in /tmp/*/ # list directories in the form "/tmp/dirname/"
do
dir=${dir%*/} # remove the trailing "/"
echo "${dir##*/}" # print everything after the final "/"
done
cd /tmp
find . -maxdepth 1 -mindepth 1 -type d -printf '%f\n'
A short explanation:
find finds files (quite obviously)
. is the current directory, which after the cd is /tmp (IMHO this is more flexible than having /tmp directly in the find command. You have only one place, the cd, to change, if you want more actions to take place in this folder)
-maxdepth 1 and -mindepth 1 make sure that find only looks in the current directory and doesn't include . itself in the result
-type d looks only for directories
-printf '%f\n prints only the found folder's name (plus a newline) for each hit.
Et voilĂ !
You can loop through all directories including hidden directrories (beginning with a dot) with:
for file in */ .*/ ; do echo "$file is a directory"; done
note: using the list */ .*/ works in zsh only if there exist at least one hidden directory in the folder. In bash it will show also . and ..
Another possibility for bash to include hidden directories would be to use:
shopt -s dotglob;
for file in */ ; do echo "$file is a directory"; done
If you want to exclude symlinks:
for file in */ ; do
if [[ -d "$file" && ! -L "$file" ]]; then
echo "$file is a directory";
fi;
done
To output only the trailing directory name (A,B,C as questioned) in each solution use this within the loops:
file="${file%/}" # strip trailing slash
file="${file##*/}" # strip path and leading slash
echo "$file is the directoryname without slashes"
Example (this also works with directories which contains spaces):
mkdir /tmp/A /tmp/B /tmp/C "/tmp/ dir with spaces"
for file in /tmp/*/ ; do file="${file%/}"; echo "${file##*/}"; done
Works with directories which contains spaces
Inspired by Sorpigal
while IFS= read -d $'\0' -r file ; do
echo $file; ls $file ;
done < <(find /path/to/dir/ -mindepth 1 -maxdepth 1 -type d -print0)
Original post (Does not work with spaces)
Inspired by Boldewyn: Example of loop with find command.
for D in $(find /path/to/dir/ -mindepth 1 -maxdepth 1 -type d) ; do
echo $D ;
done
find . -mindepth 1 -maxdepth 1 -type d -printf "%P\n"
The technique I use most often is find | xargs. For example, if you want to make every file in this directory and all of its subdirectories world-readable, you can do:
find . -type f -print0 | xargs -0 chmod go+r
find . -type d -print0 | xargs -0 chmod go+rx
The -print0 option terminates with a NULL character instead of a space. The -0 option splits its input the same way. So this is the combination to use on files with spaces.
You can picture this chain of commands as taking every line output by find and sticking it on the end of a chmod command.
If the command you want to run as its argument in the middle instead of on the end, you have to be a bit creative. For instance, I needed to change into every subdirectory and run the command latemk -c. So I used (from Wikipedia):
find . -type d -depth 1 -print0 | \
xargs -0 sh -c 'for dir; do pushd "$dir" && latexmk -c && popd; done' fnord
This has the effect of for dir $(subdirs); do stuff; done, but is safe for directories with spaces in their names. Also, the separate calls to stuff are made in the same shell, which is why in my command we have to return back to the current directory with popd.
a minimal bash loop you can build off of (based off ghostdog74 answer)
for dir in directory/*
do
echo ${dir}
done
to zip a whole bunch of files by directory
for dir in directory/*
do
zip -r ${dir##*/} ${dir}
done
If you want to execute multiple commands in a for loop, you can save the result of find with mapfile (bash >= 4) as a variable and go through the array with ${dirlist[#]}. It also works with directories containing spaces.
The find command is based on the answer by Boldewyn. Further information about the find command can be found there.
IFS=""
mapfile -t dirlist < <( find . -maxdepth 1 -mindepth 1 -type d -printf '%f\n' )
for dir in ${dirlist[#]}; do
echo ">${dir}<"
# more commands can go here ...
done
TL;DR:
(cd /tmp; for d in */; do echo "${d%/}"; done)
Explanation.
There's no need to use external programs. What you need is a shell globbing pattern. To avoid the need of removing /tmp afterward, I'm running it in a subshell, which may or not be suitable for your purposes.
Shell globbing patterns in a nutshell:
* Match any non-empty string any number of times.
? Match exactly one character.
[...] Matches with a character from between the brackets. You can also specify ranges ([a-z], [A-F0-9], etc.) or classes ([:digit:], [:alpha:], etc.).
[^...] Match one of the characters not between the braces.
* If no file names match the pattern, the shell will return the pattern unchanged. Any character or string that is not one of the above represents itself.
Consequently, the pattern */ will match any file name that ends with a /. A trailing / in a file name unambiguously identifies a directory.
The last bit is removing the trailing slash, which is achieved with the variable substitution ${var%PATTERN}, which removes the shortest matching pattern from the end of the string contained in var, and where PATTERN is any valid globbing pattern. So we write ${d%/}, meaning we want to remove the trailing slash from the string represented by d.
find . -type d -maxdepth 1
In short, put the results of find into an array and iterate the array and do what you want. Not the quickest but more organized thinking.
#!/bin/bash
cd /tmp
declare -a results=(`find -type d`)
#Iterate the results
for path in ${results[#]}
do
echo "Your path is $path"
#Do something with the path..
if [[ $path =~ "/A" ]]; then
echo $path | awk -F / '{print $NF}'
#prints A
elif [[ $path =~ "/B" ]]; then
echo $path | awk -F / '{print $NF}'
#Prints B
elif [[ $path =~ "/C" ]]; then
echo $path | awk -F / '{print $NF}'
#Prints C
fi
done
This can be reduced to find -type d | grep "/A" | awk -F / '{print $NF}' prints A
find -type d | grep "/B" | awk -F / '{print $NF}' prints B
find -type d | grep "/C" | awk -F / '{print $NF}' prints C

Resources