find returning inverted results - linux

In a few words a wrote this little script to clean up some directories where I had consolidated directories/files from multiple sources where I used the cp command with the --backup=numbered feature so that files with identical names would have a suffix like .~1~ appended to avoid overwriting. I then ran fdupes to remove duplicate files, in some cases fdupes removed the file which did not have the suffix appended from the cp command (the original file) so I wanted to scan the directories looking for files with the suffix appended by the cp command and if the file does not exist with the suffix removed I would move mv the file otherwise I would leave it to avoid deleting anything as fdupes did not think it was a duplicate.
The issues is the test condition if [ -f ... ] part of the code below returns inverted results than what it should and I cannot understand why. For example, when the file exists it would return false and when the file did not exist it would return true. I fixed it by reversing the actions that I wanted to do based on the inverted return code and verified it was working as intended and it was so I ran it as such but would like to know if anyone knows why it would behave the way it did. I am not a bash script expert by any means so its possible that I missed something simple.
#!/bin/bash
logfile=$$.log
exec > $logfile 2>&1
IFS='
'
#set -f
for FILE in $(find . -type f -regextype posix-extended -regex '^.*(\.~[0-9]+~)+$')
do
FILE2=${FILE%%.~[0-9]*} # remove the suffix
if [ -f "${FILE2}" ]
then
echo ERROR: "${FILE2}" already exists!
else
echo "${FILE}" renamed "${FILE2}"
mv "${FILE}" "${FILE2}"
fi
done

You might be able to see the problem by modifying your script to show both FILE and FILE2 in the error message. There are a few minor problems with the script which could cause some confusion (but not the "inverted" logic):
find output is not sorted. If you had more than one backup file, a randomly chosen one would replace the original file;
you could sort the output using an expression like |sort -t~ -n -k2 on the end of the find-command.
the regular expression allows multiple matches of the ~[0-9]~ pattern. Conceivably you could have some odd file which ends with ~1~~2~.
the part where the suffix is removed assumes a single ~[0-9]~ is on the end of the filename. An embedded ~0, e.g., foo~0bar~1~ would reduce FILE to foo. The workaround for that would be more cumbersome (since the suffix-stripping uses globbing), but could be done with a case statement which matched an explicit number of digits (likely three digits would be enough).

Related

`mv somedir/* someotherdir` when somedir is empty

I am writing an automated bash script that moves some files from one directory to another directory, but the first directory may be empty:
$ mv somedir/* someotherdir/
mv: cannot stat 'somedir/*': No such file or directory
How can I write this command without generating an error if the directory is empty? Should I just use rm and cp instead? I could write a conditional check to see if the directory is empty first, but that feels overweight.
I'm surprised the command fails if the directory is empty, so I'm trying to find out if I'm missing some simple solution.
Environment:
bash
RHEL
If you really want full control over the process, it might look like:
#!/usr/bin/env bash
# ^^^^- bash, not sh
restore_nullglob=$(shopt -p nullglob) # store the initial state of the nullglob setting
shopt -s nullglob # unconditionally enable nullglob
source_files=( somedir/* ) # store matching files in an array
if (( ${#source_files[#]} )); then # if that array isn't empty...
mv -- "${source_files[#]}" someotherdir/ # ...move the files it contains...
else # otherwise...
echo "No files to move; doing nothing" >&2 # ...write an error message.
fi
eval "$restore_nullglob" # restore nullglob to its original setting
Explaining the moving parts:
When nullglob is set, the shell expands *.txt to an empty list if no .txt files exist; otherwise (by default), it expands *.txt to the string *.txt when there are no matching files.
source_files is an array above -- bash's native mechanism to store a list. ${#source_files[#]} expands to the length of that array, whereas ${source_files[#]} on its own expands to its contents.
(( )) creates an arithmetic context, in which expressions are treated as math. In such a context, 0 is falsey, and positive numbers are truthy. Thus, if (( ${#source_files[#]} )) is true only if there is more than one file listed in the array source_files.
BTW, note that saving and restoring nullglob isn't really essential in an independent script; the purpose of showing how to do it is so you can safely use this code in larger scripts that might make assumptions about whether or not nullglob is set, without disrupting other code.
find somedir -type f -exec mv -t someotherdir/. '{}' +
Saves you the check, may not be what you want, though.
Are you aware of the output stream and the error stream? Output stream has number 1, while error stream has number 2. In case you don't want to see a result, you can redirect that result to the garbage bin.
Excuse me?
Well, let's have a look at this case: when the directory is empty, an error is generated and that error is shown in the error stream (2). You can redirect this, using 2>/dev/null (/dev/null being the UNIX/Linux garbage bin), so your command becomes:
$ mv somedir/* someotherdir/ 2>/dev/null
Following up on Dominique, to report all errors except the empty directory one use:
mv somedir/* someotherdir 2>&1 | grep -v No.such

how to iterate over files using find in bash/ksh shell

I am using find in a loop to search recursively for files of a specific extension, and then do something with that loop.
cd $DSJobs
jobs=$(find $DSJobs -name "*.dsx")
for j in jobs; do
echo "$j"
done
assuming $DSJobs is a relevent folder, the output of $j is "Jobs" one time. doesn't even repeat.
I want to list all *.dsx files in a folder recursively through subfolders as well.
How do Make this work?
Thanks
The idiomatic way to do this is:
cd "$DSJobs"
find . -name "*.dsx" -print0 | while IFS= read -r -d "" job; do
echo "$job"
done
The complication derives from the fact that space and newline are perfectly valid filename characters, so you get find to output the filenames separated by the null character (which is not allowed to appear in a filename). Then you tell read to use the null character (with -d "") as the delimiter while reading the names.
IFS= read -r var is the way to get bash to read the characters verbatim, without dropping any leading/trailing whitespace or any backslashes.
There are further complications regarding the use of the pipe, which may or may not matter to you depending on what you do inside the loop.
Note: take care to quote your variables, unless you know exactly when to leave the quotes off. Very detailed discussion here.
Having said that, bash can do this without find:
shopt -s globstar
cd "$DSJobs"
for job in **/*.dsx; do
echo "$job"
done
This approach removes all the complications of find | while read.
Incorporating #Gordon's comment:
shopt -s globstar nullglob
for job in "$DSJobs"/**/*.dsx; do
do_stuff_with "$job"
done
The "nullglob" setting is useful when no files match the pattern. Without it, the for loop will have a single iteration where job will have the value job='/path/to/DSJobs/**/*.dsx' (or whatever the contents of the variable) -- including the literal asterisks.
Since all you want is to find files with a specific extension...
find ${DSJobs} -name "*.dsx"
Want to do this for several directories?
for d in <some list of directories>; do
find ${d} -name ""*.dsx"
done
Want to do something interesting with the files?
find ${DSJobs} -name "*.dsx" -exec dostuffwith.sh "{}" \;

Deleting all files in a directory except the ones mentioned in a list [duplicate]

This question already has answers here:
Shell script: How to delete all files in a directory except ones listed in a file?
(2 answers)
Closed 2 years ago.
I have a directory called a00 containing 3000 files with extension .SAC. I have a text file called gd.list containing names of 88 of those 3000 files. I am trying to write a code that will delete all .SAC files except those mentioned in gd.list
How to do that using shell/bash?
The rm command is commented out so that you can check and verify that it's working as needed. Then just un-comment that line.
The check directory section will ensure you don't accidentally run the script from the wrong directory and clobber the wrong files.
You can remove the echo deleting line to run silently.
#!/bin/bash
cd /home/me/myfolder2tocleanup/
# Exit if the directory isn't found.
if (($?>0)); then
echo "Can't find work dir... exiting"
exit
fi
for i in *; do
if ! grep -qxFe "$i" filelist.txt; then
echo "Deleting: $i"
# the next line is commented out. Test it. Then uncomment to removed the files
# rm "$i"
fi
done
You can find the answer here https://askubuntu.com/questions/830776/remove-file-but-exclude-all-files-in-a-list by L. D. James
there are a few alternatives.
I'd prefer to see find -Z as it more clearly demarcates the file names:
find . -maxdepth 1 -name '*.sac' -print0 | grep -x -z -Z -f gd.list | xargs -0 echo rm
Again, test this first. Perhaps sort the output and make sure it is unique versus the original file.
For a smaller list of filenames I would recommend just using find with -and -not -name and -delete, but with a larger list that can be tricky.
You could tag the files you want to keep as read-only, then delete the wildcard with the appropriate setting in rm or find to skip read-only files. That assumes you own the read-only flag. You could tag the files as executable, and use find, if the read-only flag is not for you.
Another option would be to move the matching files to a temp folder, delete the wildcard, then move the files you want to keep back. That is assuming you can afford for the files to disappear temporarily.
To make them disappear for a shorter time, move the kept files out to a temp directory, move the original directory out, move the temp directory in, then delete the movced out directory.
If you are feeling brave, try something like
ls *.sac | fgrep -v -f gd.list | xargs echo rm
Note that I've put an echo in that xargs, just to make sure no one has a cut and paste accident.
Note also the limitations of this approach mentioned in the comments. As I said, if you are feeling brave...

Linux copy file and rename to substring of filename

My files got this structure:
mynewfile-runtime-tested-1102-19.4-alpha.zip
mysdk-sdk-tested-1102-19.4-alpha.zip
sources-tested-1102-19.4-alpha.zip
I looking for a way how to dynamically detect and drop the suffix of tested-1102-19.4-alpha and to copy the files with new names so it will look like:
mynewfile-runtime.zip
mysdk-sdk.zip
sources.zip
The suffix should be detected dynamically by delimiters ('-'), my other chunk of file have suffix like nottested-404-11.2.34-beta and the other one is final-01-1-release. The only thing remain constant is the delimiter of '-'
for file in *.zip; do
mv "$file" "${file%-*-*-*-*.zip}.zip"
done
This is fully portable POSIX shell, without forks to sed or other programs.
The ${file%pattern} bit says to remove the shortest matching string.
You can also remove the longest match with %%, or from the left with # and ##, respectively.
To only move the files that match the pattern you can do this:
#!/bin/sh
suffix='-*-*-*-*.zip'
for file in *$suffix
do
trimmed=${file%$suffix}
echo mv "$file" "$trimmed".zip
done
Remove the echo when you are confident with the result.

How to remove the extension of a file?

I have a folder that is full of .bak files and some other files also. I need to remove the extension of all .bak files in that folder. How do I make a command which will accept a folder name and then remove the extension of all .bak files in that folder ?
Thanks.
To remove a string from the end of a BASH variable, use the ${var%ending} syntax. It's one of a number of string manipulations available to you in BASH.
Use it like this:
# Run in the same directory as the files
for FILENAME in *.bak; do mv "$FILENAME" "${FILENAME%.bak}"; done
That works nicely as a one-liner, but you could also wrap it as a script to work in an arbitrary directory:
# If we're passed a parameter, cd into that directory. Otherwise, do nothing.
if [ -n "$1" ]; then
cd "$1"
fi
for FILENAME in *.bak; do mv "$FILENAME" "${FILENAME%.bak}"; done
Note that while quoting your variables is almost always a good practice, the for FILENAME in *.bak is still dangerous if any of your filenames might contain spaces. Read David W.'s answer for a more-robust solution, and this document for alternative solutions.
There are several ways to remove file suffixes:
In BASH and Kornshell, you can use the environment variable filtering. Search for ${parameter%word} in the BASH manpage for complete information. Basically, # is a left filter and % is a right filter. You can remember this because # is to the left of %.
If you use a double filter (i.e. ## or %%, you are trying to filter on the biggest match. If you have a single filter (i.e. # or %, you are trying to filter on the smallest match.
What matches is filtered out and you get the rest of the string:
file="this/is/my/file/name.txt"
echo ${file#*/} #Matches is "this/` and will print out "is/my/file/name.txt"
echo ${file##*/} #Matches "this/is/my/file/" and will print out "name.txt"
echo ${file%/*} #Matches "/name.txt" and will print out "/this/is/my/file"
echo ${file%%/*} #Matches "/is/my/file/name.txt" and will print out "this"
Notice this is a glob match and not a regular expression match!. If you want to remove a file suffix:
file_sans_ext=${file%.*}
The .* will match on the period and all characters after it. Since it is a single %, it will match on the smallest glob on the right side of the string. If the filter can't match anything, it the same as your original string.
You can verify a file suffix with something like this:
if [ "${file}" != "${file%.bak}" ]
then
echo "$file is a type '.bak' file"
else
echo "$file is not a type '.bak' file"
fi
Or you could do this:
file_suffix=$(file##*.}
echo "My file is a file '.$file_suffix'"
Note that this will remove the period of the file extension.
Next, we will loop:
find . -name "*.bak" -print0 | while read -d $'\0' file
do
echo "mv '$file' '${file%.bak}'"
done | tee find.out
The find command finds the files you specify. The -print0 separates out the names of the files with a NUL symbol -- which is one of the few characters not allowed in a file name. The -d $\0means that your input separators are NUL symbols. See how nicely thefind -print0andread -d $'\0'` together?
You should almost never use the for file in $(*.bak) method. This will fail if the files have any white space in the name.
Notice that this command doesn't actually move any files. Instead, it produces a find.out file with a list of all the file renames. You should always do something like this when you do commands that operate on massive amounts of files just to be sure everything is fine.
Once you've determined that all the commands in find.out are correct, you can run it like a shell script:
$ bash find.out
rename .bak '' *.bak
(rename is in the util-linux package)
Caveat: there is no error checking:
#!/bin/bash
cd "$1"
for i in *.bak ; do mv -f "$i" "${i%%.bak}" ; done
You can always use the find command to get all the subdirectories
for FILENAME in `find . -name "*.bak"`; do mv --force "$FILENAME" "${FILENAME%.bak}"; done

Resources