How to find,copy and rename files in linux? - linux

I am trying to find all files in a directory and sub-directories and then copy them to a different directory. However some of them have the same name, so I need to copy the files over and then if there are two files have the same name, rename one of those files.
So far I have managed to copy all found files with a unique name over using:
#!/bin/bash
if [ ! -e $2 ] ; then
mkdir $2
echo "Directory created"
fi
if [ ! -e $1 ] ; then
echo "image source does not exists"
fi
find $1 -name IMG_****.JPG -exec cp {} $2 \;
However, I now need some sort of if statement to figure out if a file has the same name as another file that has been copied.

Since you are on linux, you are probably using cp from coreutils. If that is the case, let it do the backup for you by using cp --backup=t

Try this approach: put the list of files in a variable and copy each file looking if the copy operation succeeds. If not, try a different name.
In code:
FILES=`find $1 -name IMG_****.JPG | xargs -r`
for FILE in $FILES; do
cp -n $FILE destination
# Check return error of latest command (i.e. cp)
# through the $? variable and, in case
# choose a different name for the destination
done
Inside the for statement, you can also put some incremental integer to try different names incrementally (e.g., name_1, name_2 and so on, until the cp command succeeds).

You can do:
for file in $1/**/IMG_*.jpg ; do
target=$2/$(basename "$file")
SUFF=0
while [[ -f "$target$SUFF" ]] ; do
(( SUFF++ ))
done
cp "$file" "$target$SUFF"
done
in your script in place of the find command to append integer suffixes to identically-named files

You can use rsync with the following switches for more control
rsync --backup --backup-dir=DIR --suffix=SUFFIX -az <source dire> <destination dir>
Here (from man page)
-b, --backup
With this option, preexisting destination files are renamed as each file is transferred or deleted. You can control where the backup file goes and what (if any) suffix gets appended using the --backup-dir and --suffix options.
--backup-dir=DIR
In combination with the --backup option, this tells rsync to store all backups in the specified directory on the receiving side. This can be used for incremental backups. You can additionally specify a backup suffix using the --suffix option (otherwise the files backed up in the specified directory will keep their original filenames).
--suffix=SUFFIX
This option allows you to override the default backup suffix used with the --backup (-b) option. The default suffix is a ~ if no --backup-dir was specified, otherwise it is an empty string.
You can use rsycn to either sync two folders on local file system or on a remote file system. You can even do syncing over ssh connection.
rsync is amazingly powerful. See the man page for all the options.

Related

Create folders automatically and move files

I have a lot of daily files that are sort by hours which comes from a data-logger (waveform). I downloaded inside a USB stick, now I need to save them inside folders named with the first 8 characters of waveform.
Those files have the following pattern:
Year-Month-Day-hourMinute-##.Code_Station_location_Channel
for example, inside the USB I have:
2020-10-01-0000-03.AM_REDDE_00_EHE; 2020-10-01-0100-03.AM_REDDE_00_EHE; 2020-10-02-0300-03.AM_REDDE_00_EHE; 2020-10-20-0000-03.AM_REDDE_00_EHE; 2020-10-20-0100-03.AM_REDDE_00_EHE; 2020-11-15-2000-03.AM_REDDE_00_EHE; 2020-11-15-2100-03.AM_REDDE_00_EHE; 2020-11-19-0400-03.AM_REDDE_00_EHE; 2020-11-19-0900-03.AM_REDDE_00_EHE;
I modified a little a code from #user3360767 (shell script to create folder daily with time-stamp and push time-stamp generated logs) to speed up the procedure of creating a folder and moving the files to them
for filename in 2020-10-01*EHE; do
foldername=$(echo "$filename" | awk '{print (201001)}');
mkdir -p "$foldername"
mv "$filename" "$foldername"
echo "$filename $foldername" ;
done
2020-10-01*EHE
Here I list all hours from 2020-10-01-0000-03.AM_REDDE_00_EHE
foldername=$(echo "$filename" | awk '{print (201001)}');
Here I create the folder that belongs to 2020-10-01 and with the following lines create the folder and then move all files to created folder.
mkdir -p "$foldername"
mv "$filename" "$foldername"
echo "$filename $foldername" ;
As you may notice, I will always need to modify the line for filename in 2020-10-01*EHE each time the file changes the date.
Is there a way to try to create folders with the first 8 number of the file?
Tonino
Use date
And since the foldername doesn't change, you don't need to keep creating one inside the loop.
files="$(date +%Y-%m-%d)*EHE"
foldername=$(date +%Y%m%d)
mkdir -p "$foldername"
for filename in $files; do
mv "$filename" "$foldername"
echo "$filename $foldername"
done
Edit:
If you want to specify the folder each time, you can pass it as an argument and use sed to get the filename pattern
foldername=$1
files=$(echo $1 | sed 's/\(....\)\(..\)\(..\)/\1-\2-\3/')
filepattern="$files*EHE"
mkdir -p "$foldername"
for filename in $filepattern; do
mv "$filename" "$foldername"
echo "$filename $foldername"
done
You call it with
./<yourscriptname>.sh 20101001
I think you want to move all files whose names end in *EHE into subdirectories. The subdirectories will be created as necessary and will be named according to the date at the start of each filename without the dashes/hyphens.
Please test the following on a copy of your files in a temporary directory somewhere.
#!/bin/bash
for filename in *EHE ; do
# Derive folder by deleting all dashes from filename, then taking first 8 characters
folder=${filename//-/}
folder=${folder:0:8}
echo "Would move $filename to $folder"
# Uncomment next 2 lines to actually move file
# mkdir -p "$folder"
# mv "$filename" "$folder"
done
Sample Output
Would move 2020-10-01-0000-03.AM_REDDE_00_EHE to 20201001
Would move 2020-10-01-0100-03.AM_REDDE_00_EHE to 20201001
Note that the 2 lines:
folder=${filename//-/}
folder=${folder:0:8}
use "bash parameter substitution", which is described here if you want to learn about it, and obviate the need to create whole new processes to run awk, sed or cut to extract the fields.

Bash: "No such file or directory" despite directory existing

I am making a custom command that moves or duplicates a file to a wastebasket directory instead of deleting it. I am trying to make a directory if it already isn't there, make a duplicate if a file has already been executed on, and simply move it if it doesn't. The issue is that I keep getting a no such file or directory error regardless of where I place the wastebasket directory. Do note that simply moving or copying the file with base linux commands work fine, and that being in root doesn't fix the issue. What steps should I take?
#!/bin/bash
set -x
mkdir -p /home/WASTEBASKIT #This makes a wastebasket directory if it doesn't already exist.
if test -e "$1"; then
if test -e /home/WASTEBASKIT/"$1"; then #Checking for duplicate files.
cp "$1" "/home/WASTEBASKIT/$1.$$"
else
mv "$1" "/home/WASTEBASKIT"
fi
else
printf '%s\n' "File not found." #Error if a file is not there.
fi
Here are the results: ++ mkdir -p /home/WASTEBASKIT
++ test -e config.sh
++ test -e /home/WASTEBASKIT/config.sh
++ cp config.sh.945 ' /home/WASTEBASKIT'
cp: cannot stat 'config.sh.945': No such file or directory
cp config.sh.945 ' /home/WASTEBASKIT'
cp: cannot stat 'config.sh.945': No such file or directory
The problem is on this line:
cp "$1" "$1.$$" "/home/WASTEBASKIT"
You try to copy two files into /home/WASTEBASKIT, namely $1 and $1.$$. The latter does not exist.
Change it to:
cp "$1" "/home/WASTEBASKIT/$1.$$"
I suggest that you instead create a unique file since process numbers aren't unique, so instead of the copy above, do something like:
newfile=$(mktemp "WASTEBASKIT/$1.XXXXXXX")
cp -p "$1" "$newfile"
You can then list all the copies with ls -t WASTEBASKIT to get them in historical order, newest first - or with ls -tr WASTEBASKIT to get the oldest first.
Also note: printf'%s\n' "File not found." will likely generate an error like printf%s\n: command not found.... You need to insert a space between the command printf and the argument '%s\n'.
The moving part is also wrong since you have a space before /home. It should be:
mv "$1" /home/WASTEBASKIT
mv "$1" " /home/WASTEBASKIT"
First issue: spaces matter. If you have previously created the /home/WASTEBASKIT directory, and then execute that copy command above, it will not copy the file into that directory - you will most likely end up with a file in your home directory called spaceWASTEBASKIT (unless you already have a directory of that name, including the leading space) in which case it will go into that directory.
Either way, it won't go where you want it to go.
Secondly, the command below is not doing what you seem to think. It will try to copy two files to the directory, the second of which probably does not even exist (config.sh.945 in your case):
cp "$1" "$1.$$" "/home/WASTEBASKIT"
If you want to create a "uniquely" versioned file so as to not overwrite an existing one, that would be:
mv "$1" "/home/WASTEBASKIT/$1.$$"
Note the quotes around the word "uniquely" since there's no guarantee $1.$$ may not also exist in the wastebasket - the PIDs do eventually wrap around at some point, and also do so on reboot.
I suspect a better approach (though still not bullet-proof) would be just to prefix every file with the date and time so that:
you can sort duplicates to establish the order of creation; and
sans date changes, the date/time won't give you possible duplicates (unless you're doing it more then once per second).
That approach would be something like:
mv "$1" "/home/WASTEBASKIT/$(date -u +%Y%m%d_%H%M%S)$1"
or, making duplicates even less likely:
mv "$1" "/home/WASTEBASKIT/$(date -u +%Y%m%d_%H%M%S)_${RANDOM}_$1"

Deleting all files in a directory except the ones mentioned in a list [duplicate]

This question already has answers here:
Shell script: How to delete all files in a directory except ones listed in a file?
(2 answers)
Closed 2 years ago.
I have a directory called a00 containing 3000 files with extension .SAC. I have a text file called gd.list containing names of 88 of those 3000 files. I am trying to write a code that will delete all .SAC files except those mentioned in gd.list
How to do that using shell/bash?
The rm command is commented out so that you can check and verify that it's working as needed. Then just un-comment that line.
The check directory section will ensure you don't accidentally run the script from the wrong directory and clobber the wrong files.
You can remove the echo deleting line to run silently.
#!/bin/bash
cd /home/me/myfolder2tocleanup/
# Exit if the directory isn't found.
if (($?>0)); then
echo "Can't find work dir... exiting"
exit
fi
for i in *; do
if ! grep -qxFe "$i" filelist.txt; then
echo "Deleting: $i"
# the next line is commented out. Test it. Then uncomment to removed the files
# rm "$i"
fi
done
You can find the answer here https://askubuntu.com/questions/830776/remove-file-but-exclude-all-files-in-a-list by L. D. James
there are a few alternatives.
I'd prefer to see find -Z as it more clearly demarcates the file names:
find . -maxdepth 1 -name '*.sac' -print0 | grep -x -z -Z -f gd.list | xargs -0 echo rm
Again, test this first. Perhaps sort the output and make sure it is unique versus the original file.
For a smaller list of filenames I would recommend just using find with -and -not -name and -delete, but with a larger list that can be tricky.
You could tag the files you want to keep as read-only, then delete the wildcard with the appropriate setting in rm or find to skip read-only files. That assumes you own the read-only flag. You could tag the files as executable, and use find, if the read-only flag is not for you.
Another option would be to move the matching files to a temp folder, delete the wildcard, then move the files you want to keep back. That is assuming you can afford for the files to disappear temporarily.
To make them disappear for a shorter time, move the kept files out to a temp directory, move the original directory out, move the temp directory in, then delete the movced out directory.
If you are feeling brave, try something like
ls *.sac | fgrep -v -f gd.list | xargs echo rm
Note that I've put an echo in that xargs, just to make sure no one has a cut and paste accident.
Note also the limitations of this approach mentioned in the comments. As I said, if you are feeling brave...

Delete files in one directory that do not exist in another directory or its child directories

I am still a newbie in shell scripting and trying to come up with a simple code. Could anyone give me some direction here. Here is what I need.
Files in path 1: /tmp
100abcd
200efgh
300ijkl
Files in path2: /home/storage
backupfile_100abcd_str1
backupfile_100abcd_str2
backupfile_200efgh_str1
backupfile_200efgh_str2
backupfile_200efgh_str3
Now I need to delete file 300ijkl in /tmp as the corresponding backup file is not present in /home/storage. The /tmp file contains more than 300 files. I need to delete the files in /tmp for which the corresponding backup files are not present and the file names in /tmp will match file names in /home/storage or directories under /home/storage.
Appreciate your time and response.
You can also approach the deletion using grep as well. You can loop though the files in /tmp checking with ls piped to grep, and deleting if there is not a match:
#!/bin/bash
[ -z "$1" -o -z "$2" ] && { ## validate input
printf "error: insufficient input. Usage: %s tmpfiles storage\n" ${0//*\//}
exit 1
}
for i in "$1"/*; do
fn=${i##*/} ## strip path, leaving filename only
## if file in backup matches filename, skip rest of loop
ls "${2}"* | grep -q "$fn" &>/dev/null && continue
printf "removing %s\n" "$i"
# rm "$i" ## remove file
done
Note: the actual removal is commented out above, test and insure there are no unintended consequences before preforming the actual delete. Call it passing the path to tmp (without trailing /) as the first argument and with /home/storage as the second argument:
$ bash scriptname /path/to/tmp /home/storage
You can solve this by
making a list of the files in /home/storage
testing each filename in /tmp to see if it is in the list from /home/storage
Given the linux+shell tags, one might use bash:
make the list of files from /home/storage an associative array
make the subscript of the array the filename
Here is a sample script to illustrate ($1 and $2 are the parameters to pass to the script, i.e., /home/storage and /tmp):
#!/bin/bash
declare -A InTarget
while read path
do
name=${path##*/}
InTarget[$name]=$path
done < <(find $1 -type f)
while read path
do
name=${path##*/}
[[ -z ${InTarget[$name]} ]] && rm -f $path
done < <(find $2 -type f)
It uses two interesting shell features:
name=${path##*/} is a POSIX shell feature which allows the script to perform the basename function without an extra process (per filename). That makes the script faster.
done < <(find $2 -type f) is a bash feature which lets the script read the list of filenames from find without making the assignments to the array run in a subprocess. Here the reason for using the feature is that if the array is updated in a subprocess, it would have no effect on the array value in the script which is passed to the second loop.
For related discussion:
Extract File Basename Without Path and Extension in Bash
Bash Script: While-Loop Subshell Dilemma
I spent some really nice time on this today because I needed to delete files which have same name but different extensions, so if anyone is looking for a quick implementation, here you go:
#!/bin/bash
# We need some reference to files which we want to keep and not delete,
 # let's assume you want to keep files in first folder with jpeg, so you
# need to map it into the desired file extension first.
FILES_TO_KEEP=`ls -1 ${2} | sed 's/\.pdf$/.jpeg/g'`
#iterate through files in first argument path
for file in ${1}/*; do
# In my case, I did not want to do anything with directories, so let's continue cycle when hitting one.
if [[ -d $file ]]; then
continue
fi
# let's omit path from the iterated file with baseline so we can compare it to the files we want to keep
NAME_WITHOUT_PATH=`basename $file`
 # I use mac which is equal to having poor quality clts
# when it comes to operating with strings,
# this should be safe check to see if FILES_TO_KEEP contain NAME_WITHOUT_PATH
if [[ $FILES_TO_KEEP == *"$NAME_WITHOUT_PATH"* ]];then
echo "Not deleting: $NAME_WITHOUT_PATH"
else
# If it does not contain file from the other directory, remove it.
echo "deleting: $NAME_WITHOUT_PATH"
rm -rf $file
fi
done
Usage: sh deleteDifferentFiles.sh path/from/where path/source/of/truth

Move files and rename - one-liner

I'm encountering many files with the same content and the same name on some of my servers. I need to quarantine these files for analysis so I can't just remove the duplicates. The OS is Linux (centos and ubuntu).
I enumerate the file names and locations and put them into a text file.
Then I do a for statement to move the files to quarantine.
for file in $(cat bad-stuff.txt); do mv $file /quarantine ;done
The problem is that they have the same file name and I just need to add something unique to the filename to get it to save properly. I'm sure it's something simple but I'm not good with regex. Thanks for the help.
Since you're using Linux, you can take advantage of GNU mv's --backup.
while read -r file
do
mv --backup=numbered "$file" "/quarantine"
done < "bad-stuff.txt"
Here's an example that shows how it works:
$ cat bad-stuff.txt
./c/foo
./d/foo
./a/foo
./b/foo
$ while read -r file; do mv --backup=numbered "$file" "./quarantine"; done < "bad-stuff.txt"
$ ls quarantine/
foo foo.~1~ foo.~2~ foo.~3~
$
I'd use this
for file in $(cat bad-stuff.txt); do mv $file /quarantine/$file.`date -u +%s%N`; done
You'll get everyfile with a timestamp appended (in nanoseconds).
You can create a new file name composed by the directory and the filename. Thus you can add one more argument in your original code:
for ...; do mv $file /quarantine/$(echo $file | sed 's:/:_:g') ; done
Please note that you should replace the _ with a proper character which is special enough.

Resources