Moving multiple files in directory that might have duplicate file names

Moving multiple files in directory that might have duplicate file names - linux

can anyone help me with this?
I am trying to copy images from my USB to an archive on my computer, I have decided to make a BASH script to make this job easier. I want to copy files(ie IMG_0101.JPG) and if there is already a file with that name in the archive (Which there will be as I wipe my camera everytime I use it) the file should be named IMG_0101.JPG.JPG so that I don't lose the file.
#method, then
mv IMG_0101.JPG IMG_0101.JPG.JPG
else mv IMG_0101 path/to/destination

for file in "$source"/*; do
newfile="$dest"/"$file"
while [ -e "$newfile" ]; do
newfile=$newfile.JPG
done
cp "$file" "$newfile"
done
There is a race condition here (if another process could create a file by the same name between the first done and the cp) but that's fairly theoretical.
It would not be hard to come up with a less primitive renaming policy; perhaps replace .JPG at the end with an increasing numeric suffix plus .JPG?

Use the last modified timestamp of the file to tag each filename so if it is the same file it doesn't copy it over again.
Here's a bash specific script that you can use to move files from a "from" directory to a "to" directory:
#!/bin/bash
for f in from/*
do
filename="${f##*/}"`stat -c %Y $f`
if [ ! -f to/$filename ]
then
mv $f to/$filename
fi
done
Here's some sample output (using the above code in a script called "movefiles"):
# ls from
# ls to
# touch from/a
# touch from/b
# touch from/c
# touch from/d
# ls from
a b c d
# ls to
# ./movefiles
# ls from
# ls to
a1385541573 b1385541574 c1385541576 d1385541577
# touch from/a
# touch from/b
# ./movefiles
# ls from
# ls to
a1385541573 a1385541599 b1385541574 b1385541601 c1385541576 d1385541577

Related

Create folders automatically and move files

I have a lot of daily files that are sort by hours which comes from a data-logger (waveform). I downloaded inside a USB stick, now I need to save them inside folders named with the first 8 characters of waveform.
Those files have the following pattern:
Year-Month-Day-hourMinute-##.Code_Station_location_Channel
for example, inside the USB I have:
2020-10-01-0000-03.AM_REDDE_00_EHE; 2020-10-01-0100-03.AM_REDDE_00_EHE; 2020-10-02-0300-03.AM_REDDE_00_EHE; 2020-10-20-0000-03.AM_REDDE_00_EHE; 2020-10-20-0100-03.AM_REDDE_00_EHE; 2020-11-15-2000-03.AM_REDDE_00_EHE; 2020-11-15-2100-03.AM_REDDE_00_EHE; 2020-11-19-0400-03.AM_REDDE_00_EHE; 2020-11-19-0900-03.AM_REDDE_00_EHE;
I modified a little a code from #user3360767 (shell script to create folder daily with time-stamp and push time-stamp generated logs) to speed up the procedure of creating a folder and moving the files to them
for filename in 2020-10-01*EHE; do
foldername=$(echo "$filename" | awk '{print (201001)}');
mkdir -p "$foldername"
mv "$filename" "$foldername"
echo "$filename $foldername" ;
done
2020-10-01*EHE
Here I list all hours from 2020-10-01-0000-03.AM_REDDE_00_EHE
foldername=$(echo "$filename" | awk '{print (201001)}');
Here I create the folder that belongs to 2020-10-01 and with the following lines create the folder and then move all files to created folder.
mkdir -p "$foldername"
mv "$filename" "$foldername"
echo "$filename $foldername" ;
As you may notice, I will always need to modify the line for filename in 2020-10-01*EHE each time the file changes the date.
Is there a way to try to create folders with the first 8 number of the file?
Tonino

Use date
And since the foldername doesn't change, you don't need to keep creating one inside the loop.
files="$(date +%Y-%m-%d)*EHE"
foldername=$(date +%Y%m%d)
mkdir -p "$foldername"
for filename in $files; do
mv "$filename" "$foldername"
echo "$filename $foldername"
done
Edit:
If you want to specify the folder each time, you can pass it as an argument and use sed to get the filename pattern
foldername=$1
files=$(echo $1 | sed 's/\(....\)\(..\)\(..\)/\1-\2-\3/')
filepattern="$files*EHE"
mkdir -p "$foldername"
for filename in $filepattern; do
mv "$filename" "$foldername"
echo "$filename $foldername"
done
You call it with
./<yourscriptname>.sh 20101001

I think you want to move all files whose names end in *EHE into subdirectories. The subdirectories will be created as necessary and will be named according to the date at the start of each filename without the dashes/hyphens.
Please test the following on a copy of your files in a temporary directory somewhere.
#!/bin/bash
for filename in *EHE ; do
# Derive folder by deleting all dashes from filename, then taking first 8 characters
folder=${filename//-/}
folder=${folder:0:8}
echo "Would move $filename to $folder"
# Uncomment next 2 lines to actually move file
# mkdir -p "$folder"
# mv "$filename" "$folder"
done
Sample Output
Would move 2020-10-01-0000-03.AM_REDDE_00_EHE to 20201001
Would move 2020-10-01-0100-03.AM_REDDE_00_EHE to 20201001
Note that the 2 lines:
folder=${filename//-/}
folder=${folder:0:8}
use "bash parameter substitution", which is described here if you want to learn about it, and obviate the need to create whole new processes to run awk, sed or cut to extract the fields.

Looping over all files of certain extension in a directory

I wrote a small script that unzips all the *.zip files in the current directory to extract only *.srt files directory to a newly created directory. It then loops over all the *.mkv files in the current directory to get their name and then changes subs/*.srt file name to produce a new file name that is exactly as *.mkv file name.
The script works when there is one zip file and one mkv file but when there are more files it produces bad filenames. I cannot track why this is the case. Now I figured out when this is the case.
EDIT
I managed to narrow down the scenarios when file names are changed in erroneous way.
Let's say in a current directory we have three *.mkv files: (sorted alphabetically)
$ ls -1a *.mkv
Home.S06E10.1080p.BluRay.x264-PRINTER.mkv
Home.S06E11.1080p.BluRay.x264-PRINTER.mkv
Home.S06E12.1080p.BluRay.x264-PRINTER.mkv
and three *.srt files:
$ ls -1a *.srt
Home.S06E10.srt
Home.S06E11.BDRip.X264-PRINTER.srt
Home.S06E12.BDRip.X264-PRINTER.srt
When I run the script, I get:
subs/Home.S06E10.srt -> subs/Home.S06E10.1080p.BluRay.x264-PRINTER.srt
subs/Home.S06E10.1080p.BluRay.x264-PRINTER.srt -> subs/Home.S06E11.1080p.BluRay.x264-PRINTER.srt
subs/Home.S06E11.1080p.BluRay.x264-PRINTER.srt -> subs/Home.S06E12.1080p.BluRay.x264-PRINTER.srt
As you see, Home.S06E10.srt is used twice
#!/usr/bin/env bash
mkdir -p subs
mkdir -p mkv-out
mkdir -p subs-bak
# unzip files, maybe there are subtitles in it...
for zip in *.zip; do
if [ -f "$zip" ]; then
unzip "$zip" -d subs "*.srt" >/dev/null
fi
done
# move all subtitles to subs catalog
for srt in *.srt; do
if [ -f "$srt" ]; then
mv "*.srt" subs
fi
done
mkvCount=(*.mkv)
srtCount=(subs/*.srt)
if [ ${#mkvCount[#]} != ${#srtCount[#]} ]; then
echo "Different number of srt and mkv files!"
exit 1
fi
for MOVIE in *.mkv; do
for SUBTITLE in subs/*.srt; do
NAME=$(basename "$MOVIE" .mkv)
SRT="subs/$NAME.srt"
if [ ! -f "$SRT" ]; then
echo "$SUBTITLE -> ${SRT}"
mv "$SUBTITLE" "$SRT"
fi
done
done

You seem to be relying on the lexicographical order of the files to associate one SRT with one MKV. If all you have are season-episode files for the same series, then I suggest a completely different approach: iterate a season and an episode counters, then generate masks in the form S##E## and find a movie and a subtitle files. If you find them, you move them.
for season in {01..06}; do
for episode in {01..24}; do
# Count how many movies and subtitles we have in the form S##E##
nummovies=$(find -name "*S${season}E${episode}*.mkv" | wc -l)
numsubs=$(find -name "*S${season}E${episode}*.srt" | wc -l)
if [[ $nummovies -gt 1 || $numsubs -gt 1 ]]; then
echo "Multiple movies/subtitles for S${season}E${episode}"
exit 1
fi
# Skip if there is no movie or subtitle for this particular
# season/episode combination
if [[ $nummovies -eq 0 ]]; then
continue
fi
if [[ $numsubs -eq 0 ]]; then
echo "No subtitle for S${season}E${episode}"
continue
fi
# Now actually take the MKV file, get its basename, then find the
# SRT file with the same S##E## and move it
moviename=$(find -name "*S${season}E${episode}*.mkv")
basename=$(basename -s .mkv "$moviename")
subfile=$(find -name "*S${season}E${episode}*.srt")
mv "${subfile}" "${basename}.mkv"
done
done
If you don't want to rewrite everything, just change your last loop:
Drop the inner loop
Take the movie name instead and use sed to find the particular S##E## substring
Use find to find one SRT file like in my code
Move it
This has the benefit of not relying on hard-coded number of seasons/episodes. I guessed six seasons and no season with more than 26 episodes. However I thought my code would do and would look more simple.
Make certain that there will be exactly one SRT file. Having zero or more than one file will probably just give an error from mv, but it's better to be safe. In my code I used a separate call to find with wc to count the number of lines, but if you are more knowledgeable in bash-foo, then perhaps there's a way to treat the output of find as an array instead.
In both my suggestions you can also drop that check for # movies = # subtitles. This gives you more flexibility. The subtitles can be in whatever directories you want, but the movies are assumed to the in the CWDIR. With find you can also use the -or operator to accept other extensions, such as AVI and MPG.

Delete files in one directory that do not exist in another directory or its child directories

I am still a newbie in shell scripting and trying to come up with a simple code. Could anyone give me some direction here. Here is what I need.
Files in path 1: /tmp
100abcd
200efgh
300ijkl
Files in path2: /home/storage
backupfile_100abcd_str1
backupfile_100abcd_str2
backupfile_200efgh_str1
backupfile_200efgh_str2
backupfile_200efgh_str3
Now I need to delete file 300ijkl in /tmp as the corresponding backup file is not present in /home/storage. The /tmp file contains more than 300 files. I need to delete the files in /tmp for which the corresponding backup files are not present and the file names in /tmp will match file names in /home/storage or directories under /home/storage.
Appreciate your time and response.

You can also approach the deletion using grep as well. You can loop though the files in /tmp checking with ls piped to grep, and deleting if there is not a match:
#!/bin/bash
[ -z "$1" -o -z "$2" ] && { ## validate input
printf "error: insufficient input. Usage: %s tmpfiles storage\n" ${0//*\//}
exit 1
}
for i in "$1"/*; do
fn=${i##*/} ## strip path, leaving filename only
## if file in backup matches filename, skip rest of loop
ls "${2}"* | grep -q "$fn" &>/dev/null && continue
printf "removing %s\n" "$i"
# rm "$i" ## remove file
done
Note: the actual removal is commented out above, test and insure there are no unintended consequences before preforming the actual delete. Call it passing the path to tmp (without trailing /) as the first argument and with /home/storage as the second argument:
$ bash scriptname /path/to/tmp /home/storage

You can solve this by
making a list of the files in /home/storage
testing each filename in /tmp to see if it is in the list from /home/storage
Given the linux+shell tags, one might use bash:
make the list of files from /home/storage an associative array
make the subscript of the array the filename
Here is a sample script to illustrate ($1 and $2 are the parameters to pass to the script, i.e., /home/storage and /tmp):
#!/bin/bash
declare -A InTarget
while read path
do
name=${path##*/}
InTarget[$name]=$path
done < <(find $1 -type f)
while read path
do
name=${path##*/}
[[ -z ${InTarget[$name]} ]] && rm -f $path
done < <(find $2 -type f)
It uses two interesting shell features:
name=${path##*/} is a POSIX shell feature which allows the script to perform the basename function without an extra process (per filename). That makes the script faster.
done < <(find $2 -type f) is a bash feature which lets the script read the list of filenames from find without making the assignments to the array run in a subprocess. Here the reason for using the feature is that if the array is updated in a subprocess, it would have no effect on the array value in the script which is passed to the second loop.
For related discussion:
Extract File Basename Without Path and Extension in Bash
Bash Script: While-Loop Subshell Dilemma

I spent some really nice time on this today because I needed to delete files which have same name but different extensions, so if anyone is looking for a quick implementation, here you go:
#!/bin/bash
# We need some reference to files which we want to keep and not delete,
 # let's assume you want to keep files in first folder with jpeg, so you
# need to map it into the desired file extension first.
FILES_TO_KEEP=`ls -1 ${2} | sed 's/\.pdf$/.jpeg/g'`
#iterate through files in first argument path
for file in ${1}/*; do
# In my case, I did not want to do anything with directories, so let's continue cycle when hitting one.
if [[ -d $file ]]; then
continue
fi
# let's omit path from the iterated file with baseline so we can compare it to the files we want to keep
NAME_WITHOUT_PATH=`basename $file`
 # I use mac which is equal to having poor quality clts
# when it comes to operating with strings,
# this should be safe check to see if FILES_TO_KEEP contain NAME_WITHOUT_PATH
if [[ $FILES_TO_KEEP == *"$NAME_WITHOUT_PATH"* ]];then
echo "Not deleting: $NAME_WITHOUT_PATH"
else
# If it does not contain file from the other directory, remove it.
echo "deleting: $NAME_WITHOUT_PATH"
rm -rf $file
fi
done
Usage: sh deleteDifferentFiles.sh path/from/where path/source/of/truth

How to find,copy and rename files in linux?

I am trying to find all files in a directory and sub-directories and then copy them to a different directory. However some of them have the same name, so I need to copy the files over and then if there are two files have the same name, rename one of those files.
So far I have managed to copy all found files with a unique name over using:
#!/bin/bash
if [ ! -e $2 ] ; then
mkdir $2
echo "Directory created"
fi
if [ ! -e $1 ] ; then
echo "image source does not exists"
fi
find $1 -name IMG_****.JPG -exec cp {} $2 \;
However, I now need some sort of if statement to figure out if a file has the same name as another file that has been copied.

Since you are on linux, you are probably using cp from coreutils. If that is the case, let it do the backup for you by using cp --backup=t

Try this approach: put the list of files in a variable and copy each file looking if the copy operation succeeds. If not, try a different name.
In code:
FILES=`find $1 -name IMG_****.JPG | xargs -r`
for FILE in $FILES; do
cp -n $FILE destination
# Check return error of latest command (i.e. cp)
# through the $? variable and, in case
# choose a different name for the destination
done
Inside the for statement, you can also put some incremental integer to try different names incrementally (e.g., name_1, name_2 and so on, until the cp command succeeds).

You can do:
for file in $1/**/IMG_*.jpg ; do
target=$2/$(basename "$file")
SUFF=0
while [[ -f "$target$SUFF" ]] ; do
(( SUFF++ ))
done
cp "$file" "$target$SUFF"
done
in your script in place of the find command to append integer suffixes to identically-named files

You can use rsync with the following switches for more control
rsync --backup --backup-dir=DIR --suffix=SUFFIX -az <source dire> <destination dir>
Here (from man page)
-b, --backup
With this option, preexisting destination files are renamed as each file is transferred or deleted. You can control where the backup file goes and what (if any) suffix gets appended using the --backup-dir and --suffix options.
--backup-dir=DIR
In combination with the --backup option, this tells rsync to store all backups in the specified directory on the receiving side. This can be used for incremental backups. You can additionally specify a backup suffix using the --suffix option (otherwise the files backed up in the specified directory will keep their original filenames).
--suffix=SUFFIX
This option allows you to override the default backup suffix used with the --backup (-b) option. The default suffix is a ~ if no --backup-dir was specified, otherwise it is an empty string.
You can use rsycn to either sync two folders on local file system or on a remote file system. You can even do syncing over ssh connection.
rsync is amazingly powerful. See the man page for all the options.

Script for renaming files with logical

Someone has very kindly help get me started on a mass rename script for renaming PDF files.
As you can see I need to add a bit of logical to stop the below happening - so something like add a unique number to a duplicate file name?
rename 's/^(.{5}).*(\..*)$/$1$2/' *
rename -n 's/^(.{5}).*(\..*)$/$1$2/' *
Annexes 123114345234525.pdf renamed as Annex.pdf
Annexes 123114432452352.pdf renamed as Annex.pdf
Hope this makes sense?
Thanks

for i in *
do
x='' # counter
j="${i:0:2}" # new name
e="${i##*.}" # ext
while [ -e "$j$x" ] # try to find other name
do
((x++)) # inc counter
done
mv "$i" "$j$x" # rename
done
before
$ ls
he.pdf hejjj.pdf hello.pdf wo.pdf workd.pdf world.pdf
after
$ ls
he.pdf he1.pdf he2.pdf wo.pdf wo1.pdf wo2.pdf

This should check whether there will be any duplicates:
rename -n [...] | grep -o ' renamed as .*' | sort | uniq -d
If you get any output of the form renamed as [...], then you have a collision.
Of course, this won't work in a couple corner cases - If your files contain newlines or the literal string renamed as, for example.

As noted in my answer on your previous question:
for f in *.pdf; do
tmp=`echo $f | sed -r 's/^(.{5}).*(\..*)$/$1$2/'`
mv -b ./"$f" ./"$tmp"
done
That will make backups of deleted or overwritten files. A better alternative would be this script:
#!/bin/bash
for f in $*; do
tar -rvf /tmp/backup.tar $f
tmp=`echo $f | sed -r 's/^(.{5}).*(\..*)$/$1$2/'`
i=1
while [ -e tmp ]; do
tmp=`echo $tmp | sed "s/\./-$i/"`
i+=1
done
mv -b ./"$f" ./"$tmp"
done
Run the script like this:
find . -exec thescript '{}' \;
The find command gives you lots of options for specifing which files to run on, works recursively, and passes all the filenames in to the script. The script backs all file up with tar (uncompressed) and then renames them.
This isn't the best script, since it isn't smart enough to avoid the manual loop and check for identical file names.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Moving multiple files in directory that might have duplicate file names - linux

Related

Create folders automatically and move files

Looping over all files of certain extension in a directory

Delete files in one directory that do not exist in another directory or its child directories

How to find,copy and rename files in linux?

Script for renaming files with logical

Categories

Resources