Looping over all files of certain extension in a directory - linux

I wrote a small script that unzips all the *.zip files in the current directory to extract only *.srt files directory to a newly created directory. It then loops over all the *.mkv files in the current directory to get their name and then changes subs/*.srt file name to produce a new file name that is exactly as *.mkv file name.
The script works when there is one zip file and one mkv file but when there are more files it produces bad filenames. I cannot track why this is the case. Now I figured out when this is the case.
EDIT
I managed to narrow down the scenarios when file names are changed in erroneous way.
Let's say in a current directory we have three *.mkv files: (sorted alphabetically)
$ ls -1a *.mkv
Home.S06E10.1080p.BluRay.x264-PRINTER.mkv
Home.S06E11.1080p.BluRay.x264-PRINTER.mkv
Home.S06E12.1080p.BluRay.x264-PRINTER.mkv
and three *.srt files:
$ ls -1a *.srt
Home.S06E10.srt
Home.S06E11.BDRip.X264-PRINTER.srt
Home.S06E12.BDRip.X264-PRINTER.srt
When I run the script, I get:
subs/Home.S06E10.srt -> subs/Home.S06E10.1080p.BluRay.x264-PRINTER.srt
subs/Home.S06E10.1080p.BluRay.x264-PRINTER.srt -> subs/Home.S06E11.1080p.BluRay.x264-PRINTER.srt
subs/Home.S06E11.1080p.BluRay.x264-PRINTER.srt -> subs/Home.S06E12.1080p.BluRay.x264-PRINTER.srt
As you see, Home.S06E10.srt is used twice
#!/usr/bin/env bash
mkdir -p subs
mkdir -p mkv-out
mkdir -p subs-bak
# unzip files, maybe there are subtitles in it...
for zip in *.zip; do
if [ -f "$zip" ]; then
unzip "$zip" -d subs "*.srt" >/dev/null
fi
done
# move all subtitles to subs catalog
for srt in *.srt; do
if [ -f "$srt" ]; then
mv "*.srt" subs
fi
done
mkvCount=(*.mkv)
srtCount=(subs/*.srt)
if [ ${#mkvCount[#]} != ${#srtCount[#]} ]; then
echo "Different number of srt and mkv files!"
exit 1
fi
for MOVIE in *.mkv; do
for SUBTITLE in subs/*.srt; do
NAME=$(basename "$MOVIE" .mkv)
SRT="subs/$NAME.srt"
if [ ! -f "$SRT" ]; then
echo "$SUBTITLE -> ${SRT}"
mv "$SUBTITLE" "$SRT"
fi
done
done

You seem to be relying on the lexicographical order of the files to associate one SRT with one MKV. If all you have are season-episode files for the same series, then I suggest a completely different approach: iterate a season and an episode counters, then generate masks in the form S##E## and find a movie and a subtitle files. If you find them, you move them.
for season in {01..06}; do
for episode in {01..24}; do
# Count how many movies and subtitles we have in the form S##E##
nummovies=$(find -name "*S${season}E${episode}*.mkv" | wc -l)
numsubs=$(find -name "*S${season}E${episode}*.srt" | wc -l)
if [[ $nummovies -gt 1 || $numsubs -gt 1 ]]; then
echo "Multiple movies/subtitles for S${season}E${episode}"
exit 1
fi
# Skip if there is no movie or subtitle for this particular
# season/episode combination
if [[ $nummovies -eq 0 ]]; then
continue
fi
if [[ $numsubs -eq 0 ]]; then
echo "No subtitle for S${season}E${episode}"
continue
fi
# Now actually take the MKV file, get its basename, then find the
# SRT file with the same S##E## and move it
moviename=$(find -name "*S${season}E${episode}*.mkv")
basename=$(basename -s .mkv "$moviename")
subfile=$(find -name "*S${season}E${episode}*.srt")
mv "${subfile}" "${basename}.mkv"
done
done
If you don't want to rewrite everything, just change your last loop:
Drop the inner loop
Take the movie name instead and use sed to find the particular S##E## substring
Use find to find one SRT file like in my code
Move it
This has the benefit of not relying on hard-coded number of seasons/episodes. I guessed six seasons and no season with more than 26 episodes. However I thought my code would do and would look more simple.
Make certain that there will be exactly one SRT file. Having zero or more than one file will probably just give an error from mv, but it's better to be safe. In my code I used a separate call to find with wc to count the number of lines, but if you are more knowledgeable in bash-foo, then perhaps there's a way to treat the output of find as an array instead.
In both my suggestions you can also drop that check for # movies = # subtitles. This gives you more flexibility. The subtitles can be in whatever directories you want, but the movies are assumed to the in the CWDIR. With find you can also use the -or operator to accept other extensions, such as AVI and MPG.

Related

Find out if a backup ran by searching the newest file

I'd like to write a short and simple script, that searches for a file using a specivic filter, and checks the age of that file. I want to write a short output and an error-code. This should be accessible for an NRPE-Server.
The script itself works, but I only have a problem when the file does not exist. This happens with that command:
newestfile=$(ls -t $path/$filter | head -1)
When the files exist, everything works as it should. When there nothing matches my filter, I get the output (I changed the filter to *.zip to show):
ls: cannot access '/backup/*.zip': No such file or directory
But I want to get the following output and then just exit the script with code 1:
there are no backups with the filter *.zip in the directory /backup
I am pretty sure this is a very easy problem but I just don't know whats wron. By the way, I am still "new" to bash scripts.
Here is my whole code:
#!/bin/bash
# Set the variables
path=/backup
filter=*.tar.gz
# Find the newest file
newestfile=$(ls -t $path/$filter | head -1)
# check if we even have a file
if [ ! -f $newestfile ]; then
echo "there are no backups with the filter $filter in the directory $path"
exit 1
fi
# check how old the file is that we found
if [[ $(find "$newestfile" -mtime +1 -print) ]]; then
echo "File $newestfile is older than 24 hours"
exit 2
else
echo "the file $newestfile is younger than 24 hours"
exit 0
fi
Actually, with your code you should also get an error message bash: no match: /backup/*.zip
UPDATE: Fixed the proposed solution, and the missing quotes in the original solution:
I suggest the following approach:
shopt -u failglob # Turn off error from globbing
pathfilter="/backup/*.tar.gz" # Quotes to avoid the wildcards to be expanded here already
# First see whether we have any matching files
files=($pathfilter)
if [[ ! -e ${#files[0]} ]]
then
# .... No matching files
else
# Now you can safely fetch the newest file
# Note: This does NOT work if you have filenames
# containing newlines
newestfile=$(ls -tA $pathfilter | head -1)
fi
I don't like using ls for this task, but I don't see an easy way in bash to do it better.

Delete files in one directory that do not exist in another directory or its child directories

I am still a newbie in shell scripting and trying to come up with a simple code. Could anyone give me some direction here. Here is what I need.
Files in path 1: /tmp
100abcd
200efgh
300ijkl
Files in path2: /home/storage
backupfile_100abcd_str1
backupfile_100abcd_str2
backupfile_200efgh_str1
backupfile_200efgh_str2
backupfile_200efgh_str3
Now I need to delete file 300ijkl in /tmp as the corresponding backup file is not present in /home/storage. The /tmp file contains more than 300 files. I need to delete the files in /tmp for which the corresponding backup files are not present and the file names in /tmp will match file names in /home/storage or directories under /home/storage.
Appreciate your time and response.
You can also approach the deletion using grep as well. You can loop though the files in /tmp checking with ls piped to grep, and deleting if there is not a match:
#!/bin/bash
[ -z "$1" -o -z "$2" ] && { ## validate input
printf "error: insufficient input. Usage: %s tmpfiles storage\n" ${0//*\//}
exit 1
}
for i in "$1"/*; do
fn=${i##*/} ## strip path, leaving filename only
## if file in backup matches filename, skip rest of loop
ls "${2}"* | grep -q "$fn" &>/dev/null && continue
printf "removing %s\n" "$i"
# rm "$i" ## remove file
done
Note: the actual removal is commented out above, test and insure there are no unintended consequences before preforming the actual delete. Call it passing the path to tmp (without trailing /) as the first argument and with /home/storage as the second argument:
$ bash scriptname /path/to/tmp /home/storage
You can solve this by
making a list of the files in /home/storage
testing each filename in /tmp to see if it is in the list from /home/storage
Given the linux+shell tags, one might use bash:
make the list of files from /home/storage an associative array
make the subscript of the array the filename
Here is a sample script to illustrate ($1 and $2 are the parameters to pass to the script, i.e., /home/storage and /tmp):
#!/bin/bash
declare -A InTarget
while read path
do
name=${path##*/}
InTarget[$name]=$path
done < <(find $1 -type f)
while read path
do
name=${path##*/}
[[ -z ${InTarget[$name]} ]] && rm -f $path
done < <(find $2 -type f)
It uses two interesting shell features:
name=${path##*/} is a POSIX shell feature which allows the script to perform the basename function without an extra process (per filename). That makes the script faster.
done < <(find $2 -type f) is a bash feature which lets the script read the list of filenames from find without making the assignments to the array run in a subprocess. Here the reason for using the feature is that if the array is updated in a subprocess, it would have no effect on the array value in the script which is passed to the second loop.
For related discussion:
Extract File Basename Without Path and Extension in Bash
Bash Script: While-Loop Subshell Dilemma
I spent some really nice time on this today because I needed to delete files which have same name but different extensions, so if anyone is looking for a quick implementation, here you go:
#!/bin/bash
# We need some reference to files which we want to keep and not delete,
 # let's assume you want to keep files in first folder with jpeg, so you
# need to map it into the desired file extension first.
FILES_TO_KEEP=`ls -1 ${2} | sed 's/\.pdf$/.jpeg/g'`
#iterate through files in first argument path
for file in ${1}/*; do
# In my case, I did not want to do anything with directories, so let's continue cycle when hitting one.
if [[ -d $file ]]; then
continue
fi
# let's omit path from the iterated file with baseline so we can compare it to the files we want to keep
NAME_WITHOUT_PATH=`basename $file`
 # I use mac which is equal to having poor quality clts
# when it comes to operating with strings,
# this should be safe check to see if FILES_TO_KEEP contain NAME_WITHOUT_PATH
if [[ $FILES_TO_KEEP == *"$NAME_WITHOUT_PATH"* ]];then
echo "Not deleting: $NAME_WITHOUT_PATH"
else
# If it does not contain file from the other directory, remove it.
echo "deleting: $NAME_WITHOUT_PATH"
rm -rf $file
fi
done
Usage: sh deleteDifferentFiles.sh path/from/where path/source/of/truth

Split multiple files

I have a directory with hundreds of files and I have to divide all of them in 400 lines files (or less).
I have tried ls and split, wc and split and to make some scripts.
Actually I'm lost.
Please, can anybody help me?
EDIT:
Thanks to John Bollinger and his answer this is the scritp we will use to our purpose:
#!/bin/bash
# $# -> all args passed to the script
# The arguments passed in order:
# $1 = num of lines (required)
# $2 = dir origin (optional)
# $3 = dir destination (optional)
if [ $# -gt 0 ]; then
lin=$1
if [ $# -gt 1 ]; then
dirOrg=$2
if [ $# -gt 2 ]; then
dirDest=$3
if [ ! -d "$dirDest" ]; then
mkdir -p "$dirDest"
fi
else
dirDest=$dirOrg
fi
else
dirOrg=.
dirDest=.
fi
else
echo "Missing parameters: NumLineas [DirectorioOrigen] [DirectorioDestino]"
exit 1
fi
# The shell glob expands to all the files in the target directory; a different
# glob pattern could be used if you want to restrict splitting to a subset,
# or if you want to include dotfiles.
for file in "$dirOrg"/*; do
# Details of the split command are up to you. This one splits each file
# into pieces named by appending a sequence number to the original file's
# name. The original file is left in place.
fileDest=${file##*/}
split --lines="$lin" --numeric-suffixes "$file" "$dirDest"/"$fileDest"
done
exit0
Since you seem to know about split, and to want to use it for the job, I guess your issue revolves around using one script to wrap the whole task. The details are unclear, but something along these lines is probably what you want:
#!/bin/bash
# If an argument is given then it is the name of the directory containing the
# files to split. Otherwise, the files in the working directory are split.
if [ $# -gt 0 ]; then
dir=$1
else
dir=.
fi
# The shell glob expands to all the files in the target directory; a different
# glob pattern could be used if you want to restrict splitting to a subset,
# or if you want to include dotfiles.
for file in "$dir"/*; do
# Details of the split command are up to you. This one splits each file
# into pieces named by appending a sequence number to the original file's
# name. The original file is left in place.
split --lines=400 --numeric-suffixes "$file" "$file"
done

Find and delete files that contain same string in filename in linux terminal

I want to delete all files from a folder that contain a not unique numerical string in the filename using linux terminal. E.g.:
werrt-110009.jpg => delete
asfff-110009.JPG => delete
asffa-123489.jpg => maintain
asffa-111122.JPG => maintain
Any suggestions?
I only now understand your question, I think. You want to remove all files that contain a numeric value that is not unique (in a particular folder). If a filename contains a value that is also found in another filename, you want to remove both files, right?
This is how I would do that (it may not be the fastest way):
# put all files in your folder in a list
# for array=(*) to work make sure you have enabled nullglob: shopt -s nullglob
array=(*)
delete=()
for elem in "${array[#]}"; do
# for each elem in your list extract the number
num_regex='([0-9]+)\.'
[[ "$elem" =~ $num_regex ]]
num="${BASH_REMATCH[1]}"
# use the extracted number to check if it is unique
dup_regex="[^0-9]($num)\..+?(\1)"
# if it is not unique, put the file in the files-to-delete list
if [[ "${array[#]}" =~ $dup_regex ]]; then
delete+=("$elem")
fi
done
# delete all found duplicates
for elem in "${delete[#]}"; do
rm "$elem"
done
In your example, array would be:
array=(werrt-110009.jpg asfff-110009.JPG asffa-123489.jpg asffa-111122.JPG)
And the result in delete would be:
delete=(werrt-110009.jpg asfff-110009.JPG)
Is this what you meant?
you can use the linux find command along with the -regex parameter and the -delete parameter
to do it in one command
Use "rm" command to delete all matching string files in directory
cd <path-to-directory>/ && rm *110009*
This command helps to delete all files with matching string and it doesn't depend on the position of string in file name.
I was mentioned rm command option as another option to delete files with matching string.
Below is the complete script to achieve your requirement,
#!/bin/sh -eu
#provide the destination fodler path
DEST_FOLDER_PATH="$1"
TEMP_BUILD_DIR="/tmp/$( date +%Y%m%d-%H%M%S)_clenup_duplicate_files"
#++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
clean_up()
{
if [ -d $TEMP_BUILD_DIR ]; then
rm -rf $TEMP_BUILD_DIR
fi
}
trap clean_up EXIT
[ ! -d $TEMP_BUILD_DIR ] && mkdir -p $TEMP_BUILD_DIR
TEMP_FILES_LIST_FILE="$TEMP_BUILD_DIR/folder_file_names.txt"
echo "$(ls $DEST_FOLDER_PATH)" > $TEMP_FILES_LIST_FILE
while read filename
do
#check files with number pattern
if [[ "$filename" =~ '([0-9]+)\.' ]]; then
#fetch the number to find files with similar number
matching_string="${BASH_REMATCH[1]}"
# use the extracted number to check if it is unique
#find the files count with matching_string
if [ $(ls -1 $DEST_FOLDER_PATH/*$matching_string* | wc -l) -gt 1 ]; then
rm $DEST_FOLDER_PATH/*$matching_string*
fi
fi
#reload remaining files in folder (this optimizes the loop and speeds up the operation
#(this helps lot when folder contains more files))
echo "$(ls $DEST_FOLDER_PATH)" > $TEMP_FILES_LIST_FILE
done < $TEMP_FILES_LIST_FILE
exit 0
How to execute this script,
Save this script into file as
path-to-script/delete_duplicate_files.sh (you can rename whatever
you want)
Make script executable
chmod +x {path-to-script}/delete_duplicate_files.sh
Execute script by providing directory path where duplicate
files(files with matching number pattern) needs to be deleted
{path-to-script}/delete_duplicate_files.sh "{path-to-directory}"

Moving multiple files in directory that might have duplicate file names

can anyone help me with this?
I am trying to copy images from my USB to an archive on my computer, I have decided to make a BASH script to make this job easier. I want to copy files(ie IMG_0101.JPG) and if there is already a file with that name in the archive (Which there will be as I wipe my camera everytime I use it) the file should be named IMG_0101.JPG.JPG so that I don't lose the file.
#method, then
mv IMG_0101.JPG IMG_0101.JPG.JPG
else mv IMG_0101 path/to/destination
for file in "$source"/*; do
newfile="$dest"/"$file"
while [ -e "$newfile" ]; do
newfile=$newfile.JPG
done
cp "$file" "$newfile"
done
There is a race condition here (if another process could create a file by the same name between the first done and the cp) but that's fairly theoretical.
It would not be hard to come up with a less primitive renaming policy; perhaps replace .JPG at the end with an increasing numeric suffix plus .JPG?
Use the last modified timestamp of the file to tag each filename so if it is the same file it doesn't copy it over again.
Here's a bash specific script that you can use to move files from a "from" directory to a "to" directory:
#!/bin/bash
for f in from/*
do
filename="${f##*/}"`stat -c %Y $f`
if [ ! -f to/$filename ]
then
mv $f to/$filename
fi
done
Here's some sample output (using the above code in a script called "movefiles"):
# ls from
# ls to
# touch from/a
# touch from/b
# touch from/c
# touch from/d
# ls from
a b c d
# ls to
# ./movefiles
# ls from
# ls to
a1385541573 b1385541574 c1385541576 d1385541577
# touch from/a
# touch from/b
# ./movefiles
# ls from
# ls to
a1385541573 a1385541599 b1385541574 b1385541601 c1385541576 d1385541577

Resources