Find files missing in one directory that are in a second - needs to ignore file extension - linux

I need to find files missing in one directory that are in a second, but I need to ignore file extensions. I need to do this based on file name only, not fill contents. If my file names were identical (including extensions), I could use diff something like:
diff dirA dirB
, however files in directory A have a different extension from those in directory B. I need a way to use something like diff but to ignore the extension differences between the two directories.
Another important point is that each directory may contain hundreds of thousands of files, so I have a need for a relatively efficient process.
Grateful for any ideas.

I hope it helps
I created sample files, look like this
sample
/
├── a
│   ├── 1
│   └── 3
└── b
│   ├── 1
│   ├── 2
│   └── 3.txt
$comm -13 <(find a/ -type f -exec bash -c 'basename "${0%.*}"' {} \; | sort) <(find b/ -type f -exec bash -c 'basename "${0%.*}"' {} \; | sort)
output:
2

Related

BEST Linux script; to rename SRT to name of movie file in same folder; multiple sub folders [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 years ago.
Improve this question
There have been multiple attempts to answer this question, but no correct script can be found.
The problem:
SRT subtitles will not load unless having the same name as the movie, or same name as movie +.en.srt or .es.srt or .fr.srt and so on.
1000's of movie directories within a main movie directory having within their respective movie directory, sometimes 1+ .srt files (1_English.srt, 2_English.srt, *French.srt, etc.).
My media server is using Ubuntu, so the solution should use a BASH script.
Here is a snippet of my file structure:
Test-dir$ tree
.
├── renamer.sh
├── Saga.of.the.Phoenix.1990.1080p
│   ├── 1_French.srt
│   ├── 1_Spanish.srt
│   ├── 2_English.srt
│   ├── 3_English.srt
│   └── Saga.of.the.Phoenix.1990.1080p.BluRay.x265.mp4
├── Salt.and.Pepper.1968.1080p
│   ├── 1_French.srt
│   ├── 1_Spanish.srt
│   ├── 2_English.srt
│   ├── 4_English.srt
│   └── Salt.and.Pepper.1968.1080p.mp4
└── Salyut-7.2017.1080p.BluRay.x265
├── 2_English.srt
└── Salyut-7.2017.1080p.BluRay.x265.mp4
The questions:
In writing a BASH script,
There are multiple srt files with the same language, I usually like to choose the bigger file and remove the smaller file, the first part of script would have to sort same language srt and delete the smaller ones, how to script this?
How to change the name of srt's to have the same name as the movie file (not always mp4, sometimes mkv or avi.), while appending acronyms for language (en, es, fr, ru,..) if English.srt then change name to "MovieName".en.srt?
I have started the script removing srt files from the SUB directories of the movie directory and then deleting the SUB directory.
Also, added a script to delete any unwanted parts in the string of the movie, or delete unwanted files.
#!/bin/bash/
# Using current working DIR of where script is ran from
DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )"
# Moves srt files from SUB folders to their movie folder.
for i in */Subs; do
mv "$i"/* "$i"/..
done
# Removes Subs directory.
find $DIR/* -type d -name "Subs" -exec rm -rf {} +
# Removing the additional rar string from the folders and their movie names.
find . -depth -name '*-rar*' -execdir bash -c 'for f; do mv -i "$f" "${f//-rar/}"; done' bash {} +
# Removing unwanted files from all movie folders.
find . -maxdepth 2 -type f \( -name "RAR.txt" -o -name "RAR.nfo" \) -delete
######## Your helper code starts from here to answer questions 1 and 2 #####################
Many thanks for helping with this conundrum, not only will this help one person, but many, on our quest to free many hours of copying, deleting, pasting, and all with a single script.
Update:
BTB91 gave a brilliant answer and has worked, however to help others learn the many ways to go about solving the same problem above I would like to keep this thread open.
IFS=$'\n' eval "MOVS=(\$( find \"\$DIR\" -mindepth 1 -maxdepth 1 -type d -printf '%f\n' ))" # list of movies
for M in "${MOVS[#]}" ; do
cd "$DIR/$M"
IFS=$'\n' eval "LANGS=(\$( ls | sed -nr 's/.*_([[:alpha:]]+).srt/\1/p' | sort -u ))" # list of languages for movie
for L in "${LANGS[#]}" ; do
IFS=$'\n' eval "FILES=(\$( ls -S *_$L.srt))" # list files for language sorted by size
case "${L,,}" in
en*)
L=en
;;
sp*|es*)
L=es
;;
esac
mv -v "${FILES[0]}" "$M.$L.srt"
FILES[0]=
rm -vf "${FILES[#]}"
done
cd "$OLDPWD"
done
I used "IFS=$'\n' eval ..." because the directory or file names might contain spaces.

Delete all files and folders recursively but excluding one folder

I have a folder structure like this:
my-root-folder
├── .git
├── README.md
├── index.css
├── index.js
├── index.html
├── assets
│ ├── image
│ │ ├── logo.jpg
│ │ ├── header.jpg
i want delete all files and folder into my-root-folder but not the .git folder (./my-root-folder/.git) and his child.
i have tried with this command, but always delete all:
shopt -s extglob && rm -rf ./* !(".git")
Side note i'm in a docker image based on alpine.
You can try with find:
find my-root-folder -mindepth 1 -not -path "*/.git*" -delete
Roughly translated, this means: find inside my-root-folder everything at least one level deep (so, exclude the folder itself) that does not match the predicate path contains "/.git" and delete it.
Note that the argument ".git" is actually a regular expression, so be careful when adapting to other situations. find is a very powerful and useful tool, so be sure to check out its documentation!
This find command delete nested .git as well :
find . -mindepth 1 -path ./.git -prune -o -exec echo rm -rf {} \;
-path ./.git -prune -o excludes only top level .git
One drawback is you may see messages because find tries to access directories it has already deleted.
Remove echo once you are happy.

How to copy folders and subfolders which have selected files

I have a directory oridir with structure as follows:
.
├── DIRA
│   ├── DIRA1
│   │   └── file2.txt
│   └── DIRA2
│   ├── file1.xls
│   └── file1.txt
└── DIRB
├── DIRB1
│   └── file1.txt
└── DIRB2
└── file2.xls
I have to copy files which have extension .xls while maintaining the directory structure. So I need to have following directory and files in a newdir folder:
.
├── DIRA
│   └── DIRA2
│   └── file1.xls
└── DIRB
└── DIRB2
└── file2.xls
I tried following command but it copies all files and folders:
cp -r oridir newdir
Finding required files can be done as follows:
$ find oridir | grep xls$
oridir/DIRB/DIRB2/file2.xls
oridir/DIRA/DIRA2/file1.xls
Also as follows:
$ find oridir -type f -iname *.xls
./oridir/DIRB/DIRB2/file2.xls
./oridir/DIRA/DIRA2/file1.xls
But how to create these folders and copy files. How can I achieve this selected creation of directories and copying files with `bash' in Linux?
Edit: There are space also in some file and directory names.
cp's --parents flag makes use full source file name under DIRECTORY
For example, if recursive glob ** is enabled (shopt -s globstar):
cp --parents origin/**/*.xls target
If recursive glob is not enabled, you have to add a wildcard for each level on directory hierarchy:
cp --parents origin/*/*/*.xls target
If a destination dir is "dest".
foo.sh
#!/bin/bash
dest=./dest
find . -type f -name "*.xls" | while read f
do
d=$(dirname "${f}")
d="${dest}/${d}"
mkdir -p "${d}"
cp "${f}" "${d}"
done
Make dirs and files.
$ mkdir -p DIRA/DIRA1
$ mkdir -p DIRA/DIRA2
$ mkdir -p DIRB/DIRB1
$ mkdir -p DIRB/DIRB2
$ touch DIRA/DIRA1/file2.txt
$ touch DIRA/DIRA2/file1.xls
$ touch DIRA/DIRA2/file1.txt
$ touch DIRB/DIRB1/file1.txt
$ touch DIRB/DIRB1/file2.xls
A result is
$ find dest
dest
dest/DIRB
dest/DIRB/DIRB1
dest/DIRB/DIRB1/file2.xls
dest/DIRA
dest/DIRA/DIRA2
dest/DIRA/DIRA2/file1.xls
See Yuji's excellent answer first, but I think tar is also a good option here:
cd oridir; find . -name "*.xls" | xargs tar c | (cd ../newdir; tar x)
You may need to adjust oridir and/or ../newdir depending on the precise paths of your directories.
Possible improvement: Here is a version that may be better in that it will handle files (and paths) with spaces (or other strange characters) in their names, and that uses tar's own options instead of xargs and cd:
cd oridir; find . -print0 -name "*.xls" | tar -c --null -T- | tar -C ../newdir -x
Explanation:
The -print0 and the --null cause the respective commands to separate filenames by the null (ASCII 0) character only.
-T- causes tar to read filenames from standard input.
-C causes tar to cd before extracting.

How to run a command on file in directory recursively creating a new file in each directory being visited (Bash)?

I have a directory with a few sub-directories.
.
├── alembic
│   └── results.csv
├── bigdata
│   └── results.csv
├── calchipan
│   └── results.csv
I'd like to create a copy of each results.csv file in the same directory named results_cleaned.csv where certain lines will be removed. Each sub-directory is known to contain only a single file, results.csv.
Running this on a single directory works:
find . -name 'results.csv' | xargs grep -v "Pattern" > results_cleaned.csv
However, running the same command on a root, produces just a single results_cleaned.csv file. I understand that I have to specify that I want to create a file in each sub directory individually, how do I do this?
You can't use xarg since all values are feed to single grep.
It would be better to iterate over each line of find (file names):
find . -name 'results.csv' | while read -r f
do
grep -v "Pattern" "$f" > "$(dirname "$f")/results_cleaned.csv"
done

Recursively scan directories in bash

I need to write a bash script that scans directories in current directory and generate md5 checksum values for each file in directory tree. It also should keep relative path to file and print checksums in a file.
For example if directory tree looks like this:
.
├── d
│   ├── file1.c
│   └── file2.c
├── e
│   └── file3.c
└── f
└── file4.cpp
The output should be something like this:
d8e8fca2dc0f896fd7cb4cb0031ba249 d/file1.c
d8e8fca2dc0f896fd7cb4cb0031ba249 d/file2.c
d8e8fca2dc0f896fd7cb4cb0031ba249 e/file3.c
d8e8fca2dc0f896fd7cb4cb0031ba249 f/file4.cpp
But I can't find a way to keep path to file when cd to them...
find . -type f -exec md5sum {} \;
or...
find . -type f | xargs -n 1 -d "\n" md5sum

Resources