Recursively scan directories in bash

Recursively scan directories in bash - linux

I need to write a bash script that scans directories in current directory and generate md5 checksum values for each file in directory tree. It also should keep relative path to file and print checksums in a file.
For example if directory tree looks like this:
.
├── d
│   ├── file1.c
│   └── file2.c
├── e
│   └── file3.c
└── f
└── file4.cpp
The output should be something like this:
d8e8fca2dc0f896fd7cb4cb0031ba249 d/file1.c
d8e8fca2dc0f896fd7cb4cb0031ba249 d/file2.c
d8e8fca2dc0f896fd7cb4cb0031ba249 e/file3.c
d8e8fca2dc0f896fd7cb4cb0031ba249 f/file4.cpp
But I can't find a way to keep path to file when cd to them...

find . -type f -exec md5sum {} \;
or...
find . -type f | xargs -n 1 -d "\n" md5sum

Related

Find files missing in one directory that are in a second - needs to ignore file extension

I need to find files missing in one directory that are in a second, but I need to ignore file extensions. I need to do this based on file name only, not fill contents. If my file names were identical (including extensions), I could use diff something like:
diff dirA dirB
, however files in directory A have a different extension from those in directory B. I need a way to use something like diff but to ignore the extension differences between the two directories.
Another important point is that each directory may contain hundreds of thousands of files, so I have a need for a relatively efficient process.
Grateful for any ideas.

I hope it helps
I created sample files, look like this
sample
/
├── a
│   ├── 1
│   └── 3
└── b
│   ├── 1
│   ├── 2
│   └── 3.txt
$comm -13 <(find a/ -type f -exec bash -c 'basename "${0%.*}"' {} \; | sort) <(find b/ -type f -exec bash -c 'basename "${0%.*}"' {} \; | sort)
output:
2

How to copy folders and subfolders which have selected files

I have a directory oridir with structure as follows:
.
├── DIRA
│   ├── DIRA1
│   │   └── file2.txt
│   └── DIRA2
│   ├── file1.xls
│   └── file1.txt
└── DIRB
├── DIRB1
│   └── file1.txt
└── DIRB2
└── file2.xls
I have to copy files which have extension .xls while maintaining the directory structure. So I need to have following directory and files in a newdir folder:
.
├── DIRA
│   └── DIRA2
│   └── file1.xls
└── DIRB
└── DIRB2
└── file2.xls
I tried following command but it copies all files and folders:
cp -r oridir newdir
Finding required files can be done as follows:
$ find oridir | grep xls$
oridir/DIRB/DIRB2/file2.xls
oridir/DIRA/DIRA2/file1.xls
Also as follows:
$ find oridir -type f -iname *.xls
./oridir/DIRB/DIRB2/file2.xls
./oridir/DIRA/DIRA2/file1.xls
But how to create these folders and copy files. How can I achieve this selected creation of directories and copying files with `bash' in Linux?
Edit: There are space also in some file and directory names.

cp's --parents flag makes use full source file name under DIRECTORY
For example, if recursive glob ** is enabled (shopt -s globstar):
cp --parents origin/**/*.xls target
If recursive glob is not enabled, you have to add a wildcard for each level on directory hierarchy:
cp --parents origin/*/*/*.xls target

If a destination dir is "dest".
foo.sh
#!/bin/bash
dest=./dest
find . -type f -name "*.xls" | while read f
do
d=$(dirname "${f}")
d="${dest}/${d}"
mkdir -p "${d}"
cp "${f}" "${d}"
done
Make dirs and files.
$ mkdir -p DIRA/DIRA1
$ mkdir -p DIRA/DIRA2
$ mkdir -p DIRB/DIRB1
$ mkdir -p DIRB/DIRB2
$ touch DIRA/DIRA1/file2.txt
$ touch DIRA/DIRA2/file1.xls
$ touch DIRA/DIRA2/file1.txt
$ touch DIRB/DIRB1/file1.txt
$ touch DIRB/DIRB1/file2.xls
A result is
$ find dest
dest
dest/DIRB
dest/DIRB/DIRB1
dest/DIRB/DIRB1/file2.xls
dest/DIRA
dest/DIRA/DIRA2
dest/DIRA/DIRA2/file1.xls

See Yuji's excellent answer first, but I think tar is also a good option here:
cd oridir; find . -name "*.xls" | xargs tar c | (cd ../newdir; tar x)
You may need to adjust oridir and/or ../newdir depending on the precise paths of your directories.
Possible improvement: Here is a version that may be better in that it will handle files (and paths) with spaces (or other strange characters) in their names, and that uses tar's own options instead of xargs and cd:
cd oridir; find . -print0 -name "*.xls" | tar -c --null -T- | tar -C ../newdir -x
Explanation:
The -print0 and the --null cause the respective commands to separate filenames by the null (ASCII 0) character only.
-T- causes tar to read filenames from standard input.
-C causes tar to cd before extracting.

How to run a command on file in directory recursively creating a new file in each directory being visited (Bash)?

I have a directory with a few sub-directories.
.
├── alembic
│   └── results.csv
├── bigdata
│   └── results.csv
├── calchipan
│   └── results.csv
I'd like to create a copy of each results.csv file in the same directory named results_cleaned.csv where certain lines will be removed. Each sub-directory is known to contain only a single file, results.csv.
Running this on a single directory works:
find . -name 'results.csv' | xargs grep -v "Pattern" > results_cleaned.csv
However, running the same command on a root, produces just a single results_cleaned.csv file. I understand that I have to specify that I want to create a file in each sub directory individually, how do I do this?

You can't use xarg since all values are feed to single grep.
It would be better to iterate over each line of find (file names):
find . -name 'results.csv' | while read -r f
do
grep -v "Pattern" "$f" > "$(dirname "$f")/results_cleaned.csv"
done

Move files to parent directory of current location

I have a lot of folders that have a folder inside them, with files inside. I want to move the 2nd level files into the 1st level and do so without knowing their names.
Simple example:
Before running a script:
/temp/1stlevel/test.txt
/temp/1stlevel/2ndlevel/test.rtf
After running a script:
/temp/1stlevel/test.txt
/temp/1stlevel/test.rtf
I'm getting very close but I'm missing something and I'm sure it's simple/stupid. Here's what I'm running:
find . -mindepth 3 -type f -exec sh -c 'mv -i "$1" "${1%/*}"' sh {} \;
Here's what that's getting me:
mv: './1stlevel/2ndlevel/test.rtf' and './1stlevel/2ndlevel/test.rtf' are the same file
Any suggestions?
UPDATE: George, this is great stuff, thank you! I'm learning a lot and taking notes. Using the mv command instead of the more complicated one is brilliant. Far from the first time I've been accused of doing something the hardest way possible!
However, while it works great with 1 set of folders, if I have more, it doesn't work as intended. Here's what I mean:
Before:
new
└── temp
├── Folder1
│ ├── SubFolder1
│ │ └── SubTest1.txt
│ └── Test1.txt
├── Folder2
│ ├── SubFolder2
│ │ └── SubTest2.txt
│ └── Test2.txt
└── Folder3
├── SubFolder3
│ └── SubTest3.txt
└── Test3.txt
After:
new
└── temp
└── Folder3
├── Folder1
│ ├── SubFolder1
│ └── Test1.txt
├── Folder2
│ ├── SubFolder2
│ └── Test2.txt
├── SubFolder3
├── SubTest1.txt
├── SubTest2.txt
├── SubTest3.txt
└── Test3.txt
Desired:
new
└── temp
├── Folder1
│ ├── SubFolder1
│ ├── SubTest1.txt
│ └── Test1.txt
├── Folder2
│ ├── SubFolder2
│ ├── SubTest2.txt
│ └── Test2.txt
└── Folder3
├── SubFolder3
├── SubTest3.txt
└── Test3.txt
If one wanted to get fancy*:
new
└── temp
├── Folder1
│ ├── SubTest1.txt
│ └── Test1.txt
├── Folder2
│ ├── SubTest2.txt
│ └── Test2.txt
└── Folder3
├── SubTest3.txt
└── Test3.txt
I don't need to get fancy, though, 'cause later in my script I just remove empty folders.
BTW, that took me forever in Notepad++ to draw. What did you use?

Your find . -mindepth 3 -type f -exec sh -c 'mv -i "$1" "${1%/*}"' sh {} \;
attempt is very close to being right. 
A useful technique when debugging complex commands
is to insert echo statements to see what is happening. 
So, if we say$ find . -mindepth 3 -type f -exec sh -c 'echo mv -i "$1" "${1%/*}"' sh {} \;we get
mv -i ./Folder1/SubFolder1/SubTest1.txt ./Folder1/SubFolder1
mv -i ./Folder2/SubFolder2/SubTest2.txt ./Folder2/SubFolder2
mv -i ./Folder3/SubFolder3/SubTest3.txt ./Folder3/SubFolder3
which makes perfect sense — it’s finding all the files at depth 3 (and beyond),
stripping the last level off the pathname, and moving the file there. 
But,mv (path_to_file) (path_to_directory) means
move the file into the directory.
So the command mv -i ./Folder1/SubFolder1/SubTest1.txt ./Folder1/SubFolder1
means move Folder1/SubFolder1/SubTest1.txt into Folder1/SubFolder1 —
but that’s where it already is. 
Therefore, you got error messages saying
that you were moving a file to where it already was.
As is clear from your illustration,
you want to move SubTest1.txt into Folder1. 
One quick fix is
$ find . -mindepth 3 -type f -exec sh -c 'mv -i "$1" "${1%/*}/.."' sh {} \;
which uses .. to go up from SubFolder1 to Folder1:
mv -i ./Folder1/SubFolder1/SubTest1.txt ./Folder1/SubFolder1/..
mv -i ./Folder2/SubFolder2/SubTest2.txt ./Folder2/SubFolder2/..
mv -i ./Folder3/SubFolder3/SubTest3.txt ./Folder3/SubFolder3/..
I believe that that’s bad style, although I can’t figure out quite why. 
I would prefer
$ find . -mindepth 3 -type f -exec sh -c 'mv -i "$1" "${1%/*/*}"' sh {} \;
which uses %/*/* to remove two components from the pathname of the file
to get what you really want,
mv -i ./Folder1/SubFolder1/SubTest1.txt ./Folder1
mv -i ./Folder2/SubFolder2/SubTest2.txt ./Folder2
mv -i ./Folder3/SubFolder3/SubTest3.txt ./Folder3
You can then use
$ find . -mindepth 2 -type d –delete
to delete the empty SubFolderN directories. 
If, through some malfunction, any of them is not empty,
find will leave it alone and issue a warning message.

Let me use this example to illustrate:
Tree structure:
new
└── temp
└── 1stlevel
├── 2ndlevel
│   └── text.rtf
└── test.txt
Move with:
find . -mindepth 4 -type f -exec mv {} ./*/* \;
Result after move:
new
└── temp
└── 1stlevel
├── 2ndlevel
├── test.txt
└── text.rtf
Where you run it from matters, I am running from one folder up from the temp folder, if you want to run it from the temp folder then the command would be:
find 1stlevel/ -mindepth 2 -type f -exec mv {} ./* \;
Or:
find ./ -mindepth 3 -type f -exec mv {} ./* \;
Please look closely at the section find ./ -mindepth 3, remember that -mindepth 1 means process all files except the starting-points. So if you start from temp and are after a file in temp/1st/2nd/ then you will access it with -mindepth 3 starting at temp. Please see: man find.
Now for the destination I used ./*/*, interpretation "from current (one up from temp, mine was new) directory down to temp, then 1stlevel, so:
./: => new folder
./*: => new/temp folder
./*/*: => new/temp/1stlevel
But all that is for the find command but another trick is to use the mv command only from the new folder:
mv ./*/*/*/* ./*/*
This is run from the new folder in my example (in other words from one folder up the temp folder). Make adjustments to run it at different levels.
To run from the temp folder:
mv ./*/*/* ./*
If your bordered about time since you mentioned you had a lot of files, then the mv option beats the find option. See the time results for just three files:
find:
real 0m0.004s
user 0m0.000s
sys 0m0.000s
mv:
real 0m0.001s
user 0m0.000s
sys 0m0.000s
Update:
Since OP wants a script to access multiple folders I came with this:
#!/usr/bin/env bash
for i in ./*/*/*;
do
if [[ -d "$i" ]];
then
# Move the files to the new location
mv "$i"/* "${i%/*}/"
# Remove the empty directories
rm -rf "$i"
fi
done
How to: Run from the new folder: ./move.sh, remember to make the script executable with chmod +x move.sh.
Target directory structure:
new
├── move.sh
└── temp
├── folder1
│   ├── subfolder1
│   │   └── subtext1.txt
│   └── test1.txt
├── folder2
│   ├── subfolder2
│   │   └── subtext2.txt
│   └── test1.txt
└── folder3
├── subfolder3
│   └── subtext3.txt
└── test1.txt
Get fancy result:
new
├── move.sh
└── temp
├── folder1
│   ├── subtext1.txt
│   └── text1.txt
├── folder2
│   ├── subtext2.txt
│   └── text2.txt
└── folder3
├── subtext3.txt
└── text3.txt

mv YOUR-FILE-NAME ../
Ii thould work this way if u have writing permissions

Have your script navigate to each directory where you need the files moved "up," then you can have find find each file in the directory, then move them up one directory:
$ find . -type f -exec mv {} ../. \;

bash script to rename following a pattern in subdirectories and make a copy

I am trying to do an iterative renaming of certain files in all directories.
homefolder/folder1/ouput/XXXXX_ab.png
homefolder/folder1/ouput/XXXXX_abcdefg.png
homefolder/folder2/ouput/XXXXX_ab.png
homefolder/folder2/ouput/XXXXX_abcdefg.png
homefolder/folder3/ouput/XXXXX_ab.png
homefolder/folder3/ouput/XXXXX_abcdefg.png
...
homefolder/folder500/ouput/XXXXX_ab.png
homefolder/folder500/ouput/XXXXX_abcdefg.png
I want to get the folder name (ex. folder1, folder2, ... folder500) and pass it to the two png files as a prefix and remove those five Xs at the beginning of each file.
The pattern of those png files are:
XXXXX_ab.png
XXXXX_abcdrfg.png
so only the first five characters are different in each subdirectory, which will be replaced by the name of its parent directory, those folder names.
the results will be:
homefolder/folder1/ouput/folder1_ab.png
homefolder/folder1/ouput/folder1_abcdefg.png
homefolder/folder2/ouput/folder2_ab.png
homefolder/folder2/ouput/folder2_abcdefg.png
homefolder/folder3/ouput/folder3_ab.png
homefolder/folder3/ouput/folder3_abcdefg.png
...
homefolder/folder500/ouput/folder500_ab.png
homefolder/folder500/ouput/folder500_abcdefg.png
at the end of renaming, create a copy of these two newly renamed files inside another folder in the homefolder, for example all_png_folder.
find . -iname "*_ab.png" -exec rename _ab.png folder1_ab.png '{}' \;
find . -name "*_ab.png" -exec cp {} ./all_png_folder \;

Here is a start, the copying at the end should be a trivial addition.
#!/usr/bin/env bash
files=$(find . -type f -name "*_ab.png" -or -name "*_abcdefg.png")
for file in $files; do
foldername=$(cut -d '/' -f 2 <<< $file)
# The name of the png-file minus the leading xxxxxx
pngfile=$(basename "$file" | cut -d '_' -f 2)
destinationdir=$(dirname "$file")
mv $file "$destinationdir/$foldername"'_'"$pngfile"
done
Demo
$ tree
.
├── folder1
│   └── ouput
│   ├── foo_bar.png
│   ├── xxxxx_abcdefg.png
│   └── xxxxx_ab.png
├── folder2
│   └── ouput
│   ├── xxxxx_abcdefg.png
│   └── xxxxx_ab.png
└── rename.sh
4 directories, 6 files
$ ./rename.sh
$ tree
.
├── folder1
│   └── ouput
│   ├── folder1_abcdefg.png
│   ├── folder1_ab.png
│   └── foo_bar.png
├── folder2
│   └── ouput
│   ├── folder2_abcdefg.png
│   └── folder2_ab.png
└── rename.sh

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Recursively scan directories in bash - linux

find . -type f -exec md5sum {} \; or... find . -type f | xargs -n 1 -d "\n" md5sum

Related

Find files missing in one directory that are in a second - needs to ignore file extension

How to copy folders and subfolders which have selected files

How to run a command on file in directory recursively creating a new file in each directory being visited (Bash)?

Move files to parent directory of current location

bash script to rename following a pattern in subdirectories and make a copy

Categories

Resources