Sorting 271,568 Files in Bash Based on File Names

Sorting 271,568 Files in Bash Based on File Names - linux

I have a collection of 271,568 files I need to sort, all in the same directory. Luckily, they are all conveniently named based on what folder they should be in.
For example, a small portion of the files might look like:
.
├── file.sort.shamwow
├── file.sort.shamwow.abc
├── file.sort.shamwow.example.alsoafile
├── file.sort.shamwow.example.file
├── foo.bar
├── foo.bar.a
├── foo.bar.b
├── foo.lel
├── foo.wow.a.50
└── foo.wow.b
When they are finished being sorted, they should look like:
.
├── file
│   └── sort
│   └── shamwow
│   ├── example
│   │   ├── file.sort.shamwow.example.alsoafile
│   │   └── file.sort.shamwow.example.file
│   ├── file.sort.shamwow
│   └── file.sort.shamwow.abc
└── foo
├── bar
│   ├── foo.bar
│   ├── foo.bar.a
│   └── foo.bar.b
├── foo.lel
└── wow
├── foo.wow.a.50
└── foo.wow.b
So that file foo.wow.a.50 would be placed inside of directory wow that is inside of directory foo, and so on for all the files.
The program I want would sort the files based on where the dots are into directories. However, if there is only one file in that folder (ex. foo/wow/a.50) then it won't create a new folder just for that file.
Right now, my half-functional script looks like:
#!/bin/bash
#organization for gigantic folder
> foo.txt
for f in *; do
d=${f:3}
d=${d%%.*}
d=${d%%.*}
echo $d
if grep -Fxq "$d" foo.txt
then
mkdir -p $d
mv $f $d
else
echo $d >> foo.txt
fi
done
rm foo.txt
But it doesn't really work that well.
Can someone either fix my code, or make their own to sort this mess? Thanks!

Ignoring that your requested output is impossible to represent on a filesystem (requires the same names to refer to both a file and a directory):
#!/bin/bash
# ^^^^- must be bash shebang, must be shell 4.0 or newer
# first pass: count directory references
declare -A refcounts=( )
for f in *; do
f_part=$f
while [[ $f_part = *.* ]]; do
refcounts[$f_part]=$(( ${refcounts[$f_part]} + 1 ))
f_part=${f_part%.*}
done
refcounts[$f_part]=$(( ${refcounts[$f_part]} + 1 ))
done
# second pass: use that information
# ...this is some ugly code, but I don't have the time right now to make it simpler.
for f in *; do
f_part=${f%%.*}
f_rest=${f#*.}
while : "f=$f; f_part=$f_part; f_rest=$f_rest"; do
new_piece=${f_rest%%.*}
[[ $new_piece ]] || break
f_part_next=${f_part}.$new_piece
f_rest_next=${f_rest#"$new_piece"}; f_rest_next=${f_rest_next#.}
if [[ $f_rest = *.* ]] && (( ${refcounts[${f_part_next}]:-0} > 1 )); then
f_part=$f_part_next
f_rest=$f_rest_next
else
break
fi
done
dest="${f_part//"."/"/"}/${f_rest}"
mkdir -p -- "${dest%/*}"
mv -- "$f" "$dest"
done

Related

Using bash command to copy files from a subfolder to another

I have the following structure:
.
├── dag_1
│   ├── dag
│   │   ├── current
│   │   └── deprecated
│   └── sparkjobs
│   ├── current
│      | └── spark_3.py
│   └── deprecated
│      └── spark_1.py
│      └── spark_2.py
├── dag_2
│   ├── dag
│   │   ├── current
│   │   └── deprecated
│   └── sparkjobs
│   ├── current
│      | └── spark_3.py
│   └── deprecated
│      └── spark_1.py
│      └── spark_2.py
I want to create a new folder getting only current spark jobs, my expected output folder is:
.
├── dag_1
| └── spark_3.py
├── dag_2
└── spark_3.py
I've tried to use
find /mnt/c/Users/User/Test/ -type f -wholename "sparkjob/current" | xargs -i cp {} /mnt/c/Users/User/Test/output/
Although my script is not writing the files and returns me no error. How can I solve this?

Use this, install command take the input file and copy it to another dir structure, creating the whole tree of dirs if necessary as mkdir -p transparently:
(you need to add wildcard * in -wholename to effectively find files)
find . -type f -wholename "*/sparkjob/current/*" -exec bash -c '
dir=${1#./} dir=${dir%%/*} file=${1##*/}
install -D "$1" "./$dir/$file"
' bash {} \;
Exemple of what is done:
install -D ./dag_2/sparkjob/current/spark_3.py ./dag_2/spark_3.py
install -D ./dag_1/sparkjob/current/spark_3.py ./dag_1/spark_3.py
The source path is an example, if longer, no issue.

First you should check what find returns by removing everything after |. You'll see find doesn't find any files. The reasons:
as the name implies, -wholename matches the whole name, so you need */sparkjob/current/*
according to your tree output, the folder is not named sparkjob but sparkjobs.
I'd start with something like this:
find /mnt/c/Users/User/Test/ -type f -wholename "*/sparkjobs/current/*" -print0 | while IFS= read -r -d '' file; do
echo mv "$file" "$(realpath "$(dirname "$file")"/../..)"
done
I added an echo so you can check all paths and commands are correct.
You may want to trade simplicity for performance. See https://mywiki.wooledge.org/BashFAQ/001 if performance is important (many files or frequent runs).

You'll want to do:
mkdir ../new_folder
find . -type f \
-path '*/sparkjobs/current/*' \
-exec sh -c 'f=$1
new=${f/sparkjobs\/current\//}
dest="../new_folder/$(dirname "$new")"
mkdir -p "$dest"
cp -v "$f" "$dest"' sh '{}' \;
‘./dag_1/sparkjobs/current/spark_3.py’ -> ‘../new_folder/./dag_1/spark_3.py’
‘./dag_2/sparkjobs/current/spark_3.py’ -> ‘../new_folder/./dag_2/spark_3.py’

This looks pretty straightforward.
for d in $old_loc/dag_*
do mkdir -p "$new_loc/${d##*/}"
cp "$d"/sparkjobs/current/spark_*.py "${d##*/}"
done

Recursively compressing images in a folder-structure, preserving the folder-structure

I have this folder-strucutre, with really heavy high-quality images in each subfolder
tree -d ./
Output:
./
├── 1_2dia-pedro
├── 3dia-pedro
│   ├── 101MSDCF
│   └── 102MSDCF
├── 4dia-pedro
└── Wagner
├── 410_0601
├── 411_0701
├── 412_0801
├── 413_2101
├── 414_2801
├── 415_0802
├── 416_0902
├── 417_1502
├── 418_1602
├── 419_1702
├── 420_2502
└── 421_0103
18 directories
And, I want to compress it, just like I would do with ffmpeg, or imagemagick.
e.g.,
ffmpeg -i %%F -q:v 10 processed/%%F"
mogrify -quality 3 $F.png
I'm currently think of creating a vector of the directories, using shopt, as discussed here
shopt -s globstar dotglob nullglob
printf '%q\n' **/*/
Then, create a new folder-compressed, with the same structure
mkdir folder-compressed
<<iterate the array-vector-out-of-shopt>>
Finally, compress, for each subfolder, something in the lines of
mkdir processed
for f in *.jpg;
do ffmpeg -i "$f" -q:v 1 processed/"${f%.jpg}.jpg";
done
Also, I read this question, and this procedure seems close to what I would like,
for f in mydir/**/*
do
# operations here
done
Major problem: I'm bash newbie. I know all tools needed are at my disposal!
EDIT: There is a program that, for the particular purpose of compressing images with lossless quality, gives us a a-liner, and a lot of options to this compression. The caveat: make another copy of the original folder-structure-files, because it will change them permanently in the folder-structure-files you give it.
cp -r ./<folder-structure-files> ./<folder-structure-files-copy>
image_optim -r ./<folder-structure-files-copy>
I think #m0hithreddy solution is pretty cool, though. Certainly, I will be using that logic elsewhere anytime soon.

Instead of pre-mkdiring directories, you can create the required directories on the fly. Recursion solutions look elegant to me then compared to loops. Here is a straight-forward approach. I echoed the file names and directories to keep track of whats going on. I am not ffmpeg pro, I used cp instead but should work fine for your use case.
Shell script:
source=original/
destination=compressed/
f1() {
mkdir -p ${destination}${1}
for file in `ls ${source}${1}*.jpg 2>/dev/null`
do
echo 'Original Path:' ${file}
echo 'Compressed Path:' ${destination}${1}$(basename $file) '\n'
cp ${file} ${destination}${1}$(basename $file)
done
for dir in `ls -d ${source}${1}*/ 2>/dev/null`
do
echo 'Enter sub-directory:' ${dir} '\n'
f1 ${dir#*/}
done
}
f1 ''
Terminal Session:
$ ls
original script.sh
$ tree original/
original/
├── f1
│   ├── f16
│   │   └── f12.jpg
│   ├── f5
│   │   └── t4.jpg
│   └── t3.txt
├── f2
│   └── t5.txt
├── f3
├── f4
│   └── f10
│   ├── f2
│   │   └── f6.jpg
│   └── f3.jpg
├── t1.jpg
└── t2.txt
8 directories, 8 files
$ sh script.sh
Original Path: original/t1.jpg
Compressed Path: compressed/t1.jpg
Enter sub-directory: original/f1/
Enter sub-directory: original/f1/f16/
Original Path: original/f1/f16/f12.jpg
Compressed Path: compressed/f1/f16/f12.jpg
Enter sub-directory: original/f1/f5/
Original Path: original/f1/f5/t4.jpg
Compressed Path: compressed/f1/f5/t4.jpg
Enter sub-directory: original/f2/
Enter sub-directory: original/f3/
Enter sub-directory: original/f4/
Enter sub-directory: original/f4/f10/
Original Path: original/f4/f10/f3.jpg
Compressed Path: compressed/f4/f10/f3.jpg
Enter sub-directory: original/f4/f10/f2/
Original Path: original/f4/f10/f2/f6.jpg
Compressed Path: compressed/f4/f10/f2/f6.jpg
$ tree compressed/
compressed/
├── f1
│   ├── f16
│   │   └── f12.jpg
│   └── f5
│   └── t4.jpg
├── f2
├── f3
├── f4
│   └── f10
│   ├── f2
│   │   └── f6.jpg
│   └── f3.jpg
└── t1.jpg
8 directories, 5 files

Renaming files with the same names as directory - bash script

I want to rename my files so that they are name with the same name as the folder.
I have a main folder that has around 1000 folders. each of these folders have another file within it. in that very last folder, I have files with different extentions. and I want to rename the files that have pdb extention.
here's the strcuture of my folders :
pv----|
|--m10\ pk\ result0.pdb result1.pdb result2.pdb
|--m20\ pk\ result0.pdb result1.pdb result2.pdb
|--m30\ pk\ result0.pdb result1.pdb result2.pdb
I want something like this :
pv----|
|--m10\ pk\ m10_result0.pdb m10_result1.pdb m10_result2.pdb
|--m20\ pk\ m20_result0.pdb m20_result1.pdb m20_result2.pdb
|--m30\ pk\ m30_result0.pdb m30_result1.pdb m30_result2.pdb
that's the code I made but It's not working ..
for d in MD_PR2 / * / * /
do
(cd "$d" && for file in *.pdb ; do mv "$file" "${file/result/$d_result}" ; done)
done
my code is deleting "result" of each file's name and I don't know. it becomes 0.pdb , 1.pdb ..etc
thank you very much

Before:
user#pc:~$ tree
.
├── m10
│   └── pk
│   ├── result0.pdb
│   ├── result1.pdb
│   └── result2.pdb
├── m20
│   └── pk
│   ├── result0.pdb
│   ├── result1.pdb
│   └── result2.pdb
└── m30
└── pk
├── result0.pdb
├── result1.pdb
└── result2.pdb
Your code is not working because $d_result is being interpreted as a variable name, not as a concatenation of $d and _result. I suggest using ${d}_result.
However I would suggest another approach, one that doesn't need to cd into each directory.
Code:
shopt -s globstar
for file in **; do
if [[ "$file" =~ ".pdb" ]] ; then
mv "$file" `echo $file | sed -e 's/\(.*\)\/\(.*\)\/\(.*.pdb\)/\1\/\2\/\1_\2_\3/'`;
fi;
done;
After:
user#pc:~$ tree
.
├── m10
│   └── pk
│   ├── m10_pk_result0.pdb
│   ├── m10_pk_result1.pdb
│   └── m10_pk_result2.pdb
├── m20
│   └── pk
│   ├── m20_pk_result0.pdb
│   ├── m20_pk_result1.pdb
│   └── m20_pk_result2.pdb
└── m30
└── pk
├── m30_pk_result0.pdb
├── m30_pk_result1.pdb
└── m30_pk_result2.pdb
Code explanation:
shopt -s globstar: Allow for ** to be expanded into "all files and directories recursively"
Variable "file" contains filenames including directories
Check "file" against "$file" =~ ".pdb" to ignore working with directories
Generate newfilename with sed:
Search and replace: s/search/replace/
Find something like dir1/dir2/smthg.pdb: (.*)/(.*)/(.*.pdb)
Replace with dir1/dir2/dir1_dir2_smthg.pdb: \1/\2/\1_\2_\3 (replace with \1_\2_\3 if you also want to move renamed files into parent dir)
(I removed some backslashes for readability)
mv file to newfilename

How to change all hidden folders/files to visible in a multiple sub directories

I have hundreds of sub directories in a directory that all have hidden files in them that I need to remove the period at the beginning of them to make them visible. I found a command to go into each directory and change them to make them visible but I need to know how to make this command work from one directory up.
rename 's/\.//;' .*
I have tried about an hour to modify this to work one level up but don't understand the Perl string enough to do it. If someone could help out I am sure it's simple and I just can't land on the right answer.

This requires a find that supports the + (can use \; instead, which will call rename multiple times), but even POSIX find specifies it:
find -mindepth 1 -depth -exec rename -n 's{/\.([^\/]*$)}{/$1}' {} +
The -depth option prevents directories from being renamed before all the files in them are renamed
-mindepth 1 prevents find from trying to rename the current directory, ..
-n is to just print what would be renamed instead of actually renaming (has to be removed to do the renaming).
The regular expression removes the last period after which there are no forward slashes, if it is preceded by a forward slash.
rename doesn't overwrite existing files, unless the -f ("force") option is used.
For a test directory structure like this:
.
├── .dir1
│   ├── .dir2
│   │   ├── .dir3
│   │   │   └── .file2
│   │   └── .file1
│   ├── file3
│   └── .file6
├── dir5
│   └── .file5
├── .file4
├── test1.bar
└── test1.foo
the output is
rename(./dir5/.file5, ./dir5/file5)
rename(./.file4, ./file4)
rename(./.dir1/.file6, ./.dir1/file6)
rename(./.dir1/.dir2/.file1, ./.dir1/.dir2/file1)
rename(./.dir1/.dir2/.dir3/.file2, ./.dir1/.dir2/.dir3/file2)
rename(./.dir1/.dir2/.dir3, ./.dir1/.dir2/dir3)
rename(./.dir1/.dir2, ./.dir1/dir2)
rename(./.dir1, ./dir1)
and the result after removing -n is
.
├── dir1
│   ├── dir2
│   │   ├── dir3
│   │   │   └── file2
│   │   └── file1
│   ├── file3
│   └── file6
├── dir5
│   └── file5
├── file4
├── test1.bar
└── test1.foo

safely_unhide:
#!/usr/bin/perl
use strict;
use warnings;
use File::Basename qw( fileparse );
for (#ARGV) {
my $o = $_;
my ($fn, $dir_qfn) = fileparse($_);
$fn =~ s/^\.//
or next;
my $n = "$dir_qfn/$fn";
if (stat($n)) {
warn("Skipping \"$o\": \"$n\" already exists\n");
next;
}
elsif (!$!{ENOENT}) {
warn("Skipping \"$o\": Can't stat \"$n\": $!\n");
next;
}
rename($n, $o)
or warn("Skipping \"$o\": Can't rename to \"$n\": $!\n");
}
Usage:
find -type f -exec safely_unhide {} + # Supports all file names. Requires GNU find
find -type f | xargs safely_unhide # Doesn't support newlines in file names.
find -type f -print0 | xargs -0 safely_unhide # Supports all file names.
Drop -type f and add -depth if you want to rename hidden dirs too.

Recursively remove directories inside folder on same level Linux

My structure is as follows:
├── Proj 1
│   ├── .git
│   ├── LICENSE
│   ├── README.md
│   └── example.cpp
├── Proj 2
│   ├── .git
│   ├── root_folder
│   └── README.md
├── Proj 3
│   ├── .git
│   ├── root_folder
│   └── README.md
...
Why is it when I do a rm -ri \.git it says:
rm: cannot remove `.git': No such file or directory

you could try
rm -ri */.git
(not sure that's what you want)

The semantics of rm's recursive search are not right for finding and deleting directories below the current one. The -ri flag will probably show each file beneath the .git folder right?
Happily if you are using bash, a one-liner with find will do what you need:
find . -name .git -type d -exec bash -c 'read -p "$0: Delete? " -n 1 -r && echo "" && case $REPLY in y) rm -rf "$0" ;; esac' {} \; -prune

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Sorting 271,568 Files in Bash Based on File Names - linux

Related

Using bash command to copy files from a subfolder to another

Recursively compressing images in a folder-structure, preserving the folder-structure

Renaming files with the same names as directory - bash script

How to change all hidden folders/files to visible in a multiple sub directories

Recursively remove directories inside folder on same level Linux

Categories

Resources