Linux: Move 1 million files into prefix-based created Folders - linux

I have a directory called "images" filled with about one million images. Yep.
I want to write a shell command to rename all of those images into the following format:
original: filename.jpg
new: /f/i/l/filename.jpg
Any suggestions?
Thanks,
Dan

for i in *.*; do mkdir -p ${i:0:1}/${i:1:1}/${i:2:1}/; mv $i ${i:0:1}/${i:1:1}/${i:2:1}/; done;
The ${i:0:1}/${i:1:1}/${i:2:1} part could probably be a variable, or shorter or different, but the command above gets the job done. You'll probably face performance issues but if you really want to use it, narrow the *.* to fewer options (a*.*, b*.* or what fits you)
edit: added a $ before i for mv, as noted by Dan

You can generate the new file name using, e.g., sed:
$ echo "test.jpg" | sed -e 's/^\(\(.\)\(.\)\(.\).*\)$/\2\/\3\/\4\/\1/'
t/e/s/test.jpg
So, you can do something like this (assuming all the directories are already created):
for f in *; do
mv -i "$f" "$(echo "$f" | sed -e 's/^\(\(.\)\(.\)\(.\).*\)$/\2\/\3\/\4\/\1/')"
done
or, if you can't use the bash $( syntax:
for f in *; do
mv -i "$f" "`echo "$f" | sed -e 's/^\(\(.\)\(.\)\(.\).*\)$/\2\/\3\/\4\/\1/'`"
done
However, considering the number of files, you may just want to use perl as that's a lot of sed and mv processes to spawn:
#!/usr/bin/perl -w
use strict;
# warning: untested
opendir DIR, "." or die "opendir: $!";
my #files = readdir(DIR); # can't change dir while reading: read in advance
closedir DIR;
foreach my $f (#files) {
(my $new_name = $f) =~ s!^((.)(.)(.).*)$!$2/$3/$4/$1/;
-e $new_name and die "$new_name already exists";
rename($f, $new_name);
}
That perl is surely limited to same-filesystem only, though you can use File::Copy::move to get around that.

You can do it as a bash script:
#!/bin/bash
base=base
mkdir -p $base/shorts
for n in *
do
if [ ${#n} -lt 3 ]
then
mv $n $base/shorts
else
dir=$base/${n:0:1}/${n:1:1}/${n:2:1}
mkdir -p $dir
mv $n $dir
fi
done
Needless to say, you might need to worry about spaces and the files with short names.

I suggest a short python script. Most shell tools will balk at that much input (though xargs may do the trick). Will update with example in a sec.
#!/usr/bin/python
import os, shutil
src_dir = '/src/dir'
dest_dir = '/dest/dir'
for fn in os.listdir(src_dir):
os.makedirs(dest_dir+'/'+fn[0]+'/'+fn[1]+'/'+fn[2]+'/')
shutil.copyfile(src_dir+'/'+fn, dest_dir+'/'+fn[0]+'/'+fn[1]+'/'+fn[2]+'/'+fn)

Any of the proposed solutions which use a wildcard syntax in the shell will likely fail due to the sheer number of files you have. Of the current proposed solutions, the perl one is probably the best.
However, you can easily adapt any of the shell script methods to deal with any number of files thus:
ls -1 | \
while read filename
do
# insert the loop body of your preference here, operating on "filename"
done
I would still use perl, but if you're limited to only having simple unix tools around, then combining one of the above shell solutions with a loop like I've shown should get you there. It'll be slow, though.

Related

Moving files to subfolders based on prefix in bash

I currently have a long list of files, which look somewhat like this:
Gmc_W_GCtl_E_Erz_Aue_Dl_281_heart_xerton
Gmc_W_GCtl_E_Erz_Aue_Dl_254_toe_taixwon
Gmc_W_GCtl_E_Erz_Homersdorf_Dl_201_head_xaubadan
Gmc_W_GCtl_E_Erz_Homersdorf_Dl_262_bone_bainan
Gmc_W_GCtl_E_Thur_Peuschen_Dl_261_blood_blodan
Gmc_W_GCtl_E_Thur_Peuschen_Dl_281_heart_xerton
The naming pattern all follow the same order, where I'm mainly seeking to group the files based on the part with "Aue", "Homersdorf", "Peuschen", and so forth (there are many others down the list), with the position of these keywords being always the same (e.g. they are all followed by Dl; they are all after the fifth underscore...etc.).
All the files are in the same folder, and I am trying to move these files into subfolders based on these keywords in bash, but I'm not quite certain how. Any help on this would be appreciated, thanks!
I am guessing you want something like this:
$ find . -type f | awk -F_ '{system("mkdir -p "$5"/"$6";mv "$0" "$5"/"$6)}'
This will move say Gmc_W_GCtl_E_Erz_Aue_Dl_281_heart_xerton into /Erz/Aue/Gmc_W_GCtl_E_Erz_Aue_Dl_281_heart_xerton.
Using the bash shell with a for loop.
#!/usr/bin/env bash
shopt -s nullglob
for file in Gmc*; do
[[ -d $file ]] && continue
IFS=_ read -ra dir <<< "$file"
echo mkdir -pv "${dir[4]}/${dir[5]}" || exit
echo mv -v "$file" "${dir[4]}/${dir[5]}" || exit
done
Place the script inside the directory in question make it executable and execute it.
Remove the echo's so it create the directories and move the files.

I am trying to use mlocate and for loops in bash to search for a multitude of files

So in bash, if I want I can simply do(where foo is a list of paths to files):
for i in `cat foo`; do ls -lah $i; done
I have a list of files I need to search for. My thought is; why not run them through a for loop with mlocate? I could do:
for i in `cat foo`; do locate $i; done
...but is that the best way to do what I'm trying to do?
Find is SLOW and takes forever when there are millions of files and directories whereas mlocate is super quick.
If files.txt contains a list of absolute paths with a newline line terminator you can do this to ensure they all exist:
set -o errexit
mapfile -t < files.txt
for path in "${MAPFILE[#]}"
do
[[ -e "$path" ]]
done
You can then expand on this if you want to do certain things with existing/non-existing files:
if [[ -e "$path" ]]
then
…
else
…
fi
If files.txt is so huge that the list does not fit in memory you can use a much slower while read loop:
while read -r -u 9 path
do
[[ -e "$path" ]]
done 9< files.txt
If speed really is of the essence you probably want to do this in a different language, like Java or Rust.
On a technical note, mlocate is fast because it queries a pre-generated list of files on your system, but its database does not stay in sync with the actual filesystem contents automatically. Instead you need to run updatedb to populate the database with the current filesystem contents. This is usually done by a root cron job daily.
In terms of style, backticks are deprecated for $(COMMAND) and Use More Quotes™.

linux shell command mv many files

I have many files like 1a1, 2a2, 3a3 and I want to mv the file names to 1b1, 2b2, 3b3. That means to replace 'a' to 'b' in these file names.
I have tried the command like:
for f in */*; do
mv "$f" "${f/a/b}"
done
ls | xargs -i mv {} ${{}/a/b}
ls | xargs -i mv {} \`echo {}|tr -t 'a' 'b'\`
but none works.
I know a command
rename 'a' 'b' *
can work.
But I still want to figure out how to use for, xargs involved with other cmds to do this work. After all, in every day use, they are much general than simple rename command.
Please help me, thanks.
#!/bin/bash
for old in *
do new=$(echo "$old" | sed -e 's/a/b/')
echo mv "$old" "$new" &>2
mv "$old" "$new"
done
This example will allow you to guess more complex name transformations as you learn how to use sed(1) command to do the name transformations.
The program walks all the command line parameters to the for loop, in each loop, the program gets a new variable new with the transformation of the original $old name. Then you only have to execute the command with the old and new values.
Just in case you want to know with rename:
rename 's/(.*)a(.*)/$1b$2/' *

awk system syntax throwing error

I'm trying to move all my files in my directory individually to a new directory phonex and rename them at the same time phonex.txt.
so e.g.
1.txt, 2.txt, jim.txt
become:
phone1.txt in directory phone1
phone2.txt in directory phone2
phone3.txt in directory phone3.
I'm a newbie to awk, but I have managed to create the directories, but I cannot get the rename right.
I have tried:
ls|xargs -n1|awk ' {i++;system("mkdir phone"i);system("mv "$0" phone”i ”.txt -t phone"i)}'
which errors with lots of:
mv: cannot stat `phone”i': No such file or directory
and:
ls|xargs -n1|awk ' {i++;system("mkdir phone"i);system("mv "$0" phone”i ”/phone"i”.txt”)}'
error:
awk: 1: unexpected character 0xe2
xargs: /bin/echo: terminated by signal 13
Can anyone help me finish it off? TIA!
Piping ls into xargs into awk is completely unnecessary in this scenario. What you are trying to do can be accomplished using a simple loop:
for f in *.txt; do
i=$((i+1))
dir="phone$i"
mkdir "$dir" && mv "$f" "$dir/$dir.txt"
done
Depending on your shell, the increment of $i can be done in a different way (like ((++i)) in bash) but this way is POSIX-compliant so should work on any modern shell.
By the way, the reason for your original error is that you are using curly quotes ”, which are not understood by the shell. You should always only use single ' and double " in the shell.
If you wanted it with incrementing numbers.
Else Tom Fenech's way is the way to go !
for i in *.txt;do d=phone$((++X));mkdir "$d"; mv "$i" "$d/$d.txt";done
Also you may want to set x to zero x=0 before doing this in case it is already set as something else

Move files and rename - one-liner

I'm encountering many files with the same content and the same name on some of my servers. I need to quarantine these files for analysis so I can't just remove the duplicates. The OS is Linux (centos and ubuntu).
I enumerate the file names and locations and put them into a text file.
Then I do a for statement to move the files to quarantine.
for file in $(cat bad-stuff.txt); do mv $file /quarantine ;done
The problem is that they have the same file name and I just need to add something unique to the filename to get it to save properly. I'm sure it's something simple but I'm not good with regex. Thanks for the help.
Since you're using Linux, you can take advantage of GNU mv's --backup.
while read -r file
do
mv --backup=numbered "$file" "/quarantine"
done < "bad-stuff.txt"
Here's an example that shows how it works:
$ cat bad-stuff.txt
./c/foo
./d/foo
./a/foo
./b/foo
$ while read -r file; do mv --backup=numbered "$file" "./quarantine"; done < "bad-stuff.txt"
$ ls quarantine/
foo foo.~1~ foo.~2~ foo.~3~
$
I'd use this
for file in $(cat bad-stuff.txt); do mv $file /quarantine/$file.`date -u +%s%N`; done
You'll get everyfile with a timestamp appended (in nanoseconds).
You can create a new file name composed by the directory and the filename. Thus you can add one more argument in your original code:
for ...; do mv $file /quarantine/$(echo $file | sed 's:/:_:g') ; done
Please note that you should replace the _ with a proper character which is special enough.

Resources