Moving files to subfolders based on prefix in bash - linux

I currently have a long list of files, which look somewhat like this:
Gmc_W_GCtl_E_Erz_Aue_Dl_281_heart_xerton
Gmc_W_GCtl_E_Erz_Aue_Dl_254_toe_taixwon
Gmc_W_GCtl_E_Erz_Homersdorf_Dl_201_head_xaubadan
Gmc_W_GCtl_E_Erz_Homersdorf_Dl_262_bone_bainan
Gmc_W_GCtl_E_Thur_Peuschen_Dl_261_blood_blodan
Gmc_W_GCtl_E_Thur_Peuschen_Dl_281_heart_xerton
The naming pattern all follow the same order, where I'm mainly seeking to group the files based on the part with "Aue", "Homersdorf", "Peuschen", and so forth (there are many others down the list), with the position of these keywords being always the same (e.g. they are all followed by Dl; they are all after the fifth underscore...etc.).
All the files are in the same folder, and I am trying to move these files into subfolders based on these keywords in bash, but I'm not quite certain how. Any help on this would be appreciated, thanks!

I am guessing you want something like this:
$ find . -type f | awk -F_ '{system("mkdir -p "$5"/"$6";mv "$0" "$5"/"$6)}'
This will move say Gmc_W_GCtl_E_Erz_Aue_Dl_281_heart_xerton into /Erz/Aue/Gmc_W_GCtl_E_Erz_Aue_Dl_281_heart_xerton.

Using the bash shell with a for loop.
#!/usr/bin/env bash
shopt -s nullglob
for file in Gmc*; do
[[ -d $file ]] && continue
IFS=_ read -ra dir <<< "$file"
echo mkdir -pv "${dir[4]}/${dir[5]}" || exit
echo mv -v "$file" "${dir[4]}/${dir[5]}" || exit
done
Place the script inside the directory in question make it executable and execute it.
Remove the echo's so it create the directories and move the files.

Related

I am trying to use mlocate and for loops in bash to search for a multitude of files

So in bash, if I want I can simply do(where foo is a list of paths to files):
for i in `cat foo`; do ls -lah $i; done
I have a list of files I need to search for. My thought is; why not run them through a for loop with mlocate? I could do:
for i in `cat foo`; do locate $i; done
...but is that the best way to do what I'm trying to do?
Find is SLOW and takes forever when there are millions of files and directories whereas mlocate is super quick.
If files.txt contains a list of absolute paths with a newline line terminator you can do this to ensure they all exist:
set -o errexit
mapfile -t < files.txt
for path in "${MAPFILE[#]}"
do
[[ -e "$path" ]]
done
You can then expand on this if you want to do certain things with existing/non-existing files:
if [[ -e "$path" ]]
then
…
else
…
fi
If files.txt is so huge that the list does not fit in memory you can use a much slower while read loop:
while read -r -u 9 path
do
[[ -e "$path" ]]
done 9< files.txt
If speed really is of the essence you probably want to do this in a different language, like Java or Rust.
On a technical note, mlocate is fast because it queries a pre-generated list of files on your system, but its database does not stay in sync with the actual filesystem contents automatically. Instead you need to run updatedb to populate the database with the current filesystem contents. This is usually done by a root cron job daily.
In terms of style, backticks are deprecated for $(COMMAND) and Use More Quotes™.

script to move files based on extension criteria

I've a certain amount of files always containing same name but different extensions, for example sample.dat, sample.txt, etc.
I would like to create a script that looks where sample.dat is present and than moves all files with name sample*.* into another directory.
I know how to identify them with ls *.dat | sed 's/\(.*\)\..*/\1/', however I would like to concatenate with something like || mv (the result of the first part) *.* /otherdirectory/
You can use this bash one-liner:
for f in `ls | grep YOUR_PATTERN`; do mv ${f} NEW_DESTINATION_DIRECTORY/${f}; done
It iterates through the result of the operation ls | grep, which is the list of your files you wish to move, and then it moves each file to the new destination.
Something simple like this?
dat_roots=$(ls *.dat | sed 's/\.dat$//')
for i in $dat_roots; do
echo mv ${i}*.* other-directory
done
This will break for file names containing spaces, so be careful.
Or if spaces are an issue, this will do the job, but is less readable.
ls *.dat | sed 's/\.dat$//' | while read root; do
mv "${root}"*.* other-directory
done
Not tested, but this should do the job:
shopt -s nullglob
for f in *.dat
do
mv ${f%.dat}.* other-directory
done
Setting the nullglob option ensures that the loob is not executed, if no dat-file exists. If you use this code as part of a larger script, you might want to unset it afterwards (shopt -u nullglob).

Delete files in one directory that do not exist in another directory or its child directories

I am still a newbie in shell scripting and trying to come up with a simple code. Could anyone give me some direction here. Here is what I need.
Files in path 1: /tmp
100abcd
200efgh
300ijkl
Files in path2: /home/storage
backupfile_100abcd_str1
backupfile_100abcd_str2
backupfile_200efgh_str1
backupfile_200efgh_str2
backupfile_200efgh_str3
Now I need to delete file 300ijkl in /tmp as the corresponding backup file is not present in /home/storage. The /tmp file contains more than 300 files. I need to delete the files in /tmp for which the corresponding backup files are not present and the file names in /tmp will match file names in /home/storage or directories under /home/storage.
Appreciate your time and response.
You can also approach the deletion using grep as well. You can loop though the files in /tmp checking with ls piped to grep, and deleting if there is not a match:
#!/bin/bash
[ -z "$1" -o -z "$2" ] && { ## validate input
printf "error: insufficient input. Usage: %s tmpfiles storage\n" ${0//*\//}
exit 1
}
for i in "$1"/*; do
fn=${i##*/} ## strip path, leaving filename only
## if file in backup matches filename, skip rest of loop
ls "${2}"* | grep -q "$fn" &>/dev/null && continue
printf "removing %s\n" "$i"
# rm "$i" ## remove file
done
Note: the actual removal is commented out above, test and insure there are no unintended consequences before preforming the actual delete. Call it passing the path to tmp (without trailing /) as the first argument and with /home/storage as the second argument:
$ bash scriptname /path/to/tmp /home/storage
You can solve this by
making a list of the files in /home/storage
testing each filename in /tmp to see if it is in the list from /home/storage
Given the linux+shell tags, one might use bash:
make the list of files from /home/storage an associative array
make the subscript of the array the filename
Here is a sample script to illustrate ($1 and $2 are the parameters to pass to the script, i.e., /home/storage and /tmp):
#!/bin/bash
declare -A InTarget
while read path
do
name=${path##*/}
InTarget[$name]=$path
done < <(find $1 -type f)
while read path
do
name=${path##*/}
[[ -z ${InTarget[$name]} ]] && rm -f $path
done < <(find $2 -type f)
It uses two interesting shell features:
name=${path##*/} is a POSIX shell feature which allows the script to perform the basename function without an extra process (per filename). That makes the script faster.
done < <(find $2 -type f) is a bash feature which lets the script read the list of filenames from find without making the assignments to the array run in a subprocess. Here the reason for using the feature is that if the array is updated in a subprocess, it would have no effect on the array value in the script which is passed to the second loop.
For related discussion:
Extract File Basename Without Path and Extension in Bash
Bash Script: While-Loop Subshell Dilemma
I spent some really nice time on this today because I needed to delete files which have same name but different extensions, so if anyone is looking for a quick implementation, here you go:
#!/bin/bash
# We need some reference to files which we want to keep and not delete,
 # let's assume you want to keep files in first folder with jpeg, so you
# need to map it into the desired file extension first.
FILES_TO_KEEP=`ls -1 ${2} | sed 's/\.pdf$/.jpeg/g'`
#iterate through files in first argument path
for file in ${1}/*; do
# In my case, I did not want to do anything with directories, so let's continue cycle when hitting one.
if [[ -d $file ]]; then
continue
fi
# let's omit path from the iterated file with baseline so we can compare it to the files we want to keep
NAME_WITHOUT_PATH=`basename $file`
 # I use mac which is equal to having poor quality clts
# when it comes to operating with strings,
# this should be safe check to see if FILES_TO_KEEP contain NAME_WITHOUT_PATH
if [[ $FILES_TO_KEEP == *"$NAME_WITHOUT_PATH"* ]];then
echo "Not deleting: $NAME_WITHOUT_PATH"
else
# If it does not contain file from the other directory, remove it.
echo "deleting: $NAME_WITHOUT_PATH"
rm -rf $file
fi
done
Usage: sh deleteDifferentFiles.sh path/from/where path/source/of/truth

How to loop a shell script across a specific file in all directories?

Shell Scripting sed Errors:
Cannot view /home/xx/htdocs/*/modules/forms/int.php
/bin/rm: cannot remove `/home/xx/htdocs/tmp.26758': No such file or directory
I am getting an error in my shell script. I am not sure if this for loop will work, it is intended to climb a large directory tree of PHP files and prepend a functions in every int.php file with a little validation. Don't ask me why this wasn't centralized/OO but it wasn't. I copied the script as best I could from here: http://www.cyberciti.biz/faq/unix-linux-replace-string-words-in-many-files/
#!/bin/bash
OLD="public function displayFunction(\$int)\n{"
NEW="public function displayFunction(\$int)\n{if(empty(\$int) || !is_numeric(\$int)){return '<p>Invalid ID.</p>';}"
DPATH="/home/xx/htdocs/*/modules/forms/int.php"
BPATH="/home/xx/htdocs/BAK/"
TFILE="/home/xx/htdocs/tmp.$$"
[ ! -d $BPATH ] && mkdir -p $BPATH || :
for f in $DPATH
do
if [ -f $f -a -r $f ]; then
/bin/cp -f $f $BPATH
sed "s/$OLD/$NEW/g" "$f" > $TFILE && mv $TFILE "$f"
else
echo "Error: Cannot view ${f}"
fi
done
/bin/rm $TFILE
Do wildcards like this even work? Can I check in every subdirectory across a tree like this? Do I need to precode an array and loop over that? How would I go about doing this?
Also is, the $ in the PHP code breaking the script at all?
I am terribly confused.
Problems in your code
You cannot use sed to replace multiple lines this way.
you are using / in OLD which is used in a s/// sed command. This won't work
[ ! -d $BPATH ] && mkdir -p $BPATH || : is horrible. use mkdir -p "$bpath" 2>/dev/null
Yes, wildcards like this will work but only because your string has no spaces
Doube-quote your variables, or your code will be very dangerous
Single quote your strings or you won't understand what you are escaping
Do not use capital variable names, you could accidentally replace a bash inner variable
do not rm a file that does not exist
Your backups will be overwritten as all files are named int.php
Assuming you are using GNU sed, I'm not used to other sed flavors.
If you are not using GNU sed, replacing the \n with a newline (inside the string) should work.
Fixed Code
#!/usr/bin/env bash
old='public function displayFunction(\$int)\n{'
old=${old//,/\\,} # escaping eventual commas
# the \$ is for escaping the sed-special meaning of $ in the search field
new='public function displayFunction($int)\n{if(empty($int) || !is_numeric($int)){return "<p>Invalid ID.</p>";}\n'
new=${new//,/\\,} # escaping eventual commas
dpath='/home/xx/htdocs/*/modules/forms/int.php'
for f in $dpath; do
[ -r "$f" ]; then
sed -i.bak ':a;N;$!ba;'"s,$old,$new,g" "$f"
else
echo "Error: Cannot view $f" >&2
fi
done
Links
Replace newline in sed
Inplace replace with sed with a backup
Using a different sed substitution separator
Existency not necessary if readable
Bash search and replace inside a variable
Bash guide

Linux: Move 1 million files into prefix-based created Folders

I have a directory called "images" filled with about one million images. Yep.
I want to write a shell command to rename all of those images into the following format:
original: filename.jpg
new: /f/i/l/filename.jpg
Any suggestions?
Thanks,
Dan
for i in *.*; do mkdir -p ${i:0:1}/${i:1:1}/${i:2:1}/; mv $i ${i:0:1}/${i:1:1}/${i:2:1}/; done;
The ${i:0:1}/${i:1:1}/${i:2:1} part could probably be a variable, or shorter or different, but the command above gets the job done. You'll probably face performance issues but if you really want to use it, narrow the *.* to fewer options (a*.*, b*.* or what fits you)
edit: added a $ before i for mv, as noted by Dan
You can generate the new file name using, e.g., sed:
$ echo "test.jpg" | sed -e 's/^\(\(.\)\(.\)\(.\).*\)$/\2\/\3\/\4\/\1/'
t/e/s/test.jpg
So, you can do something like this (assuming all the directories are already created):
for f in *; do
mv -i "$f" "$(echo "$f" | sed -e 's/^\(\(.\)\(.\)\(.\).*\)$/\2\/\3\/\4\/\1/')"
done
or, if you can't use the bash $( syntax:
for f in *; do
mv -i "$f" "`echo "$f" | sed -e 's/^\(\(.\)\(.\)\(.\).*\)$/\2\/\3\/\4\/\1/'`"
done
However, considering the number of files, you may just want to use perl as that's a lot of sed and mv processes to spawn:
#!/usr/bin/perl -w
use strict;
# warning: untested
opendir DIR, "." or die "opendir: $!";
my #files = readdir(DIR); # can't change dir while reading: read in advance
closedir DIR;
foreach my $f (#files) {
(my $new_name = $f) =~ s!^((.)(.)(.).*)$!$2/$3/$4/$1/;
-e $new_name and die "$new_name already exists";
rename($f, $new_name);
}
That perl is surely limited to same-filesystem only, though you can use File::Copy::move to get around that.
You can do it as a bash script:
#!/bin/bash
base=base
mkdir -p $base/shorts
for n in *
do
if [ ${#n} -lt 3 ]
then
mv $n $base/shorts
else
dir=$base/${n:0:1}/${n:1:1}/${n:2:1}
mkdir -p $dir
mv $n $dir
fi
done
Needless to say, you might need to worry about spaces and the files with short names.
I suggest a short python script. Most shell tools will balk at that much input (though xargs may do the trick). Will update with example in a sec.
#!/usr/bin/python
import os, shutil
src_dir = '/src/dir'
dest_dir = '/dest/dir'
for fn in os.listdir(src_dir):
os.makedirs(dest_dir+'/'+fn[0]+'/'+fn[1]+'/'+fn[2]+'/')
shutil.copyfile(src_dir+'/'+fn, dest_dir+'/'+fn[0]+'/'+fn[1]+'/'+fn[2]+'/'+fn)
Any of the proposed solutions which use a wildcard syntax in the shell will likely fail due to the sheer number of files you have. Of the current proposed solutions, the perl one is probably the best.
However, you can easily adapt any of the shell script methods to deal with any number of files thus:
ls -1 | \
while read filename
do
# insert the loop body of your preference here, operating on "filename"
done
I would still use perl, but if you're limited to only having simple unix tools around, then combining one of the above shell solutions with a loop like I've shown should get you there. It'll be slow, though.

Resources