I am trying to use mlocate and for loops in bash to search for a multitude of files - linux

So in bash, if I want I can simply do(where foo is a list of paths to files):
for i in `cat foo`; do ls -lah $i; done
I have a list of files I need to search for. My thought is; why not run them through a for loop with mlocate? I could do:
for i in `cat foo`; do locate $i; done
...but is that the best way to do what I'm trying to do?
Find is SLOW and takes forever when there are millions of files and directories whereas mlocate is super quick.

If files.txt contains a list of absolute paths with a newline line terminator you can do this to ensure they all exist:
set -o errexit
mapfile -t < files.txt
for path in "${MAPFILE[#]}"
do
[[ -e "$path" ]]
done
You can then expand on this if you want to do certain things with existing/non-existing files:
if [[ -e "$path" ]]
then
…
else
…
fi
If files.txt is so huge that the list does not fit in memory you can use a much slower while read loop:
while read -r -u 9 path
do
[[ -e "$path" ]]
done 9< files.txt
If speed really is of the essence you probably want to do this in a different language, like Java or Rust.
On a technical note, mlocate is fast because it queries a pre-generated list of files on your system, but its database does not stay in sync with the actual filesystem contents automatically. Instead you need to run updatedb to populate the database with the current filesystem contents. This is usually done by a root cron job daily.
In terms of style, backticks are deprecated for $(COMMAND) and Use More Quotes™.

Related

Moving files to subfolders based on prefix in bash

I currently have a long list of files, which look somewhat like this:
Gmc_W_GCtl_E_Erz_Aue_Dl_281_heart_xerton
Gmc_W_GCtl_E_Erz_Aue_Dl_254_toe_taixwon
Gmc_W_GCtl_E_Erz_Homersdorf_Dl_201_head_xaubadan
Gmc_W_GCtl_E_Erz_Homersdorf_Dl_262_bone_bainan
Gmc_W_GCtl_E_Thur_Peuschen_Dl_261_blood_blodan
Gmc_W_GCtl_E_Thur_Peuschen_Dl_281_heart_xerton
The naming pattern all follow the same order, where I'm mainly seeking to group the files based on the part with "Aue", "Homersdorf", "Peuschen", and so forth (there are many others down the list), with the position of these keywords being always the same (e.g. they are all followed by Dl; they are all after the fifth underscore...etc.).
All the files are in the same folder, and I am trying to move these files into subfolders based on these keywords in bash, but I'm not quite certain how. Any help on this would be appreciated, thanks!
I am guessing you want something like this:
$ find . -type f | awk -F_ '{system("mkdir -p "$5"/"$6";mv "$0" "$5"/"$6)}'
This will move say Gmc_W_GCtl_E_Erz_Aue_Dl_281_heart_xerton into /Erz/Aue/Gmc_W_GCtl_E_Erz_Aue_Dl_281_heart_xerton.
Using the bash shell with a for loop.
#!/usr/bin/env bash
shopt -s nullglob
for file in Gmc*; do
[[ -d $file ]] && continue
IFS=_ read -ra dir <<< "$file"
echo mkdir -pv "${dir[4]}/${dir[5]}" || exit
echo mv -v "$file" "${dir[4]}/${dir[5]}" || exit
done
Place the script inside the directory in question make it executable and execute it.
Remove the echo's so it create the directories and move the files.

Delete files in one directory that do not exist in another directory or its child directories

I am still a newbie in shell scripting and trying to come up with a simple code. Could anyone give me some direction here. Here is what I need.
Files in path 1: /tmp
100abcd
200efgh
300ijkl
Files in path2: /home/storage
backupfile_100abcd_str1
backupfile_100abcd_str2
backupfile_200efgh_str1
backupfile_200efgh_str2
backupfile_200efgh_str3
Now I need to delete file 300ijkl in /tmp as the corresponding backup file is not present in /home/storage. The /tmp file contains more than 300 files. I need to delete the files in /tmp for which the corresponding backup files are not present and the file names in /tmp will match file names in /home/storage or directories under /home/storage.
Appreciate your time and response.
You can also approach the deletion using grep as well. You can loop though the files in /tmp checking with ls piped to grep, and deleting if there is not a match:
#!/bin/bash
[ -z "$1" -o -z "$2" ] && { ## validate input
printf "error: insufficient input. Usage: %s tmpfiles storage\n" ${0//*\//}
exit 1
}
for i in "$1"/*; do
fn=${i##*/} ## strip path, leaving filename only
## if file in backup matches filename, skip rest of loop
ls "${2}"* | grep -q "$fn" &>/dev/null && continue
printf "removing %s\n" "$i"
# rm "$i" ## remove file
done
Note: the actual removal is commented out above, test and insure there are no unintended consequences before preforming the actual delete. Call it passing the path to tmp (without trailing /) as the first argument and with /home/storage as the second argument:
$ bash scriptname /path/to/tmp /home/storage
You can solve this by
making a list of the files in /home/storage
testing each filename in /tmp to see if it is in the list from /home/storage
Given the linux+shell tags, one might use bash:
make the list of files from /home/storage an associative array
make the subscript of the array the filename
Here is a sample script to illustrate ($1 and $2 are the parameters to pass to the script, i.e., /home/storage and /tmp):
#!/bin/bash
declare -A InTarget
while read path
do
name=${path##*/}
InTarget[$name]=$path
done < <(find $1 -type f)
while read path
do
name=${path##*/}
[[ -z ${InTarget[$name]} ]] && rm -f $path
done < <(find $2 -type f)
It uses two interesting shell features:
name=${path##*/} is a POSIX shell feature which allows the script to perform the basename function without an extra process (per filename). That makes the script faster.
done < <(find $2 -type f) is a bash feature which lets the script read the list of filenames from find without making the assignments to the array run in a subprocess. Here the reason for using the feature is that if the array is updated in a subprocess, it would have no effect on the array value in the script which is passed to the second loop.
For related discussion:
Extract File Basename Without Path and Extension in Bash
Bash Script: While-Loop Subshell Dilemma
I spent some really nice time on this today because I needed to delete files which have same name but different extensions, so if anyone is looking for a quick implementation, here you go:
#!/bin/bash
# We need some reference to files which we want to keep and not delete,
 # let's assume you want to keep files in first folder with jpeg, so you
# need to map it into the desired file extension first.
FILES_TO_KEEP=`ls -1 ${2} | sed 's/\.pdf$/.jpeg/g'`
#iterate through files in first argument path
for file in ${1}/*; do
# In my case, I did not want to do anything with directories, so let's continue cycle when hitting one.
if [[ -d $file ]]; then
continue
fi
# let's omit path from the iterated file with baseline so we can compare it to the files we want to keep
NAME_WITHOUT_PATH=`basename $file`
 # I use mac which is equal to having poor quality clts
# when it comes to operating with strings,
# this should be safe check to see if FILES_TO_KEEP contain NAME_WITHOUT_PATH
if [[ $FILES_TO_KEEP == *"$NAME_WITHOUT_PATH"* ]];then
echo "Not deleting: $NAME_WITHOUT_PATH"
else
# If it does not contain file from the other directory, remove it.
echo "deleting: $NAME_WITHOUT_PATH"
rm -rf $file
fi
done
Usage: sh deleteDifferentFiles.sh path/from/where path/source/of/truth

Centos copy file into another file, if exists, create a version

Does anyone know of a way to (via bash) setup a "versioning" copy of a file into another? For example: I am copying file into file.bak. If file.bak exists, I am currently overwriting. What I'd like to do is set it up so that it creates multiple files: file, file.bak, file.bak.1, file.bak.2, etc...
Right now, I'm using:
cp -rf file file.bak
This currently overwrites the file(as expected)
or:
cp --backup=t file1 file2
repeat few times to see the result...
see https://www.gnu.org/software/coreutils/manual/html_node/cp-invocation.html
Simply use a test
[ -e file.bak ] && cp -r file file.bak.$(date +%s) || cp -r file file.bak
This will create a unique backup if file.bak already exists in the form file.bak.1411505497
There are many ways to skin this cat.
Since you're using Linux, it's likely you've got the GNU mv command, which may include a --backup option. You could wrap this in a shell function:
bkp() {
file="$1"
if [ -f "$file" ]; then
/bin/mv -v --backup=numbered "$(mktemp ${file}XXX)" "$file"
#/bin/rm "$file"
fi
}
You can put this in your .bashrc, for example. Then you can use this as follows:
# bkp foo
This will copy foo to numbered backup files. You can uncomment the rm if this is, for example, a log file that you're rotating.
Another option, which is more portable to operating systems that don't use GNU tools (i.e. FreeBSD, OSX) might be something like this quick-and-dirty solution might work:
bkp() {
file="$1"
if [ -f "$file" ]; then
# increment existing files up to 10
for n in {9..1}; do
if [ -f $file.$n ]; then
# remove -v if you want less noise.
mv -v "${file}.$n" "${file}.$[n+1]"
fi
done
# move the original to first backup position
mv "$file" "$file.1"
else
echo "Not found: $file" >&2
fi
}
It suffers in that it won't compact your list of files (and will throw errors) if some numbers are missing, but that's stuff you can add if it's important. You'd use it pretty much the same way, changing the final mv to a cp if you need to keep the original in place.
Final option I'll mention is in comments as well. Since you've said that you're using this solution to back up "system files" (which I assume you mean to be things in /etc/) you should consider using an actual version control system to control your versions of these files.
Many options exist, but I'd recommend RCS for its simplicity and low overhead. Simply install the package, mkdir /etc/RCS to keep your /etc directory clean, read the man pages for rcs, ci, co, rlog, rcsdiff and perhaps rcsintro, and you're good to go. You'll get better control of diffs and history, opportunity for comments, none of the overhead of a repository for a large VCS like SVN or Git. I've been using this on various servers for years, as RCS is still built in to the base system in FreeBSD. :)

Shell script best way to remove files not in a pair

I have a set of files that come in pairs:
/var/log/messages-20111001
/var/log/messages-20111001.hash
I've had several of these rotate away and now I'm left with a ton of /var/log/messages-201110xx.hash files with no associated log. I'd like to clean up the mess, but I'm uncertain how to remove a file that isn't part of a "pair". I can use bash or zsh (or any LSB tool, really). I need to remove all the .hash files that don't have an associated log.
Example
/var/log/messages-20111001.hash
/var/log/messages-20111002.hash
/var/log/messages-20111003.hash
/var/log/messages-20111004.hash
/var/log/messages-20111005
/var/log/messages-20111005.hash
/var/log/messages-20111006
/var/log/messages-20111006.hash
Should be reduced to:
/var/log/messages-20111005
/var/log/messages-20111005.hash
/var/log/messages-20111006
/var/log/messages-20111006.hash
for file in *.hash; do test -f "${file%.hash}" || rm -- "$file"; done
Something like this?
for f in /var/log/messages-????????.hash ; do
[[ -e "${f%.hash}" ]] || rm "$f"
done

Linux: Move 1 million files into prefix-based created Folders

I have a directory called "images" filled with about one million images. Yep.
I want to write a shell command to rename all of those images into the following format:
original: filename.jpg
new: /f/i/l/filename.jpg
Any suggestions?
Thanks,
Dan
for i in *.*; do mkdir -p ${i:0:1}/${i:1:1}/${i:2:1}/; mv $i ${i:0:1}/${i:1:1}/${i:2:1}/; done;
The ${i:0:1}/${i:1:1}/${i:2:1} part could probably be a variable, or shorter or different, but the command above gets the job done. You'll probably face performance issues but if you really want to use it, narrow the *.* to fewer options (a*.*, b*.* or what fits you)
edit: added a $ before i for mv, as noted by Dan
You can generate the new file name using, e.g., sed:
$ echo "test.jpg" | sed -e 's/^\(\(.\)\(.\)\(.\).*\)$/\2\/\3\/\4\/\1/'
t/e/s/test.jpg
So, you can do something like this (assuming all the directories are already created):
for f in *; do
mv -i "$f" "$(echo "$f" | sed -e 's/^\(\(.\)\(.\)\(.\).*\)$/\2\/\3\/\4\/\1/')"
done
or, if you can't use the bash $( syntax:
for f in *; do
mv -i "$f" "`echo "$f" | sed -e 's/^\(\(.\)\(.\)\(.\).*\)$/\2\/\3\/\4\/\1/'`"
done
However, considering the number of files, you may just want to use perl as that's a lot of sed and mv processes to spawn:
#!/usr/bin/perl -w
use strict;
# warning: untested
opendir DIR, "." or die "opendir: $!";
my #files = readdir(DIR); # can't change dir while reading: read in advance
closedir DIR;
foreach my $f (#files) {
(my $new_name = $f) =~ s!^((.)(.)(.).*)$!$2/$3/$4/$1/;
-e $new_name and die "$new_name already exists";
rename($f, $new_name);
}
That perl is surely limited to same-filesystem only, though you can use File::Copy::move to get around that.
You can do it as a bash script:
#!/bin/bash
base=base
mkdir -p $base/shorts
for n in *
do
if [ ${#n} -lt 3 ]
then
mv $n $base/shorts
else
dir=$base/${n:0:1}/${n:1:1}/${n:2:1}
mkdir -p $dir
mv $n $dir
fi
done
Needless to say, you might need to worry about spaces and the files with short names.
I suggest a short python script. Most shell tools will balk at that much input (though xargs may do the trick). Will update with example in a sec.
#!/usr/bin/python
import os, shutil
src_dir = '/src/dir'
dest_dir = '/dest/dir'
for fn in os.listdir(src_dir):
os.makedirs(dest_dir+'/'+fn[0]+'/'+fn[1]+'/'+fn[2]+'/')
shutil.copyfile(src_dir+'/'+fn, dest_dir+'/'+fn[0]+'/'+fn[1]+'/'+fn[2]+'/'+fn)
Any of the proposed solutions which use a wildcard syntax in the shell will likely fail due to the sheer number of files you have. Of the current proposed solutions, the perl one is probably the best.
However, you can easily adapt any of the shell script methods to deal with any number of files thus:
ls -1 | \
while read filename
do
# insert the loop body of your preference here, operating on "filename"
done
I would still use perl, but if you're limited to only having simple unix tools around, then combining one of the above shell solutions with a loop like I've shown should get you there. It'll be slow, though.

Resources