Going through files recursively and receiving file information - linux

i am fairly new to bash scripts and right now i am trying to go through files recursively in order to recieve some info about these files (name, size, ...)
My attempt so far:
for i in *.txt; do
stat -c '%n' "$i" >> $2
wc -l -w >> $2
stat -c '%a %A %U' "$i" >> $2
done
$2 is the file, where i want to log this info...
Thanks in advance!
EDIT I should post the problem aswell, sorry
i am receiving this error message:
stat: cannot stat '*.txt': No such file or directory
But the file should be in the $i variable, shouldnt it?

If there are no files matching the glob, Bash will by default return just the glob itself. So you are trying to process a file named literally "*.txt" (which by the way is actually a valid file name). What you probably want is shopt -s nullglob to expand a non-matching glob to nothing.

The goto tool for recursive file operations is find - it is very powerful, so go make a cup of tea, settle back and type man find

Related

Delete files in one directory that do not exist in another directory or its child directories

I am still a newbie in shell scripting and trying to come up with a simple code. Could anyone give me some direction here. Here is what I need.
Files in path 1: /tmp
100abcd
200efgh
300ijkl
Files in path2: /home/storage
backupfile_100abcd_str1
backupfile_100abcd_str2
backupfile_200efgh_str1
backupfile_200efgh_str2
backupfile_200efgh_str3
Now I need to delete file 300ijkl in /tmp as the corresponding backup file is not present in /home/storage. The /tmp file contains more than 300 files. I need to delete the files in /tmp for which the corresponding backup files are not present and the file names in /tmp will match file names in /home/storage or directories under /home/storage.
Appreciate your time and response.
You can also approach the deletion using grep as well. You can loop though the files in /tmp checking with ls piped to grep, and deleting if there is not a match:
#!/bin/bash
[ -z "$1" -o -z "$2" ] && { ## validate input
printf "error: insufficient input. Usage: %s tmpfiles storage\n" ${0//*\//}
exit 1
}
for i in "$1"/*; do
fn=${i##*/} ## strip path, leaving filename only
## if file in backup matches filename, skip rest of loop
ls "${2}"* | grep -q "$fn" &>/dev/null && continue
printf "removing %s\n" "$i"
# rm "$i" ## remove file
done
Note: the actual removal is commented out above, test and insure there are no unintended consequences before preforming the actual delete. Call it passing the path to tmp (without trailing /) as the first argument and with /home/storage as the second argument:
$ bash scriptname /path/to/tmp /home/storage
You can solve this by
making a list of the files in /home/storage
testing each filename in /tmp to see if it is in the list from /home/storage
Given the linux+shell tags, one might use bash:
make the list of files from /home/storage an associative array
make the subscript of the array the filename
Here is a sample script to illustrate ($1 and $2 are the parameters to pass to the script, i.e., /home/storage and /tmp):
#!/bin/bash
declare -A InTarget
while read path
do
name=${path##*/}
InTarget[$name]=$path
done < <(find $1 -type f)
while read path
do
name=${path##*/}
[[ -z ${InTarget[$name]} ]] && rm -f $path
done < <(find $2 -type f)
It uses two interesting shell features:
name=${path##*/} is a POSIX shell feature which allows the script to perform the basename function without an extra process (per filename). That makes the script faster.
done < <(find $2 -type f) is a bash feature which lets the script read the list of filenames from find without making the assignments to the array run in a subprocess. Here the reason for using the feature is that if the array is updated in a subprocess, it would have no effect on the array value in the script which is passed to the second loop.
For related discussion:
Extract File Basename Without Path and Extension in Bash
Bash Script: While-Loop Subshell Dilemma
I spent some really nice time on this today because I needed to delete files which have same name but different extensions, so if anyone is looking for a quick implementation, here you go:
#!/bin/bash
# We need some reference to files which we want to keep and not delete,
 # let's assume you want to keep files in first folder with jpeg, so you
# need to map it into the desired file extension first.
FILES_TO_KEEP=`ls -1 ${2} | sed 's/\.pdf$/.jpeg/g'`
#iterate through files in first argument path
for file in ${1}/*; do
# In my case, I did not want to do anything with directories, so let's continue cycle when hitting one.
if [[ -d $file ]]; then
continue
fi
# let's omit path from the iterated file with baseline so we can compare it to the files we want to keep
NAME_WITHOUT_PATH=`basename $file`
 # I use mac which is equal to having poor quality clts
# when it comes to operating with strings,
# this should be safe check to see if FILES_TO_KEEP contain NAME_WITHOUT_PATH
if [[ $FILES_TO_KEEP == *"$NAME_WITHOUT_PATH"* ]];then
echo "Not deleting: $NAME_WITHOUT_PATH"
else
# If it does not contain file from the other directory, remove it.
echo "deleting: $NAME_WITHOUT_PATH"
rm -rf $file
fi
done
Usage: sh deleteDifferentFiles.sh path/from/where path/source/of/truth

awk system syntax throwing error

I'm trying to move all my files in my directory individually to a new directory phonex and rename them at the same time phonex.txt.
so e.g.
1.txt, 2.txt, jim.txt
become:
phone1.txt in directory phone1
phone2.txt in directory phone2
phone3.txt in directory phone3.
I'm a newbie to awk, but I have managed to create the directories, but I cannot get the rename right.
I have tried:
ls|xargs -n1|awk ' {i++;system("mkdir phone"i);system("mv "$0" phone”i ”.txt -t phone"i)}'
which errors with lots of:
mv: cannot stat `phone”i': No such file or directory
and:
ls|xargs -n1|awk ' {i++;system("mkdir phone"i);system("mv "$0" phone”i ”/phone"i”.txt”)}'
error:
awk: 1: unexpected character 0xe2
xargs: /bin/echo: terminated by signal 13
Can anyone help me finish it off? TIA!
Piping ls into xargs into awk is completely unnecessary in this scenario. What you are trying to do can be accomplished using a simple loop:
for f in *.txt; do
i=$((i+1))
dir="phone$i"
mkdir "$dir" && mv "$f" "$dir/$dir.txt"
done
Depending on your shell, the increment of $i can be done in a different way (like ((++i)) in bash) but this way is POSIX-compliant so should work on any modern shell.
By the way, the reason for your original error is that you are using curly quotes ”, which are not understood by the shell. You should always only use single ' and double " in the shell.
If you wanted it with incrementing numbers.
Else Tom Fenech's way is the way to go !
for i in *.txt;do d=phone$((++X));mkdir "$d"; mv "$i" "$d/$d.txt";done
Also you may want to set x to zero x=0 before doing this in case it is already set as something else

Move files and rename - one-liner

I'm encountering many files with the same content and the same name on some of my servers. I need to quarantine these files for analysis so I can't just remove the duplicates. The OS is Linux (centos and ubuntu).
I enumerate the file names and locations and put them into a text file.
Then I do a for statement to move the files to quarantine.
for file in $(cat bad-stuff.txt); do mv $file /quarantine ;done
The problem is that they have the same file name and I just need to add something unique to the filename to get it to save properly. I'm sure it's something simple but I'm not good with regex. Thanks for the help.
Since you're using Linux, you can take advantage of GNU mv's --backup.
while read -r file
do
mv --backup=numbered "$file" "/quarantine"
done < "bad-stuff.txt"
Here's an example that shows how it works:
$ cat bad-stuff.txt
./c/foo
./d/foo
./a/foo
./b/foo
$ while read -r file; do mv --backup=numbered "$file" "./quarantine"; done < "bad-stuff.txt"
$ ls quarantine/
foo foo.~1~ foo.~2~ foo.~3~
$
I'd use this
for file in $(cat bad-stuff.txt); do mv $file /quarantine/$file.`date -u +%s%N`; done
You'll get everyfile with a timestamp appended (in nanoseconds).
You can create a new file name composed by the directory and the filename. Thus you can add one more argument in your original code:
for ...; do mv $file /quarantine/$(echo $file | sed 's:/:_:g') ; done
Please note that you should replace the _ with a proper character which is special enough.

Removing 10 Characters of Filename in Linux

I just downloaded about 600 files from my server and need to remove the last 11 characters from the filename (not including the extension). I use Ubuntu and I am searching for a command to achieve this.
Some examples are as follows:
aarondyne_kh2_13thstruggle_or_1250556383.mus should be renamed to aarondyne_kh2_13thstruggle_or.mus
aarondyne_kh2_darknessofunknow_1250556659.mp3 should be renamed to aarondyne_kh2_darknessofunknow.mp3
It seems that some duplicates might exist after I do this, but if the command fails to complete and tells me what the duplicates would be, I can always remove those manually.
Try using the rename command. It allows you to rename files based on a regular expression:
The following line should work out for you:
rename 's/_\d+(\.[a-z0-9A-Z]+)$/$1/' *
The following changes will occur:
aarondyne_kh2_13thstruggle_or_1250556383.mus renamed as aarondyne_kh2_13thstruggle_or.mus
aarondyne_kh2_darknessofunknow_1250556659.mp3 renamed as aarondyne_kh2_darknessofunknow.mp3
You can check the actions rename will do via specifying the -n flag, like this:
rename -n 's/_\d+(\.[a-z0-9A-Z]+)$/$1/' *
For more information on how to use rename simply open the manpage via: man rename
Not the prettiest, but very simple:
echo "$filename" | sed -e 's!\(.*\)...........\(\.[^.]*\)!\1\2!'
You'll still need to write the rest of the script, but it's pretty simple.
find . -type f -exec sh -c 'mv {} `echo -n {} | sed -E -e "s/[^/]{10}(\\.[^\\.]+)?$/\\1/"`' ";"
one way to go:
you get a list of your files, one per line (by ls maybe) then:
ls....|awk '{o=$0;sub(/_[^_.]*\./,".",$0);print "mv "o" "$0}'
this will print the mv a b command
e.g.
kent$ echo "aarondyne_kh2_13thstruggle_or_1250556383.mus"|awk '{o=$0;sub(/_[^_.]*\./,".",$0);print "mv "o" "$0}'
mv aarondyne_kh2_13thstruggle_or_1250556383.mus aarondyne_kh2_13thstruggle_or.mus
to execute, just pipe it to |sh
I assume there is no space in your filename.
This script assumes each file has just one extension. It would, for instance, rename "foo.something.mus" to "foo.mus". To keep all extensions, remove one hash mark (#) from the first line of the loop body. It also assumes that the base of each filename has at least 12 character, so that removing 11 doesn't leave you with an empty name.
for f in *; do
ext=${f##*.}
new_f=${base%???????????.$ext}
if [ -f "$new_f" ]; then
echo "Will not rename $f, $new_f already exists" >&2
else
mv "$f" "$new_f"
fi
done

Search MS word files in a directory for specific content in Linux

I have a directory structure full of MS word files and I have to search the directory for particular string. Until now I was using the following command to search files for in a directory
find . -exec grep -li 'search_string' {} \;
find . -name '*' -print | xargs grep 'search_string'
But, this search doesn't work for MS word files.
Is it possible to do string search in MS word files in Linux?
I'm a translator and know next to nothing about scripting but I was so pissed off about grep not being able to scan inside Word .doc files that I worked out how to make this little shell script to use catdoc and grep to search a directory of .doc files for a given input string.
You need to install catdocand docx2txt packages
#!/bin/bash
echo -e "\n
Welcome to scandocs. This will search .doc AND .docx files in this directory for a given string. \n
Type in the text string you want to find... \n"
read response
find . -name "*.doc" |
while read i; do catdoc "$i" |
grep --color=auto -iH --label="$i" "$response"; done
find . -name "*.docx" |
while read i; do docx2txt < "$i" |
grep --color=auto -iH --label="$i" "$response"; done
All improvements and suggestions welcome!
Here's a way to use "unzip" to print the entire contents to standard output, then pipe to "grep -q" to detect whether the desired string is present in the output. It works for docx format files.
#!/bin/bash
PROG=`basename $0`
if [ $# -eq 0 ]
then
echo "Usage: $PROG string file.docx [file.docx...]"
exit 1
fi
findme="$1"
shift
for file in $#
do
unzip -p "$file" | grep -q "$findme"
[ $? -eq 0 ] && echo "$file"
done
Save the script as "inword" and search for "wombat" in three files with:
$ ./inword wombat file1.docx file2.docx file3.docx
file2.docx
Now you know file2.docx contains "wombat". You can get fancier by adding support for other grep options. Have fun.
The more recent versions of MS Word intersperse ascii[0] in between each of the letters of the text for purposes I cannot yet understand. I have written my own MS Word search utilities that insert ascii[0] in between each of the characters in the search field and it just works fine. Clumsy but OK. A lot of questions remain. Perhaps the junk characters are not always the same. More tests need to be done. It would be nice if someone could write a utility that would take all this into account. On my windows machine the same files respond well to searches.
We can do it!
In a .doc file the text is generally present and can be found by grep, but that text is broken up and interspersed with field codes and formatting information so searching for a phrase you know is there may not match. A search for something very short has a better chance of matching.
A .docx file is actually a zip archive collecting several files together in a directory structure (try renaming a .docx to .zip then unzipping it!) -- with zip compression it's unlikely that grep will find anything at all.
The opensource command line utility crgrep will search most MS document formats (I'm the author).
Have you tried with awk ‘/Some|Word|In|Word/’ document.docx ?
If it's not too many files you can write a script that incorporates something like catdoc: http://manpages.ubuntu.com/manpages/gutsy/man1/catdoc.1.html , by looping over each file, perfoming a catdoc and grep, storing that in a bash variable, and outputting it if it's satisfactory.
If you have installed program called antiword you can use this command:
find -iname "*.doc" |xargs -I {} bash -c 'if (antiword {}|grep "string_to_search") > /dev/null 2>&1; then echo {} ; fi'
replace "string_to_search" in above command with your text. This command spits file name(s) of files containing "string_to_search"
The command is not perfect because works weird on small files (the result can be untrustful), becasue for some reseaon antiword spits this text:
"I'm afraid the text stream of this file is too small to handle."
if file is small (whatever it means .o.)
The best solution I came upon was to use unoconv to convert the word documents to html. It also has a .txt output, but that dropped content in my case.
http://linux.die.net/man/1/unoconv
I've found a way of searching Word files (doc and docx) that uses the preprocessor functionality of ripgrep.
This depends on the following being installed:
ripgrep (more information about the preprocessor here)
LibreOffice
docx2txt
this catdoc2 script, which I've added to my $PATH:
#!/bin/bash
temp_dir=$(mktemp -d)
trap "rm $temp_dir/* && rmdir $temp_dir" 0 2 3 15
libreoffice --headless --convert-to "txt:Text (encoded):UTF8" --outdir ${temp_dir} $1 1>/dev/null
cat ${temp_dir}/$(basename -s .doc $1).txt
The command pattern tor a one-level recursive search is:
$ rg --pre <preprocessor> --glob <glob with filetype> <search string>
Example:
$ ls *
one:
a.docx
two:
b.docx c.doc
$ rg --pre docx2txt --glob *.docx This
two/b.docx
1:This is file b.
one/a.docx
1:This is file a.
$ rg --pre catdoc2 --glob *.doc This
two/c.doc
1:This is file c.
Here's the full script I use on macOS (Catalina, Big Sur, Monterey).
It's based on Ralph's suggestion, but using built-in textutil for .doc
#!/bin/bash
searchInDoc() {
# in .doc
find "$DIR" -name "*.doc" |
while read -r i; do
textutil -stdout -cat txt "$i" | grep --color=auto -iH --label="$i" "$PATTERN"
done
}
searchInDocx() {
for i in "$DIR"/*.docx; do
#extract
docx2txt.sh "$i" 1> /dev/null
#point, grep, remove
txtExtracted="$i"
txtExtracted="${txtExtracted//.docx/.txt}"
grep -iHn "$PATTERN" "$txtExtracted"
rm "$txtExtracted"
done
}
askPrompts() {
local i
for i in DIR PATTERN; do
#prompt
printf "\n%s to search: \n" "$i"
#read & assign
read -e REPLY
eval "$i=$REPLY"
done
}
makeLogs() {
local i
for i in results errors; do
# extract dir for log name
dirNAME="${DIR##*/}"
# set var
eval "${i}LOG=$HOME/$i-$PATTERN-$dirNAME.log"
local VAR="${i}LOG"
# remove if existant
if [ -f "${!VAR}" ]; then
printf "WARNING: %s will be overwriten.\n" "${!VAR}"
fi
# touch file
touch "${!VAR}"
done
}
checkDocx2txt() {
#see if soft exists
if ! command -v docx2txt.sh 1>/dev/null; then
printf "\nWARNING: docx2txt is required.\n"
printf "Use \e[3mbrew install docx2txt\e[0m.\n\n"
exit
else
printf "\n~~~~~~~~~~~~~~~~~~~~~~~~\n"
printf "Welcome to scandocs macOS.\n"
printf "~~~~~~~~~~~~~~~~~~~~~~~~\n"
fi
}
parseLogs() {
# header
printf "\n------\n"
printf "Scandocs finished.\n"
# results
if [ ! -s "$resultsLOG" ]; then
printf "But no results were found."
printf "\"%s\" did not match in \"%s\"" "$PATTERN" "$DIR" > "$resultsLOG"
else
printf "See match results in %s" "$resultsLOG"
fi
# errors
if [ ! -s "$errorsLOG" ]; then
rm -f "$errorsLOG"
else
printf "\nWARNING: there were some errors. See %s" "$errorsLOG"
fi
# footer
printf "\n------\n\n"
}
#the soft
checkDocx2txt
askPrompts
makeLogs
{
searchInDoc
searchInDocx
} 1>"$resultsLOG" 2>"$errorsLOG"
parseLogs

Resources