Best way to tar and zip files meeting specific name criteria? - linux

I'm writing a shell script on a Linux machine to be run via a crontab which is meant to move all files older than the current day to a new folder, and then tar and zip the entire folder. Seems like a simple task but for some reason, I'm running into all kinds of roadblocks. I'm new to this and self-taught so any help or redirection would be greatly appreciated.
Specific criteria for which files to archive:
All log files are in /home/tech/logs/ and all pdfs are in /home/tech/logs/pdf
All files are over a day old as indicated by the file name (file name does not include $CURRENT_DATE)
All files must be *.log or *.pdf (i.e. don't archive files that don't include $CURRENT_DATE if it isn't a log or pdf file.
Filename formatting specifics:
All the log file names are in home/tech/logs in the format NAME 00_20180510.log, and all the pdf files are in a "pdf" subdirectory (home/tech/logs/pdf) with the format NAME 00_20180510_00000000.pdf ("20180510" would be whenever the file was created and the 0's would be any number). I need to use the name rather than the file metadata for the creation date, and all files (pdf/log) whose name does not include the current date are "old". I also can't just move all files that don't contain $CURRENT_DATE in the name because it would take any non-*.pdf or *.log files with it.
Right now the script creates a new folder with a new pdf subdir for the old files (mkdir -p /home/tech/logs/$ARCHIVE_NAME/pdf). I then want to move the old logs into $ARCHIVE_NAME, and move all old pdfs from the original pdf subdirectory into $ARCHIVE_NAME/pdf.
Current code:
find /home/tech/logs -maxdepth 1 -name ( "*[^$CURRENT_DATE].log" "*.log" ) -exec mv -t "$ARCHIVE_NAME" '{}' ';'
find /home/tech/logs/pdf -maxdepth 1 -name ( "*[^$CURRENT_DATE]*.pdf" "*.pdf" ) -exec mv -t "$ARCHIVE_NAME/pdf" '{}' ';'
This hasn't been working because it treats the numbers in $CURRENT_DATE as a list of numbers to exclude rather than a literal string.
I've considered just using tar's exclude options like this:
tar -cvzPf "$ARCHIVE_NAME.tgz" --directory /home/tech/logs --exclude="$CURRENT_DATE" --no-unquote --recursion --remove-files --files-from="/home/tech/logs/"
But a) it doesn't work, and b) it would theoretically include all files that weren't *.pdf or *.log files, which would be a problem.
Am I overcomplicating this? Is there a better way to go about this?

I would go about this using bash's extended glob features, which allow you to negate a pattern:
#!/bin/bash
shopt -s extglob
mv /home/tech/logs/*!("$CURRENT_DATE")*.log "$ARCHIVE_NAME"
mv /home/tech/logs/pdf/*!("$CURRENT_DATE")*.pdf "$ARCHIVE_NAME"/pdf
With extglob enabled, !(pattern) expands to everything that doesn't match the pattern (or list of pipe-separated patterns).
Using find it should also be possible:
find /home/tech/logs -name '*.log' -not -name "*$CURRENT_DATE*" -exec mv -t "$ARCHIVE_NAME" {} +

Building on #tom-fenech answer, optimized to avoid many mv invocations:
find /home/tech/logs -maxdepth 1 -name '*.log' -not -name "*_${CURRENT_DATE?}.log" | \
xargs mv -t "${ARCHIVE_NAME?}"
An interesting feature, from processing the file thru pipes, is the ability to filter them with extra tools (aka grep :), which can (arguably) become more readable i.e. ->
find /home/tech/logs -maxdepth 1 -name '*.log' | fgrep -v "_${CURRENT_DATE?}" | \
xargs mv -t "${ARCHIVE_NAME?}"
Then similarly for the pdf ones, BTW you can "dry-run" above by just replacing mv by echo mv.
--jjo

Related

Copying a type of file, in specific directories, to another directory

I have a .txt file that contains a list of directories. I want to make a script that goes through this .txt file, copies anything in the directory thats listed of a certain file type, to another directory.
I've never done this with directories, only files.
How can i edit this simple script to work for reading a directory list, looking for a .csv file, and copy it to another directory?
cat filenames.list | \
while read FILENAME
do
find . -name "$FILENAME" -exec cp '{}' new_dir\;
done
for DIRNAME in $(dirname.list); do find $DIRNAME -type f -name "*.csv" -exec cp \{} dest \; ; done;
sorry, in my first answer i didnt understand what you asking for.
The first line of code, simply, take a dirname entry in your directory list as a path and search in it for each file which end with ".csv" extension; then copy it inside the destination you want.
But you could do with less code:
for DIRNAME in $(dirname.list); do cp $DIRNAME/*.csv dest ; done
Despite the filename of the list filenames.list, let me assume the file contains the list of directory names, not filenames. Then would you please try:
while IFS= read -r dir; do
find "$dir" -type f -name "*.mp3" -exec cp -p -- {} new_dir \;
done < filenames.list
The find command searches in "$dir" for files which have an extension .mp3 then copies them to the new_dir.
The script above does not care the duplication of the filenames. If you want to keep the original directory tree and/or need a countermeasure for the duplication of the filenames, please let me know.
Using find inside a while loop works but find will run on each line of the file, another alternative is to save the list in an array, that way find can search on the directories in the list in one search.
If you have bash4+ you can use mapfile.
mapfile -t directories < filenames.list
If you're stuck at bash3.
directories=()
while IFS= read -r line; do
directories+=("$lines")
done < filenames.list
Now if you're just after one file type like files ending in *.csv.
find "${directories[#]}" -type f -name '*.csv' -exec sh -c 'cp -v -- "$#" /newdirectory' _ {} +
If you have multiple file type to match and multiple directories to copy the files.
while IFS= read -r -d '' file; do
case $file in
*.csv) cp -v -- "$file" /foodirectory;; ##: csv file copy to foodirectory
*.mp3) cp -v -- "$file" /bardirectory;; ##: mp3 file copy to bardirectory
*.avi) cp -v -- "$file" /bazdirectory;; ##: avi file copy to bazdirectory
esac
done < <(find "${directories[#]}" -type f -print0)
find's print0 will work with read's -d '' when dealing with files with white spaces and newlines. see How can I find and deal with file names containing newlines, spaces or both?
The -- is there so if you have a problematic filename that starts with a dash - cp will not interpret it as an option.
Given find ability to process multiple folder, and assuming goal is to 'flatten' all csv files into a single destination, consider the following.
Note that it assumes folder names do not have special characters (including spaces, tabs, new lines, etc).
As a side benefit, it will minimize the number of 'cp' calls, making the process efficient across large number of files/folders.
find $(<filename.list) -name '*.csv' | xargs cp -t DESTINATION/
For the more complex case, where folder names/file name can be anything (including space, '*', etc.), consider using NUL separator (-print0 and -0).
xargs -I{} -t find '{}' -name '*.csv' <dd -print0 | xargs -0 -I{} -t cp -t new/ '{}'
Which will fork multiple find and multiple cp.

Linux rename files as dirname

i got lots of files like this:
./1/wwuhw.mp3
./2/nweiewe.mp3
./3/iwqjoiw.mp3
./4/ncionw.MP3
./5/joiwqfm.wmv
./6/jqoifiew.WMV
how can i rename them like this in Linux Bash:
./1/1.mp3
./2/2.mp3
./3/3.mp3
./4/4.MP3
./5/5.wmv
./6/6.WMV
Try this,
for i in */*; do mv $i $(dirname $i)/$(dirname $i).${i##*.}; done
For loop iterates over each file in directory one by one. and mv statement renames the each file in directory one by one.
Something like this should do the job:
for i in */*; do
echo mv "${i}" "${i%/*}/${i%/*}.${i##*.}"
done
See e.g. here, what this cryptic parameter expansions (like ${i%/*}) mean in bash.
The script above will only print the commands in the console, without invoking them. Once you are sure you want to proceed, you can remove the echo statement and let it run.
If you don't mind using external tool, then rnm can do this pretty easily:
rnm -ns '/pd0/./e/' */*
/pd0/ is the immediate parent directory, /pd1/ is the directory before that and so forth.
-ns means name string and /pd/ and /e/ are name string rules which expands to parent directory and file extension respectively.
The general format of the /pd/ rule is /pd<digit>-<digit>-<delim>/, for example, a rule like /pd0-2-_/ will construct dir0_dir1_dir2 from a directory structure of dir2/dir1/dir0
More examples can be found here.
The for loop method, as outlined in some of the other answers, would suffice and work great for most cases where you need to rename every file in a directory to the first parent's directory name. My particular case called for a bit more granularity, where I only wanted to rename a subset of the files in a directory and assert that the operand was, in fact, an actual file, not an empty directory, symbolic link, etc. Using find can achieve exactly what you want in addition to the added ability to apply filtration and processing to the file inputs and outputs.
#####################################
# Same effect as using a `for` loop #
#####################################
#
# -mindepth 2 : ensures that the file has a parent directory.
# -type f : ensures that we are working with a `regular file` (not directory, symlink, etc.).
find . -mindepth 2 -type f -exec bash -c 'file="{}"; dir="$(dirname $file)"; mv "$file" "$dir/${dir##*/}.${file##*.}"' \;
#########################
# Additional filtration #
#########################
# mp3 ONLY (case insensitive)
find . -mindepth 2 -type f -iname "*.mp3" -exec bash -c 'file="{}"; dir="$(dirname $file)"; mv "$file" "$dir/${dir##*/}.${file##*.}"' \;
# mp3 OR mp4 ONLY (case insensitive)
find . -mindepth 2 -type f \( -iname "*.mp3" -or -iname "*.mp4" \) -exec bash -c 'file="{}"; "dir=$(dirname $file)"; mv "$file" "$dir/${dir##*/}.${file##*.}"' \;

Getting all files from various folders and copying them with unique names

Currently using this command to get all my "fanart" from my TV folder, and dump it into a single folder.
find /volume1/tv/ -type f \( -name '*fanart.jpg'* -o -path '*/fanart/*.jpg' -o -path '*/extrafanart/*.jpg' \) -exec cp {} /volume1/tv/_FANART \;
Here's the issue: a lot of these files have the same name, and can't be dumped into the same folder. Example:
Folder A
fanart.jpg
Folder B
fanart.jpg
Is there a way to copy these files from their respective folders and give them a unique name in the destination folder? Name needn't be anything descriptive, random is just fine.
Thanks!
find /volume1/tv/ -type f \( -name '*fanart.jpg'* -o -path '*/fanart/*.jpg' -o -path '*/extrafanart/*.jpg' \) -exec cp --backup=numbered {} /volume1/tv/_FANART \;
..
cp --backup=numbered {}
If the file exists, this will not overwrite but make a backup with a number assigned.
The files will be hidden. Ctrl+H to view hidden files
You could copy the files while giving them names according to their locations in the original directory tree. For instance (":" is legal but
unusual in filenames), your "find" command could call a shell script (rather than "cp" directly), which might look like this:
#!/bin/sh
case "x$1" in
x/volume1/tv/_FANART/*)
;;
*)
target=`echo "$1" | sed -e 's,^/volume1/tv/,,' -e s,/,:,g`
cp "$1" "$2/$target"
;;
esac
and the corresponding "-exec" would be
-exec myscript "{}" /volume1/tv/_FANART \;
By the way, the source/destination on the original example are in the same directory tree "/volume1/tv", which is why the sample script uses a case statement - to exclude files already copied to the _FANART folder.
If you want to use the md5sum as the new name:
find /volume1/tv/ -type d -path '/volume1/tv/_FANART' -prune -o -type f \( -name '*fanart.jpg'* -o -path '*/fanart/*.jpg' -o -path '*/extrafanart/*.jpg' \) -exec sh -c 'md5=$(md5sum < "$0") && md5=${md5%% *}.jpg && echo cp "$0" "/volume1/tv/_FANART/$md5"' {} \;
Every thing happens in the sh command (all commands are separated by && but I omitted the && for clarity):
md5=$(md5sum < "$0")
md5=${md5%% *}.jpg
cp "$0" "/volume1/tv/_FANART/$md5"'
the $0 expands to the filename processed. We first compute the md5sum of the file, then only keep the md5sum (md5sum puts a hyphen next to the hash) and append .jpg to that, and finally we copy the file into the target folder, with the computed name.
Notes.
I added
-type d -path '/volume1/tv/_FANART` -prune -o
to your command to omit this folder, since you very likely don't want to process it; it would actually be weird to process it, as its content is changed throughout find's traversal.
I left an echo in the command, so that absolutely nothing is copied (as is, it's 100% safe, you can just copy and paste it in your terminal): it only shows what commands are going to be performed (and you'll also see how fast/slow it is).
The command is 100% safe regarding funny filenames with spaces, newlines, globs, etc.
I used md5sum < fileand not md5sum file, because if the filename file contains special characters (like backslashes, newlines, etc.), md5sum (at least my version) prepends the hash with a backslash. Weird. By not giving a filename, we're safe, this won't happen.

How do I rename lots of files changing the same filename element for each in Linux?

I'm trying to rename a load of files (I count over 200) that either have the company name in the filename, or in the text contents. I basically need to change any references to "company" to "newcompany", maintaining capitalisation where applicable (ie "Company becomes Newcompany", "company" becomes "newcompany"). I need to do this recursively.
Because the name could occur pretty much anywhere I've not been able to find example code anywhere that meets my requirements. It could be any of these examples, or more:
company.jpg
company.php
company.Class.php
company.Company.php
companysomething.jpg
Hopefully you get the idea. I not only need to do this with filenames, but also the contents of text files, such as HTML and PHP scripts. I'm presuming this would be a second command, but I'm not entirely sure what.
I've searched the codebase and found nearly 2000 mentions of the company name in nearly 300 files, so I don't fancy doing it manually.
Please help! :)
bash has powerful looping and substitution capabilities:
for filename in `find /root/of/where/files/are -name *company*`; do
mv $filename ${filename/company/newcompany}
done
for filename in `find /root/of/where/files/are -name *Company*`; do
mv $filename ${filename/Company/Newcompany}
done
For the file and directory names, use for, find, mv and sed.
For each path (f) that has company in the name, rename it (mv) from f to the new name where company is replaced by newcompany.
for f in `find -name '*company*'` ; do mv "$f" "`echo $f | sed s/company/nemcompany/`" ; done
For the file contents, use find, xargs and sed.
For every file, change company by newcompany in its content, keeping original file with extension .backup.
find -type f -print0 | xargs -0 sed -i .bakup 's/company/newcompany/g'
I'd suggest you take a look at man rename an extremely powerful perl-utility for, well, renaming files.
Standard syntax is
rename 's/\.htm$/\.html/' *.htm
the clever part is that the tool accept any perl-regexp as a pattern for a filename to be changed.
you might want to run it with the -n switch which will make the tool to only report what it would have changed.
Can't figure out a nice way to keep the capitalization right now, but since you already can search through the filestructure, issue several rename with different capitalization until all files are changed.
To loop through all files below current folder and to search for a particular string, you can use
find . -type f -exec grep -n -i STRING_TO_SEARCH_FOR /dev/null {} \;
The output from that command can be directed to a file (after some filtering to just extract the file names of the files that need to be changed).
find . /type ... > files_to_operate_on
Then wrap that in a while read loop and do some perl-magic for inplace-replacement
while read file
do
perl -pi -e 's/stringtoreplace/replacementstring/g' $file
done < files_to_operate_on
There are few right ways to recursively process files. Here's one:
while IFS= read -d $'\0' -r file ; do
newfile="${file//Company/Newcompany}"
newfile="${newfile//company/newcompany}"
mv -f "$file" "$newfile"
done < <(find /basedir/ -iname '*company*' -print0)
This will work with all possible file names, not just ones without whitespace in them.
Presumes bash.
For changing the contents of files I would advise caution because a blind replacement within a file could break things if the file is not plain text. That said, sed was made for this sort of thing.
while IFS= read -d $'\0' -r file ; do
sed -i '' -e 's/Company/Newcompany/g;s/company/newcompany/g'"$file"
done < <(find /basedir/ -iname '*company*' -print0)
For this run I recommend adding some additional switches to find to limit the files it will process, perhaps
find /basedir/ \( -iname '*company*' -and \( -iname '*.txt' -or -ianem '*.html' \) \) -print0

Merging Sub-Folders together, Linux

I have a main folder "Abc" which has about 800 sub-folders. Each of these sub-folders contains numerous files (all of the same format, say ".doc"). How do I create one master folder with all these files (and not being distributed into subfolders). I am doing this on a Windows 7 machine, using cygwin terminal.
The cp -r command copies it but leaves the files in the sub-folders, so it doesn't really help much. I'd appreciate assistance with this. Thank you!
Assuming there could be name collisions and multiple extensions, this will create unique names, changing directory paths to dashes (e.g. a/b/c.doc would become a-b-c.doc). Run this from within the folder you want to collapse:
# if globstar is not enabled, you'll need it.
shopt -s globstar
for file in */**; do [ -f "$file" ] && mv -i "$file" "${file//\//-}"; done
# get rid of the now-empty subdirectories.
find . -type d -empty -delete
If you can guarantee unique names, this will move the files and remove the subdirectories. You can change the two .s to the name of a folder and run it from outside said folder:
find . -depth \( -type f -exec mv -i {} . \; \) -o \( -type d -empty -delete \)
This may not be the most elegant or efficient way to do it, but I believe it'd accomplish what you want:
for file in `find abc`
do
if [ -f $file ]
then
mv $file `basename $file`
fi
done
Iterate through everything in abc, check if it's a file (not a directory) and if it is then move it from its current location (eg abc/d/example.txt) to abc/
Edit: This would leave all the subfolders in place (but they'd be empty now)

Resources