How to write shell script to create zip file for the files that had same string in file name - linux

How to write simple shell script to create zip file.
I want to create zip file by collecting files with same string pattern in their file names from a folder.
For example, there may be many files under a folder.
xxxxx_20140502_xxx.txt
xxxxx_20140502_xxx.txt
xxxxx_20140503_xxx.txt
xxxxx_20140503_xxx.txt
xxxxx_20140504_xxx.txt
xxxxx_20140504_xxx.txt
After running the shell script, the result must be following three zip files.
20140502.zip
20140503.zip
20140504.zip
Please give me right direction to create simple shell script to output the result as above.

#!/bin/bash
for file in *_????????_*.csv *_????????_*.txt; do
[ -f "${file}" ] || continue
date=${file#*_} # adjust this and next line depending
date=${date%_*} # on your actual prefix/suffix
echo "${date}"
done | sort -u | while read date; do
zip "${date}.zip" *${date}*
done

Since zip will update the archive, this will do:
shopt -s nullglob
for file in *.{txt,csv}; do [[ $file =~ _([[:digit:]]{8})_ ]] && zip "${BASH_REMATCH[1]}.zip" "$file"; done
The shopt -s nullglob is because you don't want to have unexpanded globs if there are no matching files.
Everything below this line is my old answer...
First, get all the possible dates. Heuristically, this could be the files ending in .txt and .csv that match the regex _[[:digit:]]{8}_:
#!/bin/bash
shopt -s nullglob
declare -A dates=()
for file in *.{csv,txt}; do
[[ $file =~ _([[:digit:]]{8})_ ]] && dates[${BASH_REMATCH[1]}]=
done
printf "Date found: %s\n" "${!dates[#]}"
This will output to stdout all the dates found in the files. E.g. (I called the previous snipped gorilla and I chmod +x gorilla and touched a few files for demo):
$ ls
banana_20010101_gorilla.csv gorilla_20140502_bonobo.csv
gorilla notthisone_123_lol.txt
gorilla_20140502_banana.txt
$ ./gorilla
Date found: 20140502
Date found: 20010101
Next step, for each date found, get all the files ending in .txt and .csv and zip them in the archive corresponding to the date: appending this to gorilla will do the job:
for date in "${!dates[#]}"; do
zip "$date.zip" *"_${date}_"*.{csv,txt}
done
Full script after removing the flooding part:
#!/bin/bash
shopt -s nullglob
declare -A dates=()
for file in *.{csv,txt}; do
[[ $file =~ _([[:digit:]]{8})_ ]] && dates[${BASH_REMATCH[1]}]=
done
for date in "${!dates[#]}"; do
zip "$date.zip" *"_${date}_"*.{csv,txt}
done
Edit. I overlooked your requirement with one line command. Then here's the one-liner:
shopt -s nullglob; declare -A dates=(); for file in *.{csv,txt}; do [[ $file =~ _([[:digit:]]{8})_ ]] && dates[${BASH_REMATCH[1]}]=; done; for date in "${!dates[#]}"; do zip "$date.zip" *"_${date}_"*.{csv,txt}; done
:)

#! /bin/bash
dates=$(ls ?????_[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]_???.{csv,txt} \
| cut -f2 -d_ | sort -u)
for date in $dates ; do
zip $date.zip ?????_"$date"_???.{csv,txt}
done

Related

Copy a folder contents and save the file with diff name Unix

I have a bunch of .txt files in a directory.
I m looking for a command to copy all .txt files and save it with <filename>_2.txt.
Eg: abc.txt -> abc_2.txt (After copy)
Thanks in tons in advance
EDIT: As per OP's extension request adding following code now.
for file in *.txt
do
if [[ ! -f "${file%.*}_MED_2.txt" ]]
then
cp "$file" "${file%.*}_MED_2.txt"
fi
done
Try following.
for file in *.txt
do
echo "cp $file ${file%.*}_2"
done
Above will print cp commands, if you are ok with them then run following.
for file in *.txt
do
if [[ ! -f "${file%.*}_2" ]]
then
cp "$file" "${file%.*}_2"
fi
done

How can I batch rename multiple images with their path names and reordered sequences in bash?

My pictures are kept in the folder with the picture-date for folder name, for example the original path and file names:
.../Pics/2016_11_13/wedding/DSC0215.jpg
.../Pics/2016_11_13/afterparty/DSC0234.jpg
.../Pics/2016_11_13/afterparty/DSC0322.jpg
How do I rename the pictures into the format below, with continuous sequences and 4-digit padding?
.../Pics/2016_11_13_wedding.0001.jpg
.../Pics/2016_11_13_afterparty.0002.jpg
.../Pics/2016_11_13_afterparty.0003.jpg
I'm using Bash 4.1, so only mv command is available. Here is what I have now but it's not working
#!/bin/bash
p=0
for i in *.jpg;
do
mv "$i" "$dirname.%03d$p.JPG"
((p++))
done
exit 0
Let say you have something like .../Pics/2016_11_13/wedding/XXXXXX.jpg; then go in directory .../Pics/2016_11_13; from there, you should have a bunch of subdirectories like wedding, afterparty, and so on. Launch this script (disclaimer: I didn't test it):
#!/bin/sh
for subdir in *; do # scan directory
[ ! -d "$subdir" ] && continue; # skip non-directory
prognum=0; # progressive number
for file in $(ls "$dir"); do # scan subdirectory
(( prognum=$prognum+1 )) # increment progressive
newname=$(printf %4.4d $prognum) # format it
newname="$subdir.$newname.jpg" # compose the new name
if [ -f "$newname" ]; then # check to not overwrite anything
echo "error: $newname already exist."
exit
fi
# do the job, move or copy
cp "$subdir/$file" "$newname"
done
done
Please note that I skipped the "date" (2016_11_13) part - I am not sure about it. If you have a single date, then it is easy to add these digits in # compose the new name. If you have several dates, then you can add a nested for for scanning the "date" directories. One more reason I skipped this, is to let you develop something by yourself, something you can be proud of...
Using only mv and bash builtins:
#! /bin/bash
shopt -s globstar
cd Pics
p=1
# recursive glob for .jpg files
for i in **/*.jpg
do
# (date)/(event)/(filename).jpg
if [[ $i =~ (.*)/(.*)/(.*).jpg ]]
then
newname=$(printf "%s_%s.%04d.jpg" "${BASH_REMATCH[#]:1:2}" "$p")
echo mv "$i" "$newname"
((p++))
fi
done
globstar is a bash 4.0 feature, and regex matching is available even in OSX's anitque bash.

Linux: Piping output to unique files

I have a folder filed with hundreds of text files which I want to run a Linux command called mint. This command outputs a text value which I want stored in unique files, one for each file I have in the folder. Is there a way to run the command using the * character to represent all my input files, while still piping the output to a file that is unique from each other file?
Example:
$ mint * > uniqueFile.krn
With the bugs fixed and caveats closed:
#!/bin/bash
# ^^^^ - bash, not sh, for [[ ]] support
for f in *.krn; do
[[ $f = *.krn ]] && continue # skip files already ending in .krn
mint "$f" >"$f.krn"
done
Or, with a prefix:
for f in *; do
[[ $f = int_* ]] && continue
mint "$f" >"int_$f"
done
You can also avoid recreating hashes that already exist unless the source file changed:
for f in *; do
# don't hash hash files
[[ $f = int_* ]] && continue
# if a non-empty hash file exists, and is newer than our source file, don't hash again
[[ -s "int_$f" && "int_$f" -nt "$f" ]] && continue
# ...if we got through the above conditions, then go ahead with creating a hash
mint "$f" >"int_$f"
done
To explain:
test -s filename is true only if a file by the given name exists and is non-empty
test file1 -nt file2 is true only if both files exist, and file1 is newer than file2.
[[ ]] is a ksh-extended shell syntax derived from that for the test command, adding support for pattern-matching tests (ie. [[ $string = *.txt ]] will be true only if $string expands to a value ending in .txt), and relaxing quoting rules (it's safe to write [[ -s $f ]], but test -s "$f" needs the quotes to work with all possible filenames).
Thanks for all the suggestions! Shiping's solution worked great, I just appended a prefix to the file name. Like so:
$ for file in * ; do mint $file > int_$file ; done
Self-answer moved from question and flagged Community Wiki; see What is the appropriate action when the answer to a question is added to the question itself?

Create .txt of all files in each subdirectory

I need to create a text file in each subdirectory of all files in the list.
For example, subdirectory1 would contain a list of all of its files as a .txt and subdirectory2 would also contain a list of all of subdirectory2 files as a .txt.
I have tried
#!/bin/bash
for X in "$directory" *
do
if [ -d "$X" ];
then
cd "$X"
files="$(ls)"
echo "$files" >> filesNames.txt
fi
done
However this did not generate anything. I absolutely need it as a shell script because it will be part of a pipeline script, but I cannot seem to get it to work.
Here is the adjusted script giving me the no such file or directory comment. I know that the folder exists and have used it in commands that are run before this command.
#!/bin/bash
#Retrieve the base directory path
baseDir=$(dirname "$ini")
#Retrieve the reference genome path
ref=$(dirname "$genome")
#Create required directory structure
tested="$baseDir/tested"
MarkDups1="$baseDir/MarkDups1"
#don't create if already exists
[[ -d "tested" ]] || mkdir "$tested"
[[ -d "MarkDups1" ]] || mkdir "$MarkDups1"
#create a text file with all sorted and indexed bam files paths
#!/bin/bash
for x in $MarkDups1/*/;
do
(cd "$x"; ls > filesNames.txt)
done
The sequence to iterate over should be "$directory"/*/.
for x in "$directory"/*/; do
(cd "$x"
files=(*)
printf '%s\n' "${files[#]}" > filesNames.txt
)
done

linux zip and exclude dir via bash/shell script

I am trying to write a bash/shell script to zip up a specific folder and ignore certain sub-dirs in that folder.
This is the folder I am trying to zip "sync_test5":
My bash script generates an ignore list (based on) and calls the zip function like this:
#!/bin/bash
SYNC_WEB_ROOT_BASE_DIR="/home/www-data/public_html"
SYNC_WEB_ROOT_BACKUP_DIR="sync_test5"
SYNC_WEB_ROOT_IGNORE_DIR="dir_to_ignore dir2_to_ignore"
ignorelist=""
if [ "$SYNC_WEB_ROOT_IGNORE_DIR" != "" ];
then
for ignoredir in $SYNC_WEB_ROOT_IGNORE_DIR
do
ignorelist="$ignorelist $SYNC_WEB_ROOT_BACKUP_DIR/$ignoredir/**\*"
done
fi
FILE="$SYNC_BACKUP_DIR/$DATETIMENOW.website.zip"
cd $SYNC_WEB_ROOT_BASE_DIR;
zip -r $FILE $SYNC_WEB_ROOT_BACKUP_DIR -x $ignorelist >/dev/null
echo "Done"
Now this script runs without error, however it is not ignoring/excluding the dirs I've specified.
So, I had the shell script output the command it tried to run, which was:
zip -r 12-08-2014_072810.website.zip sync_test5 -x sync_test5/dir_to_ignore/**\* sync_test5/dir2_to_ignore/**\*
Now If I run the above command directly in putty like this, it works:
So, why doesn't my shell script exclude working as intended? the command that is being executed is identical (in shell and putty directly).
Because backslash quotings in a variable after word splitting are not evaluated.
If you have a='123\4', echo $a would give
123\4
But if you do it directly like echo 123\4, you'd get
1234
Clearly the arguments you pass with the variable and without the variables are different.
You probably just meant to not quote your argument with backslash:
ignorelist="$ignorelist $SYNC_WEB_ROOT_BACKUP_DIR/$ignoredir/***"
Btw, what actual works is a non-evaluated glob pattern:
zip -r 12-08-2014_072810.website.zip sync_test5 -x 'sync_test5/dir_to_ignore/***' 'sync_test5/dir2_to_ignore/***'
You can verify this with
echo zip -r 12-08-2014_072810.website.zip sync_test5 -x sync_test5/dir_to_ignore/**\* sync_test5/dir2_to_ignore/**\*
And this is my suggestion:
#!/bin/bash
SYNC_WEB_ROOT_BASE_DIR="/home/www-data/public_html"
SYNC_WEB_ROOT_BACKUP_DIR="sync_test5"
SYNC_WEB_ROOT_IGNORE_DIR=("dir_to_ignore" "dir2_to_ignore")
IGNORE_LIST=()
if [[ -n $SYNC_WEB_ROOT_IGNORE_DIR ]]; then
for IGNORE_DIR in "${SYNC_WEB_ROOT_IGNORE_DIR[#]}"; do
IGNORE_LIST+=("$SYNC_WEB_ROOT_BACKUP_DIR/$IGNORE_DIR/***") ## "$SYNC_WEB_ROOT_BACKUP_DIR/$IGNORE_DIR/*" perhaps is enough?
done
fi
FILE="$SYNC_BACKUP_DIR/$DATETIMENOW.website.zip" ## Where is $SYNC_BACKUP_DIR set?
cd "$SYNC_WEB_ROOT_BASE_DIR";
zip -r "$FILE" "$SYNC_WEB_ROOT_BACKUP_DIR" -x "${IGNORE_LIST[#]}" >/dev/null
echo "Done"
This is what I ended up with:
#!/bin/bash
# This script zips a directory, excluding specified files, types and subdirectories.
# while zipping the directory it excludes hidden directories and certain file types
[[ "`/usr/bin/tty`" == "not a tty" ]] && . ~/.bash_profile
DIRECTORY=$(cd `dirname $0` && pwd)
if [[ -z $1 ]]; then
echo "Usage: managed_directory_compressor /your-directory/ zip-file-name"
else
DIRECTORY_TO_COMPRESS=${1%/}
ZIPPED_FILE="$2.zip"
COMPRESS_IGNORE_FILE=("\.git" "*.zip" "*.csv" "*.json" "gulpfile.js" "*.rb" "*.bak" "*.swp" "*.back" "*.merge" "*.txt" "*.sh" "bower_components" "node_modules")
COMPRESS_IGNORE_DIR=("bower_components" "node_modules")
IGNORE_LIST=("*/\.*" "\.* "\/\.*"")
if [[ -n $COMPRESS_IGNORE_FILE ]]; then
for IGNORE_FILES in "${COMPRESS_IGNORE_FILE[#]}"; do
IGNORE_LIST+=("$DIRECTORY_TO_COMPRESS/$IGNORE_FILES/*")
done
for IGNORE_DIR in "${COMPRESS_IGNORE_DIR[#]}"; do
IGNORE_LIST+=("$DIRECTORY_TO_COMPRESS/$IGNORE_DIR/")
done
fi
zip -r "$ZIPPED_FILE" "$DIRECTORY_TO_COMPRESS" -x "${IGNORE_LIST[#]}" # >/dev/null
# echo zip -r "$ZIPPED_FILE" "$DIRECTORY_TO_COMPRESS" -x "${IGNORE_LIST[#]}" # >/dev/null
echo $DIRECTORY_TO_COMPRESS "compressed as" $ZIPPED_FILE.
fi
After a few trial and error, I have managed to fix this problem by changing this line:
ignorelist="$ignorelist $SYNC_WEB_ROOT_BACKUP_DIR/$ignoredir/**\*"
to:
ignorelist="$ignorelist $SYNC_WEB_ROOT_BACKUP_DIR/$ignoredir/***"
Not sure why this worked, but it does :)

Resources