How do I recursively unzip nested ZIP files? - linux

Given there is a secret file deep inside a nested ZIP file, i.e. a zip file inside a zip file inside a zip file, etc...
The zip files are named 1.zip, 2.zip, 3.zip, etc...
We don't know how deep the zip files are nested, but it may be thousands.
What would be the easiest way to loop through all of them up until the last one to read the secret file?
My initial approach would have been to call unzip recursively, but my Bash skills are limited. What are your ideas to solve this?

Thanks Cyrus! The master wizard Shawn J. Goff had the perfect script for this:
while [ "`find . -type f -name '*.zip' | wc -l`" -gt 0 ]; do find -type f -name "*.zip" -exec unzip -- '{}' \; -exec rm -- '{}' \;; done

Here's my 2 cents.
#!/bin/bash
function extract(){
unzip $1 -d ${1/.zip/} && eval $2 && cd ${1/.zip/}
for zip in `find . -maxdepth 1 -iname *.zip`; do
extract $zip 'rm $1'
done
}
extract '1.zip'

Probably not the cleanest way, but that should do the trick:
#!/bin/sh
IDX=1 # ID of your first zip file
while [ 42 ]
do
unzip $IDX.zip # Extract
if [[ $? != 0 ]]
then
break # Quit if unzip failed (no more files)
fi
if [ $IDX -ne 1 ]
then
rm $IDX.zip # Remove zip to leave your directory clean
fi
(( IDX ++ )) # Next file
done

Checkout this java based utility nzip for nested zips.
Extracting and compressing nested zips can be done easily using following commands:
java -jar nzip.jar -c list -s readme.zip
java -jar nzip.jar -c extract -s "C:\project\readme.zip" -t readme
java -jar nzip.jar -c compress -s readme -t "C:\project\readme.zip"
PS. I am the author and will be happy to fix any bugs quickly.

Here is a solution for windows assuming 7zip is installed in the default location.
#echo off
Setlocal EnableDelayedExpansion
Set source=%1
Set SELF=%~dpnx0
For %%Z in (!source!) do (
set FILENAME=%%~nxZ
)
set FILENAME=%FILENAME:"=%
"%PROGRAMFILES%\7-zip\7z.exe" x -o* -y "%FILENAME%"
REM DEL "%FILENAME%"
rem " This is just to satisfy stackoverflow code formatting!
For %%Z in (!source!) do (
set FILENAME=%%~nZ
)
for %%a in (zip rar jar z bz2 gz gzip tgz tar lha iso wim cab rpm deb) do (
forfiles /P ^"%FILENAME%^" /S /M *.%%a /C "cmd /c if #isdir==FALSE \"%SELF%\" #path"
)
This has been adapted from here https://social.technet.microsoft.com/Forums/ie/en-US/ccd7172b-85e3-4b4a-ad93-5902e0abd903/batch-file-extracting-all-files-from-nested-archives?forum=ITCG
Notes:
The only way to do variable modification using the ~ modifiers is to use a dummy for..in loop. If there is a better way please edit.
~nx modifies the variable to make it a full path+file name.
~dpnx also does the same thing to %0 i.e. gets the full path and filename of the script.
-o* in the 7zip command line allows 7zip to create folder names without the .zip extension like it does when extracting with a right click in the gui.
~n modifies the variable to make it a filename without an extension. i.e. drops the .zip
Note that the escape character (for quotes) in FORFILES /P is ^ (caret) while for the CMD /C it is \. This ensures that it handles path and filenames with spaces also recursively without any problem.
You can remove the REM from the DEL statement if you want the zip file to be deleted after unzipping.

Related

Recursively unzip all subdirectories while retaining file structure

I'm new to bash scripting, and i'm finding it hard to solve this one.
I have a parent folder containing a mixture of sub directories and zipped sub directories.
Within those sub directories are also more nested zip files.
Not only are there .zip files, but also .rar and .7z files which also contain nested zips/rars/7zs.
I want to unzip, unrar and un7z all my nested sub directories recursively until the parent folder no longer contains any .rar, .zip, .7zip files. (these eventually need to be removed when they have been extracted). There could be thousands of sub directories all at different nesting depths. You could have zipped folders or zipped files.
However I want to retain my folder structure, so the unzipped folders must stay in the same place where it has been unzipped
I have tried this script that works for unzipping, but it does not retain the file structure.
#!/bin/bash
while [ "`find . -type f -name '*.zip' | wc -l`" -gt 0 ]
do
find . -type f -name "*.zip" -exec unzip -- '{}' \; -exec rm -- '{}' \;
done
I want for example:
folder 'a' contain zipped folder 'b.zip' which contains a zipped text file pear.zip (which is pear.txt that has been zipped to pear.zip a/b.zip(/pear.zip))
I would like folder 'a' to contain 'b' to contain pear.txt 'a/b/pear.txt'
The script above brings 'b' (b is empty) and pear both into folder 'a' where the script is executed which is not what I want. eg 'a/b' and 'a/pear.txt'
You could try this:
#!/bin/bash
while :; do
mapfile -td '' archives \
< <(find . -type f -name '*.zip' -o -name '*.7z' -print0)
[[ ${#archives[#]} -eq 0 ]] && break
for i in "${archives[#]}"; do
case $i in
*.zip) unzip -d "$(dirname "$i")" -- "$i";;
*.7z) 7z x "-o$(dirname "$i")" -- "$i";;
esac
done
rm -rf "${archives[#]}" || break
done
Every archive is listed by find. That list is extracted in the correct location and the archives removed. This repeats, until zero archives are found.
You can add an equivalent unrar command (I'm not familiar with it).
Add -o -name '*.rar' to find, and another case to case. If there's no option to specify a target directory with unrar, you could use cd "$(dirname "$i")" && unrar "$i".
There are some issues with this script. In particular, if extraction fails, the archive is still removed. Otherwise it would cause an infinite loop. You can use unzip ... || exit 1 to exit if extraction fails, and deal with that manually.
It's possible to both avoid removal and also an infinite loop, by counting files which aren't removed, but hopefully not necessary.
I couldn't test this properly. YMMV.

Batch file convert to Linux script

due to migrating of batch job to Linux server I have problem finding the equivalent of the following commands in Linux:
Y drive is a map drive to the NAS drive which is also connected to Ubuntu server /NAS/CCTV . Need to search every sub folders for all .264 files
Z drive is on the Ubuntu server itself. Just move every .mp4 files here, no folder here. Path on Ubuntu is /Share/CCTV/
Its just a simple script to convert the cctv capture .264 format to mp4 and move to server to be process and delete off any h264 files and any folder thats older than 1 day, the script will schedule to run every 3 mins.
I have ffmpeg installed on the Ubuntu server, just unable to find the for each file in the folders to do the same.
Also for the last for files command that delete folder older than 1 days
FOR /r y:\ %%F in (*.h264) do c:\scripts\ffmpeg -i %%F %%F.mp4
FOR /r y:\ %%F in (*.h264) do del %%F
FOR /r y:\ %%G in (*.mp4) do move %%G Z:\
forfiles -p "Y:\" -d -1 -c "cmd /c IF #isdir == TRUE rd /S /Q #path"
Appreciate any forms of help or point me to the right guide so I can rewrite it on the Linux server. I did try to search for for loop but all show me to count number, maybe I search wrongly.
Find all .h264 files (recursively)
find /NAS/CCTV -type f -name '*.h264'
Convert all such files to .mp4
while IFS= read -d '' -r file ; do
ffmpeg -i "$file" "$file".mp4
done < <(find /NAS/CCTV -type f -name '*.h264' -print0)
Note that this will create files called like filename.h264.mp4. This matches your batch file behavior. If you would prefer to replace the extension use ffmpeg -i "$file" "${file%.*}".mp4 instead and you will get a name like filename.h264.
Also move those mp4 files to another directory
while IFS= read -d '' -r file ; do
ffmpeg -i "$file" "$file".mp4
if [[ -f $file.mp4 ]] ; then
mv -f -- "$file".mp4 /Share/CCTV
fi
done < <(find /NAS/CCTV -type f -name '*.h264' -print0)
Delete old directories (recursively)
find /NAS/CCTV -type d -not -newermt '1 day ago' -exec rm -rf {} +
Documentation.
The find command recursively lists files according to criteria you specify. Any time you need to deal with files in multiple directories or very large numbers of files it is probably what you want to use. For safety against malicious file names it's important to -print0 so file names are delimited by null rather than newline, which then requires using the IFS= read -d '' construct to interpret later.
The while read variable ; do ... done construct reads data from input and assigns each record to the named variable. This allows each matching file to be handled one at a time inside the loop. The insides of the loop should be fairly obvious.
Again find is used to select files, but in this case the files are directories. The switches -not -newer select files which are not newer (in other words, files which are older) according to their m time, the modification time, compared against t, which in this case means that the next argument is text describing a time. Here you can use any expression understood by GNU date's -d switch, so I can write in plain English and it will work as expected.
As you embark on your shell scripting journey you should keep two things by your side:
shellcheck - Always runs scripts you write through shellcheck to catch basic errors.
Bash FAQ - The bash FAQ at wooledge.org. Most of the answers to questions you have not thought of yet will be here. For example FAQ 15 is highly relevant to this question.
for f in /NAS/CCTV/*.h264; do ffmpeg -i "$f" "$f".mp4; done
rm /NAS/CCTV/*.h264
mv /NAS/CCTV/*.mp4 /Share/CCTV
find /NAS/CCTV/ -type d -ctime +1 -exec rm -rf {} \;

Extracting specific file types from all tar files from specific folder

I need shell script which accept two arguments.
First one is path to the specific folder and second one is int value (1 or 2).
If second argument is 1 then I have to go through all tar files in mentioned folder and extract just executable files into specific folder inside path from first argument. in this case name of that folder is "unpacked".
If second argument is 2 then I have to extract all *.txt files from all tar files from folder given by first argument.
I am trying something like this but don't know how to catch every tar file and extract one of these two file types.
#!/bin/bash
cd $1
if [$2 –eq 1 ]
then
for f in *.tar; do
tar –xv –f "$f" –-wildcards EXECUTABLE FILES -C ./unpacked
done
fi
if [$2 –eq 2 ]
then
for f in *.tar; do
tar –xv –f "$f" –-wildcards "*.txt" -C ./unpacked
done
fi
The [MEMBER...] argument must come last.
#!/bin/bash
cd $1
if [$2 –eq 1 ]
then
for f in *.tar; do
tar –xv –f "$f" –-wildcards -C ./unpacked EXECUTABLE FILES
done
fi
if [$2 –eq 2 ]
then
for f in *.tar; do
tar –xv –f "$f" –-wildcards -C ./unpacked "*.txt"
done
fi
To Extract specific files form a tar file execute in yor terminal:
$ tar -zxvf TARNAME.tar.gz PATH/FILNAME

Execute multiple commands on target files from find command

Let's say I have a bunch of *.tar.gz files located in a hierarchy of folders. What would be a good way to find those files, and then execute multiple commands on it.
I know if I just need to execute one command on the target file, I can use something like this:
$ find . -name "*.tar.gz" -exec tar xvzf {} \;
But what if I need to execute multiple commands on the target file? Must I write a bash script here, or is there any simpler way?
Samples of commands that need to be executed a A.tar.gz file:
$ tar xvzf A.tar.gz # assume it untars to folder logs
$ mv logs logs_A
$ rm A.tar.gz
Here's what works for me (thanks to Etan Reisner suggestions)
#!/bin/bash # the target folder (to search for tar.gz files) is parsed from command line
find $1 -name "*.tar.gz" -print0 | while IFS= read -r -d '' file; do # this does the magic of getting each tar.gz file and assign to shell variable `file`
echo $file # then we can do everything with the `file` variable
tar xvzf $file
# mv untar_folder $file.suffix # untar_folder is the name of folder after untar
rm $file
done
As suggested, the array way is unsafe if file name contained space(s), and also doesn't seem to work properly in this case.
Writing a shell script is probably easiest. Take a look at sh for loops. You could use the output of a find command in an array, and then loop over that array to perform a set of commands on each element.
For example,
arr=( $(find . -name "*.tar.gz" -print0) )
for i in "${arr[#]}"; do
# $i now holds each of the filenames output by find
tar xvzf $i
mv $i $i.suffix
rm $i
# etc., etc.
done

Recursively rename .jpg files in all subdirectories

I am on a Linux system and I am trying to rename all .jpg files in many subdirectories to sequential filenames, so all the jpeg files in each subdirectory are renamed 0001.jpg, 0002.jpg, etc. I have a 'rename' command that works in a single directory:
rename -n 's/.*/sprintf("%04d",$::iter++ +1).".jpg"/e' *.jpg
I am trying to use it like this:
for i in ls -D; do rename -n 's/.*/sprintf("%04d",$::iter++ +1).".jpg"/e' *.jpg; done
but for output I get this:
*.jpg renamed as 0001.jpg
for each subdirectory. What am I doing wrong?
You need to put the command in backticks (or use the $( ... ) bash syntax) in order
to iterate over its output. Also use the $i variable together with the *.jpg file
name pattern, e.g.
for i in `ls -D`
do
rename -n 's/.*/sprintf("%04d",$::iter++ +1).".jpg"/e' $i/*.jpg
done
however, for this scenario you want to iterate over all the subdirectories, and you are
better of using the find command:
for i in `find . -type d`; do rename ...
It seems to me you've forgot to change a current working directory so it should looks like
for i in *; do
[ -d "$i" ] || continue
pushd "$i"
# rename is here
popd
done

Resources