Unzip all gz files in all subdirectories in the terminal - linux

Is there a way to unzip all gz files in the folder containing the zipfiles. When zip files are in subdirectories.
A query for
find -type f -name "*.gz"
Gives results like this:
./datasets/auto/auto.csv.gz
./datasets/prnn_synth/prnn_synth.csv.gz
./datasets/sleep/sleep.csv.gz
./datasets/mfeat-zernike/mfeat-zernike.csv.gz
./datasets/sonar/sonar.csv.gz
./datasets/wine-quality-white/wine-quality-white.csv.gz
./datasets/ring/ring.csv.gz
./datasets/diabetes/diabetes.csv.g

If you want, for each of those, to launch "gzip -d" on them:
cd theparentdir && gzip -d $(find ./ -type f -name '*.gz')
and then, to gzip them back:
cd theparentdir && gzip $(find ./ -type f -name '*.csv')
This will however choke in many cases
if filenames have some special characters (spaces, tabs, newline, etc) in them
other similar cases
or if there are TOO MANY files to be put after the gzip command!
A solution would be instead, if you have GNU find, to do :
find ... -print0 | xarsg -0 gzip -d # for the gunzip one, but still choke on files with "newline" in them
Another (arguably better?) solution, if you have GNU find at your disposal:
cd theparentdir && find ./ -type f -name '*.gz' -exec gzip -d '{}' '+'
and to re-zip all csv in that parentdir & all subdirs:
cd theparentdir && find ./ -type f -name '*.csv' -exec gzip '{}' '+'
"+" tells GNU find to try to put as many found files as it can on each gzip invocation (instead of doing 1 gzip incocation per file, very very ressource intensive and very innefficient and slow), similar to xargs, but with some benefits (1 command only, no pipe needed)

There is an option for recursivity (-r).
gzip -dr ./datasets
All archive will be decompressed in their own directory.
Example: gzip -dr ./a
a/b/c/test1.gz
a/b/d/test2.gz
a/e/test3.gz
After execution:
a/b/c/test1
a/b/d/test2
a/e/test3

Related

Statement that compress files older than X and after it removes old ones

Trying to do a bash script, that will compress files older than X, and after compressing removes uncompressed version. Tried something like this, but it doesn't work.
find /home/randomcat -mtime +11 -exec gzip {}\ | -exec rm;
By default, gzip will remove the uncompressed file (since it replaces it with the compressed variant). And you don't want it to run on anything else than a plain file (not on directories or devices, not on symbolic links).
So you want at least
find /home/randomcat -mtime +11 -type f -exec gzip {} \;
You could even want find(1) to avoid files with several hard links. And you might also want it to ask you before running the command. Then you could try:
find /home/randomcat -mtime +11 -type f -links 1 -ok gzip {} \;
The find command with -exec or -ok wants a semicolon (or a + sign), and you need to escape that semicolon ; from your shell. You could use ';' instead of \; to quote it...
If you use a + the find command will group several arguments (to a single gzip process), so will run less processes (but they would last longer). So you could try
find /home/randomcat -mtime +11 -type f -links 1 -exec gzip -v {} +
You may want to read more about globbing and how a shell works.
BTW, you don't need any command pipeline (as suggested by the wrong use of | in your question).
You could even consider using GNU parallel to run things in parallel, or feed some shell (with background jobs) with e.g.
find /home/randomcat -mtime +11 -type f -links 1 \
-exec printf "gzip %s &\n" {} \; | bash -x
but in practice you won't speed up a lot your processing.
find /home/randomcat -mtime +11 -exec gzip {} +
This bash script compresses the files you find with the "find command".Instead of generating new files in gzip format, convert the files to gzip format.Let's say you have three files named older than X. And their names are a,b,c.
After running find /home/randomcat -mtime +11 -exec gzip {} + command,
you will see a.gz b.gz c.gz instead of seeing a b c in /home/randomcat directory.
find /location/location -type f -ctime +15 -exec mv {} /location/backup_location \;
This will help you find all the files and move to backup folder

Find all files and unzip specific file to local folder

find -name archive.zip -exec unzip {} file.txt \;
This command finds all files named archive.zip and unzips file.txt to the folder that I execute the command from, is there a way to unzip the file to the same folder where the .zip file was found? I would like file.txt to be unzipped to folder1.
folder1\archive.zip
folder2\archive.zip
I realize $dirname is available in a script but I'm looking for a one line command if possible.
#iheartcpp - I successfully ran three alternatives using the same base command...
find . -iname "*.zip"
... which is used to provide the list of / to be passed as an argument to the next command.
Alternative 1: find with -exec + Shell Script (unzips.sh)
File unzips.sh:
#!/bin/sh
# This will unzip the zip files in the same directory as the zip are
for f in "$#" ; do
unzip -o -d `dirname $f` $f
done
Use this alternative like this:
find . -iname '*.zip' -exec ./unzips.sh {} \;
Alternative 2: find with | xargs _ Shell Script (unzips)
Same unzips.sh file.
Use this alternative like this:
find . -iname '*.zip' | xargs ./unzips.sh
Alternative 3: all commands in the same line (no .sh files)
Use this alternative like this:
find . -iname '*.zip' | xargs sh -c 'for f in $#; do unzip -o -d `dirname $f` $f; done;'
Of course, there are other alternatives but hope that the above ones can help.

cat files in subdirectories using linux commands

I have the following directories:
P922_101
P922_102
.
.
Each directory, for instance P922_101 has following subdirectories:
140311_AH8MHGADXX 140401_AH8CU4ADXX
Each subdirectory, for instance 140311_AH8MHGADXX has the following files:
1_140311_AH8MH_P922_101_1.fastq.gz 1_140311_AH8MH_P922_101_2.fastq.gz
2_140311_AH8MH_P922_101_1.fastq.gz 2_140311_AH8MH_P922_101_2.fastq.gz
And files in 140401_AH8CU4ADXX are:
1_140401_AH8CU_P922_101_1.fastq.gz 1_140401_AH8CU_P922_4001_2.fastq.gz
2_140401_AH8CU_P922_101_1.fastq.gz 2_140401_AH8CU_P922_4001_2.fastq.gz
I want to do 'cat' for the files in the subdirectories in the following way:
cat 1_140311_AH8MH_P922_101_1.fastq.gz 2_140311_AH8MH_P922_101_1.fastq.gz
1_140401_AH8CU_P922_101_1.fastq.gz 2_140401_AH8CU_P922_101_1.fastq.gz > P922_101_1.fastq.gz
which means that files ending with _1.fastq.gz should be concatenated into a single file and files ending with _2.fatsq.gz into another file.
It should be run for all files in subdirectories in all directories. Could someone give a linux solution to do this?
Since they're compressed, you should probably use gzip -dc (decompress and write to stdout) -
find /somePath -type f -name "*.fastq.gz" -exec gzip -dc {} \; | \
tee -a /someOutFolder/out.txt
You can use find for this:
find /top/path -mindepth 2 -type f -name "*_1.fastq.gz" -exec cat {} \; > one_file
find /top/path -mindepth 2 -type f -name "*_2.fastq.gz" -exec cat {} \; > another_file
This will look for all the files starting from /top/path and having a name matching the pattern _1.fastq.gz / _2.fastq.gz and cat them into the desired file. -mindepth 2 makes find look for files that are at least under the current directory; this way, files in /top/path won't be matched.
Note that you will probably need zcat instead of cat, for gz files.
As you keep adding details in comments, let's see what else we can do:
Say you have the list of directories in a file directories_list, each line containing one:
while read directory
do
find $directory -mindepth 2 -type f -name "*_1.fastq.gz" -exec cat {} \; > $directory/output
done < directories_list

Bash Script to find, process and rename files?

I am trying to put together a script which will run through all the files on my server (under various subdirectories) , look for .jpeg files and run them through a translator which converts them to non progressive jpgs.
I have:
find /home/disk2/ -type f -iname "*.jpg"
Which finds all the files.
Then if it finds for example 1.jpg, I need to run:
/usr/bin/jpegtrans /file location/1.jpg > /file location/1.jpg.temp
The jpegtrans app converts the file to a temp file which needs to replace the original file.
So then I need to delete the original and rename 1.jpg.temp to 1.jpg
rm /file location/1.jpg
mv /file location/1.jpg.temp /file location/1.jpg
I can easily do this for single files but i need to do it for 100's on my server.
Use find with -exec:
find /home/disk2/ -type f -iname "*.jpg" -exec sh -c "/usr/bin/jpegtrans {} > {}.temp; mv -f {}.temp {}" \;
EDIT: For handling spaces in filenames, say:
find /home/disk2/ -type f -iname "*.jpg" -exec sh -c "/usr/bin/jpegtrans '{}' > '{}.temp'; mv -f '{}.temp' '{}'" \;

unzip specific extension only

I have a a directory with zip archives containing .jpg, .png, .gif images. I want to unzip each archive taking the images only and putting them in a folder with the name of the archive.
So:
files/archive1.zip
files/archive2.zip
files/archive3.zip
files/archive4.zip
Open archive1.zip - take sunflower.jpg, rose_sun.gif. Make a folder files/archive1/ and add the images to that folder, so files/archive1/folder1.jpg, files/archive1/rose_sun.gif. Do this to each archive.
I really don't know how this can be done, all suggestions are welcome. I have over 600 archives and an automatic solution would be a lifesaver, preferably a linux solution.
In Short
You can do this with a one-liner find + unzip.
find . -name "*.zip" -type f -exec unzip -jd "images/{}" "{}" "*.jpg" "*.png" "*.gif" \;
In Detail
unzip allows you to specify the files you want:
unzip archive.zip "*.jpg" "*.png" "*.gif"
And -d a target directory:
unzip -d images/ archive.zip "*.jpg" "*.png" "*.gif"
Combine that with a find, and you can extract all the images in all zips:
find . -name "*.zip" -type f -exec unzip -d images/ {} "*.jpg" "*.png" "*.gif" \;
Using unzip -j to junk the extraction of the zip's internal directory structure, we can do it all in one command. This gives you the flat image list separated by zip name that you desire as a one-liner.
find . -name "*.zip" -type f -exec unzip -jd "images/{}" "{}" "*.jpg" "*.png" "*.gif" \;
A limitation is that unzip -d won't create more than one new level of directories, so just mkdir images first. Enjoy.
7zip can do this, and has a Linux version.
mkdir files/archive1
7z e -ofiles/archive1/ files/archive1.zip *.jpg *.png *.gif
(Just tested it, it works.)
Something along the lines of:
#!/bin/bash
cd ~/basedir/files
for file in *.zip ; do
newfile=$(echo "${file}" | sed -e 's/^files.//' -e 's/.zip$//')
echo ":${newfile}:"
mkdir tmp
rm -rf "${newfile}"
mkdir "${newfile}"
cp "${newfile}.zip" tmp
cd tmp
unzip "${newfile}.zip"
find . -name '*.jpg' -exec cp {} "../${newfile}" ';'
find . -name '*.gif' -exec cp {} "../${newfile}" ';'
cd ..
rm -rf tmp
done
This is tested and will handle spaces in filenames (both the zip files and the extracted files). You may have collisions if the zip file has the same file name in different directories (you can't avoid this if you're going to flatten the directory structure).
You can write a program using a zip library. If you do Mono, you can use DotNetZip.
The code would look like this:
foreach (var archive in listOfZips)
{
using (var zip = ZipFile.Read(archive)
{
foreach (ZipEntry e in zip)
{
if (IsImageFile(e.FileName))
{
e.FileName = System.IO.Path.Combine(archive.Replace(".zip",""),
System.IO.Path.GetFileName(e.FileName));
e.Extract("files");
}
}
}
}
Perl's Archive-Zip is a good library for zipping/unzipping.
Here's my take on the first answer...
#!/bin/bash
cd files
for zip_name in *.zip ; do
dir_name=$(echo "${zip_name}" | sed -e 's/^files.//' -e 's/.zip$//')
mkdir ${dir_name}
7z e -o${dir_name}/ ${zip_name} *.jpg *.png *.gif
done
or, if you'd just like to use the regular unzip command...
unzip -d ${dir_name}/ ${zip_name} *.jpg *.png *.gif
I haven't tested this, but it should work... or something along these lines. Definitely more efficient than the first solution. :)
Hope this helps!

Resources