Extract and delete all .gz in a directory- Linux

Extract and delete all .gz in a directory- Linux - linux

I have a directory. It has about 500K .gz files.
How can I extract all .gz in that directory and delete the .gz files?

This should do it:
gunzip *.gz

#techedemic is correct but is missing '.' to mention the current directory, and this command go throught all subdirectories.
find . -name '*.gz' -exec gunzip '{}' \;

There's more than one way to do this obviously.
# This will find files recursively (you can limit it by using some 'find' parameters.
# see the man pages
# Final backslash required for exec example to work
find . -name '*.gz' -exec gunzip '{}' \;
# This will do it only in the current directory
for a in *.gz; do gunzip $a; done
I'm sure there's other ways as well, but this is probably the simplest.
And to remove it, just do a rm -rf *.gz in the applicable directory

Extract all gz files in current directory and its subdirectories:
find . -name "*.gz" | xargs gunzip

If you want to extract a single file use:
gunzip file.gz
It will extract the file and remove .gz file.

for foo in *.gz
do
tar xf "$foo"
rm "$foo"
done

Try:
ls -1 | grep -E "\.tar\.gz$" | xargs -n 1 tar xvfz
Then Try:
ls -1 | grep -E "\.tar\.gz$" | xargs -n 1 rm
This will untar all .tar.gz files in the current directory and then delete all the .tar.gz files. If you want an explanation, the "|" takes the stdout of the command before it, and uses that as the stdin of the command after it. Use "man command" w/o the quotes to figure out what those commands and arguments do. Or, you can research online.

Related

Recursively delete all binary files in folder

I want to recursively delete all binary files in a folder under linux using the command-line or a bash script. I found
grep -r -m 1 "^" path/to/folder | grep "^Binary file"
to list all binary files in path/to/folder at How to list all binary file extensions within a directory tree?. I would now like to delete all these files.
I could do
grep -r -m 1 "^" path/to/folder | grep "^Binary file" | xargs rm
but that is rather fishy as it also tries to delete the files 'Binary', 'file', and 'matches' as in
rm: cannot remove ‘Binary’: No such file or directory
rm: cannot remove ‘file’: No such file or directory
rm: cannot remove ‘matches’: No such file or directory
The question is thus how do I delete those files correctly ?

This command will return all binary executable files recursively within a directory, run this first to ensure proper output.
find . -type f -executable -exec sh -c "file -i '{}' | grep -q 'x-executable; charset=binary'" \; -print
If that works you can pass the output to xargs to delete these files.
find . -type f -executable -exec sh -c "file -i '{}' | grep -q 'x-executable; charset=binary'" \; -print | xargs rm -f
Hope this helped, have an awesome day! :)

I coded a tool, called blobs, that lists runable binaries.
Its readme mentions how to pipe to any other command.

This should do the job, if you are deleting a lot of binrary files in a folder.
find . -type f -executable | xargs rm

for loops returned with a ">"

I would like to extract all the tar.gz files in a directory.
I have searched in the internet and used the command:
for i in *.tar.gz; do tar -xzvf $i; done
However, the terminal returned with a > at the following line.
Can somebody enlighten me with what is the situation here? I would really appreciate your help.

Does your tar.gz files have spaces or special charas in their names? just a guess, try this:
for i in *.tar.gz; do tar -xzvf "$i"; done

I think you might need to specify the output folder for your zip files.
This is what the > stands for normally.
for i in *.tar.gz; do tar xvzf $i -C path/to/output/directory; done
This did the trick when i used it some weeks ago.

Hmm your solution should work in my opinion. You can try something simpler
ls *.tar.gz | xargs -L 1 tar -xzvf

Try this
find -name '*.tar.*' | xargs -I% tar -xvf %

No one suggested find -print0 | xargs -0 for filenames with spaces, here it is:
/tmp/foo$ ls -1
a.tar.gz
b.tar.gz
c and d.tar.gz
/tmp/foo$ find . -name "*.tar.gz" -print0 | xargs -0 -I{} tar -xzvf {}
a
c
d
b
/tmp/foo$ ls -1
a
a.tar.gz
b
b.tar.gz
c
c and d.tar.gz
d
Your solution will fail on my input data because of the file "c and d.tar.gz" which contains a space but maybe that's not the error you experienced. The one from A.M.D will work.

Unzipping from a folder of unknown name?

I have a bunch of zip files, and I'm trying to make a bash script to automate the unzipping of certain files from it.
Things is, although I know the name of the file I want, I don't know the name of the folder it's in; it is one folder depth in
How can I extract these files, preferably discarding the folder?

Here's how to unzip any given file at any depth and junk the folder paths on the way out:
unzip -j somezip.zip *somefile.txt
The -j junks any folder structure in the zip file and the asterisk gives a wildcard to match along any path.

if you're in:
some_directory/
and the zip files are in any number of subdirectories, say:
some_directory/foo
find ./ -name myfile.zip -exec unzip {} -d /directory \;
Edit: As for the second part, removing the directory that contained the zip file I assume?
find ./ -name myfile.zip -exec unzip {} -d /directory \; -exec echo rm -rf `dirname {}` \;
Notice the "echo." That's a sanity check. I always echo first when executing something destructive like rm -rf in a loop/iterative sequence like this. Good luck!

Have you tried unzip somefile.zip "*/blah.txt"?

You can use find to find the file that you need to unzip, and xargs to call unzip:
find /path/to/root/ -name 'zipname.zip' -print0 | xargs -0 unzip
print0 enables the command to work with files or paths that have white space in them. -0 is the option to xargs that makes it work with print0.

How to delete all files that were recently created in a directory in linux?

I untarred something into a directory that already contained a lot of things. I wanted to untar into a separate directory instead. Now there are too many files to distinguish between. However the files that I have untarred have been created just now (right ?) and the original files haven’t been modified for long (at least a day). Is there a way to delete just these untarred files based on their creation information ?

Tar usually restores file timestamps, so filtering by time is not likely to work.
If you still have the tar file, you can use it to delete what you unpacked with something like:
tar tf file.tar --quoting-style=shell-always |xargs rm -i
The above will work in most cases, but not all (filenames that have a carriage return in them will break it), so be careful.
You could remove the directories by adding -r to that, but it's probably safer to just remove the toplevel directories manually.

find . -mtime -1 -type f | xargs rm
but test first with
find . -mtime -1 -type f | xargs echo

There are several different answers to this question in order of increasing complexity.
First, if this is a one off, and in this particular instance you are absolutely sure that there are no weird characters in your filenames (spaces are OK, but not tabs, newlines or other control characters, nor unicode characters) this will work:
tar -tf file.tar | egrep '^(\./)?[^/]+(/)?$' | egrep -v '^\./$' | tr '\n' '\0' | xargs -0 rm -r
All that egrepping is to skip out on all the subdirectories of the subdirectories.
Another way to do this that works with funky filenames is this:
mkdir foodir
cd foodir
tar -xf ../file.tar
for file in *; do rm -rf ../"$file"; done
That will create a directory in which your archive has been expanded, but it sounds like you wanted that already anyway. It also will not handle any files who's names start with ..
To make that method work with files that start with ., do this:
mkdir foodir
cd foodir
tar -xf ../file.tar
find . -mindepth 1 -maxdepth 1 -print0 | xargs -0 sh -c 'for file in "$#"; do rm -rf ../"$file"; done' junk
Lastly, taking from Mat's answer, you can do this and it will work for any filename and not require you to untar the directory again:
tar -tf file.tar | egrep '^(\./)?[^/]+(/)?$' | grep -v '^\./$' | tr '\n' '\0' | xargs -0 bash -c 'for fname in "$#"; do fname="$(echo -ne "$fname")"; echo -n "$fname"; echo -ne "\0"; done' junk | xargs -0 rm -r

You can handle files and directories in one pass with:
tar -tf ../test/bob.tar --quoting-style=shell-always | sed -e "s/^\(.*\/\)'$/rmdir \1'/; t; s/^\(.*\)$/rm \1/;" | sort | bash
You can see what is going to happen leave off the pipe to 'bash'
tar -tf ../test/bob.tar --quoting-style=shell-always | sed -e "s/^\(.*\/\)'$/rmdir \1'/; t; s/^\(.*\)$/rm \1/;" | sort
to handle filenames with linefeeds you need more processing.

Find files and tar them (with spaces)

Alright, so simple problem here. I'm working on a simple back up code. It works fine except if the files have spaces in them. This is how I'm finding files and adding them to a tar archive:
find . -type f | xargs tar -czvf backup.tar.gz
The problem is when the file has a space in the name because tar thinks that it's a folder. Basically is there a way I can add quotes around the results from find? Or a different way to fix this?

Use this:
find . -type f -print0 | tar -czvf backup.tar.gz --null -T -
It will:
deal with files with spaces, newlines, leading dashes, and other funniness
handle an unlimited number of files
won't repeatedly overwrite your backup.tar.gz like using tar -c with xargs will do when you have a large number of files
Also see:
GNU tar manual
How can I build a tar from stdin?, search for null

There could be another way to achieve what you want. Basically,
Use the find command to output path to whatever files you're looking for. Redirect stdout to a filename of your choosing.
Then tar with the -T option which allows it to take a list of file locations (the one you just created with find!)
find . -name "*.whatever" > yourListOfFiles
tar -cvf yourfile.tar -T yourListOfFiles

Try running:
find . -type f | xargs -d "\n" tar -czvf backup.tar.gz

Why not:
tar czvf backup.tar.gz *
Sure it's clever to use find and then xargs, but you're doing it the hard way.
Update: Porges has commented with a find-option that I think is a better answer than my answer, or the other one: find -print0 ... | xargs -0 ....

If you have multiple files or directories and you want to zip them into independent *.gz file you can do this. Optional -type f -atime
find -name "httpd-log*.txt" -type f -mtime +1 -exec tar -vzcf {}.gz {} \;
This will compress
httpd-log01.txt
httpd-log02.txt
to
httpd-log01.txt.gz
httpd-log02.txt.gz

Would add a comment to #Steve Kehlet post but need 50 rep (RIP).
For anyone that has found this post through numerous googling, I found a way to not only find specific files given a time range, but also NOT include the relative paths OR whitespaces that would cause tarring errors. (THANK YOU SO MUCH STEVE.)
find . -name "*.pdf" -type f -mtime 0 -printf "%f\0" | tar -czvf /dir/zip.tar.gz --null -T -
. relative directory
-name "*.pdf" look for pdfs (or any file type)
-type f type to look for is a file
-mtime 0 look for files created in last 24 hours
-printf "%f\0" Regular -print0 OR -printf "%f" did NOT work for me. From man pages:
This quoting is performed in the same way as for GNU ls. This is not the same quoting mechanism as the one used for -ls and -fls. If you are able to decide what format to use for the output of find then it is normally better to use '\0' as a terminator than to use newline, as file names can contain white space and newline characters.
-czvf create archive, filter the archive through gzip , verbosely list files processed, archive name
Edit 2019-08-14:
I would like to add, that I was also able to use essentially use the same command in my comment, just using tar itself:
tar -czvf /archiveDir/test.tar.gz --newer-mtime=0 --ignore-failed-read *.pdf
Needed --ignore-failed-read in-case there were no new PDFs for today.

Why not give something like this a try: tar cvf scala.tar `find src -name *.scala`

Another solution as seen here:
find var/log/ -iname "anaconda.*" -exec tar -cvzf file.tar.gz {} +

The best solution seem to be to create a file list and then archive files because you can use other sources and do something else with the list.
For example this allows using the list to calculate size of the files being archived:
#!/bin/sh
backupFileName="backup-big-$(date +"%Y%m%d-%H%M")"
backupRoot="/var/www"
backupOutPath=""
archivePath=$backupOutPath$backupFileName.tar.gz
listOfFilesPath=$backupOutPath$backupFileName.filelist
#
# Make a list of files/directories to archive
#
echo "" > $listOfFilesPath
echo "${backupRoot}/uploads" >> $listOfFilesPath
echo "${backupRoot}/extra/user/data" >> $listOfFilesPath
find "${backupRoot}/drupal_root/sites/" -name "files" -type d >> $listOfFilesPath
#
# Size calculation
#
sizeForProgress=`
cat $listOfFilesPath | while read nextFile;do
if [ ! -z "$nextFile" ]; then
du -sb "$nextFile"
fi
done | awk '{size+=$1} END {print size}'
`
#
# Archive with progress
#
## simple with dump of all files currently archived
#tar -czvf $archivePath -T $listOfFilesPath
## progress bar
sizeForShow=$(($sizeForProgress/1024/1024))
echo -e "\nRunning backup [source files are $sizeForShow MiB]\n"
tar -cPp -T $listOfFilesPath | pv -s $sizeForProgress | gzip > $archivePath

Big warning on several of the solutions (and your own test) :
When you do : anything | xargs something
xargs will try to fit "as many arguments as possible" after "something", but then you may end up with multiple invocations of "something".
So your attempt: find ... | xargs tar czvf file.tgz
may end up overwriting "file.tgz" at each invocation of "tar" by xargs, and you end up with only the last invocation! (the chosen solution uses a GNU -T special parameter to avoid the problem, but not everyone has that GNU tar available)
You could do instead:
find . -type f -print0 | xargs -0 tar -rvf backup.tar
gzip backup.tar
Proof of the problem on cygwin:
$ mkdir test
$ cd test
$ seq 1 10000 | sed -e "s/^/long_filename_/" | xargs touch
# create the files
$ seq 1 10000 | sed -e "s/^/long_filename_/" | xargs tar czvf archive.tgz
# will invoke tar several time as it can'f fit 10000 long filenames into 1
$ tar tzvf archive.tgz | wc -l
60
# in my own machine, I end up with only the 60 last filenames,
# as the last invocation of tar by xargs overwrote the previous one(s)
# proper way to invoke tar: with -r (which append to an existing tar file, whereas c would overwrite it)
# caveat: you can't have it compressed (you can't add to a compressed archive)
$ seq 1 10000 | sed -e "s/^/long_filename_/" | xargs tar rvf archive.tar #-r, and without z
$ gzip archive.tar
$ tar tzvf archive.tar.gz | wc -l
10000
# we have all our files, despite xargs making several invocations of the tar command
Note: that behavior of xargs is a well know diccifulty, and it is also why, when someone wants to do :
find .... | xargs grep "regex"
they intead have to write it:
find ..... | xargs grep "regex" /dev/null
That way, even if the last invocation of grep by xargs appends only 1 filename, grep sees at least 2 filenames (as each time it has: /dev/null, where it won't find anything, and the filename(s) appended by xargs after it) and thus will always display the file names when something maches "regex". Otherwise you may end up with the last results showing matches without a filename in front.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Extract and delete all .gz in a directory- Linux - linux

I have a directory. It has about 500K .gz files. How can I extract all .gz in that directory and delete the .gz files?

This should do it: gunzip *.gz

#techedemic is correct but is missing '.' to mention the current directory, and this command go throught all subdirectories. find . -name '*.gz' -exec gunzip '{}' \;

Extract all gz files in current directory and its subdirectories: find . -name "*.gz" | xargs gunzip

If you want to extract a single file use: gunzip file.gz It will extract the file and remove .gz file.

for foo in *.gz do tar xf "$foo" rm "$foo" done

Related

Recursively delete all binary files in folder

for loops returned with a ">"

Unzipping from a folder of unknown name?

How to delete all files that were recently created in a directory in linux?

Find files and tar them (with spaces)

Categories

Resources