Why can't a directory compressed with gzip, bzip, bzip2, xz? - linux

Is there any possible way to compress a directory with GZip, BZip, BZip2, xz format. I'm building a command line tool(using bash) which I need these options to be included.

A command like
tar czf output.tar.gz yourdir/
should work.
c means that tar will create an archive
z means that the output will be compressed (using gzip)
the output filename is after f
at the end, you can specify any number of directories/files (space-separated)

To answer the "why" part of your question, it is because of the Unix philosophy of having many small tools that do their job well that you can string together, as opposed to one big tool that doesn't do anything well and is hard to make better. Your examples are a perfect illustration of this philosophy, where you have several compression tools to choose from, and it is easy to add a new compression tool to your tool box. The archiving part, turning a directory of files into a byte stream, is a different task that is its own tool that can be combined with any of those or any future compression tools.
The body of your question then asks "how". You use a pipe with tar, cpio, or pax. tar is the most common. You then name the file accordingly so the consumer of the file can tell what it is from the name. E.g. ending with .tar.gz. Like this:
tar cf - somedirectory | gzip > somedirectory.tar.gz
or
tar cf - somedirectory | xz > somedirectory.tar.xz
These tar up the directory into a byte stream, which is then piped to a compressor. The output of the compressor is then written to the file containing the compressed directory contents.
To decompress:
gzip -dc somedirectory.tar.gz | tar xf -
Here it is done in the reverse order to first decompress the file and feed the output of that to tar to extract the files and recreate the directory structure. The - means to put the archive to stdout or get the archive from stdin.
Having said all that stuff about how much better it is to have small tools that do their job well, this application of tar is so incredibly common that it is built into the tar options. So you can instead:
tar czf somedirectory.tar.gz somedirectory
tar cJf somedirectory.tar.xz somedirectory
tar will run the gzip or xz executables and pipe the data through them itself.
(J is a recent gnutar addition, so your tar may not have it.)

Related

How to untar `.tgz` directory, and gzip one of the extracted files in memory?

TL;DR
How can I untar a file .tgz, and then selectively gzip the output?
My extracted dir has a few text files and a .nii file. I'd like to gzip the later.
More details
First method would be to just do sequentially. However I'm dealing with a huge dataset (10k+ tar archives) stored on a BeeGFS file system and I was told it would be better to do it in memory instead in two steps, since BeeGFS doesn't like handling big directories like this.
Sequential method:
for tarfile in ${rootdir}/*.tgz; do
tarpath="${tarfile%.tgz}"
tar zxvf ${tarfile} # (1) untar directory
gzip ${tarpath}/*.nii # (2) gzip the .nii file
done
Is there a way to combine (1) and (2)? Or do you have any other tips on how to do this process effectively?
Thanks!
You can extract a single file from the archive (If you know the filename), and have tar write it to standard output instead of to a file with -O, and then compress that stream and redirect it to a file. Something like
tar xzOf "$tarfile" "$tarpath/foo.nii" | gzip -c > "$tarpath/foo.nii.gz"
You can then extract everything else in the archive with tar xzf "$tarfile" --exclude "*.nii"

GZip an entire directory

i used the following:
gzip -9 -c -r <some_directory> > directory.gz
how do i decompress this directory ?
I have tried
gunzip directory.gz
i am just left with a single file and not a directory structure.
As others have already mentioned, gzip is a file compression tool and not an archival tool. It cannot work with directories. When you run it with -r, it will find all files in a directory hierarchy and compress them, i.e. replacing path/to/file with path/to/file.gz. When you pass -c the gzip output is written to stdout instead of creating files. You have effectively created one big file which contains several gzip-compressed files.
Now, you could look for the gzip file header/magic number, which is 1f8b and then reconstruct your files manually.
The sensible thing to do now is to create backups (if you haven't already). Backups always help (especially with problems such as yours). Create a backup of your directory.gz file now. Then read on.
Fortunately, there's an easier way than manually reconstructing all files: using binwalk, a forensics utility which can be used to extract files from within other files. I tried it with a test file, which was created the same way as yours. Running binwalk -e file.gz will create a folder with all extracted files. It even manages to reconstruct the original file names. The hierarchy of the directories is probably lost. But at least you have your file contents and their names back. Good luck!
Remember: backups are essential.
(For completeness' sake: What you probably intended to run: tar czf directory.tar.gz directory and then tar xf directory.tar.gz)
gzip will compress 1+ files, though not meant to function like an archive utility. The posted cmd-line would yield N compressed file images concatenated to stdout, redirected to the named output file; unfortunately stuff like filenames and any dirs would not be recorded. A pair like this should work:
(create)
tar -czvf dir.tar.gz <some-dir>
(extract)
tar -xzvf dir.tar.gz

How to decompress tensorflow hub module in the terminal?

I want to download, decompress, and use a pretrained model from tensorflow-hub
After downloading I end up with a 1.tar.tar file, which I probably need to extract / decompress in order to be able to use it.
I can't wrap my head around how, I am working in a Linux terminal.
If your tar file is compressed using tar compression, use this command to decompress it. Make sure to be in the directory of the tar.tar file, it will decompress everything into the directory you are currently in.
$ tar xvzf 1.tar.tar
Where,
x: This option tells tar to extract the files.
v: The “v” stands for “verbose.” This option will list all of the files one by one in the archive.
z: The z option is very important and tells the tar command to uncompress the file.
f: This options tells tar that you are going to give it a file name to work with.
Nice to know:
A tarball is a group or archive of files that are bundled together using the tar command and have the .tar file extension.

Use of temporary files and memory when using tar to backup very _large_ files with compression

When backing up one or more _very_large_ files using tar with compression (-j or -z) how does GNU tar manage the use of temporary files and memory?
Does it backup and compress the files block by block, file by file, or some other way?
Is there a difference between the way the following two commands use temporary files and memory?
tar -czf data.tar.gz ./data/*
tar -cf - ./data/* | gzip > data.tar.gz
Thank you.
No temporary files are used by either command. tar works completely in a streaming fashion. Packaging and compressing are completely separated from each other and also done in a piping mechanism when using the -z or -j option (or similar).
For each file tar it puts into an archive, it computes a file info datagram which contains the file's path, its user, permissions, etc., and also its size. This needs to be known up front (that's why putting the output of a stream into a tar archive isn't easy without using a temp file). After this datagram, the plain contents of the file follows. Since its size is known and already part of the file info ahead, the end of the file is unambiguous. So after this, the next file in the archive can follow directly. In this process no temporary files are needed for anything.
This stream of bytes is given to any of the implemented compression algorithms which also do not create any temporary files. Here I'm going out on a limb a bit because I don't know all compression algorithms by heart, but all that I ever came in touch with do not create temporary files.

Use tar to archive the contents of your home directory in linux

I am trying to archive the contents of my home directory using tar and then compress the tar file with gzip. I know you can uncompress and unarchive the .tar.gz file using cat, tar and gzip. But , I don't know how to compress and archive.
Hey there here is a link for your question. a full guide
https://www.howtogeek.com/248780/how-to-compress-and-extract-files-using-the-tar-command-on-linux/
tar -czvf name-of-archive.tar.gz /path/to/directory-or-file
Here’s what those switches actually mean:
-c: Create an archive.
-z: Compress the archive with gzip.
-v: Display progress in the terminal while creating the archive, also known as “verbose” mode. The v is always optional in these commands, but it’s helpful.
-f: Allows you to specify the filename of the archive.

Resources