GZip an entire directory

GZip an entire directory - linux

i used the following:
gzip -9 -c -r <some_directory> > directory.gz
how do i decompress this directory ?
I have tried
gunzip directory.gz
i am just left with a single file and not a directory structure.

As others have already mentioned, gzip is a file compression tool and not an archival tool. It cannot work with directories. When you run it with -r, it will find all files in a directory hierarchy and compress them, i.e. replacing path/to/file with path/to/file.gz. When you pass -c the gzip output is written to stdout instead of creating files. You have effectively created one big file which contains several gzip-compressed files.
Now, you could look for the gzip file header/magic number, which is 1f8b and then reconstruct your files manually.
The sensible thing to do now is to create backups (if you haven't already). Backups always help (especially with problems such as yours). Create a backup of your directory.gz file now. Then read on.
Fortunately, there's an easier way than manually reconstructing all files: using binwalk, a forensics utility which can be used to extract files from within other files. I tried it with a test file, which was created the same way as yours. Running binwalk -e file.gz will create a folder with all extracted files. It even manages to reconstruct the original file names. The hierarchy of the directories is probably lost. But at least you have your file contents and their names back. Good luck!
Remember: backups are essential.
(For completeness' sake: What you probably intended to run: tar czf directory.tar.gz directory and then tar xf directory.tar.gz)

gzip will compress 1+ files, though not meant to function like an archive utility. The posted cmd-line would yield N compressed file images concatenated to stdout, redirected to the named output file; unfortunately stuff like filenames and any dirs would not be recorded. A pair like this should work:
(create)
tar -czvf dir.tar.gz <some-dir>
(extract)
tar -xzvf dir.tar.gz

Related

Bash Scripting with xargs to BACK UP files

I need to copy a file from multiple locations to the BACK UP directory by retaining its directory structure. For example, I have a file "a.txt" at the following locations /a/b/a.txt /a/c/a.txt a/d/a.txt a/e/a.txt, I now need to copy this file from multiple locations to the backup directory /tmp/backup. The end result should be:
when i list /tmp/backup/a --> it should contain /b/a.txt /c/a.txt /d/a.txt & /e/a.txt.
For this, I had used the command: echo /a/*/a.txt | xargs -I {} -n 1 sudo cp --parent -vp {} /tmp/backup. This is throwing the error "cp: cannot stat '/a/b/a.txt /a/c/a.txt a/d/a.txt a/e/a.txt': No such file or directory"
-I option is taking the complete input from echo instead of individual values (like -n 1 does). If someone can help debug this issue that would be very helpful instead of providing an alternative command.

Use rsync with the --relative (-R) option to keep (parts of) the source paths.
I've used a wildcard for the source to match your example command rather than the explicit list of directories mentioned in your question.
rsync -avR /a/*/a.txt /tmp/backup/

Do the backups need to be exactly the same as the originals? In most cases, I'd prefer a little compression. [tar](https://man7.org/linux/man-pages/man1/tar.1.html) does a great job of bundling things including the directory structure.
tar cvzf /path/to/backup/tarball.tgz /source/path/
tar can't update compressed archives, so you can skip the compression
tar uf /path/to/backup/tarball.tar /source/path/
This gives you versioning of a sort, as if only updates changed files, but keeps the before and after versions, both.
If you have time and cycles and still want the compression, you can decompress before and recompress after.

Use of temporary files and memory when using tar to backup very _large_ files with compression

When backing up one or more _very_large_ files using tar with compression (-j or -z) how does GNU tar manage the use of temporary files and memory?
Does it backup and compress the files block by block, file by file, or some other way?
Is there a difference between the way the following two commands use temporary files and memory?
tar -czf data.tar.gz ./data/*
tar -cf - ./data/* | gzip > data.tar.gz
Thank you.

No temporary files are used by either command. tar works completely in a streaming fashion. Packaging and compressing are completely separated from each other and also done in a piping mechanism when using the -z or -j option (or similar).
For each file tar it puts into an archive, it computes a file info datagram which contains the file's path, its user, permissions, etc., and also its size. This needs to be known up front (that's why putting the output of a stream into a tar archive isn't easy without using a temp file). After this datagram, the plain contents of the file follows. Since its size is known and already part of the file info ahead, the end of the file is unambiguous. So after this, the next file in the archive can follow directly. In this process no temporary files are needed for anything.
This stream of bytes is given to any of the implemented compression algorithms which also do not create any temporary files. Here I'm going out on a limb a bit because I don't know all compression algorithms by heart, but all that I ever came in touch with do not create temporary files.

Why can't a directory compressed with gzip, bzip, bzip2, xz?

Is there any possible way to compress a directory with GZip, BZip, BZip2, xz format. I'm building a command line tool(using bash) which I need these options to be included.

A command like
tar czf output.tar.gz yourdir/
should work.
c means that tar will create an archive
z means that the output will be compressed (using gzip)
the output filename is after f
at the end, you can specify any number of directories/files (space-separated)

To answer the "why" part of your question, it is because of the Unix philosophy of having many small tools that do their job well that you can string together, as opposed to one big tool that doesn't do anything well and is hard to make better. Your examples are a perfect illustration of this philosophy, where you have several compression tools to choose from, and it is easy to add a new compression tool to your tool box. The archiving part, turning a directory of files into a byte stream, is a different task that is its own tool that can be combined with any of those or any future compression tools.
The body of your question then asks "how". You use a pipe with tar, cpio, or pax. tar is the most common. You then name the file accordingly so the consumer of the file can tell what it is from the name. E.g. ending with .tar.gz. Like this:
tar cf - somedirectory | gzip > somedirectory.tar.gz
or
tar cf - somedirectory | xz > somedirectory.tar.xz
These tar up the directory into a byte stream, which is then piped to a compressor. The output of the compressor is then written to the file containing the compressed directory contents.
To decompress:
gzip -dc somedirectory.tar.gz | tar xf -
Here it is done in the reverse order to first decompress the file and feed the output of that to tar to extract the files and recreate the directory structure. The - means to put the archive to stdout or get the archive from stdin.
Having said all that stuff about how much better it is to have small tools that do their job well, this application of tar is so incredibly common that it is built into the tar options. So you can instead:
tar czf somedirectory.tar.gz somedirectory
tar cJf somedirectory.tar.xz somedirectory
tar will run the gzip or xz executables and pipe the data through them itself.
(J is a recent gnutar addition, so your tar may not have it.)

Compress files in a directory in a zip file with the shell

I want to compress files from the filesystem to a directory within a new zip archive or update an old one. So here is my example:
directory/
|-file1.ext
|-file2.ext
|-file3.ext
in the zip archive it should look like this:
new_directory/
|-file1.ext
|-file2.ext
|-file3.ext
I could copy the files to a new directory and compress them from there but i do not want to have that extra step. I haven't found an answer to that problem on google. The man page doesn't mention anything like that aswell so I hope somebody can help me.

I don't think the zip utility supports this sort of transformation. A workaround is to use a symbolic link:
ln -s directory new_directory
zip -r foo.zip new_directory
rm new_directory
If other archive formats are an option for you, then this would be a bit easier with a tar archive, since GNU tar has a --transform option taking a sed command that it applies to the file name before adding the file to the archive.
In that case, you could use (for example)
tar czf foo.tar.gz --transform 's:^directory/:new_directory/:' directory

How to update tar (NOT append)

I want to update an existing tar file with newer files.
At GNU, I read:
4.2.3 Updating an Archive
In the previous section, you learned how to use ‘--append’ to add a
file to an existing archive. A related operation is ‘--update’ (‘-u’).
The ‘--update’ operation updates a tar archive by comparing the date
of the specified archive members against the date of the file with the
same name. If the file has been modified more recently than the
archive member, then the newer version of the file is added to the
archive (as with ‘--append’).
However,
When I run my tar update command, the files are appended even though their modification dates are exactly the same. I want to ONLY append where modification dates of files to be tarred are newer than those already in the tar...
tar -uf ./tarfile.tar /localdirectory/ >/dev/null 2>&1
Currently, every time I update, the tar doubles in size...

The update you describe implies that the file within the archive is replaced. If the new copy is smaller than what's in the archive, it could be directly rewritten. If the new copy however is larger, tar would have to zero the existing archive entry and append. Such updates would leave runs of '\0's or other unused bytes, so any normal computer user would want that such sections are removed, which would be done by "moving up" bytes comprising the archive contents towards the start of the file (think C's memmove).
Such an in-place move operation however, which would involve seek-read-seek-write cycles, is costly, especially when you look at it in the context of tapes — which tar was designed for originally —, i.e. devices with a seek performance that is not comparable to harddisks. You'd wear out the tape rather quickly with such move ops. Oh and of course, WORM devices don't support this move op either.

If you do not want to use the "-P" switch tar -u... works correctly if the current directory is the parent directory of the one we are going to update, and the path to this directory in the tar command will not be an absolute path.
For exmple:
We want to update catalog /home/blabla/Dir. We do it like that:
cd /home/blabla
tar -u -f tarfile.tar Dir
In general, the update must be made from the same place as the creation, so that the paths agree.
It is also possible:
cd /home/blabla/Dir
tar-u -f /path/to/tarfile.tar .

You may simply create (instead of update) the archive each time:
tar -cvpf tarfile.tar *
This will solve the problem of your archive doubling in size each time. But of course, it is generating the whole archive every time.

By default tar strips the leading / from member names, but it does this after deciding what needs to be updated.
Therefore if you are archiving an absolute path, you either need to cd / and use relative paths, or add the -P/--absolute-names option.
cd /
tar -uf "$OLDPWD/tarfile.tar" localdirectory/ >/dev/null 2>&1
tar -cPf tarfile.tar /localdirectory/ >/dev/null 2>&1
tar -uPf tarfile.tar /localdirectory/ >/dev/null 2>&1
However, the updated items will still be appended. A tar (tape archive) file cannot be modified excpet by appending.

Warning! When speaking about "dates" it means any date, and that includes the access time.
Should your files have been accessed in any such way (a simple ls -l is enough) then tar is right to do what it does!
You need to find another way to do what you want. Probably use a sentinel file and see if its modification date is less than the files you wish to append.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string