How to untar `.tgz` directory, and gzip one of the extracted files in memory? - linux

TL;DR
How can I untar a file .tgz, and then selectively gzip the output?
My extracted dir has a few text files and a .nii file. I'd like to gzip the later.
More details
First method would be to just do sequentially. However I'm dealing with a huge dataset (10k+ tar archives) stored on a BeeGFS file system and I was told it would be better to do it in memory instead in two steps, since BeeGFS doesn't like handling big directories like this.
Sequential method:
for tarfile in ${rootdir}/*.tgz; do
tarpath="${tarfile%.tgz}"
tar zxvf ${tarfile} # (1) untar directory
gzip ${tarpath}/*.nii # (2) gzip the .nii file
done
Is there a way to combine (1) and (2)? Or do you have any other tips on how to do this process effectively?
Thanks!

You can extract a single file from the archive (If you know the filename), and have tar write it to standard output instead of to a file with -O, and then compress that stream and redirect it to a file. Something like
tar xzOf "$tarfile" "$tarpath/foo.nii" | gzip -c > "$tarpath/foo.nii.gz"
You can then extract everything else in the archive with tar xzf "$tarfile" --exclude "*.nii"

Related

GZip an entire directory

i used the following:
gzip -9 -c -r <some_directory> > directory.gz
how do i decompress this directory ?
I have tried
gunzip directory.gz
i am just left with a single file and not a directory structure.
As others have already mentioned, gzip is a file compression tool and not an archival tool. It cannot work with directories. When you run it with -r, it will find all files in a directory hierarchy and compress them, i.e. replacing path/to/file with path/to/file.gz. When you pass -c the gzip output is written to stdout instead of creating files. You have effectively created one big file which contains several gzip-compressed files.
Now, you could look for the gzip file header/magic number, which is 1f8b and then reconstruct your files manually.
The sensible thing to do now is to create backups (if you haven't already). Backups always help (especially with problems such as yours). Create a backup of your directory.gz file now. Then read on.
Fortunately, there's an easier way than manually reconstructing all files: using binwalk, a forensics utility which can be used to extract files from within other files. I tried it with a test file, which was created the same way as yours. Running binwalk -e file.gz will create a folder with all extracted files. It even manages to reconstruct the original file names. The hierarchy of the directories is probably lost. But at least you have your file contents and their names back. Good luck!
Remember: backups are essential.
(For completeness' sake: What you probably intended to run: tar czf directory.tar.gz directory and then tar xf directory.tar.gz)
gzip will compress 1+ files, though not meant to function like an archive utility. The posted cmd-line would yield N compressed file images concatenated to stdout, redirected to the named output file; unfortunately stuff like filenames and any dirs would not be recorded. A pair like this should work:
(create)
tar -czvf dir.tar.gz <some-dir>
(extract)
tar -xzvf dir.tar.gz

compress dir to same size split files,depressed files individual?

I need to transfer a 1G folder over the network, and the data formats I send and receive are all under my control. In order to speed up the reception of data, I do this now:
Before transferring, compress the 1G folder and then transfer it.
after download all,decompress it.
It can reduce some time because the amount of data transferred becomes smaller, but it also requires time for decompression. Is it possible to compress a folder into many files of the same size, download one file and decompress one file, and when all files are decompressed, it will be the initial folder? my question is:
Can this be achieved?
How can I uncompressing file while downloading it?
how to reduce the download and uncompress time?
Suppose original folder is in the HOME directory. Make a tar archive of the 1G folder using a commands like,
cd;
mkdir tmp;
tar -cvzf tmp/original-folder.tar.gz original-folder
Split the tar archive into small files like, xaa, xab, ..., using the split command, e.g.,
split --bytes=100000000 original-folder.tar.gz
Join the pieces at the destination with the cat command,
cat xaa xab xac [...] > transferred-folder.tar.gz
Untar to get the folder
tar -xvzf transferred-folder.tar.gz

Use tar to archive the contents of your home directory in linux

I am trying to archive the contents of my home directory using tar and then compress the tar file with gzip. I know you can uncompress and unarchive the .tar.gz file using cat, tar and gzip. But , I don't know how to compress and archive.
Hey there here is a link for your question. a full guide
https://www.howtogeek.com/248780/how-to-compress-and-extract-files-using-the-tar-command-on-linux/
tar -czvf name-of-archive.tar.gz /path/to/directory-or-file
Here’s what those switches actually mean:
-c: Create an archive.
-z: Compress the archive with gzip.
-v: Display progress in the terminal while creating the archive, also known as “verbose” mode. The v is always optional in these commands, but it’s helpful.
-f: Allows you to specify the filename of the archive.

extract good files from corrupt tar archive

I have a large tar archive with many xml files in it. a couple of xml files in this archive are corrupt. How can I extract the good files without the program exiting?
There is a tar file within the gz
tar zxf myFile.gz
tar: Unexpected EOF in archive
tar: rmtlseek not stopped at a record boundary
tar: Error is not recoverable: exiting now
It looks, from the filename, as though you're trying to unpack something that isn't a tar archive. Usually a tar file would have a .tar extension, and if it had been then compressed with gzip, it would be .tar.gz or .tgz.
The command you're running, with the z option to tar, tries to undo gzip compression first, and then untar the resulting archive. But from the .gz extension, it rather looks as though you've got a gzipped file rather than a gzipped tar archive.
The best thing to do is to examine the file to find out what sort of file it is:
file myFile.gz
That will tell you whether it's gzipped or whatever. If it's gzipped, then run
gunzip myFile.gz
That will leave you with myFile without the extension; you can then use
file myFile
to probe it to determine whether it's a tar archive or something else.
mv myFile.gz myFile.tar.gz
gunzip myfile.tar.gz
tar xf myFile.tar
You might try bzip2recover to try recover the bondaries first!
bzip2recover file.bzip
eg from manual:
In case of damage to one member of a .gz file, other members can still be recovered (if the
file: Corrupt input. Use zcat to recover some data.
usage: zcat file > recover

Why can't a directory compressed with gzip, bzip, bzip2, xz?

Is there any possible way to compress a directory with GZip, BZip, BZip2, xz format. I'm building a command line tool(using bash) which I need these options to be included.
A command like
tar czf output.tar.gz yourdir/
should work.
c means that tar will create an archive
z means that the output will be compressed (using gzip)
the output filename is after f
at the end, you can specify any number of directories/files (space-separated)
To answer the "why" part of your question, it is because of the Unix philosophy of having many small tools that do their job well that you can string together, as opposed to one big tool that doesn't do anything well and is hard to make better. Your examples are a perfect illustration of this philosophy, where you have several compression tools to choose from, and it is easy to add a new compression tool to your tool box. The archiving part, turning a directory of files into a byte stream, is a different task that is its own tool that can be combined with any of those or any future compression tools.
The body of your question then asks "how". You use a pipe with tar, cpio, or pax. tar is the most common. You then name the file accordingly so the consumer of the file can tell what it is from the name. E.g. ending with .tar.gz. Like this:
tar cf - somedirectory | gzip > somedirectory.tar.gz
or
tar cf - somedirectory | xz > somedirectory.tar.xz
These tar up the directory into a byte stream, which is then piped to a compressor. The output of the compressor is then written to the file containing the compressed directory contents.
To decompress:
gzip -dc somedirectory.tar.gz | tar xf -
Here it is done in the reverse order to first decompress the file and feed the output of that to tar to extract the files and recreate the directory structure. The - means to put the archive to stdout or get the archive from stdin.
Having said all that stuff about how much better it is to have small tools that do their job well, this application of tar is so incredibly common that it is built into the tar options. So you can instead:
tar czf somedirectory.tar.gz somedirectory
tar cJf somedirectory.tar.xz somedirectory
tar will run the gzip or xz executables and pipe the data through them itself.
(J is a recent gnutar addition, so your tar may not have it.)

Resources