compress dir to same size split files,depressed files individual? - linux

I need to transfer a 1G folder over the network, and the data formats I send and receive are all under my control. In order to speed up the reception of data, I do this now:
Before transferring, compress the 1G folder and then transfer it.
after download all,decompress it.
It can reduce some time because the amount of data transferred becomes smaller, but it also requires time for decompression. Is it possible to compress a folder into many files of the same size, download one file and decompress one file, and when all files are decompressed, it will be the initial folder? my question is:
Can this be achieved?
How can I uncompressing file while downloading it?
how to reduce the download and uncompress time?

Suppose original folder is in the HOME directory. Make a tar archive of the 1G folder using a commands like,
cd;
mkdir tmp;
tar -cvzf tmp/original-folder.tar.gz original-folder
Split the tar archive into small files like, xaa, xab, ..., using the split command, e.g.,
split --bytes=100000000 original-folder.tar.gz
Join the pieces at the destination with the cat command,
cat xaa xab xac [...] > transferred-folder.tar.gz
Untar to get the folder
tar -xvzf transferred-folder.tar.gz

Related

How to untar `.tgz` directory, and gzip one of the extracted files in memory?

TL;DR
How can I untar a file .tgz, and then selectively gzip the output?
My extracted dir has a few text files and a .nii file. I'd like to gzip the later.
More details
First method would be to just do sequentially. However I'm dealing with a huge dataset (10k+ tar archives) stored on a BeeGFS file system and I was told it would be better to do it in memory instead in two steps, since BeeGFS doesn't like handling big directories like this.
Sequential method:
for tarfile in ${rootdir}/*.tgz; do
tarpath="${tarfile%.tgz}"
tar zxvf ${tarfile} # (1) untar directory
gzip ${tarpath}/*.nii # (2) gzip the .nii file
done
Is there a way to combine (1) and (2)? Or do you have any other tips on how to do this process effectively?
Thanks!
You can extract a single file from the archive (If you know the filename), and have tar write it to standard output instead of to a file with -O, and then compress that stream and redirect it to a file. Something like
tar xzOf "$tarfile" "$tarpath/foo.nii" | gzip -c > "$tarpath/foo.nii.gz"
You can then extract everything else in the archive with tar xzf "$tarfile" --exclude "*.nii"

GZip an entire directory

i used the following:
gzip -9 -c -r <some_directory> > directory.gz
how do i decompress this directory ?
I have tried
gunzip directory.gz
i am just left with a single file and not a directory structure.
As others have already mentioned, gzip is a file compression tool and not an archival tool. It cannot work with directories. When you run it with -r, it will find all files in a directory hierarchy and compress them, i.e. replacing path/to/file with path/to/file.gz. When you pass -c the gzip output is written to stdout instead of creating files. You have effectively created one big file which contains several gzip-compressed files.
Now, you could look for the gzip file header/magic number, which is 1f8b and then reconstruct your files manually.
The sensible thing to do now is to create backups (if you haven't already). Backups always help (especially with problems such as yours). Create a backup of your directory.gz file now. Then read on.
Fortunately, there's an easier way than manually reconstructing all files: using binwalk, a forensics utility which can be used to extract files from within other files. I tried it with a test file, which was created the same way as yours. Running binwalk -e file.gz will create a folder with all extracted files. It even manages to reconstruct the original file names. The hierarchy of the directories is probably lost. But at least you have your file contents and their names back. Good luck!
Remember: backups are essential.
(For completeness' sake: What you probably intended to run: tar czf directory.tar.gz directory and then tar xf directory.tar.gz)
gzip will compress 1+ files, though not meant to function like an archive utility. The posted cmd-line would yield N compressed file images concatenated to stdout, redirected to the named output file; unfortunately stuff like filenames and any dirs would not be recorded. A pair like this should work:
(create)
tar -czvf dir.tar.gz <some-dir>
(extract)
tar -xzvf dir.tar.gz

Use of temporary files and memory when using tar to backup very _large_ files with compression

When backing up one or more _very_large_ files using tar with compression (-j or -z) how does GNU tar manage the use of temporary files and memory?
Does it backup and compress the files block by block, file by file, or some other way?
Is there a difference between the way the following two commands use temporary files and memory?
tar -czf data.tar.gz ./data/*
tar -cf - ./data/* | gzip > data.tar.gz
Thank you.
No temporary files are used by either command. tar works completely in a streaming fashion. Packaging and compressing are completely separated from each other and also done in a piping mechanism when using the -z or -j option (or similar).
For each file tar it puts into an archive, it computes a file info datagram which contains the file's path, its user, permissions, etc., and also its size. This needs to be known up front (that's why putting the output of a stream into a tar archive isn't easy without using a temp file). After this datagram, the plain contents of the file follows. Since its size is known and already part of the file info ahead, the end of the file is unambiguous. So after this, the next file in the archive can follow directly. In this process no temporary files are needed for anything.
This stream of bytes is given to any of the implemented compression algorithms which also do not create any temporary files. Here I'm going out on a limb a bit because I don't know all compression algorithms by heart, but all that I ever came in touch with do not create temporary files.

How to get the list of files (ls command) from bz2 archive?

What is the Unix bash command to get the list of files (like ls) from archive file of type .bz2 (without unzipping the archive)?
First bzip2, gzip, etc compress only one file. So probably you have compressed tar file. To list the files you need command like:
tar tjvf file.bz2
This command uncompress the archive and test the content of tar.
Note that bzip2 compresses each file, and a simple .bz2 file always contains a single file of the same name with the ".bz2" part stripped off. When using bzip2 to compress a file, there is no option to specify a different name, the original name is used and .bz2 appended. So there are no files, only 1 file. If that file is a tar archive, it can contain many files, and the whole contents of the .tar.bz2 file can be listed with "tar tf file.tar.bz2" without unpacking the archive.

Fast directory conversion to file

I have a directory D that contains multiple files and folders and consumes a very large amount of disk space. Although I do not care much about disk space consumption, I want to convert D into a file as fast as possible. The first approach that came to mind was to use a compression tool however it is taking too long to finish.
Is there a faster way?
Thank you for your help.
you can use tar command, with no compression
with tar -cf you can convert your folder to a file, with no compression process.
tar -cf your_big_folder.tar /path/to/your/big/folder
and finally you can convert it back to folder with
tar -xf your_big_folder.tar

Resources