Fast directory conversion to file - linux

I have a directory D that contains multiple files and folders and consumes a very large amount of disk space. Although I do not care much about disk space consumption, I want to convert D into a file as fast as possible. The first approach that came to mind was to use a compression tool however it is taking too long to finish.
Is there a faster way?
Thank you for your help.

you can use tar command, with no compression
with tar -cf you can convert your folder to a file, with no compression process.
tar -cf your_big_folder.tar /path/to/your/big/folder
and finally you can convert it back to folder with
tar -xf your_big_folder.tar

Related

tar.xz - how to check uncompressed files size without decompress whole archive

I have a problem with checking uncompressed size from archive tar.xz without extract whole archive.
I know that for tar.gz I can use gzip or zcat but for tar.xz it dosnt work.
Any sugestion how to do this ?
tar tvfa <file> will give you a list including file sizes. Check man tar for details. You should also note, that file size does not equate to disk usage, since part of a file can take a full file system block.
xz -l file.xz will give you the compressed/uncompressed sizes of the archive in a human-readable way (KiB MiB suffixes) without actually decompressing (no CPU or disk load). It won't list the files inside the tar archive, though.
xz -lv file.xz is verbose, and you can filter it to get the exact sizes in bytes.

compress dir to same size split files,depressed files individual?

I need to transfer a 1G folder over the network, and the data formats I send and receive are all under my control. In order to speed up the reception of data, I do this now:
Before transferring, compress the 1G folder and then transfer it.
after download all,decompress it.
It can reduce some time because the amount of data transferred becomes smaller, but it also requires time for decompression. Is it possible to compress a folder into many files of the same size, download one file and decompress one file, and when all files are decompressed, it will be the initial folder? my question is:
Can this be achieved?
How can I uncompressing file while downloading it?
how to reduce the download and uncompress time?
Suppose original folder is in the HOME directory. Make a tar archive of the 1G folder using a commands like,
cd;
mkdir tmp;
tar -cvzf tmp/original-folder.tar.gz original-folder
Split the tar archive into small files like, xaa, xab, ..., using the split command, e.g.,
split --bytes=100000000 original-folder.tar.gz
Join the pieces at the destination with the cat command,
cat xaa xab xac [...] > transferred-folder.tar.gz
Untar to get the folder
tar -xvzf transferred-folder.tar.gz

Use of temporary files and memory when using tar to backup very _large_ files with compression

When backing up one or more _very_large_ files using tar with compression (-j or -z) how does GNU tar manage the use of temporary files and memory?
Does it backup and compress the files block by block, file by file, or some other way?
Is there a difference between the way the following two commands use temporary files and memory?
tar -czf data.tar.gz ./data/*
tar -cf - ./data/* | gzip > data.tar.gz
Thank you.
No temporary files are used by either command. tar works completely in a streaming fashion. Packaging and compressing are completely separated from each other and also done in a piping mechanism when using the -z or -j option (or similar).
For each file tar it puts into an archive, it computes a file info datagram which contains the file's path, its user, permissions, etc., and also its size. This needs to be known up front (that's why putting the output of a stream into a tar archive isn't easy without using a temp file). After this datagram, the plain contents of the file follows. Since its size is known and already part of the file info ahead, the end of the file is unambiguous. So after this, the next file in the archive can follow directly. In this process no temporary files are needed for anything.
This stream of bytes is given to any of the implemented compression algorithms which also do not create any temporary files. Here I'm going out on a limb a bit because I don't know all compression algorithms by heart, but all that I ever came in touch with do not create temporary files.

tar query:How to speedup tar for many files in .tgz file

I have around thousands(approx say 20000) of files in sample.tgz which when I am doing decompression using tar-xf is taking more than 5 minutes.I want to speed it up to within a minute.The approach which I am thinking of is getting all the names of files in the .tgz file using -T option and then running tar in parallel in batches of say 500 file names.
Could somebody suggest a better approach?Please note that I have to use tar only here and not any other utilities like pigz and parallel etc.
Similarly if anyone can suggest the approach to compress it faster,that would also be helpful.
Also not that there is no .tgz file inside my sample.tgz file.
Tarballs are linear archives (mimicing the media they are named after; Tape ARchive) and so don't parallelize well until decompressed. Speeding up the decompression operation by using an algorithm such as LZ4 will help some, but if you're stuck with a gzipped tarball then the only chance you'll have of speeding it up is to use pigz instead of gzip to decompress it to a .tar file and then extract the files from there.

How to limit memory usage during tar

I need to tar (or otherwise zip) more than 2.7 million files. (150GB)
However, with this many files the tar command uses way too much memory and my system crashes. What can I do?
tar -cf /path/filename.tar /path_to_file/
I've tried to do it in batches (multiple tar files would be OK) based on file creation date and find, but find takes up even more memory.
Not sure if this is an answer exactly, as it doesn't say how to explicitly lower tar's memory usage, but...
I think you can specify the compression program used with tar to use pigz (parallel gzip), then specify the number of threads to use to better help manage memory. Maybe something like:
tar cvf - paths-to-archive | pigz -p 4 > archive.tar.gz
where -p $NUM is the number of cores.

Resources