I need to change the compression level for BZIP2 compression with tar. I found that we can set the compression level and run the tar command to compress. I tried different compression levels using the following command, but it seems like BZIP2=-<compression level> does not change the compression level.
BZIP2=-1
tar -cjvf <output_file> <input_file>
How to do it correctly?
Since you tagged this "linux", I will assume that you are using GNU tar. Then you can give the compression command with options using -I:
tar -I="bzip2 -1" -cvf out.tar.bz2 files
Related
Packaging a folder on a SUSE Linux Enterprise Server 12 SP3 system using GNU tar 1.30 always gives different md5 checksums although the file contents do not change.
I run tar to package my folder that contains a simple text file:
tar cf package.tar folder
Nevertheless, although the content is exactly the same, the resulting tar always has a different md5 (or sha1) checksum:
$> rm -rf package.tar && tar cf package.tar folder && md5sum package.tar
e6383218596fffe118758b46e0edad1d package.tar
$> rm -rf package.tar && tar cf package.tar folder && md5sum package.tar
1c5aa972e5bfa2ec78e63a9b3116e027 package.tar
Because the linux file system seems to deliver files in a random order to tar, I tried using the --sort option. But the resulting command doesn't change the checksum issue for me. Also tar's --mtime option does not help here, since the creation dates are exactly the same.
I appreciate any help on this.
The archives you provided contain pax extended headers.
A quick glance at their structure reveals that they differ in these two fields:
The process ID of the pax process (as part of a name for the extended header in the ustar header block, and consequently the checksum for this ustar header block).
The atime (access time) in the extended header.
One of the workarounds you can use for reproducible archive creation is to enforce the old unix ustar format (rather than the pax/posix format):
tar --format=ustar -cf package.tar folder
The other choice is to manually set the extended name and delete the atime while preserving the pax format:
tar --format=pax --pax-option=exthdr.name=%d/PaxHeaders/%f,delete=atime -cf package.tar folder
Now the md5sum should be the same for both archives.
The header for tar files contain several fields which will be potentially different each time you re-tar a set of files. For instance the last access time and modification time will likely be different each time.
According to this article it is possible with GNU tar to produce identical output for identical input by doing the following:
# requires GNU Tar 1.28+
$ tar --sort=name \
--mtime="2018-10-05 00:00Z" \
--owner=0 --group=0 --numeric-owner \
-cf product.tar build
tar -p --sort=name --no-acls --no-selinux --no-xattrs
worked for a similar situation in slackware 14.2, using
GNU tar 1.29.
The p stands for preserve attributes (owner and time) and is assumed for a root user.
Also consider untarring with --atime-preserve (depending on purpose).
I am trying to archive the contents of my home directory using tar and then compress the tar file with gzip. I know you can uncompress and unarchive the .tar.gz file using cat, tar and gzip. But , I don't know how to compress and archive.
Hey there here is a link for your question. a full guide
https://www.howtogeek.com/248780/how-to-compress-and-extract-files-using-the-tar-command-on-linux/
tar -czvf name-of-archive.tar.gz /path/to/directory-or-file
Here’s what those switches actually mean:
-c: Create an archive.
-z: Compress the archive with gzip.
-v: Display progress in the terminal while creating the archive, also known as “verbose” mode. The v is always optional in these commands, but it’s helpful.
-f: Allows you to specify the filename of the archive.
Is there any possible way to compress a directory with GZip, BZip, BZip2, xz format. I'm building a command line tool(using bash) which I need these options to be included.
A command like
tar czf output.tar.gz yourdir/
should work.
c means that tar will create an archive
z means that the output will be compressed (using gzip)
the output filename is after f
at the end, you can specify any number of directories/files (space-separated)
To answer the "why" part of your question, it is because of the Unix philosophy of having many small tools that do their job well that you can string together, as opposed to one big tool that doesn't do anything well and is hard to make better. Your examples are a perfect illustration of this philosophy, where you have several compression tools to choose from, and it is easy to add a new compression tool to your tool box. The archiving part, turning a directory of files into a byte stream, is a different task that is its own tool that can be combined with any of those or any future compression tools.
The body of your question then asks "how". You use a pipe with tar, cpio, or pax. tar is the most common. You then name the file accordingly so the consumer of the file can tell what it is from the name. E.g. ending with .tar.gz. Like this:
tar cf - somedirectory | gzip > somedirectory.tar.gz
or
tar cf - somedirectory | xz > somedirectory.tar.xz
These tar up the directory into a byte stream, which is then piped to a compressor. The output of the compressor is then written to the file containing the compressed directory contents.
To decompress:
gzip -dc somedirectory.tar.gz | tar xf -
Here it is done in the reverse order to first decompress the file and feed the output of that to tar to extract the files and recreate the directory structure. The - means to put the archive to stdout or get the archive from stdin.
Having said all that stuff about how much better it is to have small tools that do their job well, this application of tar is so incredibly common that it is built into the tar options. So you can instead:
tar czf somedirectory.tar.gz somedirectory
tar cJf somedirectory.tar.xz somedirectory
tar will run the gzip or xz executables and pipe the data through them itself.
(J is a recent gnutar addition, so your tar may not have it.)
There is a file having an extension .tar.xz : wkhtmltox-linux-i386_0.12.0-03c001d.tar.xz
What is the linux command to uncompress it ?
From the Ubuntu Site here.
tar -xJf wkhtmltox-linux-i386_0.12.0-03c001d.tar.xz
If you have a recent version of tar (1.25 or later), you should be able to just type:
tar xf wkhtmltox-linux-i386_0.12.0-03c001d.tar.xz
And it will correctly determine what type of decompression to use.
In addition, you can use tar caf archive.ext files_to_add to create archives, and it will decide which compression algorithm to use based on the extension of the archive.
The -J, --xz flags are for that:
tar -xJf file.pkg.tar.xz
You can also use the xz-utils package and use the unxz command on the file, then use standard tar from there.
Can someone please explain me how to use ">" and "|" in linux commands and convert me these three lines into one line of code please?
mysqldump --user=*** --password=*** $db --single-transaction -R > ${db}-$(date +%m-%d-%y).sql
tar -cf ${db}-$(date +%m-%d-%y).sql.tar ${db}-$(date +%m-%d-%y).sql
gzip ${db}-$(date +%m-%d-%y).sql.tar
rm ${db}-$(date +%m-%d-%y).sql (after conversion I guess this line will be useless)
The GNU tar program can itself do the compression normally done by gzip. You can use the -z flag to enable this. So the tar and gzip could be combined into:
tar -zcf ${db}-$(date +%m-%d-%y).sql.tar.gz ${db}-$(date +%m-%d-%y).sql
Getting tar to read from standard input for archiving is not a simple task but I would question its necessity in this particular case.
The intent of tar is to be able to package up a multitude of files into a single archive file but, since it's only one file you're processing (the output stream from mysqldump), you don't need to tar it up, you can just pipe it straight into gzip itself:
mysqldump blah blah | gzip > ${db}-$(date +%m-%d-%y).sql.gz
That's because gzip will compress standard input to standard output if you don't give it any file names.
This removes the need for any (possibly very large) temporary files during the compression process.
You can use next script:
#!/bin/sh
USER="***"
PASS="***"
DB="***"
mysqldump --user=$USER --password=$PASS $DB --single-transaction -R | gzip > ${DB}-$(date +%m-%d-%y).sql.gz
You can learn more about "|" here - http://en.wikipedia.org/wiki/Pipeline_(Unix). I can say that this construction moves output of mysqldump command to the standard input of gzip command, so that is like you connect output of one command with input of other via pipeline.
I dont see the point in using tar: You just have one file, and for compression you call gzip explicit. Tar is used to archive/pack multiple files into one.
You cammandline should be (the dump command is shorted, but I guess you will get it):
mysqldump .... | gzip > filename.sql.gz
To append the commands together in one line, I'd put && between them. That way if one fails, it stops executing them. You could also use a semicolon after each command, in which case each will run regardless if the prior command fails or not.
You should also know that tar will do the gzip for you with a "z" option, so you don't need the extra command.
Paxdiablo makes a good point that you can just pipe mysqldump directly into gzip.