tar package has different checksum for exactly the same content - linux

Packaging a folder on a SUSE Linux Enterprise Server 12 SP3 system using GNU tar 1.30 always gives different md5 checksums although the file contents do not change.
I run tar to package my folder that contains a simple text file:
tar cf package.tar folder
Nevertheless, although the content is exactly the same, the resulting tar always has a different md5 (or sha1) checksum:
$> rm -rf package.tar && tar cf package.tar folder && md5sum package.tar
e6383218596fffe118758b46e0edad1d package.tar
$> rm -rf package.tar && tar cf package.tar folder && md5sum package.tar
1c5aa972e5bfa2ec78e63a9b3116e027 package.tar
Because the linux file system seems to deliver files in a random order to tar, I tried using the --sort option. But the resulting command doesn't change the checksum issue for me. Also tar's --mtime option does not help here, since the creation dates are exactly the same.
I appreciate any help on this.

The archives you provided contain pax extended headers.
A quick glance at their structure reveals that they differ in these two fields:
The process ID of the pax process (as part of a name for the extended header in the ustar header block, and consequently the checksum for this ustar header block).
The atime (access time) in the extended header.
One of the workarounds you can use for reproducible archive creation is to enforce the old unix ustar format (rather than the pax/posix format):
tar --format=ustar -cf package.tar folder
The other choice is to manually set the extended name and delete the atime while preserving the pax format:
tar --format=pax --pax-option=exthdr.name=%d/PaxHeaders/%f,delete=atime -cf package.tar folder
Now the md5sum should be the same for both archives.

The header for tar files contain several fields which will be potentially different each time you re-tar a set of files. For instance the last access time and modification time will likely be different each time.
According to this article it is possible with GNU tar to produce identical output for identical input by doing the following:
# requires GNU Tar 1.28+
$ tar --sort=name \
--mtime="2018-10-05 00:00Z" \
--owner=0 --group=0 --numeric-owner \
-cf product.tar build

tar -p --sort=name --no-acls --no-selinux --no-xattrs
worked for a similar situation in slackware 14.2, using
GNU tar 1.29.
The p stands for preserve attributes (owner and time) and is assumed for a root user.
Also consider untarring with --atime-preserve (depending on purpose).

Related

Bash Scripting with xargs to BACK UP files

I need to copy a file from multiple locations to the BACK UP directory by retaining its directory structure. For example, I have a file "a.txt" at the following locations /a/b/a.txt /a/c/a.txt a/d/a.txt a/e/a.txt, I now need to copy this file from multiple locations to the backup directory /tmp/backup. The end result should be:
when i list /tmp/backup/a --> it should contain /b/a.txt /c/a.txt /d/a.txt & /e/a.txt.
For this, I had used the command: echo /a/*/a.txt | xargs -I {} -n 1 sudo cp --parent -vp {} /tmp/backup. This is throwing the error "cp: cannot stat '/a/b/a.txt /a/c/a.txt a/d/a.txt a/e/a.txt': No such file or directory"
-I option is taking the complete input from echo instead of individual values (like -n 1 does). If someone can help debug this issue that would be very helpful instead of providing an alternative command.
Use rsync with the --relative (-R) option to keep (parts of) the source paths.
I've used a wildcard for the source to match your example command rather than the explicit list of directories mentioned in your question.
rsync -avR /a/*/a.txt /tmp/backup/
Do the backups need to be exactly the same as the originals? In most cases, I'd prefer a little compression. [tar](https://man7.org/linux/man-pages/man1/tar.1.html) does a great job of bundling things including the directory structure.
tar cvzf /path/to/backup/tarball.tgz /source/path/
tar can't update compressed archives, so you can skip the compression
tar uf /path/to/backup/tarball.tar /source/path/
This gives you versioning of a sort, as if only updates changed files, but keeps the before and after versions, both.
If you have time and cycles and still want the compression, you can decompress before and recompress after.

Which tar options do I need to install node from binary on linux?

Background
I downloaded binary for linux 64-bit and I was following several tutorials, each with similar options:
tar -C /usr/local --strip-components 1 -xzf /path/to/node.tar.gz
I always get this error:
gzip: stdin: not in gzip format
tar: Child returned status 1
tar: Error is not recoverable: exiting now
I've googled and it seems I have manually specified gzip file format via one of these switches. File is actually tar.xz, not tar.gz. It was probably tar.gz in older versions.
I wonder what do all of these options mean and which one's I need?
Is there an auto-detect format option?
This is what running info tar said:
-C, --directory DIR
change to directory DIR
--strip-components=NUMBER
strip NUMBER leading components from file names on extraction
-x, --extract, --get
extract files from an archive
-z, --gzip, --gunzip --ungzip
-f, --file ARCHIVE
use archive file or device ARCHIVE
Questions
I don't understand options -f, --strip-components.
-f - What else can it be but a file? What is "device archive"?
--strip-components - What does --strip-components 1 exactly do here?
I don't see any numbers in the file.
Please provide example of filename which would be affected by --strip-components and explain how.
And what's the idea with installing nodejs on linux?
Just unzip to /usr/local or what else needs to be done?
First, tar is "smart" enough to detect the compression method used in an archive so it isn't necessary to specify -z or -j.
-f - What else can it be but a file? What is "device archive"?
A device archive could be a tape archive attached to your machine in /dev/your_tar
--strip-components - What does --strip-components 1 exactly do
tar xfz /var/www/site/site.gz --strip-components=2
will create /var/site.

Extract tar archive excluding a specific folder and its contents

With PHP I am using exec("tar -xf archive.tar -C /home/user/target/folder") to extract the contents of a specific archive (archive.tar) into the target directory (/home/user/target/folder), so that all existing contents of the target directory will be overwritten by the new ones that are contained in the archive.
It works fine and all files in the target directory are being overwritten after extract, but there is one directory in the archive that I would like to omit (from extracting and thus overwriting the existing one in the target folder)...
For example, the archive.tar contains:
folderA/
folderB/
folderC/
folderD/
fileA.php
fileB.php
fileC.xml
How could I extract (and overwrite) all except (for example) folderC/? In other words, I want folderC and its contents to remain intact in the user's directory and not be overwritten by the one contained in the tar archive.
Any suggestions?
(Tar on the hosting server is GNU version 1.23.)
You can use '--exclude' to omit a folder:
tar -xf archive.tar -C /home/user/target/folder" --exclude="folderC"
There is the --exclude PATTERN option in the tar tool.
Check: tar on linuxcommand.org
To be on the safe side, you could remove all write permissions from the folder. For example:
$ chmod 000 folderC/
An then do a normal tar extract (as regular user). You'll get some error messages on console, but your folder will remain untouched.... At the end of the tar, change back your folder original permissions. For example:
$ chmod 775 folderC/
Of course '--exclude' tar option is the right solution to this particular problem, but, if you are not completely 100% sure about a command syntax, and yor're handling critical data, my solution puts you on the safe side :-).
Write --exclude='./folder' at the beginning of the tar command.
In your case that is,
exec("tar -x --exclude='./C' -f archive.tar -C /home/user/target/folder")

Is it possible to create a folder with the filename into the tar file you are creating?

Let's say I'm trying to tar.gz all the files and folders in /usr/local/bin/data/*
The file name would be data-2015-10-01.tar.gz. When I untar it, is it possible that the root directory would be data-2015-10-01 followed by the contents of whatever is inside of data/* ?
If not, how can I tar /usr/local/bin/data/* but start at the /data/ folder level?
I can't do this unfortunately since the program spits out /usr/local/bin/data/ and I'm unable to change it.
cd /usr/local/bin
tar ... /data/*
There are a couple of ways to do what I think you're trying to accomplish. First, you can use the -C option to tar when creating the archive. That changes tar's current working directory to that directory before creating the archive. Not strictly required in your case, but probably helpful.
# tar -C /usr/local/bin -czf data-2015-10-01.tar.gz data/*
That at least gets you to a single directory named data. If you have control of the extraction (manually or via a script you provide to whomever is unpacking this), then you can do something like this on the extraction:
# mkdir -f data-2015-10-01 && tar -C data-2015-10-01 --strip-components=1 -xzf data-2015-10-01.tar.gz
This will remove the first path, which is "data" and extract everything from there into the directory which is your current working directory, data-2015-10-01. So, it isn't specifically tar that's doing the renaming, but you will effectively end up with the same result.
I've accomplished something similar with a symlink. This is not a great solution if you have (or might have) symlinks in the directory structure you're trying to archive. I have to say that I prefer #geis' solution to strip out the top-level directory on extract, but this gives you another option.
ln -s /usr/local/bin/data data-2015-10-01
tar -cvhf data-2015-10-01.tar.gz data-2015-10-01/
rm data-2015-10-01
(Note the additional -h option in the tar invocation.)

How to update tar (NOT append)

I want to update an existing tar file with newer files.
At GNU, I read:
4.2.3 Updating an Archive
In the previous section, you learned how to use ‘--append’ to add a
file to an existing archive. A related operation is ‘--update’ (‘-u’).
The ‘--update’ operation updates a tar archive by comparing the date
of the specified archive members against the date of the file with the
same name. If the file has been modified more recently than the
archive member, then the newer version of the file is added to the
archive (as with ‘--append’).
However,
When I run my tar update command, the files are appended even though their modification dates are exactly the same. I want to ONLY append where modification dates of files to be tarred are newer than those already in the tar...
tar -uf ./tarfile.tar /localdirectory/ >/dev/null 2>&1
Currently, every time I update, the tar doubles in size...
The update you describe implies that the file within the archive is replaced. If the new copy is smaller than what's in the archive, it could be directly rewritten. If the new copy however is larger, tar would have to zero the existing archive entry and append. Such updates would leave runs of '\0's or other unused bytes, so any normal computer user would want that such sections are removed, which would be done by "moving up" bytes comprising the archive contents towards the start of the file (think C's memmove).
Such an in-place move operation however, which would involve seek-read-seek-write cycles, is costly, especially when you look at it in the context of tapes — which tar was designed for originally —, i.e. devices with a seek performance that is not comparable to harddisks. You'd wear out the tape rather quickly with such move ops. Oh and of course, WORM devices don't support this move op either.
If you do not want to use the "-P" switch tar -u... works correctly if the current directory is the parent directory of the one we are going to update, and the path to this directory in the tar command will not be an absolute path.
For exmple:
We want to update catalog /home/blabla/Dir. We do it like that:
cd /home/blabla
tar -u -f tarfile.tar Dir
In general, the update must be made from the same place as the creation, so that the paths agree.
It is also possible:
cd /home/blabla/Dir
tar-u -f /path/to/tarfile.tar .
You may simply create (instead of update) the archive each time:
tar -cvpf tarfile.tar *
This will solve the problem of your archive doubling in size each time. But of course, it is generating the whole archive every time.
By default tar strips the leading / from member names, but it does this after deciding what needs to be updated.
Therefore if you are archiving an absolute path, you either need to cd / and use relative paths, or add the -P/--absolute-names option.
cd /
tar -uf "$OLDPWD/tarfile.tar" localdirectory/ >/dev/null 2>&1
tar -cPf tarfile.tar /localdirectory/ >/dev/null 2>&1
tar -uPf tarfile.tar /localdirectory/ >/dev/null 2>&1
However, the updated items will still be appended. A tar (tape archive) file cannot be modified excpet by appending.
Warning! When speaking about "dates" it means any date, and that includes the access time.
Should your files have been accessed in any such way (a simple ls -l is enough) then tar is right to do what it does!
You need to find another way to do what you want. Probably use a sentinel file and see if its modification date is less than the files you wish to append.

Resources