Spli tarz file not extracting on Linux - linux

I have transferred large tarz ~17Gb using gnu split in windows with 2048mb parts,
After uploading to Linux (Redhat 5 32bit) and combining the files using cat (cat xaa xab xac ... >> final.tarz), there is an error during checking tar xvfz final.tarz.
gzip: stdin: invalid compressed data--format violated
tar: Unexpected EOF in archive
tar: Error is not recoverable: exiting now
The file size are ditto and also the md5 checksum. So no clue what went wrong.
Can I resume tar from another part or skip the bad part, or may be check if the files are correctly transferred and re-transfer bad files only.

Related

Using pseudo to retain xattrs when extracting tar

I'm trying the following:
use pseudo to pack an archive (bz2) which has files with security xattr set.
use pseudo again to unpack the archive and keep the security xattr of the files.
Unfortunately, the extraction fails with the following message coming from pseudo:
$ pseudo tar -cjpf myarchive.tar.bz2 --xattrs --xattrs-include='*' myFile
# checked the contents in the meantime and looked good
$ pseudo tar -C unpack-folder/ -xpjf myarchive.tar.bz2 --xattrs --xattrs-include='*'
Warning: PSEUDO_PREFIX unset, defaulting to /home/user/tmp/.
got *at() syscall for unknown directory, fd 4
unknown base path for fd 4, path myFile
couldn't allocate absolute path for 'myFile'.
tar: myFile: Cannot open: Bad address
tar: Exiting with failure status due to previous errors
pseudo: 1.9.0
tar: 1.34
Do you have any idea what could be the problem or have another idea on how to be able to preserve the xattr of the files when extracting the contents of the archive?

Extracting a .tar file returns, “This does not look like a tar archive.”

I'm trying to download a dataset from here for my machine learning project
the data file appears to be tar but not extracting properly.
tar -xvf SNAKE_ALL.tar
tar: This does not look like a tar archive
tar: Exiting with failure status due to previous errors
tried
gzip -dc SNAKE_ALL.tar | tar -xf -
gzip: SNAKE_ALL.tar: not in gzip format
tar: This does not look like a tar archive
tar: Exiting with failure status due to previous errors
and
file SNAKE_ALL.tar
SNAKE_ALL.tar: HTML document, ASCII text
link to data:https://data.mendeley.com/datasets/v88xfw5wyx/1
It's not a tar file. The download fails and the ".tar" file you get is really a html fail saying:
"
403 Forbidden
Forbidden
You don't have permission to access /public/snake_toxins/SNAKE_ALL.tar
on this server.
Apache/2.2.15 (Linux/SUSE) mod_ssl/2.2.15 OpenSSL/1.0.0 PHP/5.3.3 Server at www.cbs.dtu.dk Port 80
"
Next time you get something like this run the commman:
file SNAKE_ALL.tar to see what kind of file it really is.
It appears to be corrupted. Note that the file size is 0 bytes.
file screenshot

is it possible to verify if tar.gz file is corrupted [duplicate]

This question already has answers here:
How to check if a Unix .tar.gz file is a valid file without uncompressing?
(8 answers)
Closed 4 years ago.
we want to check the tar.gz files if they are corrupted
for example
I use the file command
file spark2-hdp-yarn-archive.tar.gz
spark2-hdp-yarn-archive.tar.gz: gzip compressed data, was "spark2-hdp-yarn-archive.tar", last modified: Wed Aug 1 15:05:27 2018, max compression
in this case seems that file is ok
but I am not sure if file command is the right approach to check the compressed file
second what are the expected results that indicate the tar.gz file is corrupted ?
Use this :
arch=spark2-hdp-yarn-archive.tar.gz
if gzip -t "$arch" &>/dev/null; then
echo ok
else
echo >&2 "File corrupted"
exit 1
fi
The gzip tool allows you to test an archive. You would enter a command like
gzip -t spark2-hdp-yarn-archive.tar.gz
According to the man pages the -t flag tests the integrity of the file. There may be many errors you receive when testing the file, but the one I see most commonly for corrupted archives is
gzip: blah.gz: unexpected end of file
(With blah.gz being the file examined).

BusyBox tar: append workaround given limited disk space?

I'm on a Linux system with limited resources and BusyBox -- this version of tar does not support --append, -r. Is there a workaround that will allow me to [1] append files from directory B to an existing tar of files from directory A after [2] making the B-files appear to have come from directory A? (Later, when someone extracts the files, they should all end up in the same directory A.)
Situation: I have a list of files that I want to tar, but I must process some of these files first. The files might be used by other processes so I don't want to edit them in-place. I want to be conservative when using disk space so my script only copies those files which it needs to change (vs copying them all and then processing some and finally archiving them all with tar -- if I copied them all I might run into disk space issues).
This means the files I want to archive end up in two separate locations. But I want the resulting tar file to appear as if they were all in the same location. Near the end of my script, I end up with two text files listing the A and B files by name.
I think this is straightforward with a full-blown version of tar, but I have to work with the BusyBox version (usage below). Thanks in advance for any ideas!
Usage: tar -[cxtzjaZmvO] [-X FILE] [-f TARFILE] [-C DIR] [FILE]...
Create, extract, or list files from a tar file
Operation:
c Create
x Extract
t List
Options:
f Name of TARFILE ('-' for stdin/out)
C Change to DIR before operation
v Verbose
z (De)compress using gzip
j (De)compress using bzip2
a (De)compress using lzma
Z (De)compress using compress
O Extract to stdout
h Follow symlinks
m Don't restore mtime
exclude File to exclude
X File with names to exclude
T File with names to include
In principle, you just need to append a tar repository containing the additional files to the end of the tar file. It is only slightly more difficult than that.
A tar file consists of any number of repetitions of header + file. The header is always a single 512-byte block, and the file is padded to a multiple of 512 bytes, so you can think of these units as being a variable number of 512-byte blocks. Each block is independent; it's header starts with the full pathname to the file. So there is no requirement that files in a directory be tarred together.
There is one complication. At the end of the tar file, there are at least two 512-byte blocks completely filled with 0s. When tar is reading a tar file, it will ignore a single zero-filled header, but the second one will cause it to stop reading the file. If it hits EOF, it will complain, so the terminating empty headers are required.
There might be more than two headers, because tar actually writes in blocks which are a multiple of 512 bytes. Gnu tar, for example, by default writes in multiples of 20 512-byte chunks, so the smallest tar file is normally 10240 bytes.
In order to append new data, you need to first truncate the existing file to eliminate the empty blocks.
I believe that if the tar file was produced by busybox, there will only be two empty blocks, but I haven't inspected the code. That would be easy; you only need to truncate the last 1024 bytes of the file before appending the additional files.
For general tar files, it is trickier. If you knew that the files themselves didn't have NUL bytes in them (i.e. they were all simple text files), you could remove empty headers until you found a block with a non-0 byte in it, which wouldn't be too difficult.
What I would do is:
Truncate the last 1024 bytes of the tar file.
Remember the current size of the tar file.
Append a test tar file consisting of the tar of a file with a simple short message
Verify that tar tf correctly shows the test file
Truncate the file back to the remembered length,
If the tar tf found the test file's name, succeed
If the last 512 bytes of the tar file are all 0s, truncate the last 512 bytes of the file, and return to step 2.
Otherwise fail
If the above procedure succeeds, you can proceed to append the tar repository with the new files.
I don't know if you have a trunc command. If not, you can use dd copy a file over top of an old file at a specified offset (see the seek= option). dd will truncate the file automatically at the end of the copy. You can also use dd to read a 512 byte block (see the skip and count options).
The best solution is to cut the last 1024 bytes and concatenate a new tar after it. In order to append a tar to an existing tar file, they must be uncompressed.
For files like:
$ find a b
a
a/file1
b
b/file2
You can:
$ tar -C a -czvf a.tar.gz .
$ gunzip -c a.tar.gz | { head -c -$((512*2)); tar -C b -c .; } | gzip > a+b.tar.gz
With the result:
$ tar -tzvf a+b.tar.gz
drwxr-xr-x 0/0 0 2018-04-20 16:11:00 ./
-rw-r--r-- 0/0 0 2018-04-20 16:11:00 ./file1
drwxr-xr-x 0/0 0 2018-04-20 16:11:07 ./
-rw-r--r-- 0/0 0 2018-04-20 16:11:07 ./file2
Or you can create both tar in the same command:
$ tar -C a -c . | { head -c -$((512*2)); tar -C b -c .; } | gzip > a+b.tar.gz
Although this is for tar generated by busybox tar. As mentioned in previous answer, GNU tar add multiple of 20 blocks. You need to force the number of blocks to be 1 (--blocking-factor=1) in order to know in advance how many blocks to cut:
$ tar --blocking-factor=1 -C a -c . | { head -c -$((512*2)); tar -C b -c .; } | gzip | tar --blocking-factor=1 -tzv
Anyway, GNU tar do have --append. The last --blocking-factor=1 is only needed if you indent do append the resulting tar again.

Extracting split .tar files on linux, that separately need to be decrypted

i am trying to extract and decrypt 23 .tar files named as per below:
dev_flash_000.tar.aa.2010_07_29_170013
There are 23 of them, and each needs to be decrypted with an app called dePKG before it is extracted.
I tried this bash script:
for i in `ls dev_flash*`; do ./depkg $i $i.tar ; tar -xvf ./$i.tar ; rm $i.tar; done
and get this error for all 23 files:
read 0x800 bytes of pkg
pkg data # 340 with size 3ec
not inflated, writing 1004 bytes
tar: This does not look like a tar archive
tar: Skipping to next header
tar: Exiting with failure status due to previous errors
I just want to save time :D
You should not use ls in a ` ` context — see http://porkmail.org/era/unix/award.html#ls . FWIW:
for i in dev_flash*`; do
./depkg "$i" -;
done | tar -xv;
Check with your depkg manual pages on how to make it output to stdout, or if it does not, use /dev/stdout as a file. Not only does that save you the temporaries, but running a single tar command on the concatenation of the decrypted contents also works properly when the original archive has been split at arbitrary positions.

Resources