How to zip folder that contains more than 12GB data - linux

I have a requirement to zip a folder which contains large number of files.
When I tried to zip in command line, it is showing zip error: Input file read failure
I searched net and found "The .ZIP file format, only handles file lengths that can be
contained in a 32-bit integer." If so, then it must be the cause of the error I got, because my folder size is more than 12GB. Is there any way to extend the file size to be zipped.
Or is there another way to solve this?
I am using CENTOS 5.
Thanks.

You can use tar for that.
Just try:
$tar -cvzf compress.tgz /path/to/your/data
and to extract it:
$tar -xvzf compress.tgz

GZip can handle any size that your file system can handle. You might want to first "tar" the content to one file, unsing the GnuTar you can use the z option to do the compression in one go.
7Zip is also a good alternative to ZIP. It is ported to many platforms and the size limits are much higher.

Related

tar.xz - how to check uncompressed files size without decompress whole archive

I have a problem with checking uncompressed size from archive tar.xz without extract whole archive.
I know that for tar.gz I can use gzip or zcat but for tar.xz it dosnt work.
Any sugestion how to do this ?
tar tvfa <file> will give you a list including file sizes. Check man tar for details. You should also note, that file size does not equate to disk usage, since part of a file can take a full file system block.
xz -l file.xz will give you the compressed/uncompressed sizes of the archive in a human-readable way (KiB MiB suffixes) without actually decompressing (no CPU or disk load). It won't list the files inside the tar archive, though.
xz -lv file.xz is verbose, and you can filter it to get the exact sizes in bytes.

how to extract files from a large (30Gb+) zip file on linux server

1) extract from large zip file
I want to extract files from a large zip file (30Gb+) on the linux server. There is enough free disk space.
I've tried jar xf dataset.zip. However, there's an error that push button is full, and it failed to extract all of the files.
I tried unzip, but zipfile corrupt.
Archive: dataset.zip
warning [dataset.zip]: 35141564204 extra bytes at beginning or within zipfile
(attempting to process anyway)
error [dataset.zip]: start of central directory not found;
zipfile corrupt.
(please check that you have transferred or created the zipfile in the
appropriate BINARY mode and that you have compiled UnZip properly)
I tried zip -FF dataset.zip --out data.zip, and there's an error that entry too big:
zip error: Entry too big to split, read, or write (Poor compression resulted in unexpectedly large entry - try -fz)
Is there anyway I can efficiently extract files from really large zip file?
2) extract certain files from a large zip file
If I only want some certain files from this large zip file, is there anyway I can extract only these files? For example, data1.txt from dataset.zip? It seems that I can't use any zip or unzip command (always have the zipfile corrupt problem).
Thanks!
I've solved the problem. It turns out to be a zip corruption problem. I first fixed the file with:
zip -FF filename1.zip --out filename2.zip -fz
then unzip the fixed zipfile:
unzip filename2.zip
and have successfully extracted all the files!
Many thanks to Fattaneh Talebi for the help!
you can extract specific file from zip
$ unzip -j "zipedfile.zip" "file.txt"
file.txt is the file you want to extract from zipedfile.zip
I had the similar kind of problem and it got solved by unar command.
unar file.zip
try extracting directories to retain control and know where you left off.
eg:
tar tv --wildcards -f siteRF.tar './Movies/*'
I tried all the steps mentioned above to unzip the file, but failed miserably.
My last resort was to copy my zip file (11.1GB) into a hard drive and unzip it using 7 zip on Windows 8 OS.
Worked like a charm :D
I also solved it in similar manner like Irene W did. It was a corrupted zip. I first fixed the file with:
zip -FF original_corrupted.zip --out fixed_file.zip -fz
then unzip the fixed zip file:
unzip fixed_file.zip

zip command skip errors

zip -r file.zip folder/
This is the typical command I use to zip a directory, however it is on an active site so images are constantly deleted/updated. Leading to the command failing due to a file being there when it started the process but not there when it gets to actually compressing it (at least from what I can see).
I have no option to stop the editing of the files in this case so my only hope is to just skip them, the amount of images getting edited compared to the sheer size of the directory is insigificant. so 2-3 files changing out of 100,000 is nothing, but the error stops the compression altogether.
I tried to look for a way around this, but have had no luck, could be just looking in the wrong direction but I feel that there is no way this is impossible.
Here is an example error:
zip I/O error: No such file or directory
zip error: Input file read failure (was zipping uploads/2010/03/file.jpg)
Is there some way to use the zip command or something similar to zip a folder, but if it runs into an error when it hits a file, it just skips it?
tar is always a good option to compress in Linux. Beware that zip may also have file size limit issue.
tar vcfz file.tar.gz folder

Create Zip File Fedora 17

Hi I work with fedora 17 and I want to create zip file
There are four files in my directory /tmp/manager/
sos.prj
sos.shp
sos.shx
sbb.shh
I want to create zip file from sos.prj,sos.shp,sos.shx files
I want to use grep. In other words, I want to create zip file from grep's result
Can anybody help me?
zip myArchiveName *.{prj,shp,shx}
This will zip all files with your extensions listed into a zip file named myArchiveName.zip
People normally use tar and some archiver, examples:
tar czf manager.tar.gz /tmp/manager/
tar cjf manager.tar.bz2 /tmp/manager/
tar cJf manager.tar.xz /tmp/manager/
.xz format often yields the highest compression ratio and this is the compression format used for .rpm in Fedora.
Perhaps the OP was thinking of using grep to find all the sos.* files but as #Impossibear says it's easier to just use a wildcard. If you want to focus on sos files, you could use zip myArchive sos.*

How to extract filename.tar.gz file

I want to extract an archive named filename.tar.gz.
Using tar -xzvf filename.tar.gz doesn't extract the file. it is gives this error:
gzip: stdin: not in gzip format
tar: Child returned status 1
tar: Error exit delayed from previous errors
If file filename.tar.gz gives this message: POSIX tar archive,
the archive is a tar, not a GZip archive.
Unpack a tar without the z, it is for gzipped (compressed), only:
mv filename.tar.gz filename.tar # optional
tar xvf filename.tar
Or try a generic Unpacker like unp (https://packages.qa.debian.org/u/unp.html), a script for unpacking a wide variety of archive formats.
determine the file type:
$ file ~/Downloads/filename.tbz2
/User/Name/Downloads/filename.tbz2: bzip2 compressed data, block size = 400k
As far as I can tell, the command is correct, ASSUMING your input file is a valid gzipped tar file. Your output says that it isn't. If you downloaded the file from the internet, you probably didn't get the entire file, try again.
Without more knowledge of the source of your file, nobody here is going to be able to give you a concrete solution, just educated guesses.
I have the same error
the result of command :
file hadoop-2.7.2.tar.gz
is hadoop-2.7.2.tar.gz: HTML document, ASCII text
the reason that the file is not gzip format due to problem in download or other.
It happens sometimes for the files downloaded with "wget" command. Just 10 minutes ago, I was trying to install something to server from the command screen and the same thing happened. As a solution, I just downloaded the .tar.gz file to my machine from the web then uploaded it to the server via FTP. After that, the "tar" command worked as it was expected.
Internally tar xcvf <filename> will call the binary gzip from the PATH environment variable to decompress the files in the tar archive. Sometimes third party tools use a custom gzip binary which is not compatible with the tar binary.
It is a good idea to check the gzip binary in your PATH with which gzip and make sure that a correct gzip binary is called.
A tar.gz is a tar file inside a gzip file, so 1st you must unzip the gzip file with gunzip -d filename.tar.gz , and then use tar to untar it. However, since gunzip says it isn't in gzip format, you can see what format it is in with file filename.tar.gz, and use the appropriate program to open it.
Check to make sure that the file is complete. This error message can occur if you only partially downloaded a file or if it has major issues. Check the MD5sum.
The other scenario you mush verify is that the file you're trying to unpack is not empty and is valid.
In my case I wasn't downloading the file correctly, after double check and I made sure I had the right file I could unpack it without any issues.
So, basically the seemingly tar.gz file is not really in the format it should be. This can be ascertained using file Linux command. Example, for a genuine .tgz file, the command output will be as below:
root#f562353fc1ab:/app# file kafka_2.13-2.8.0.tgz
kafka_2.13-2.8.0.tgz: gzip compressed data, from FAT filesystem (MS-DOS, OS/2, NT), original size modulo 2^32 75202560
So, the source from where you received the file hasn't sent it in the correct format. If you have downloaded the supposedly .tgz file from a URI, may be the URI is wrong. In my case, I faced the same issue while extracting kafka binary (.tgz file). Turns out, that the URI to wget was incorrect. At least for kafka, to get the correct download link, from the downloads page (https://kafka.apache.org/downloads.html) , we must traverse to the page that is highlighted by the link representing the binary. Once we are in that page, we will get the exact link to download the binary. Also, during download, wget displays the type of the file that will be downloaded. It will print something like this to indicate the type.
Length: unspecified [text/html] --> Incorrect URI.
Length: 71403603 (68M) [application/octet-stream] --> Correct URI.

Resources