linux wget download tar.gz as html? - linux

I know the problem may seem naive to most of you but I couldn't find a solution. I'm using a linux virtual machine and I'm trying to download the apache-drill.tar.gz from enter link description herethe link (10 minutes tutorial they provide) I did the wget "link" but when I tried to extract the file using the tar -xvzf it gave me error messages as:
gzip:stdion:not in gzip format
tar: Child returned status 1
Error is not recoverable; existing now
Initially I thought it was the file format so I mv apache-drill.tar.gz apache-drill.tar and tar -xvf apache-drill.tar but still error. (file format different)
Then I started to check the size of the file, the original .tar.gz size is 134MB but when I tried:ls -lh apache-drill.tar.gz it's only 34KB which is much smaller than the real tar.gz file So I'm guessing wget is not downloading the file properly. Instead of the tar it actually downloads the html for me..
How can I fix this?
Thanks!

The HTML document you downloaded consists primarily of a bunch of links to mirrors hosting the actual file.
Pick one of them and download the response.

Related

why wget change file type?

I am downloading a tar.gz file using this command:
wget -c http://example.com/example.tar.gz
When download complete,check file type:
file example.tar.gz
The output is:HTML document text.Unpack file:
tar -zxvf example.tar.gz
The output:
gzip: example.tar.gz: not in gzip format
Why would this happen?What should I do to make it right?I am absolutely sure it is a tar.gz file because change other way to download,and unpack successful.This command make it works:
curl -O http://example.com/example.tar.gz
wget doesn't change the file type but what you got as a response for your request was not what you expected. You expected a tar archive but got an HTML file. wget just stores what it got under the name you specified.
Have a look at the contents of HTML file you got. It probably tells you what error has happened (that's the typical case why you get an HTML file instead of what you expected).
wget doesn't change the file format. http://example.com/example.tar.gz, despite what it's name may suggest, isn't a tar.gz archive, but just a plain old HTML page.

How to create a Linux compatible zip archive of a directory on a Mac

I've tried multiple ways of creating a zip or a tar.gz on the mac using GUI or command lines, and I have tried decompressing on the Linux side and gotten various errors, from things like "File.XML" and "File.xml" both appearing in a directory, to all sorts of others about something being truncated, etc.
Without listing all my experiments on the command line on the Mac and Linux (using tcsh), what should 2 bullet proof commands be to:
1) make a zip file of a directory (with no __MACOSX folders)
2) unzip / untar (whatever) the Mac zip on Linux with no errors (and no __MACOSX folders)
IT staff on the Linux side said they "usually use .gz and use gzip and gunzip commands".
Thanks!
After much research and experimentation, I found this works every time:
1) Create a zipped tar file with this command on the Mac in Terminal:
tar -cvzf your_archive_name.tar.gz your_folder_name/
2) When you FTP the file from one server to another, make sure you do so with binary mode turned on
3) Unzip and untar in two steps in your shell on the Linux box (in this case, tcsh):
gunzip your_archive_name.tar.gz
tar -xvf your_archive_name.tar
On my Mac and in ssh bash I use the following simple commands:
Create Zip File (-czf)
tar -czf NAME.tgz FOLDER
Extract Zip File (-xzf)
tar -xzf NAME.tgz
Best, Mike
First off, the File.XML and File.xml cannot both appear in an HFS+ file system. It is possible, but very unusual, for someone to format a case-sensitive HFSX file system that would permit that. Can you really create two such files and see them listed separately?
You can use the -X option with zip to prevent resource forks and extended attributes from being saved. You can also throw in a -x .DS_Store to get rid of those files as well.
For tar, precede it with COPYFILE_DISABLE=true or setenv COPYFILE_DISABLE true, depending on your shell. You can also throw in an --exclude=.DS_Store.
Your "IT Staff" gave you a pretty useless answer, since gzip can only compress one file. gzip has to be used in combination with tar to archive a directory.

wget:: How to rename all the already dowloaded files as name given by wget --content-dispostion?

I have downloaded 1200 jpeg files using wget. But the name of the files are based on links from which they are downloaded.
For ex.
http://www.*.*.*/index.php?id=0MwfTcqbP9dl1_icR3_gVezE8tlpUJt-wumA5hHjpjk will download the file with name index.php?id=0MwfTcqbP9dl1_icR3_gVezE8tlpUJt-wumA5hHjpjk.jpg but its name on server is different. Now I want all the files to be named as the name on the server.
One way is to delete all the files and re-download it with wget option --content-dispostion but total size of the download is 8GB and downloading it again is not a good option.
How can I rename all the downloaded files as names on server?
Edit: Name of the jpeg files downloaded from the links using wget --content-disposition or browser would be like 2014:08:09_18:07:51_IMG_5543.jpg (not created by wget, it's oringinal name on server, uploader's file name). I want all the files to be named as their oringinal names without again downloading them.
If the webserver supports HEAD request, you can use commands like wget --server-response --spider $URL, otherwise you can use range 0-1 to get one byte only. After you have the response heahers, you can write a script to rename. – Wu Yongzheng

How to extract filename.tar.gz file

I want to extract an archive named filename.tar.gz.
Using tar -xzvf filename.tar.gz doesn't extract the file. it is gives this error:
gzip: stdin: not in gzip format
tar: Child returned status 1
tar: Error exit delayed from previous errors
If file filename.tar.gz gives this message: POSIX tar archive,
the archive is a tar, not a GZip archive.
Unpack a tar without the z, it is for gzipped (compressed), only:
mv filename.tar.gz filename.tar # optional
tar xvf filename.tar
Or try a generic Unpacker like unp (https://packages.qa.debian.org/u/unp.html), a script for unpacking a wide variety of archive formats.
determine the file type:
$ file ~/Downloads/filename.tbz2
/User/Name/Downloads/filename.tbz2: bzip2 compressed data, block size = 400k
As far as I can tell, the command is correct, ASSUMING your input file is a valid gzipped tar file. Your output says that it isn't. If you downloaded the file from the internet, you probably didn't get the entire file, try again.
Without more knowledge of the source of your file, nobody here is going to be able to give you a concrete solution, just educated guesses.
I have the same error
the result of command :
file hadoop-2.7.2.tar.gz
is hadoop-2.7.2.tar.gz: HTML document, ASCII text
the reason that the file is not gzip format due to problem in download or other.
It happens sometimes for the files downloaded with "wget" command. Just 10 minutes ago, I was trying to install something to server from the command screen and the same thing happened. As a solution, I just downloaded the .tar.gz file to my machine from the web then uploaded it to the server via FTP. After that, the "tar" command worked as it was expected.
Internally tar xcvf <filename> will call the binary gzip from the PATH environment variable to decompress the files in the tar archive. Sometimes third party tools use a custom gzip binary which is not compatible with the tar binary.
It is a good idea to check the gzip binary in your PATH with which gzip and make sure that a correct gzip binary is called.
A tar.gz is a tar file inside a gzip file, so 1st you must unzip the gzip file with gunzip -d filename.tar.gz , and then use tar to untar it. However, since gunzip says it isn't in gzip format, you can see what format it is in with file filename.tar.gz, and use the appropriate program to open it.
Check to make sure that the file is complete. This error message can occur if you only partially downloaded a file or if it has major issues. Check the MD5sum.
The other scenario you mush verify is that the file you're trying to unpack is not empty and is valid.
In my case I wasn't downloading the file correctly, after double check and I made sure I had the right file I could unpack it without any issues.
So, basically the seemingly tar.gz file is not really in the format it should be. This can be ascertained using file Linux command. Example, for a genuine .tgz file, the command output will be as below:
root#f562353fc1ab:/app# file kafka_2.13-2.8.0.tgz
kafka_2.13-2.8.0.tgz: gzip compressed data, from FAT filesystem (MS-DOS, OS/2, NT), original size modulo 2^32 75202560
So, the source from where you received the file hasn't sent it in the correct format. If you have downloaded the supposedly .tgz file from a URI, may be the URI is wrong. In my case, I faced the same issue while extracting kafka binary (.tgz file). Turns out, that the URI to wget was incorrect. At least for kafka, to get the correct download link, from the downloads page (https://kafka.apache.org/downloads.html) , we must traverse to the page that is highlighted by the link representing the binary. Once we are in that page, we will get the exact link to download the binary. Also, during download, wget displays the type of the file that will be downloaded. It will print something like this to indicate the type.
Length: unspecified [text/html] --> Incorrect URI.
Length: 71403603 (68M) [application/octet-stream] --> Correct URI.

How to zip folder that contains more than 12GB data

I have a requirement to zip a folder which contains large number of files.
When I tried to zip in command line, it is showing zip error: Input file read failure
I searched net and found "The .ZIP file format, only handles file lengths that can be
contained in a 32-bit integer." If so, then it must be the cause of the error I got, because my folder size is more than 12GB. Is there any way to extend the file size to be zipped.
Or is there another way to solve this?
I am using CENTOS 5.
Thanks.
You can use tar for that.
Just try:
$tar -cvzf compress.tgz /path/to/your/data
and to extract it:
$tar -xvzf compress.tgz
GZip can handle any size that your file system can handle. You might want to first "tar" the content to one file, unsing the GnuTar you can use the z option to do the compression in one go.
7Zip is also a good alternative to ZIP. It is ported to many platforms and the size limits are much higher.

Resources