why wget change file type? - linux

I am downloading a tar.gz file using this command:
wget -c http://example.com/example.tar.gz
When download complete,check file type:
file example.tar.gz
The output is:HTML document text.Unpack file:
tar -zxvf example.tar.gz
The output:
gzip: example.tar.gz: not in gzip format
Why would this happen?What should I do to make it right?I am absolutely sure it is a tar.gz file because change other way to download,and unpack successful.This command make it works:
curl -O http://example.com/example.tar.gz

wget doesn't change the file type but what you got as a response for your request was not what you expected. You expected a tar archive but got an HTML file. wget just stores what it got under the name you specified.
Have a look at the contents of HTML file you got. It probably tells you what error has happened (that's the typical case why you get an HTML file instead of what you expected).

wget doesn't change the file format. http://example.com/example.tar.gz, despite what it's name may suggest, isn't a tar.gz archive, but just a plain old HTML page.

Related

using tar to extract latest .gz file into another directory

Below is my code
#!/bin/bash
# Location for backups to be saved.
EXTRACTTO=/opt/test_script
#stores the latest .gz file to be extracted
EXTRACTFROM= ls -t /opt/scripts/AXDB1.clean_pof_backup* | head -1
echo $EXTRACTFROM
tar -xf $EXTRACTFROM -C $EXTRACTTO
EXTRACTTO contains the path where I want to extract my .gz files to.
EXTRACTFROM contains the latest .gz file, which will be extracted.
However when I pass this variables which contains directory path, in tar command
it gives invalid directory error.
Can someone tell how can I accomplish my task here?
Judging from the error message, it seems that EXTRACTTO is incorrect. Print it out and try running the tar command manually.

How to untar a .img file?

I was in needed of backing up a .img file, basically what I had to do was to compress the file and copy it to another location. On the begining this file was a +800GB file, so I ran this command:
tar -cvf file.img file.tar
Of course that I didn't see the problem of the command till this was promped
tar: fle.tar: Cannot stat: No such file or directory
tar: Exiting with failure status due to previous errors
Now I ran the command properly after that error
tar -cvf file.tar file.img
And this time succeeded... The problem came when I realized the original +800GB file was now a 12KB file.
I don't know If I damaged the file or if it was compressed, If so, how do I get back the original size of the file?
I am using a linux SLES 11
When you ran
tar -cvf file.img file.tar
you overwrote file.img, creating a tar file with no contents, even though the tar commmand appeared to fail. So when you swapped the parameters around, your large image file was gone. Sorry but I think you've lost the file.
"man tar" will help you
-c = create new tar file
-x = extract tar file
and compression options are -z -Z -j (-z -Z -y on BSD & MAC)

linux wget download tar.gz as html?

I know the problem may seem naive to most of you but I couldn't find a solution. I'm using a linux virtual machine and I'm trying to download the apache-drill.tar.gz from enter link description herethe link (10 minutes tutorial they provide) I did the wget "link" but when I tried to extract the file using the tar -xvzf it gave me error messages as:
gzip:stdion:not in gzip format
tar: Child returned status 1
Error is not recoverable; existing now
Initially I thought it was the file format so I mv apache-drill.tar.gz apache-drill.tar and tar -xvf apache-drill.tar but still error. (file format different)
Then I started to check the size of the file, the original .tar.gz size is 134MB but when I tried:ls -lh apache-drill.tar.gz it's only 34KB which is much smaller than the real tar.gz file So I'm guessing wget is not downloading the file properly. Instead of the tar it actually downloads the html for me..
How can I fix this?
Thanks!
The HTML document you downloaded consists primarily of a bunch of links to mirrors hosting the actual file.
Pick one of them and download the response.

How to extract filename.tar.gz file

I want to extract an archive named filename.tar.gz.
Using tar -xzvf filename.tar.gz doesn't extract the file. it is gives this error:
gzip: stdin: not in gzip format
tar: Child returned status 1
tar: Error exit delayed from previous errors
If file filename.tar.gz gives this message: POSIX tar archive,
the archive is a tar, not a GZip archive.
Unpack a tar without the z, it is for gzipped (compressed), only:
mv filename.tar.gz filename.tar # optional
tar xvf filename.tar
Or try a generic Unpacker like unp (https://packages.qa.debian.org/u/unp.html), a script for unpacking a wide variety of archive formats.
determine the file type:
$ file ~/Downloads/filename.tbz2
/User/Name/Downloads/filename.tbz2: bzip2 compressed data, block size = 400k
As far as I can tell, the command is correct, ASSUMING your input file is a valid gzipped tar file. Your output says that it isn't. If you downloaded the file from the internet, you probably didn't get the entire file, try again.
Without more knowledge of the source of your file, nobody here is going to be able to give you a concrete solution, just educated guesses.
I have the same error
the result of command :
file hadoop-2.7.2.tar.gz
is hadoop-2.7.2.tar.gz: HTML document, ASCII text
the reason that the file is not gzip format due to problem in download or other.
It happens sometimes for the files downloaded with "wget" command. Just 10 minutes ago, I was trying to install something to server from the command screen and the same thing happened. As a solution, I just downloaded the .tar.gz file to my machine from the web then uploaded it to the server via FTP. After that, the "tar" command worked as it was expected.
Internally tar xcvf <filename> will call the binary gzip from the PATH environment variable to decompress the files in the tar archive. Sometimes third party tools use a custom gzip binary which is not compatible with the tar binary.
It is a good idea to check the gzip binary in your PATH with which gzip and make sure that a correct gzip binary is called.
A tar.gz is a tar file inside a gzip file, so 1st you must unzip the gzip file with gunzip -d filename.tar.gz , and then use tar to untar it. However, since gunzip says it isn't in gzip format, you can see what format it is in with file filename.tar.gz, and use the appropriate program to open it.
Check to make sure that the file is complete. This error message can occur if you only partially downloaded a file or if it has major issues. Check the MD5sum.
The other scenario you mush verify is that the file you're trying to unpack is not empty and is valid.
In my case I wasn't downloading the file correctly, after double check and I made sure I had the right file I could unpack it without any issues.
So, basically the seemingly tar.gz file is not really in the format it should be. This can be ascertained using file Linux command. Example, for a genuine .tgz file, the command output will be as below:
root#f562353fc1ab:/app# file kafka_2.13-2.8.0.tgz
kafka_2.13-2.8.0.tgz: gzip compressed data, from FAT filesystem (MS-DOS, OS/2, NT), original size modulo 2^32 75202560
So, the source from where you received the file hasn't sent it in the correct format. If you have downloaded the supposedly .tgz file from a URI, may be the URI is wrong. In my case, I faced the same issue while extracting kafka binary (.tgz file). Turns out, that the URI to wget was incorrect. At least for kafka, to get the correct download link, from the downloads page (https://kafka.apache.org/downloads.html) , we must traverse to the page that is highlighted by the link representing the binary. Once we are in that page, we will get the exact link to download the binary. Also, during download, wget displays the type of the file that will be downloaded. It will print something like this to indicate the type.
Length: unspecified [text/html] --> Incorrect URI.
Length: 71403603 (68M) [application/octet-stream] --> Correct URI.

Unable to untar a file?

I have written a shellscript which tries to pull a tar file from an ftp server and untar it locally. I need to extract specific files from the tar archive. The filename of the tarfile contains a date; I need to be able to select a tar file based on this date.
abc_myfile_$date.tar is the format of the file I am pulling from the ftp server.
My current code looks like this:
for host in ftpserver
do
ftp -inv host <<END_SCRIPT
user username password
prompt
cd remotepath
lcd localpath
mget *myfile_$date*.tar
quit
END_SCRIPT
done
for next in `ls localpath/*.tar`
do
tar xvf $next *required_file_in_tar_file*.dat
done
when i run the script am not able to untar the files
I am able to get a single tar file from the ftp server only if I mention the exact name of that file. I would like to get a file which has myfile_$date in its name. After this I would like to extract it to a local path to get the specified files in that tar file whose names consist of my required_files.
You get the .tar file, but decompress it with z option. Compressed files (those that require z) normally have .tar.gz prefix. Try
tar xvf $next *required_file_in_tar_file*.dat
Firstly, if you want to use wildcards for the file name that you're getting from the server you need to use mget instead of get. Wildcard file expansion (the *) does not work for the get command.
Once you have pulled the file the tar operation will work as expected, most modern versions of linux/bsd have a 'smart' tar, which doesn't need the 'z' command to specify that the tar file is compressed - they'll figure out that the tarball is compressed on their own and uncompress it automatically, providing the appropriate compression/decompression tool is on the system (bzip2 for .jz files, gzip for .gz files).
I'm not quite sure, but does the FTP protocol not have a command mget if you want to download multiple files? (instead of get)

Resources