Recover a text file in linux - linux

My project c source code file is corrupted while making the tgz of the file. I wanted to make *.tgz of 4 files. The file names are common.c common.h myfile.c and myfile.h. I mistyped the tar command. I used the following tar command by mistake
tar -cvf common.* myfile.* project.tgz
This has corrupted the common.c file. Is there any way to overcome this error?

If the file is really important, than umount the related block device, and you can look into it with the strings command. If the file contains something relatively unique, you can grep it, and if you have a little chance you can save the most of the file.
This can work with mounted block devices, but the chance of loosing the data is higher. But it works only under system's where you can access the block device directly.

Related

GZip an entire directory

i used the following:
gzip -9 -c -r <some_directory> > directory.gz
how do i decompress this directory ?
I have tried
gunzip directory.gz
i am just left with a single file and not a directory structure.
As others have already mentioned, gzip is a file compression tool and not an archival tool. It cannot work with directories. When you run it with -r, it will find all files in a directory hierarchy and compress them, i.e. replacing path/to/file with path/to/file.gz. When you pass -c the gzip output is written to stdout instead of creating files. You have effectively created one big file which contains several gzip-compressed files.
Now, you could look for the gzip file header/magic number, which is 1f8b and then reconstruct your files manually.
The sensible thing to do now is to create backups (if you haven't already). Backups always help (especially with problems such as yours). Create a backup of your directory.gz file now. Then read on.
Fortunately, there's an easier way than manually reconstructing all files: using binwalk, a forensics utility which can be used to extract files from within other files. I tried it with a test file, which was created the same way as yours. Running binwalk -e file.gz will create a folder with all extracted files. It even manages to reconstruct the original file names. The hierarchy of the directories is probably lost. But at least you have your file contents and their names back. Good luck!
Remember: backups are essential.
(For completeness' sake: What you probably intended to run: tar czf directory.tar.gz directory and then tar xf directory.tar.gz)
gzip will compress 1+ files, though not meant to function like an archive utility. The posted cmd-line would yield N compressed file images concatenated to stdout, redirected to the named output file; unfortunately stuff like filenames and any dirs would not be recorded. A pair like this should work:
(create)
tar -czvf dir.tar.gz <some-dir>
(extract)
tar -xzvf dir.tar.gz

Use of temporary files and memory when using tar to backup very _large_ files with compression

When backing up one or more _very_large_ files using tar with compression (-j or -z) how does GNU tar manage the use of temporary files and memory?
Does it backup and compress the files block by block, file by file, or some other way?
Is there a difference between the way the following two commands use temporary files and memory?
tar -czf data.tar.gz ./data/*
tar -cf - ./data/* | gzip > data.tar.gz
Thank you.
No temporary files are used by either command. tar works completely in a streaming fashion. Packaging and compressing are completely separated from each other and also done in a piping mechanism when using the -z or -j option (or similar).
For each file tar it puts into an archive, it computes a file info datagram which contains the file's path, its user, permissions, etc., and also its size. This needs to be known up front (that's why putting the output of a stream into a tar archive isn't easy without using a temp file). After this datagram, the plain contents of the file follows. Since its size is known and already part of the file info ahead, the end of the file is unambiguous. So after this, the next file in the archive can follow directly. In this process no temporary files are needed for anything.
This stream of bytes is given to any of the implemented compression algorithms which also do not create any temporary files. Here I'm going out on a limb a bit because I don't know all compression algorithms by heart, but all that I ever came in touch with do not create temporary files.

zip command skip errors

zip -r file.zip folder/
This is the typical command I use to zip a directory, however it is on an active site so images are constantly deleted/updated. Leading to the command failing due to a file being there when it started the process but not there when it gets to actually compressing it (at least from what I can see).
I have no option to stop the editing of the files in this case so my only hope is to just skip them, the amount of images getting edited compared to the sheer size of the directory is insigificant. so 2-3 files changing out of 100,000 is nothing, but the error stops the compression altogether.
I tried to look for a way around this, but have had no luck, could be just looking in the wrong direction but I feel that there is no way this is impossible.
Here is an example error:
zip I/O error: No such file or directory
zip error: Input file read failure (was zipping uploads/2010/03/file.jpg)
Is there some way to use the zip command or something similar to zip a folder, but if it runs into an error when it hits a file, it just skips it?
tar is always a good option to compress in Linux. Beware that zip may also have file size limit issue.
tar vcfz file.tar.gz folder

How to extract filename.tar.gz file

I want to extract an archive named filename.tar.gz.
Using tar -xzvf filename.tar.gz doesn't extract the file. it is gives this error:
gzip: stdin: not in gzip format
tar: Child returned status 1
tar: Error exit delayed from previous errors
If file filename.tar.gz gives this message: POSIX tar archive,
the archive is a tar, not a GZip archive.
Unpack a tar without the z, it is for gzipped (compressed), only:
mv filename.tar.gz filename.tar # optional
tar xvf filename.tar
Or try a generic Unpacker like unp (https://packages.qa.debian.org/u/unp.html), a script for unpacking a wide variety of archive formats.
determine the file type:
$ file ~/Downloads/filename.tbz2
/User/Name/Downloads/filename.tbz2: bzip2 compressed data, block size = 400k
As far as I can tell, the command is correct, ASSUMING your input file is a valid gzipped tar file. Your output says that it isn't. If you downloaded the file from the internet, you probably didn't get the entire file, try again.
Without more knowledge of the source of your file, nobody here is going to be able to give you a concrete solution, just educated guesses.
I have the same error
the result of command :
file hadoop-2.7.2.tar.gz
is hadoop-2.7.2.tar.gz: HTML document, ASCII text
the reason that the file is not gzip format due to problem in download or other.
It happens sometimes for the files downloaded with "wget" command. Just 10 minutes ago, I was trying to install something to server from the command screen and the same thing happened. As a solution, I just downloaded the .tar.gz file to my machine from the web then uploaded it to the server via FTP. After that, the "tar" command worked as it was expected.
Internally tar xcvf <filename> will call the binary gzip from the PATH environment variable to decompress the files in the tar archive. Sometimes third party tools use a custom gzip binary which is not compatible with the tar binary.
It is a good idea to check the gzip binary in your PATH with which gzip and make sure that a correct gzip binary is called.
A tar.gz is a tar file inside a gzip file, so 1st you must unzip the gzip file with gunzip -d filename.tar.gz , and then use tar to untar it. However, since gunzip says it isn't in gzip format, you can see what format it is in with file filename.tar.gz, and use the appropriate program to open it.
Check to make sure that the file is complete. This error message can occur if you only partially downloaded a file or if it has major issues. Check the MD5sum.
The other scenario you mush verify is that the file you're trying to unpack is not empty and is valid.
In my case I wasn't downloading the file correctly, after double check and I made sure I had the right file I could unpack it without any issues.
So, basically the seemingly tar.gz file is not really in the format it should be. This can be ascertained using file Linux command. Example, for a genuine .tgz file, the command output will be as below:
root#f562353fc1ab:/app# file kafka_2.13-2.8.0.tgz
kafka_2.13-2.8.0.tgz: gzip compressed data, from FAT filesystem (MS-DOS, OS/2, NT), original size modulo 2^32 75202560
So, the source from where you received the file hasn't sent it in the correct format. If you have downloaded the supposedly .tgz file from a URI, may be the URI is wrong. In my case, I faced the same issue while extracting kafka binary (.tgz file). Turns out, that the URI to wget was incorrect. At least for kafka, to get the correct download link, from the downloads page (https://kafka.apache.org/downloads.html) , we must traverse to the page that is highlighted by the link representing the binary. Once we are in that page, we will get the exact link to download the binary. Also, during download, wget displays the type of the file that will be downloaded. It will print something like this to indicate the type.
Length: unspecified [text/html] --> Incorrect URI.
Length: 71403603 (68M) [application/octet-stream] --> Correct URI.

Can you use tar to apply a patch to an existing web application?

Patches are frequently released for my CMS system. I want to be able to extract the tar file containing the patched files for the latest version directly over the full version on my development system. When I extract a tar file it puts it into a folder with the name of the tar file. That leaves me to manually copy each file over to the main directory. Is there a way to force the tar to extract the files into the current directory and overwrite any files that have the same filenames? Any directories that already exist should not be overwritten, but merged...
Is this possible? If so, what is the command?
Check out the --strip-components (or --strippath) argument to tar, might be what you're looking for.
EDIT: you might want to throw --keep-newer into the mix, so any locally modified files aren't overwritten. And I would suggest testing new releases on a development server, then using rsync or subversion to carry over the changes.
I tried getting --strip-components to work and, while I didn't try that hard, I didn't get it working. It kept flattening the directory structure. In searching, I came across the following command that seems to do exactly what I want:
pax -r -f patch.tar -s'/patch///'
It's not tar, but hey, it works... Replace the words "patch" with whatever your tar file name is.
The option '--strip-components' allows you to trim parts of the embedded filenames. With that it is possible to do what you want.
For more help check out http://www.gnu.org/software/tar/manual/html_section/transform.html
I have just done:
tar -xzf patch.tar.gz
And it overwrites all the files that the patch contains.
I.e., if the patch was created for the contents of the app folder, I would extract it there. Results would be like this:
tar.gz contains: oldfolder/someoldfile.txt, oldfolder/newfolder/newfile.txt
before app looks like:
app/oldfolder/someoldfile.txt
Afterwards, app looks like
app/oldfolder/someoldfile.txt
oldfolder/newfolder/newfile.txt
And the "someoldfile.txt" is actually updated to what was in the tar.gz
Maybe this doesn't work with regular tar, only tar.gz. But I doubt it. I think it should work for everything, as long as user has write permissions.

Resources