Is it possible to remove all files that start with certain name from a compressed tarball without extracting and recreating the archive again?
The tar file format is a streaming format, so it would be possible to do this by reading the old file, skipping over the unwanted file(s), and copying all the data you want to keep to a new tar file. If the tar file is also compressed (eg. .tar.gz), then you would have to uncompress, filter, recompress, and write.
I don't know of any existing tools to do this, but it should be reasonably straightforward using the Python tarfile module, for example.
You may post-process the tar stream with tardy, which can filter a tar file to remove (or rename) some files. You probably want to use the -EXclude option, or perhaps the -Remove_Prefix one. (tardy has case sensitive options).
Related
i used the following:
gzip -9 -c -r <some_directory> > directory.gz
how do i decompress this directory ?
I have tried
gunzip directory.gz
i am just left with a single file and not a directory structure.
As others have already mentioned, gzip is a file compression tool and not an archival tool. It cannot work with directories. When you run it with -r, it will find all files in a directory hierarchy and compress them, i.e. replacing path/to/file with path/to/file.gz. When you pass -c the gzip output is written to stdout instead of creating files. You have effectively created one big file which contains several gzip-compressed files.
Now, you could look for the gzip file header/magic number, which is 1f8b and then reconstruct your files manually.
The sensible thing to do now is to create backups (if you haven't already). Backups always help (especially with problems such as yours). Create a backup of your directory.gz file now. Then read on.
Fortunately, there's an easier way than manually reconstructing all files: using binwalk, a forensics utility which can be used to extract files from within other files. I tried it with a test file, which was created the same way as yours. Running binwalk -e file.gz will create a folder with all extracted files. It even manages to reconstruct the original file names. The hierarchy of the directories is probably lost. But at least you have your file contents and their names back. Good luck!
Remember: backups are essential.
(For completeness' sake: What you probably intended to run: tar czf directory.tar.gz directory and then tar xf directory.tar.gz)
gzip will compress 1+ files, though not meant to function like an archive utility. The posted cmd-line would yield N compressed file images concatenated to stdout, redirected to the named output file; unfortunately stuff like filenames and any dirs would not be recorded. A pair like this should work:
(create)
tar -czvf dir.tar.gz <some-dir>
(extract)
tar -xzvf dir.tar.gz
I want to download, decompress, and use a pretrained model from tensorflow-hub
After downloading I end up with a 1.tar.tar file, which I probably need to extract / decompress in order to be able to use it.
I can't wrap my head around how, I am working in a Linux terminal.
If your tar file is compressed using tar compression, use this command to decompress it. Make sure to be in the directory of the tar.tar file, it will decompress everything into the directory you are currently in.
$ tar xvzf 1.tar.tar
Where,
x: This option tells tar to extract the files.
v: The “v” stands for “verbose.” This option will list all of the files one by one in the archive.
z: The z option is very important and tells the tar command to uncompress the file.
f: This options tells tar that you are going to give it a file name to work with.
Nice to know:
A tarball is a group or archive of files that are bundled together using the tar command and have the .tar file extension.
When backing up one or more _very_large_ files using tar with compression (-j or -z) how does GNU tar manage the use of temporary files and memory?
Does it backup and compress the files block by block, file by file, or some other way?
Is there a difference between the way the following two commands use temporary files and memory?
tar -czf data.tar.gz ./data/*
tar -cf - ./data/* | gzip > data.tar.gz
Thank you.
No temporary files are used by either command. tar works completely in a streaming fashion. Packaging and compressing are completely separated from each other and also done in a piping mechanism when using the -z or -j option (or similar).
For each file tar it puts into an archive, it computes a file info datagram which contains the file's path, its user, permissions, etc., and also its size. This needs to be known up front (that's why putting the output of a stream into a tar archive isn't easy without using a temp file). After this datagram, the plain contents of the file follows. Since its size is known and already part of the file info ahead, the end of the file is unambiguous. So after this, the next file in the archive can follow directly. In this process no temporary files are needed for anything.
This stream of bytes is given to any of the implemented compression algorithms which also do not create any temporary files. Here I'm going out on a limb a bit because I don't know all compression algorithms by heart, but all that I ever came in touch with do not create temporary files.
I have a tar file called test.tgz , inside it are the following files:
tool.foo
atest.you
btest.you
ctest.you
t.you
I want to rename the files inside test.tgz to be:
0.foo
0.you
1.you
2.you
3.you
Without the use of extracting the files and repacking them. How could I accomplish this?
Even though you can't rename the files in the tar archive, you can rename them with a sed expression on the fly while they are being extracted. The option to tar is--transform [sed-expression].
You do need to extract the files before you rename them. When files are in a tgz, they are protected from change.
Patches are frequently released for my CMS system. I want to be able to extract the tar file containing the patched files for the latest version directly over the full version on my development system. When I extract a tar file it puts it into a folder with the name of the tar file. That leaves me to manually copy each file over to the main directory. Is there a way to force the tar to extract the files into the current directory and overwrite any files that have the same filenames? Any directories that already exist should not be overwritten, but merged...
Is this possible? If so, what is the command?
Check out the --strip-components (or --strippath) argument to tar, might be what you're looking for.
EDIT: you might want to throw --keep-newer into the mix, so any locally modified files aren't overwritten. And I would suggest testing new releases on a development server, then using rsync or subversion to carry over the changes.
I tried getting --strip-components to work and, while I didn't try that hard, I didn't get it working. It kept flattening the directory structure. In searching, I came across the following command that seems to do exactly what I want:
pax -r -f patch.tar -s'/patch///'
It's not tar, but hey, it works... Replace the words "patch" with whatever your tar file name is.
The option '--strip-components' allows you to trim parts of the embedded filenames. With that it is possible to do what you want.
For more help check out http://www.gnu.org/software/tar/manual/html_section/transform.html
I have just done:
tar -xzf patch.tar.gz
And it overwrites all the files that the patch contains.
I.e., if the patch was created for the contents of the app folder, I would extract it there. Results would be like this:
tar.gz contains: oldfolder/someoldfile.txt, oldfolder/newfolder/newfile.txt
before app looks like:
app/oldfolder/someoldfile.txt
Afterwards, app looks like
app/oldfolder/someoldfile.txt
oldfolder/newfolder/newfile.txt
And the "someoldfile.txt" is actually updated to what was in the tar.gz
Maybe this doesn't work with regular tar, only tar.gz. But I doubt it. I think it should work for everything, as long as user has write permissions.