How can I convert a tar file to zip using stdout/stdin?
-# takes a list of files from stdin, tar -t provides that but doesn't actually extract the files. Using -xv provides me a list of files and extracts it to disk but I'm trying to avoid touching the disk.
Something like the following but obviously the below will not produce the same file structure.
tar -xf somefile.tar -O | zip somefile.zip -
I can do it by temporarily writing files to disk but I'm trying to avoid that - I'd like to use pipes only.
Basically I'm not aware of "standard" utility doing the conversion on the fly.
But similar question has been discussed over here using a python libs, which has a simple structure, probably you can adapt the script to your requirements.
I never heard about the tool that performs direct conversion. As a workaround, if your files are not extremely huge, you can extract the tar archive into a tmpfs-mounted directory (or similar), that is, the filesystem that resides completely in memory and don't work with disk. It should be much faster than extracting into the disk folder.
Related
I'm looking for a way to update tgz file.
I know what I can update tgz file by unzipping it, inserting file in directory and re-compressing it.
But I do not want decompressed file was created in my disc.
Tar has 'r' option(--append) what appending files to the end of an archive but when I use tar with 'r' option, console logged 'cannot update compressed archive.', 'Error is not recoverable: exiting now.'
If it is impossible, I cannot help but if there is a way, plz let me know.
I modified my question according to Basile Starynkevitch's comment.
This is command for testing.
tar -rf myComppressedFile.tgz insert.txt
Result
tar: Cannot update compressed archives
tar: Error is not recoverable: exiting now
vim myCompressedFile.tgz
./file1.txt
./file2.txt
./file3.txt
./directory1/
./directory1/file4.txt
what I want to update
./file1.txt
./file2.txt
./file3.txt
./directory1/
./directory1/file4.txt
./insert.txt <<<< I want to add like this file
My tgz file is about more 100 megabytes.
Is there a way to add or update file in tar.gz(tgz) without decompress?
No, there is no way.
Since compression by gzip happens on the entire tar archive (after making the tar ball). Observe the runtime behavior of your tar command using strace(1) (so read syscalls(2)) or ltrace(1)
Read documentation of tar(1). Study the source code of GNU tar (it is free software).
On my Debian/Sid/x86-64, the libc6-dev package provides the /usr/include/tar.h header. I invite you to look inside that header file, it describes the format of tar balls.
Consider using other approaches: some sqlite database, using postgresql or mongodb or GDBM, some afio archive (where each file is perhaps compressed before archiving it). See also the tardy utility.
I'm looking for a way to update tgz file
Why ?
Did you consider instead using version control on the underlying files before their archival with tar? Something like git?
But I do not want decompressed file was created in my disk.
Why ?
The decompressed file might be a temporary one....
Of course, you would need to have different approaches if you deal with a few megabytes of data or if you have a few petabytes of data. Of course, if you are developing a critical embedded application (e.g. medical devices, interplanetary satellite software, DO-178C software systems), things could be different.
There are lots of trade-offs to consider (including money and development time and economical impact of loss of data or of corrupted archive and legal regulations regarding the archive and its data integrity : in a flight recorder you should ensure the data is readable after the crash of an aircraft).
My tgz file is about more 100 megabytes.
This is tiny, and practically would fit (for most Linux systems in 2020, even a cheap RaspBerryPi) in the page cache. You might use some file system kept in RAM (e.g. tmpfs)
I zipped a large regular unix file (.dat) using tar -cvzf command . This file is of around 200 gb in size.
After zipping it became 27gb in size. But while reading data in that zipped file i can see annonymous data added at start of file.
Is this possible?
I tried to unzip that file again and found that unzipped file has no such anonymous records.
The GNU tar command is free software. Please study its source code. Read of course its man page, tar(1).
Indeed, a tar archive starts with a header documented in header file tar.h. There is a POSIX standard related to tar.
See also Peter Miller's tardy utility.
Don't confuse tar archives with zip ones handled by Info-ZIP (so zip and unzip commands).
GNU zip -a compressor, the gzip program which can be started by tar, notably your tar czvf command- is also free software, and of course you should study its source code if interested.
Some Unix shells (notably sash or busybox) have a builtin tar.
I tried to unzip that file again and found that unzipped file has no such anonymous records.
AFAIK, most Linux filesystems try to implement more or less the POSIX standard -based upon read(2) and write(2) system calls, and they don't know about records. If you need "records", consider using databases (like sqlite or PostGreSQL) or indexed files (like GDBM) - both built above Linux file systems or block devices.
Read also a good textbook on operating systems.
Notice that "a large regular unix file" is mostly a sequence of bytes. There is no notion of records inside them, except as a convention used by other user-space programs thru syscalls(2). See also path_resolution(7) and inode(7).
I am trying different ways to update/write an image on a linux device and using rsync for this.
For file system synchronization rsync checks and only transfers missing /changed files reducing the bandwidth.
In similar way I created a binary file of 10MB(original.bin) and modified this file by adding few changes (modified.bin)and tried to rsync the original.bin file.First time it transfers the whole file as there is no copy on the device.Next modified.bin file is renamed to original.bin and did rsync. It only transferred changes in the modified.bin I want to know if this is the same with .dd.xz files as well. I have 2 .dd.xz files (image1.dd.xz and image2.dd.xz which has few dlls and mono packgaes added) and when these files are extracted to .dd files and rsync transfers only changes.
But when i rsync the files as .dd.xz it transfers the whole file again. Can some one help me to understand if this is expected behaviour or rsync behaves same on .dd files as with any other text files?
xz is the extension used by the xz compress tool. Compressed files don't work with rsync for obvious reasons.
Consider whether you're better off using dd images without compressing them. You can (de)compress them faster using the pixz command which does its job in parallel using all available processors.
Usually I create archives of data in linux at the commandline with tar & gzip (or pigz - as this uses parallel processing for compression).
However, listing the contents of such an archive is painfully slow because of the sequential format of tar archives. This is especially true if an archive contains many files that are several GB each.
What is an alternative to my combination to create gzipped tar archives of files in linux. Especially I'm looking for something that allows for a retrieval of the list or tree of files inside the archives, similar to tar - but much more performant?
zip? The zip file format contains a catalog of the contents (at the end, IIRC), which can be retrieved with zipinfo(1).
7zip is probably the best solution nowadays.
ZIP format is a bit outdated and was designed for FAT filesystem, which is where many of its limitations come from.
dar also might be an option. But as far as I can tell there is only one developer and no community around (unlike 7zip which has several forks and ports made by independent developers).
As we know, we could use rsync, tar or cp to transfer directory or files on NFS file system or different server.
Just be curious that which way is best to transfer complex directory?
Or writing a customized c program would be simple and fast to copy or create the link directory structure tree.
The best way is use rsync. It is famous for its delta-transfer algorithm, which reduces the amount of data sent over the network by sending only the differences between the source files and the existing files in the destination. rsync is widely used for backups and mirroring and as an improved copy command for everyday use.
The utility of command tar is: "designed to store and extract files from an archive file known as a tarfile". Not to do a transfer between servers.. And cp is more simple than rsync command. Replace all the files in the destination.
About program customized in C, i think that It is difficult to optimize something that has already been optimized.