bash: sending compressed files while compressing others - linux

i have a simple bash script to download a lot of logs files over pretty slow network. i can compress logs on the remote side. basically it's:
ssh: compress whole directory
scp: download archive
ssh: rm archive
using lzma gives great compression but compressing the whole directory is slow. is there any tool or easy way to write a script that allows me to compress a single files (or a bunch of files) and start downloading them while other files/chunks are still being compressed? i was thinking about launching compressing for every single file in the background and in the loop downloading/rsync files with correct extension. but then i don't know how to check if compressing process finished its work

The easiest way would be to compress them in transit using ssh -C. However, if you have a large number of small files, you are better off tarring and gzip/bzipping the whole directory at once using tar zcf or tar jcf. You may be able to start downloading the file while it's still being written, though I haven't tried it.

best solution i found here. in my case it was:
ssh -T user#example.com 'tar ... | lzma -5 -' > big.compressed

Try sshing into your server and going to the log directory and using GNU Parallel to compress all the logs in parallel and as each one is compressed, change its name to add the .done suffix so you can do rsync. So, on the server you would run:
cd <LOG DIRECTORY>
rm ALL_COMPRESSED.marker
parallel 'lzma {}; mv {}.lzma {}.lzma.done' ::: *.log
touch ALL_COMPRESSED.marker

Related

GZip an entire directory

i used the following:
gzip -9 -c -r <some_directory> > directory.gz
how do i decompress this directory ?
I have tried
gunzip directory.gz
i am just left with a single file and not a directory structure.
As others have already mentioned, gzip is a file compression tool and not an archival tool. It cannot work with directories. When you run it with -r, it will find all files in a directory hierarchy and compress them, i.e. replacing path/to/file with path/to/file.gz. When you pass -c the gzip output is written to stdout instead of creating files. You have effectively created one big file which contains several gzip-compressed files.
Now, you could look for the gzip file header/magic number, which is 1f8b and then reconstruct your files manually.
The sensible thing to do now is to create backups (if you haven't already). Backups always help (especially with problems such as yours). Create a backup of your directory.gz file now. Then read on.
Fortunately, there's an easier way than manually reconstructing all files: using binwalk, a forensics utility which can be used to extract files from within other files. I tried it with a test file, which was created the same way as yours. Running binwalk -e file.gz will create a folder with all extracted files. It even manages to reconstruct the original file names. The hierarchy of the directories is probably lost. But at least you have your file contents and their names back. Good luck!
Remember: backups are essential.
(For completeness' sake: What you probably intended to run: tar czf directory.tar.gz directory and then tar xf directory.tar.gz)
gzip will compress 1+ files, though not meant to function like an archive utility. The posted cmd-line would yield N compressed file images concatenated to stdout, redirected to the named output file; unfortunately stuff like filenames and any dirs would not be recorded. A pair like this should work:
(create)
tar -czvf dir.tar.gz <some-dir>
(extract)
tar -xzvf dir.tar.gz

Use of temporary files and memory when using tar to backup very _large_ files with compression

When backing up one or more _very_large_ files using tar with compression (-j or -z) how does GNU tar manage the use of temporary files and memory?
Does it backup and compress the files block by block, file by file, or some other way?
Is there a difference between the way the following two commands use temporary files and memory?
tar -czf data.tar.gz ./data/*
tar -cf - ./data/* | gzip > data.tar.gz
Thank you.
No temporary files are used by either command. tar works completely in a streaming fashion. Packaging and compressing are completely separated from each other and also done in a piping mechanism when using the -z or -j option (or similar).
For each file tar it puts into an archive, it computes a file info datagram which contains the file's path, its user, permissions, etc., and also its size. This needs to be known up front (that's why putting the output of a stream into a tar archive isn't easy without using a temp file). After this datagram, the plain contents of the file follows. Since its size is known and already part of the file info ahead, the end of the file is unambiguous. So after this, the next file in the archive can follow directly. In this process no temporary files are needed for anything.
This stream of bytes is given to any of the implemented compression algorithms which also do not create any temporary files. Here I'm going out on a limb a bit because I don't know all compression algorithms by heart, but all that I ever came in touch with do not create temporary files.

Elegantly send local tarball and untar on remote end

All,
This might be a FAQ, but I can't get my search-fu to find it. Namely, I kind of want to do the "reverse" tar pipe. Usually a tar pipe used to send a local folder to a remote location as a tar ball in a single nice command:
tar zcvf - ~/MyFolder | ssh user#remote "cat > ~/backup/MyFolder.tar.gz"
(I hope I got that right. I typed it from memory.)
I'm wondering about the reverse situation. Let's say I locally have a tarball of a large directory and what I want to do is copy it (rsync? scp?) to a remote machine where it will live as the expanded file, i.e.,:
Local: sourcecode.tar.gz ==> send to Remote and untar ==>
Remote: sourcecode/
I want to do this because the "local" disk has inode pressure so keeping a single bigger file is better than many smaller files. But the remote system is one with negligible inode pressure, and it would be preferable to keep it as an expanded directory.
Now, I can think of various ways to do this with &&-command chaining and the like, but I figure there must be a way to do this with tar-pipes and rsync or ssh/scp that I am just not seeing.
You're most of the way there:
ssh user#remote "tar -C /parent/directory -xz -f-" < sourcecode.tar.gz
Where -f- tells tar to extract from stdin, and the -C flag changes directory before untarring.

How to create a Linux compatible zip archive of a directory on a Mac

I've tried multiple ways of creating a zip or a tar.gz on the mac using GUI or command lines, and I have tried decompressing on the Linux side and gotten various errors, from things like "File.XML" and "File.xml" both appearing in a directory, to all sorts of others about something being truncated, etc.
Without listing all my experiments on the command line on the Mac and Linux (using tcsh), what should 2 bullet proof commands be to:
1) make a zip file of a directory (with no __MACOSX folders)
2) unzip / untar (whatever) the Mac zip on Linux with no errors (and no __MACOSX folders)
IT staff on the Linux side said they "usually use .gz and use gzip and gunzip commands".
Thanks!
After much research and experimentation, I found this works every time:
1) Create a zipped tar file with this command on the Mac in Terminal:
tar -cvzf your_archive_name.tar.gz your_folder_name/
2) When you FTP the file from one server to another, make sure you do so with binary mode turned on
3) Unzip and untar in two steps in your shell on the Linux box (in this case, tcsh):
gunzip your_archive_name.tar.gz
tar -xvf your_archive_name.tar
On my Mac and in ssh bash I use the following simple commands:
Create Zip File (-czf)
tar -czf NAME.tgz FOLDER
Extract Zip File (-xzf)
tar -xzf NAME.tgz
Best, Mike
First off, the File.XML and File.xml cannot both appear in an HFS+ file system. It is possible, but very unusual, for someone to format a case-sensitive HFSX file system that would permit that. Can you really create two such files and see them listed separately?
You can use the -X option with zip to prevent resource forks and extended attributes from being saved. You can also throw in a -x .DS_Store to get rid of those files as well.
For tar, precede it with COPYFILE_DISABLE=true or setenv COPYFILE_DISABLE true, depending on your shell. You can also throw in an --exclude=.DS_Store.
Your "IT Staff" gave you a pretty useless answer, since gzip can only compress one file. gzip has to be used in combination with tar to archive a directory.

How to repack zip files without using tmp dir?

I have a lot of zip files that I need to repack/recompress in order to work around a bug in MediaWiki 0.1.18.
I can do it with
#!/bin/bash
for f in *zip; do
cd tmp
rm -rf *
unzip ../"$f"
zip -r ../"$f" *
cd ..
done
but is there a way to do this e.g. with pipes or perhaps a zip option?
gzip -d -c old.gz | gzip >new.gz
There is a utility called AdvanceCOMP that does exactly what you're looking for. It recompresses ZIP and GZ files (and some others) without intermediary extraction to disk. (I do believe that the mechanism used is to decompress the data and recompress it, but that does not require writing files to disk or regenerating metadata.)
You can't. If you send some bits to zip it doesn't have a way to know when one file ends and a new one begins.
Actually you can write your own program to do the job but from your description it seems like an overkill. Also you are not telling what exactly bug are you fixing so other workarounds cannot be suggested.
A bit late, but it may be helpfull for those who come later:
zipsplit -n 2147483648 will repack zip upto 2GiB without extraction. But as this command is for splitting zip files, there is no option to overwrite original or specify output file, only output directory.

Resources