Convert tar.gz to zip - linux

I've got a large collection of gzipped archives on my Ubuntu webserver, and I need them converted to zips. I figure this would be done with a script, but what language should I use, and how would I go about unzipping and rezipping files?

I'd do it with a bash(1) one-liner:
for f in *.tar.gz;\
do rm -rf ${f%.tar.gz} ;\
mkdir ${f%.tar.gz} ;\
tar -C ${f%.tar.gz} zxvf $f ;\
zip -r ${f%.tar.gz} $f.zip ;\
rm -rf ${f%.tar.gz} ;\
done
It isn't very pretty because I'm not great at bash(1). Note that this destroys a lot of directories so be sure you know what this does before doing it.
See the bash(1) reference card for more details on the ${foo%bar} syntax.

A simple bash script would be easiest, surely? That way you can just invoke the tar and zip commands.

the easiest solution on unix platforms may well be to use fuse and something like archivemount (libarchive), http://en.wikipedia.org/wiki/Archivemount .
/iaw

You can use node.js and tar-to-zip for this purpose. All you need to do is:
Install node.js with nvm if you do not have it.
And then install tar-to-zip with:
npm i tar-to-zip -g
And use it with:
tar-to-zip *.tar.gz
Also you can convert .tar.gz files to .zip programmatically.
You should install async and tar-to-zip locally:
npm i async tar-to-zip
And then create converter.js with contents:
#!/usr/bin/env node
'use strict';
const fs = require('fs');
const tarToZip = require('tar-to-zip');
const eachSeries = require('async/eachSeries');
const names = process.argv.slice(2);
eachSeries(names, convert, exitIfError);
function convert(name, done) {
const {stdout} = process;
const onProgress = (n) => {
stdout.write(`\r${n}%: ${name}`);
};
const onFinish = (e) => {
stdout.write('\n');
done();
};
const nameZip = name.replace(/\.tar\.gz$/, '.zip');
const zip = fs.createWriteStream(nameZip)
.on('error', (error) => {
exitIfError(error);
fs.unlinkSync(zipPath);
});
const progress = true;
tarToZip(name, {progress})
.on('progress', onProgress)
.on('error', exitIfError)
.getStream()
.pipe(zip)
.on('finish', onFinish);
}
function exitIfError(error) {
if (!error)
return;
console.error(error.message);
process.exit(1);
}

Zipfiles are handy because they offer random access to files. Tar files only sequential.
My solution for this conversion is this shell script, which calls itself via tar(1) "--to-command" option. (I prefer that rather than having 2 scripts). But I admit "untar and zip -r" is faster than this, because zipnote(1) cannot work in-place, unfortunately.
#!/bin/zsh -feu
## Convert a tar file into zip:
usage() {
setopt POSIX_ARGZERO
cat <<EOF
usage: ${0##*/} [+-h] [-v] [--] {tarfile} {zipfile}"
-v verbose
-h print this message
converts the TAR archive into ZIP archive.
EOF
unsetopt POSIX_ARGZERO
}
while getopts :hv OPT; do
case $OPT in
h|+h)
usage
exit
;;
v)
# todo: ignore TAR_VERBOSE from env?
# Pass to the grand-child process:
export TAR_VERBOSE=y
;;
*)
usage >&2
exit 2
esac
done
shift OPTIND-1
OPTIND=1
# when invoked w/o parameters:
if [ $# = 0 ] # todo: or stdin is not terminal
then
# we are invoked by tar(1)
if [ -n "${TAR_VERBOSE-}" ]; then echo $TAR_REALNAME >&2;fi
zip --grow --quiet $ZIPFILE -
# And rename it:
# fixme: this still makes a full copy, so slow.
printf "# -\n#=$TAR_REALNAME\n" | zipnote -w $ZIPFILE
else
if [ $# != 2 ]; then usage >&2; exit 1;fi
# possibly: rm -f $ZIPFILE
ZIPFILE=$2 tar -xaf $1 --to-command=$0
fi

Here is a python solution based on this answer here:
import sys, tarfile, zipfile, glob
def convert_one_archive(file_name):
out_file = file_name.replace('.tar.gz', '.zip')
with tarfile.open(file_name, mode='r:gz') as tf:
with zipfile.ZipFile(out_file, mode='a', compression=zipfile.ZIP_DEFLATED) as zf:
for m in tf.getmembers():
f = tf.extractfile( m )
fl = f.read()
fn = m.name
zf.writestr(fn, fl)
for f in glob.glob('*.tar.gz'):
convert_one_archive(f)

Here is script based on #Brad Campbell's answer that works on files passed as command arguments, works with other tar file types (uncompressed or the other compression types supported by tarfile), and handles directories in the source tar file. It will also print warnings if the source file contains a symlink or hardlink, which are converted to regular files. For symlinks, the link is resolved during conversion. This can lead to an error if the link target is not in the tar; this is also potentially dangerous from a security standpoint, so user beware.
#!/usr/bin/python
import sys, tarfile, zipfile, glob, re
def convert_one_archive(in_file, out_file):
with tarfile.open(in_file, mode='r:*') as tf:
with zipfile.ZipFile(out_file, mode='a', compression=zipfile.ZIP_DEFLATED) as zf:
for m in [m for m in tf.getmembers() if not m.isdir()]:
if m.issym() or m.islnk():
print('warning: symlink or hardlink converted to file')
f = tf.extractfile(m)
fl = f.read()
fn = m.name
zf.writestr(fn, fl)
for in_file in sys.argv[1:]:
out_file = re.sub(r'\.((tar(\.(gz|bz2|xz))?)|tgz|tbz|tbz2|txz)$', '.zip', in_file)
if out_file == in_file:
print(in_file, '---> [skipped]')
else:
print(in_file, '--->', out_file)
convert_one_archive(in_file, out_file)

Related

How to send stdout/stderr to grandparent process?

I have this node.js code:
const k = cp.spawn('bash');
k.stdin.end(`
set -e;
echo; echo; echo 'du results:';
( cd "${extractDir}" && tar -xzvf "${createTarball.pack.value}" ) > /dev/null;
find "${extractDir}/package" -type f | xargs du --threshold=500KB;
`);
k.stdout.pipe(pt(chalk.redBright('this is a big file (>500KB) according to du: '))).pipe(process.stdout);
k.stderr.pipe(pt(chalk.magenta('du stderr: '))).pipe(process.stderr);
the problem is that multiple grandchildren will write all it's stdout/stderr to the parent.
I want to do something like this instead:
const pid = process.pid;
const k = cp.spawn('bash');
k.stdin.end(`
set -e;
exec 3<>/proc/${pid}/fd/1
echo > 3 ; echo > 3; echo 'du results:' > 3;
( cd "${extractDir}" && tar -xzvf "${createTarball.pack.value}" ) > /dev/null;
find "${extractDir}/package" -type f | xargs du --threshold=500KB ;
`);
k.stdout.pipe(pt(chalk.redBright('this is a big file (>500KB) according to du: '))).pipe(process.stdout);
k.stderr.pipe(pt(chalk.magenta('du stderr: '))).pipe(process.stderr);
but that technique won't work on MacOS because Mac doesn't have the /proc/pid fs feature. Does anyone know what I am trying to do and maybe a good workaround?
Install libsys or some other means of making raw syscalls from node. Then, do the syscall dup(2) with 1 as its argument, and store the result to variable fd. Once you do that, fd will be a number of a file descriptor that's a duplicate of node's stdout, and it will be inherited by child processes. At that point, just remove exec 3<>/proc/${pid}/fd/1 from bash, and replace all of your >&3s with >&${fd}s.
So I solved this problem a different way, the OP was a bit of an XY question - so I basically use mkfifo since I am on MacOS and don't have a /proc/<pid> file:
first we call this:
mkfifo "$fifoDir/fifo"
and then we have
const fifo = cp.spawn('bash');
fifo.stdout.setEncoding('utf8');
fifo.stdout
.pipe(pt(chalk.yellow.bold('warning: this is a big file (>5KB) according to du: ')))
.pipe(process.stdout);
fifo.stdin.end(`
while true; do
cat "${fifoDir}/fifo"
done
`);
const k = cp.spawn('bash');
k.stdin.end(`
set -e;
echo; echo 'du results:';
tar -xzvf "${createTarball.pack.value}" -C "${extractDir}" > /dev/null;
${cmd} > "${fifoDir}/fifo";
kill -INT ${fifo.pid};
`);
k.stdout.pipe(process.stdout);
k.stderr.pipe(pt(chalk.magenta('du stderr: '))).pipe(process.stderr);
k.once('exit', code => {
if (code > 0) {
log.error('Could not run cmd:', cmd);
}
cb(code);
});
},
the "fifo" child process is a sibling process to process k, which is obvious when you look at the above code block, and the "fifo" process ends up handling the stdout of one of the subprocesses of the k process - they intercommunicate by way of the named pipe (mkfifo).

bash: How to transfer/copy only the file names to separate similar files?

I've some files in a folder A which are named like that:
001_file.xyz
002_file.xyz
003_file.xyz
in a separate folder B I've files like this:
001_FILE_somerandomtext.zyx
002_FILE_somerandomtext.zyx
003_FILE_somerandomtext.zyx
Now I want to rename, if possible, with just a command line in the bash all the files in folder B with the file names in folder A. The file extension must stay different.
There is exactly the same amount of files in each folder A and B and they both have the same order due to numbering.
I'm a total noob, but I hope some easy answer for the problem will show up.
Thanks in advance!
ZVLKX
*Example edited for clarification
An implementation might look a bit like this:
renameFromDir() {
useNamesFromDir=$1
forFilesFromDir=$2
for f in "$forFilesFromDir"/*; do
# Put original extension in $f_ext
f_ext=${f##*.}
# Put number in $f_num
f_num=${f##*/}; f_num=${f_num%%_*}
# look for a file in directory B with same number
set -- "$useNamesFromDir"/"${f_num}"_*.*
[[ $1 && -e $1 ]] || {
echo "Could not find file number $f_num in $dirB" >&2
continue
}
(( $# > 1 )) && {
# there's more than one file with the same number; write an error
echo "Found more than one file with number $f_num in $dirB" >&2
printf ' - %q\n' "$#" >&2
continue
}
# extract the parts of our destination filename we want to keep
destName=${1##*/} # remove everything up to the last /
destName=${destName%.*} # and past the last .
# write the command we would run to stdout
printf '%q ' mv "$f" "$forFilesFromDir/$destName.$f_ext"; printf '\n'
## or uncomment this to actually run the command
# mv "$f" "$forFilesFromDir/$destName.$f_ext"
done
}
Now, how would we test this?
mkdir -p A B
touch A/00{1,2,3}_file.xyz B/00{1,2,3}_FILE_somerandomtext.zyx
renameFromDir A B
Given that, the output is:
mv B/001_FILE_somerandomtext.zyx B/001_file.zyx
mv B/002_FILE_somerandomtext.zyx B/002_file.zyx
mv B/003_FILE_somerandomtext.zyx B/003_file.zyx
Sorry if this isn't helpful, but I had fun writing it.
This renames items in folder B to the names in folder A, preserving the extension of B.
A_DIR="./A"
A_FILE_EXT=".xyz"
B_DIR="./B"
B_FILE_EXT=".zyx"
FILES_IN_A=`find $A_DIR -type f -name *$A_FILE_EXT`
FILES_IN_B=`find $B_DIR -type f -name *$B_FILE_EXT`
for A_FILE in $FILES_IN_A
do
A_BASE_FILE=`basename $A_FILE`
A_FILE_NUMBER=(${A_BASE_FILE//_/ })
A_FILE_WITHOUT_EXTENSION=(${A_BASE_FILE//./ })
for B_FILE in $FILES_IN_B
do
B_BASE_FILE=`basename $B_FILE`
B_FILE_NUMBER=(${B_BASE_FILE//_/ })
if [ ${A_FILE_NUMBER[0]} == ${B_FILE_NUMBER[0]} ]; then
mv $B_FILE $B_DIR/$A_FILE_WITHOUT_EXTENSION$B_FILE_EXT
break
fi
done
done

renaming many gifs in folder

I have a lot of files that have random generated names like notqr64SC51ruz6zso3_250.gif and I would like to rename them to simply 1.gif, 2.gif etc.
What would be the best way to accomplish this?
A simple UNIX shell script:
N=1; for i in `ls *.gif` ; do mv $i $N.gif ; N=$((N+1)); done
If you need padding:
N=1
for i in *.gif; do
printf -v new "%06d.gif" ${N}
mv -- "$i" "$new"
N=$((N+1));
done
Under Windows you can use a (similar) batch file:
Rename Multiple files with in Dos batch file
If you can use Python:
import glob
import os
files = glob.glob("/path/to/folder/*.gif")
n = 0
for fn in files:
os.rename(fn, str(n).zfill(6) + '.gif')
n += 1

wkhtmltoimage convert html to image Perl and Centos

I am trying to use wkhtmltoimage to convert html and web pages to images using Perl module WKHTMLTOPDF. The script and code below works from the command line but does not work if I call the script from browser.
Update:
If I run the script from the shell as a root user, it runs without
error, if I switch to the domain user where the script is located,
I get that error, seems it is executable permissions for domain owner.
The error is:
error running '/usr/local/bin/wkhtmltoimage': '/usr/local/bin/wkhtmltoimage http://yahoo.com
/home/xxxx/public_html/pdfwebkit/output.png' died with signal 11, with coredump at
/usr/local/perl-5.18.1/lib/site_perl/5.18.1/MooseX/Role/Cmd.pm line 128.
MooseX::Role::Cmd::run(WKHTMLTOPDF=HASH(0x2714260), "http://yahoo.com",
"/home/xxxx/public_html/pdfwebkit/output.png") called at /usr/local/
perl-5.18.1/lib/site_perl/5.18.1/WKHTMLTOPDF.pm line 645 WKHTMLTOPDF::generate(WKHTMLT
OPDF=HASH(0x2714260)) called at htmltoimage.cgi line xxx main::convert_using_WKHTMLTOPDF_image("http://yahoo.com",
"/home/xxxx/public_html/pdfwebkit/output.png") called at htmltoimage.cgi line xx
The code I am using is:
#!/usr/bin/perl
#!C:\perl\bin\perl.exe
print "Content-type: text/html;charset=utf-8\n\n";
use File::Spec::Functions;
use File::Basename;
BEGIN {
$|=1;
use CGI::Carp qw(fatalsToBrowser set_message);
sub handle_errors {
#print "Content-type: text/html;charset=utf-8\n\n";
my $msg = shift;
print qq!<h1><font color="red">Software Error</font></h1>!;
print qq!<p>$msg</p>!;
}
set_message(\&handle_errors);
}
$|=1;
my ($Script, $Bin);
if ($ENV{SCRIPT_FILENAME}) {
($Script, $Bin) = fileparse($ENV{SCRIPT_FILENAME});
}
else {
($Script, $Bin) = fileparse(__FILE__);
}
use WKHTMLTOPDF;
my $outfile = catfile ($Bin, 'output.jpg');
print "Converting url to image file $outfile...<br>\n";
convert_using_WKHTMLTOPDF_image('http://yahoo.com', $outfile);
print "Finished...<br>\n";
exit;
sub convert_using_WKHTMLTOPDF_image {
my ($page, $output) = #_;
my $pdf = new WKHTMLTOPDF;
my $bin = '/usr/local/bin/wkhtmltoimage';
#my $bin = 'C:/Program Files/wkhtmltopdf/bin/wkhtmltoimage.exe';
$pdf->bin_name($bin);
$pdf->_input_file($page);
$pdf->_output_file($output);
#$pdf->grayscale(1);
$pdf->generate;
}
sub convert_html_to_image_direct {
my ($page, $output) = #_;
my $bin = '/usr/local/bin/wkhtmltoimage --quiet ';
my $out = `$bin $page $output`;
print "out: $out<br>\n";
return $out;
}
The code works on Windows from the browser normal.
I am having the same issue if I try to use wkhtmltopdf for converting html to pdf.
The way I installed the binary are from here:
https://gist.github.com/DaRamirezSoto/5489861
# wget http://wkhtmltopdf.googlecode.com/files/wkhtmltoimage-0.11.0_rc1-static-amd64.tar.bz2
# wget http://wkhtmltopdf.googlecode.com/files/wkhtmltopdf-0.11.0_rc1-static-amd64.tar.bz2
# tar xvjf wkhtmltoimage-0.11.0_rc1-static-amd64.tar.bz2
# tar xvjf wkhtmltopdf-0.11.0_rc1-static-amd64.tar.bz2
# chown root:root wkhtmltopdf-amd64
# chown root:root wkhtmltoimage-amd64
# mv wkhtmltopdf-amd64 /usr/bin/wkhtmltopdf
# mv wkhtmltoimage-amd64 /usr/bin/wkhtmltoimage
// dependencies
# yum install -y libXrender libXext openssl openssl-devel fontconfig
I tested (in IE 11/Win 8.1) using Centos 6 server, worked fine with a little change to the path to the folder where I wanted it to be saved in.
My output was
Converting url to image file /home/user/../bittertruth.jpg...
Finished...
which is just as expected by you....

Efficient transfer of console data, tar & gzip/ bzip2 without creating intermediary files

Linux environment. So, we have this program 't_show', when executed with an ID will write price data for that ID on the console. There is no other way to get this data.
I need to copy the price data for IDs 1-10,000 between two servers, using minimum bandwidth, minimum number of connections. On the destination server the data will be a separate file for each id with the format:
<id>.dat
Something like this would be the long-winded solution:
dest:
files=`seq 1 10000`
for id in `echo $files`;
do
./t_show $id > $id
done
tar cf - $files | nice gzip -c > dat.tar.gz
source:
scp user#source:dat.tar.gz ./
gunzip dat.tar.gz
tar xvf dat.tar
That is, write each output to its own file, compress & tar, send over network, extract.
It has the problem that I need to create a new file for each id. This takes up tonnes of space and doesn't scale well.
Is it possible to write the console output directly to a (compressed) tar archive without creating the intermediate files? Any better ideas (maybe writing compressed data directly across network, skipping tar)?
The tar archive would need to extract as I said on the destination server as a separate file for each ID.
Thanks to anyone who takes the time to help.
You could just send the data formatted in some way and parse it on the the receiver.
foo.sh on the sender:
#!/bin/bash
for (( id = 0; id <= 10000; id++ ))
do
data="$(./t_show $id)"
size=$(wc -c <<< "$data")
echo $id $size
cat <<< "$data"
done
On the receiver:
ssh -C user#server 'foo.sh'|while read file size; do
dd of="$file" bs=1 count="$size"
done
ssh -C compresses the data during transfer
You can at least tar stuff over a ssh connection:
tar -czf - inputfiles | ssh remotecomputer "tar -xzf -"
How to populate the archive without intermediary files however, I don't know.
EDIT: Ok, I suppose you could do it by writing the tar file manually. The header is specified here and doesn't seem too complicated, but that isn't exactly my idea of convenient...
I don't think this is working with a plain bash script. But you could have a look at the Archive::TAR module for perl or other scripting languages.
The Perl Module has a function add_data to create a "file" on the fly and add it to the archive for streaming accros the network.
The Documentation is found here:
You can do better without tar:
#!/bin/bash
for id in `seq 1 1000`
do
./t_show $id
done | gzip
The only difference is that you will not get the boundaries between different IDs.
Now put that in a script, say show_me_the_ids and do from the client
shh user#source ./show_me_the_ids | gunzip
And there they are!
Or either, you can specify the -C flag to compress the SSH connection and remove the gzip / gunzip uses all together.
If you are really into it you may try ssh -C, gzip -9 and other compression programs.
Personally I'll bet for lzma -9.
I would try this:
(for ID in $(seq 1 10000); do echo $ID: $(/t_show $ID); done) | ssh user#destination "ImportscriptOrProgram"
This will print "1: ValueOfID1" to standardout, which a transfered via ssh to the destination host, where you can start your importscript or program, which reads the lines from standardin.
HTH
Thanks all
I've taken the advice 'just send the data formatted in some way and parse it on the the receiver', it seems to be the consensus. Skipping tar and using ssh -C for simplicity.
Perl script. Breaks the ids into groups of 1000. IDs are source_id in hash table. All data is sent via single ssh, delimited by 'HEADER', so it writes to the appropriate file. This is a lot more efficient:
sub copy_tickserver_files {
my $self = shift;
my $cmd = 'cd tickserver/ ; ';
my $i = 1;
while ( my ($source_id, $dest_id) = each ( %{ $self->{id_translations} } ) ) {
$cmd .= qq{ echo HEADER $source_id ; ./t_show $source_id ; };
$i++;
if ( $i % 1000 == 0 ) {
$cmd = qq{ssh -C dba\#$self->{source_env}->{tickserver} " $cmd " | };
$self->copy_tickserver_files_subset( $cmd );
$cmd = 'cd tickserver/ ; ';
}
}
$cmd = qq{ssh -C dba\#$self->{source_env}->{tickserver} " $cmd " | };
$self->copy_tickserver_files_subset( $cmd );
}
sub copy_tickserver_files_subset {
my $self = shift;
my $cmd = shift;
my $output = '';
open TICKS, $cmd;
while(<TICKS>) {
if ( m{HEADER [ ] ([0-9]+) }mxs ) {
my $id = $1;
$output = "$self->{tmp_dir}/$id.ts";
close TICKSOP;
open TICKSOP, '>', $output;
next;
}
next unless $output;
print TICKSOP "$_";
}
close TICKS;
close TICKSOP;
}

Resources