wkhtmltoimage convert html to image Perl and Centos - linux

I am trying to use wkhtmltoimage to convert html and web pages to images using Perl module WKHTMLTOPDF. The script and code below works from the command line but does not work if I call the script from browser.
Update:
If I run the script from the shell as a root user, it runs without
error, if I switch to the domain user where the script is located,
I get that error, seems it is executable permissions for domain owner.
The error is:
error running '/usr/local/bin/wkhtmltoimage': '/usr/local/bin/wkhtmltoimage http://yahoo.com
/home/xxxx/public_html/pdfwebkit/output.png' died with signal 11, with coredump at
/usr/local/perl-5.18.1/lib/site_perl/5.18.1/MooseX/Role/Cmd.pm line 128.
MooseX::Role::Cmd::run(WKHTMLTOPDF=HASH(0x2714260), "http://yahoo.com",
"/home/xxxx/public_html/pdfwebkit/output.png") called at /usr/local/
perl-5.18.1/lib/site_perl/5.18.1/WKHTMLTOPDF.pm line 645 WKHTMLTOPDF::generate(WKHTMLT
OPDF=HASH(0x2714260)) called at htmltoimage.cgi line xxx main::convert_using_WKHTMLTOPDF_image("http://yahoo.com",
"/home/xxxx/public_html/pdfwebkit/output.png") called at htmltoimage.cgi line xx
The code I am using is:
#!/usr/bin/perl
#!C:\perl\bin\perl.exe
print "Content-type: text/html;charset=utf-8\n\n";
use File::Spec::Functions;
use File::Basename;
BEGIN {
$|=1;
use CGI::Carp qw(fatalsToBrowser set_message);
sub handle_errors {
#print "Content-type: text/html;charset=utf-8\n\n";
my $msg = shift;
print qq!<h1><font color="red">Software Error</font></h1>!;
print qq!<p>$msg</p>!;
}
set_message(\&handle_errors);
}
$|=1;
my ($Script, $Bin);
if ($ENV{SCRIPT_FILENAME}) {
($Script, $Bin) = fileparse($ENV{SCRIPT_FILENAME});
}
else {
($Script, $Bin) = fileparse(__FILE__);
}
use WKHTMLTOPDF;
my $outfile = catfile ($Bin, 'output.jpg');
print "Converting url to image file $outfile...<br>\n";
convert_using_WKHTMLTOPDF_image('http://yahoo.com', $outfile);
print "Finished...<br>\n";
exit;
sub convert_using_WKHTMLTOPDF_image {
my ($page, $output) = #_;
my $pdf = new WKHTMLTOPDF;
my $bin = '/usr/local/bin/wkhtmltoimage';
#my $bin = 'C:/Program Files/wkhtmltopdf/bin/wkhtmltoimage.exe';
$pdf->bin_name($bin);
$pdf->_input_file($page);
$pdf->_output_file($output);
#$pdf->grayscale(1);
$pdf->generate;
}
sub convert_html_to_image_direct {
my ($page, $output) = #_;
my $bin = '/usr/local/bin/wkhtmltoimage --quiet ';
my $out = `$bin $page $output`;
print "out: $out<br>\n";
return $out;
}
The code works on Windows from the browser normal.
I am having the same issue if I try to use wkhtmltopdf for converting html to pdf.
The way I installed the binary are from here:
https://gist.github.com/DaRamirezSoto/5489861
# wget http://wkhtmltopdf.googlecode.com/files/wkhtmltoimage-0.11.0_rc1-static-amd64.tar.bz2
# wget http://wkhtmltopdf.googlecode.com/files/wkhtmltopdf-0.11.0_rc1-static-amd64.tar.bz2
# tar xvjf wkhtmltoimage-0.11.0_rc1-static-amd64.tar.bz2
# tar xvjf wkhtmltopdf-0.11.0_rc1-static-amd64.tar.bz2
# chown root:root wkhtmltopdf-amd64
# chown root:root wkhtmltoimage-amd64
# mv wkhtmltopdf-amd64 /usr/bin/wkhtmltopdf
# mv wkhtmltoimage-amd64 /usr/bin/wkhtmltoimage
// dependencies
# yum install -y libXrender libXext openssl openssl-devel fontconfig

I tested (in IE 11/Win 8.1) using Centos 6 server, worked fine with a little change to the path to the folder where I wanted it to be saved in.
My output was
Converting url to image file /home/user/../bittertruth.jpg...
Finished...
which is just as expected by you....

Related

finding a file in directory using perl script

I'm trying to develop a perl script that looks through all of the user's directories for a particular file name without the user having to specify the entire pathname to the file.
For example, let's say the file of interest was data.list. It's located in /home/path/directory/project/userabc/data.list. At the command line, normally the user would have to specify the pathname to the file like in order to access it, like so:
cd /home/path/directory/project/userabc/data.list
Instead, I want the user just to have to enter script.pl ABC in the command line, then the Perl script will automatically run and retrieve the information in the data.list. which in my case, is count the number of lines and upload it using curl. the rest is done, just the part where it can automatically locate the file
Even though very feasible in Perl, this looks more appropriate in Bash:
#!/bin/bash
filename=$(find ~ -name "$1" )
wc -l "$filename"
curl .......
The main issue would of course be if you have multiple files data1, say for example /home/user/dir1/data1 and /home/user/dir2/data1. You will need a way to handle that. And how you handle it would depend on your specific situation.
In Perl that would be much more complicated:
#! /usr/bin/perl -w
eval 'exec /usr/bin/perl -S $0 ${1+"$#"}'
if 0; #$running_under_some_shell
use strict;
# Import the module File::Find, which will do all the real work
use File::Find ();
# Set the variable $File::Find::dont_use_nlink if you're using AFS,
# since AFS cheats.
# for the convenience of &wanted calls, including -eval statements:
# Here, we "import" specific variables from the File::Find module
# The purpose is to be able to just type '$name' instead of the
# complete '$File::Find::name'.
use vars qw/*name *dir *prune/;
*name = *File::Find::name;
*dir = *File::Find::dir;
*prune = *File::Find::prune;
# We declare the sub here; the content of the sub will be created later.
sub wanted;
# This is a simple way to get the first argument. There is no
# checking on validity.
our $filename=$ARGV[0];
# Traverse desired filesystem. /home is the top-directory where we
# start our seach. The sub wanted will be executed for every file
# we find
File::Find::find({wanted => \&wanted}, '/home');
exit;
sub wanted {
# Check if the file is our desired filename
if ( /^$filename\z/) {
# Open the file, read it and count its lines
my $lines=0;
open(my $F,'<',$name) or die "Cannot open $name";
while (<$F>){ $lines++; }
print("$name: $lines\n");
# Your curl command here
}
}
You will need to look at the argument-parsing, for which I simply used $ARGV[0] and I do dont know what your curl looks like.
A more simple (though not recommended) way would be to abuse Perl as a sort of shell:
#!/usr/bin/perl
#
my $fn=`find /home -name '$ARGV[0]'`;
chomp $fn;
my $wc=`wc -l '$fn'`;
print "$wc\n";
system ("your curl command");
Following code snippet demonstrates one of many ways to achieve desired result.
The code takes one parameter, a word to look for in all subdirectories inside file(s) data.list. And prints out a list of found files in a terminal.
The code utilizes subroutine lookup($dir,$filename,$search) which calls itself recursively once it come across a subdirectory.
The search starts from current working directory (in question was not specified a directory as start point).
use strict;
use warnings;
use feature 'say';
my $search = shift || die "Specify what look for";
my $fname = 'data.list';
my $found = lookup('.',$fname,$search);
if( #$found ) {
say for #$found;
} else {
say 'Not found';
}
exit 0;
sub lookup {
my $dir = shift;
my $fname = shift;
my $search = shift;
my $files;
my #items = glob("$dir/*");
for my $item (#items) {
if( -f $item && $item =~ /\b$fname\b/ ) {
my $found;
open my $fh, '<', $item or die $!;
while( my $line = <$fh> ) {
$found = 1 if $line =~ /\b$search\b/;
if( $found ) {
push #{$files}, $item;
last;
}
}
close $fh;
}
if( -d $item ) {
my $ret = lookup($item,$fname,$search);
push #{$files}, $_ for #$ret;
}
}
return $files;
}
Run as script.pl search_word
Output sample
./capacitor/data.list
./examples/data.list
./examples/test/data.list
Reference:
glob,
Perl file test operators

perl wkhtmltopdf Error: Unable to write to destination

I have the following code running as CGI. It starts to run and returns an empty PDF file to the browser and writes an error message to the error_log.
Does anybody have suggestions on how to solve this?
linux: Linux version 2.6.35.6-48.fc14.i686.PAE (...) (gcc version 4.5.1 20100924 (Red Hat 4.5.1-4) (GCC) ) #1 SMP Fri Oct 22 15:27:53 UTC 2010
wkhtmltopdf: wkhtmltopdf 0.10.0 rc2
perl: This is perl 5, version 12, subversion 2 (v5.12.2) built for i386-linux-thread-multi
Thank You in Advance.
~Donavon
perl CODE:
#!/usr/bin/perl
#### takes string containing HTML and outputs PDF to browser to download
#### (otherwise would output to STDOUT)
print "Content-Disposition: attachment; filename='testPDF.pdf'\n";
print "Content-type: application/octet-stream\n\n";
my $htmlToPrint = "<html>a bunch of html</html>";
### open a filehandle and pipe it to wkhtmltopdf
### *the arguments "- -" tell wkhtmltopdf to get
### input from STDIN and send output to STDOUT*
open(my $makePDF, "|-", "/usr/local/bin/wkhtmltopdf", "-", "-") || die("$!");
print $makePDF $htmlToPrint; ## sends my HTML to wkhtmltopdf which streams immediately to STDOUT
error_log message:
Loading pages (1/6)
QPainter::begin(): Returned false============================] 100%
Error: Unable to write to destination
Here is my code that I got to work. Hopefully some folks will find it useful.
Make sure the rights are set up correctly on the server side. We have a sysadmin here that set the module up on the server side so I can't tell you what those need to be, just that it can cause problems.
#!/usr/bin/perl
use warnings;
use strict;
use IPC::Open3;
use Symbol;
my $cmd = '/usr/local/bin/wkhtmltopdf - -';
my $err = gensym();
my $in = gensym();
my $out = gensym();
my $pdf = '';
my $pid = open3($in, $out, $err, $cmd) or die "could not run cmd : $cmd : $!\n";
my $string = '<html><head></head><body>Hello World!!!</body></html>';
print $in $string;
close($in);
while( <$out> ) {
$pdf .= $_
}
# for trouble shooting
while( <$err> ) {
# print "err-> $_<br />\n";
}
# for trouble shooting
waitpid($pid, 0 ) or die "$!\n";
my $retval = $?;
# print "retval-> $retval<br />\n";
print "Content-Disposition: attachment; filename='testPDF.pdf'\n";
print "Content-type: application/octet-stream\n\n";
print $pdf;

batch download from URL

I want to download thousand of files from a URL. Each line in "FileName.txt" contains the name of file to download. I am using a Perl script to take the file name from "FileName.txt" and downloading them after a random time. I run script as "./program.pl Filename.txt"
Filename.txt
A
B
C
B
program.pl
#!/usr/bin/perl
$file1=$ARGV[0];
open(FP1, $file1);
while($s1=<FP1>)
<br>
{ chomp ($s1);
$range = 5;
$minimum = 3;
$random_number = int(rand($range)) + $minimum;
`wget --wait="$random_number" "http://URL=$s1"`;
}
I am getting the output for few initial file but not for remaining file. For remaining file $ emacs fileD.txt give
[13] 29699
Could you kindly tell me why I am getting "[13] 29699", and what is the best way to download file after random time interval. Sorry, the program at while does not show the correct handler. Thanks
You don't show where $id comes from, but presumably some URLs contain & which puts the process in the background. You should use single quotes for wget's argument or use the list form of system.
Further, wget's wait parameter is only relevant if your are using wget itself to traverse links from a given URL. In your case, you need your Perl script to sleep between invoking wget for each URL:
#!/usr/bin/env perl
use strict;
use warnings;
use constant WAIT_MINIMUM => 3;
use constant WAIT_RANGE => 5;
my ($url_list_file) = #ARGV;
defined($url_list_file)
or die "Need URL list\n";
open my $fh, '<', $url_list_file
or die "Cannot open '$url_list_file': $!";
while (my $url = <$fh>) {
$url =~ s/\R\z//;
my #cmd = (wget => 'http://$url');
print "#cmd\n";
my $error = system #cmd;
if ($error) {
warn "''#cmd' failed: $?";
}
sleep WAIT_MINIMUM + rand(WAIT_RANGE);
}
What means URL=? wget takes url as simple paramter. Seems to be you need
`wget --wait=$random_number 'http://$s1'`;

Efficient transfer of console data, tar & gzip/ bzip2 without creating intermediary files

Linux environment. So, we have this program 't_show', when executed with an ID will write price data for that ID on the console. There is no other way to get this data.
I need to copy the price data for IDs 1-10,000 between two servers, using minimum bandwidth, minimum number of connections. On the destination server the data will be a separate file for each id with the format:
<id>.dat
Something like this would be the long-winded solution:
dest:
files=`seq 1 10000`
for id in `echo $files`;
do
./t_show $id > $id
done
tar cf - $files | nice gzip -c > dat.tar.gz
source:
scp user#source:dat.tar.gz ./
gunzip dat.tar.gz
tar xvf dat.tar
That is, write each output to its own file, compress & tar, send over network, extract.
It has the problem that I need to create a new file for each id. This takes up tonnes of space and doesn't scale well.
Is it possible to write the console output directly to a (compressed) tar archive without creating the intermediate files? Any better ideas (maybe writing compressed data directly across network, skipping tar)?
The tar archive would need to extract as I said on the destination server as a separate file for each ID.
Thanks to anyone who takes the time to help.
You could just send the data formatted in some way and parse it on the the receiver.
foo.sh on the sender:
#!/bin/bash
for (( id = 0; id <= 10000; id++ ))
do
data="$(./t_show $id)"
size=$(wc -c <<< "$data")
echo $id $size
cat <<< "$data"
done
On the receiver:
ssh -C user#server 'foo.sh'|while read file size; do
dd of="$file" bs=1 count="$size"
done
ssh -C compresses the data during transfer
You can at least tar stuff over a ssh connection:
tar -czf - inputfiles | ssh remotecomputer "tar -xzf -"
How to populate the archive without intermediary files however, I don't know.
EDIT: Ok, I suppose you could do it by writing the tar file manually. The header is specified here and doesn't seem too complicated, but that isn't exactly my idea of convenient...
I don't think this is working with a plain bash script. But you could have a look at the Archive::TAR module for perl or other scripting languages.
The Perl Module has a function add_data to create a "file" on the fly and add it to the archive for streaming accros the network.
The Documentation is found here:
You can do better without tar:
#!/bin/bash
for id in `seq 1 1000`
do
./t_show $id
done | gzip
The only difference is that you will not get the boundaries between different IDs.
Now put that in a script, say show_me_the_ids and do from the client
shh user#source ./show_me_the_ids | gunzip
And there they are!
Or either, you can specify the -C flag to compress the SSH connection and remove the gzip / gunzip uses all together.
If you are really into it you may try ssh -C, gzip -9 and other compression programs.
Personally I'll bet for lzma -9.
I would try this:
(for ID in $(seq 1 10000); do echo $ID: $(/t_show $ID); done) | ssh user#destination "ImportscriptOrProgram"
This will print "1: ValueOfID1" to standardout, which a transfered via ssh to the destination host, where you can start your importscript or program, which reads the lines from standardin.
HTH
Thanks all
I've taken the advice 'just send the data formatted in some way and parse it on the the receiver', it seems to be the consensus. Skipping tar and using ssh -C for simplicity.
Perl script. Breaks the ids into groups of 1000. IDs are source_id in hash table. All data is sent via single ssh, delimited by 'HEADER', so it writes to the appropriate file. This is a lot more efficient:
sub copy_tickserver_files {
my $self = shift;
my $cmd = 'cd tickserver/ ; ';
my $i = 1;
while ( my ($source_id, $dest_id) = each ( %{ $self->{id_translations} } ) ) {
$cmd .= qq{ echo HEADER $source_id ; ./t_show $source_id ; };
$i++;
if ( $i % 1000 == 0 ) {
$cmd = qq{ssh -C dba\#$self->{source_env}->{tickserver} " $cmd " | };
$self->copy_tickserver_files_subset( $cmd );
$cmd = 'cd tickserver/ ; ';
}
}
$cmd = qq{ssh -C dba\#$self->{source_env}->{tickserver} " $cmd " | };
$self->copy_tickserver_files_subset( $cmd );
}
sub copy_tickserver_files_subset {
my $self = shift;
my $cmd = shift;
my $output = '';
open TICKS, $cmd;
while(<TICKS>) {
if ( m{HEADER [ ] ([0-9]+) }mxs ) {
my $id = $1;
$output = "$self->{tmp_dir}/$id.ts";
close TICKSOP;
open TICKSOP, '>', $output;
next;
}
next unless $output;
print TICKSOP "$_";
}
close TICKS;
close TICKSOP;
}

Convert tar.gz to zip

I've got a large collection of gzipped archives on my Ubuntu webserver, and I need them converted to zips. I figure this would be done with a script, but what language should I use, and how would I go about unzipping and rezipping files?
I'd do it with a bash(1) one-liner:
for f in *.tar.gz;\
do rm -rf ${f%.tar.gz} ;\
mkdir ${f%.tar.gz} ;\
tar -C ${f%.tar.gz} zxvf $f ;\
zip -r ${f%.tar.gz} $f.zip ;\
rm -rf ${f%.tar.gz} ;\
done
It isn't very pretty because I'm not great at bash(1). Note that this destroys a lot of directories so be sure you know what this does before doing it.
See the bash(1) reference card for more details on the ${foo%bar} syntax.
A simple bash script would be easiest, surely? That way you can just invoke the tar and zip commands.
the easiest solution on unix platforms may well be to use fuse and something like archivemount (libarchive), http://en.wikipedia.org/wiki/Archivemount .
/iaw
You can use node.js and tar-to-zip for this purpose. All you need to do is:
Install node.js with nvm if you do not have it.
And then install tar-to-zip with:
npm i tar-to-zip -g
And use it with:
tar-to-zip *.tar.gz
Also you can convert .tar.gz files to .zip programmatically.
You should install async and tar-to-zip locally:
npm i async tar-to-zip
And then create converter.js with contents:
#!/usr/bin/env node
'use strict';
const fs = require('fs');
const tarToZip = require('tar-to-zip');
const eachSeries = require('async/eachSeries');
const names = process.argv.slice(2);
eachSeries(names, convert, exitIfError);
function convert(name, done) {
const {stdout} = process;
const onProgress = (n) => {
stdout.write(`\r${n}%: ${name}`);
};
const onFinish = (e) => {
stdout.write('\n');
done();
};
const nameZip = name.replace(/\.tar\.gz$/, '.zip');
const zip = fs.createWriteStream(nameZip)
.on('error', (error) => {
exitIfError(error);
fs.unlinkSync(zipPath);
});
const progress = true;
tarToZip(name, {progress})
.on('progress', onProgress)
.on('error', exitIfError)
.getStream()
.pipe(zip)
.on('finish', onFinish);
}
function exitIfError(error) {
if (!error)
return;
console.error(error.message);
process.exit(1);
}
Zipfiles are handy because they offer random access to files. Tar files only sequential.
My solution for this conversion is this shell script, which calls itself via tar(1) "--to-command" option. (I prefer that rather than having 2 scripts). But I admit "untar and zip -r" is faster than this, because zipnote(1) cannot work in-place, unfortunately.
#!/bin/zsh -feu
## Convert a tar file into zip:
usage() {
setopt POSIX_ARGZERO
cat <<EOF
usage: ${0##*/} [+-h] [-v] [--] {tarfile} {zipfile}"
-v verbose
-h print this message
converts the TAR archive into ZIP archive.
EOF
unsetopt POSIX_ARGZERO
}
while getopts :hv OPT; do
case $OPT in
h|+h)
usage
exit
;;
v)
# todo: ignore TAR_VERBOSE from env?
# Pass to the grand-child process:
export TAR_VERBOSE=y
;;
*)
usage >&2
exit 2
esac
done
shift OPTIND-1
OPTIND=1
# when invoked w/o parameters:
if [ $# = 0 ] # todo: or stdin is not terminal
then
# we are invoked by tar(1)
if [ -n "${TAR_VERBOSE-}" ]; then echo $TAR_REALNAME >&2;fi
zip --grow --quiet $ZIPFILE -
# And rename it:
# fixme: this still makes a full copy, so slow.
printf "# -\n#=$TAR_REALNAME\n" | zipnote -w $ZIPFILE
else
if [ $# != 2 ]; then usage >&2; exit 1;fi
# possibly: rm -f $ZIPFILE
ZIPFILE=$2 tar -xaf $1 --to-command=$0
fi
Here is a python solution based on this answer here:
import sys, tarfile, zipfile, glob
def convert_one_archive(file_name):
out_file = file_name.replace('.tar.gz', '.zip')
with tarfile.open(file_name, mode='r:gz') as tf:
with zipfile.ZipFile(out_file, mode='a', compression=zipfile.ZIP_DEFLATED) as zf:
for m in tf.getmembers():
f = tf.extractfile( m )
fl = f.read()
fn = m.name
zf.writestr(fn, fl)
for f in glob.glob('*.tar.gz'):
convert_one_archive(f)
Here is script based on #Brad Campbell's answer that works on files passed as command arguments, works with other tar file types (uncompressed or the other compression types supported by tarfile), and handles directories in the source tar file. It will also print warnings if the source file contains a symlink or hardlink, which are converted to regular files. For symlinks, the link is resolved during conversion. This can lead to an error if the link target is not in the tar; this is also potentially dangerous from a security standpoint, so user beware.
#!/usr/bin/python
import sys, tarfile, zipfile, glob, re
def convert_one_archive(in_file, out_file):
with tarfile.open(in_file, mode='r:*') as tf:
with zipfile.ZipFile(out_file, mode='a', compression=zipfile.ZIP_DEFLATED) as zf:
for m in [m for m in tf.getmembers() if not m.isdir()]:
if m.issym() or m.islnk():
print('warning: symlink or hardlink converted to file')
f = tf.extractfile(m)
fl = f.read()
fn = m.name
zf.writestr(fn, fl)
for in_file in sys.argv[1:]:
out_file = re.sub(r'\.((tar(\.(gz|bz2|xz))?)|tgz|tbz|tbz2|txz)$', '.zip', in_file)
if out_file == in_file:
print(in_file, '---> [skipped]')
else:
print(in_file, '--->', out_file)
convert_one_archive(in_file, out_file)

Resources