Convert HTML to searchable PDF using PhantomJS / Node.js

Convert HTML to searchable PDF using PhantomJS / Node.js - node.js

I am generating PDF's server side with an HTML template that gets completed with data from the client and server. The code below works, but:
1) The PDF file is 5x bigger than when it is 'Saved as PDF' on the client side.
2) The PDF is not searchable.
I am assuming both of these problems stem from PhantomJS generating a raster vs. vector based PDF. What should I do differently (hoping I am just missing an PhantomJS option or two...)??
var phantom = require('phantom');
req.body['invoicenumber'] = 15010001;
phantom.create(function(ph){
ph.createPage(function(page) {
page.set('paperSize', { format: 'Letter',orientation: 'portrait', margin: '1cm' });
page.open("html/template.html", function(status) {
page.evaluate(function(data) {
$(function() { populate(data); });
},function() {
var quotenumber
page.render('quotes/'+req.body['invoicenumber']+'.pdf', function(){
ph.exit();
res.send(req.body['invoicenumber']+'.pdf');
});
},req.body);
});
});
})
MINOR UPDATE: Increasing the margin so the page is not scaled up reduces the file size, but still 2.5x the client side 'Save as PDF'...

In the html template try using any of these header tags (h1,h2...h6) to wrap your content. The content inside these headers tags will be rendered as text in the generated pdf. Hence it should be searchable. This will also reduce good amount of pdf file size. Not sure why div, p, table etc tags are rendered as image in the pdf.

Related

Convert HTML page to PDF with CSS3 Support

I'm working on a small project whereby I create multiple CVs (or resumes) via an interface I've built in Vue + Laravel, which I can then export to PDF.
I'm having issues though when I export the PDF. Laravel DOMPDF doesn't let me have CSS3 properties inside the PDF, for example flex, or CSS variables. I believe PDFs only support CSS 2.0, but I have seen multiple PDFs being exported that are an exact carbon-copy of the website. For example, resume.io - when you create a CV via their site, they can export it and make it look exactly like the website version.
My question is: does anyone know of a library that I could use that ties into Vue or Laravel that will produce a carbon-copy of the website template into a PDF?
I have tried a few JS libraries that take screenshots of certain elements on the page, then try and fit them together, but it just doesn't work. I basically need a specific element on the page to be selectable and then saved to a PDF. Please see my example below:
As you can see, the white area is the CV preview, so I need that whole section saved to a PDF, minus the right hand side menu and the top-bar. I'm planning on building some really cool templates, but if I can't use modern CSS practices then it's going to be quite hard to make them into a PDF.
At the moment, I've got two views, the CV preview which you can see above, then another view which re-uses partials that are inserted in the PDF template. Obviously though, reusing the partials which have modern CSS applied then makes the PDF break or look broken.
My stack:
Laravel
Vue.js
TailwindCSS
Laravel-DOMPDF
If anyone could advise on the best way to go about this, I'd really appreciate it.
TIA

Since you didn't mentioned the converted page is in Vue or Blade, I'll explain both way.
Here's the Library, which all you need is to design a Blade view, then do something like this
Route::get('/doc', function () {
//
$data = Marketers::all();
// LoadView with $data
$pdf = PDF::loadView('pdf',$data)->setPaper('A4');
// LoadView with Compact
$pdf = PDF::loadView('pdf',compact('data'))->setPaper('A4');
// Then Download it
return $pdf->download('pdf.pdf');
});
Now in Vue you need jsPDF and htmlToCanvas or htmlToIMage
i used HTMLtoImage because i had some character issues for persian language so I'll help you base on HtmlToImage Library.
<template>
// Part you want to Convert to pdf or ...
<div ref="contentz" id="jsPdf" >
// Contents
<div/>
</template>
downloadFull(t) {
let self = this
switch (t) {
case 1:
const doc = new jsPDF("l", "mm", "a4");
htmlToImage.toCanvas(document.getElementById('jsPdf'))
.then(function (canvas) {
var img = canvas.toDataURL("image/jpeg");
var width = doc.internal.pageSize.getWidth();
var height = doc.internal.pageSize.getHeight();
doc.addImage(img, 'JPEG', 0, 0, width, 0);
doc.save('app.pdf');
})
.catch(function (error) {
self.$notifications.failedNotificationOnGetData(self)
});
break;
case 2:
htmlToImage.toJpeg(document.getElementById('jsPdf'), { quality: 1 })
.then(function (dataUrl) {
var link = document.createElement('a');
link.download = 'kalabala.jpeg';
link.href = dataUrl;
link.click();
})
.catch(function (error) {
self.$notifications.failedNotificationOnGetData(self)
});
}
break;
default:
break;
}
}
So here's my function which downloadFull(t) t will be the file type, you might don't need it or you can improve it with simple if/else without switch, first you you will import libraries like :
import jsPDF from 'jspdf';
import htmlToImage from 'html-to-image';
Then set page dimensions for jsPDF, then simply use HtmlToImage to get Canvas then set width, height and image Canvas with variables then simply add image to doc and save it. my functions are same but in first switch case I'll get PDF, in second I'll get JPEG.
If you're trying to do the first way but get download link in Vue page you must do the controller just like i explained at above then in VUE when u do API call you should use BLOB to download the file. Here's the Example :
getPDF(type) {
axios({
url : '/api/api_name/exportPDF',
method: 'POST',
responseType: 'blob'
})
.then(res => {
const url = window.URL.createObjectURL(new Blob([res.data]));
const link = document.createElement('a');
link.href = url;
link.setAttribute('download', 'pdf.pdf');
document.body.appendChild(link);
link.click();
})
},
Good Luck.

Why is svg generated by MathJax.js different than the svg generated by MathJax-node

I am writing an web app that saves html to onenote. In order to save math formulas, I plan to convert math formulas to svg by MathJax.js and then convert svg to png, because the html/css supported in onenote api is limited.
But it seems the svg generated by MathJax.js in browser is not a valid svg. I tested it with a simple math formula: $$a^2 + b^2 = c^2$$ (demo code) and copy the svg to jsfiddle and it displays nothing.
Then I tried to write a MathJax-node demo and copy the svg to jsfiddle again, it looks good. Here is my demo code, it's almost the same as the GitHub repo demo:
// a simple TeX-input example
const fs = require('fs')
var mjAPI = require("mathjax-node");
mjAPI.config({
MathJax: {
// traditional MathJax configuration
}
});
mjAPI.start();
var yourMath = String.raw`a^2 + b^2 = c^2`
mjAPI.typeset({
math: yourMath,
format: "TeX", // or "inline-TeX", "MathML"
svg: true, // or svg:true, or html:true
}, function (data) {
if (!data.errors) {console.log(data.svg)}
// will produce:
// <math xmlns="http://www.w3.org/1998/Math/MathML" display="block">
// <mi>E</mi>
// <mo>=</mo>
// <mi>m</mi>
// <msup>
// <mi>c</mi>
// <mn>2</mn>
// </msup>
// </math>
fs.writeFile('math.txt', data.svg, (error) => {
console.log(error)
})
});
I also tested two svg with cloudconvert, it's the same result. Why are the two svg different? Do I miss something?

The difference is due to a specific setting: useGlobalCache
By default, MathJax (docs) sets this to true while mathjax-node (docs) sets this to false.
On the server MathJax-node does not have any document context and produces self-contained SVGs. On the client, MathJax has a full document context and thus can re-use the SVG paths across equations.

svg convert to canvas - can't generate multi pages pdf

I have 12 graphs and I want to generate pdf with 2 pages each page has 6 graphs.
However, when I convert svg to canvas, then the jspdf can only see part of both sub-dives.
$('#downloadx2').click(function() {
var svgElements = $("#body_id").find('svg');
//replace all svgs with a temp canvas
svgElements.each(function() {
var canvas, xml;
// canvg doesn't cope very well with em font sizes so find the calculated size in pixels and replace it in the element.
$.each($(this).find('[style*=em]'), function(index, el) {
$(this).css('font-size', getStylex(el, 'font-size'));
});
canvas = document.createElement("canvas");
canvas.className = "screenShotTempCanvas";
//convert SVG into a XML string
xml = (new XMLSerializer()).serializeToString(this);
// Removing the name space as IE throws an error
xml = xml.replace(/xmlns=\"http:\/\/www\.w3\.org\/2000\/svg\"/, '');
//draw the SVG onto a canvas
canvg(canvas, xml);
$(canvas).insertAfter(this);
//hide the SVG element
////this.className = "tempHide";
$(this).attr('class', 'tempHide');
$(this).hide();
});
var doc = new jsPDF("p", "mm");
var width = doc.internal.pageSize.width;
var height = doc.internal.pageSize.height;
html2canvas($("#div_pdf1"), {
onrendered: function(canvas) {
var imgData = canvas.toDataURL(
'image/png', 0.1);
doc.addImage(imgData, 'PNG', 5, 0, width, height/2,'','FAST');
doc.addPage();
}
});
html2canvas($("#div_pdf2"), {
onrendered: function(canvas2) {
var imgData2 = canvas2.toDataURL(
'image/png', 0.1);
doc.addImage(imgData2, 'PNG', 5, 0, width, height/2,'','FAST');
doc.save('.pdf');
}
});
});
<body id="body_id">
<div id="div_pdf1" >
<svg></svg>
<svg></svg>
<svg></svg>
</div>
<div id="div_pdf1" >
<svg></svg>
<svg></svg>
<svg></svg>
</div>
</body>
When I run this code, the generated pdf will view two pages with same canvas the first one (div_pdf1) div. So how to get both of them appearing in pdf as two pages.

You seem to be trying to run 2 parts in sequence but that's not how javascript works and actually runs your code.
No big deal, just a small misunderstanding between your mental model and the engine that executes the code.
A quick temporary debugging tool to see what's going on and verify that there is a discrepancy is to add console.log to key points and check the sequence of their printout once you run the code.
console.log('[1] just before: svgElements.each');
svgElements.each(function() {
console.log('[2] just after: svgElements.each');
And also around this part of the code:
console.log('[3] just before html2canvas-div_pdf1');
html2canvas($("#div_pdf1"), {
console.log('[4] just after html2canvas-div_pdf1');
Finally around this part of the code:
console.log('[5] just before html2canvas-div_pdf2');
html2canvas($("#div_pdf2"), {
console.log('[6] just after html2canvas-div_pdf2');
I suspect you'll see the code doesn't print the log lines in the order you think they will.
Next, you can try wrapping the 2 calls to html2canvas with one setTimeout function and force a delay in the execution of that code by an arbitrary amount of milliseconds.
Note that this is not the recommended final production quality solution but it will make the code output what you want.

Is it possible to export HTML along with SVG markup to the Highcharts Export Server (http://export.highcharts.com/)

I have a case where my page displays HTML along with SVG markup together. Its an JSPlumb(http://jsplumbtoolkit.com/demo/home/dom.html) demo.
I need to export this data/design to a PDF or a PNG file. Can this be done using the Highcharts export library or is there any other library which can solve my requirement?

PhantomJs to the rescue.
This script:
var page = require('webpage').create();
page.viewportSize = { width: 1000, height: 600 };
page.open('http://jsplumbtoolkit.com/demo/flowchart/dom.html', function() {
page.render('jsPlumb.png');
phantom.exit();
});
Produces this png:

yui How to make in tag image does not necessarily will been to specify its size?

Good day.
I use script Imagecropper
Script:
<img src="http://test.com/img/1362244329.jpg" id="yui_img" height="768" width="1024">
<script>
(function() {
var Dom = YAHOO.util.Dom,
Event = YAHOO.util.Event;
var crop = new YAHOO.widget.ImageCropper('yui_img');
})();
</script>
result:
But if i do not specify the image size, then i get next(see image):
<img src="http://test.com/img/1362244329.jpg" id="yui_img">
result:
And if i specify the wrong picture size, the window will increase the portion of the image:
<img src="http://test.com/img/1362244329.jpg" id="yui_img" height="333" width="500">
result:
How to make in tag image does not necessarily will been to specify its size?

First of all I'd like to point you to YUI 3 since YUI 2 is no longer supported. You shouldn't write new code using YUI 2. There's an ImageCropper component I wrote for YUI 3 that works just like the YUI 2 version in the YUI Gallery: http://yuilibrary.com/gallery/show/imagecropper. Since it copies what the YUI 2 ImageCropper did, it shares these issues with the older version.
What to do when the size of the image isn't specified
The reason why you're getting a small ImageCropper is that you're creating the widget before the image has been fetched and so the browser doesn't know its size yet. What you can do is wait for the image's onload event. You can listen to that event and create the ImageCropper after it fires:
(function() {
var Dom = YAHOO.util.Dom,
Event = YAHOO.util.Event;
var yui_img = Dom.get('yui_img');
Event.addListener(yui_img, 'load', function () {
var crop = new YAHOO.widget.ImageCropper(yui_img);
});
})();
Why the ImageCropper doesn't work with images with the wrong size
Neither the YUI 2 ImageCropper nor my YUI 3 version work with images when they don't have the right size. The reason is that both use the background: url() CSS style for showing the image inside the crop area (the non-darkened part of the widget). CSS backgrounds don't let you use a resized/zoomed image.
I plan on using another strategy at some point for the YUI 3 version that will fix the issue. However, you need to keep in mind that the ImageCropper component is designed so that you send the crop coordinates to the server for it to actually crop the image. That means that if you have the wrong size set to the image, the coordinates that the image cropper returns with its getCropCoords method wouldn't be the coordinates that match with the full sized image. Instead you'd also have to send the server the size of the image you've been using and do extra math to crop the image correctly.
In conclusion, you shouldn't use the image with the wrong size. You can fix the size of the image in two ways:
Use the HTML5 naturalWidth and naturalHeight attributes of the image. Those return the real size of the image even if it's resized. Unfortunately these attributes are not yet supported by all browsers.
Create a new image with JS, set it the same src as the image you're using, listen to its load event and get that image's size.
Something like this:
(function () {
var Dom = YAHOO.util.Dom;
var yui_img = Dom.get('yui_img'),
new_img = new Image();
new_img.onload = function () {
yui_img.width = new_img.width;
yui_img.height = new_img.height;
// create the ImageCropper
};
new_img.src = yui_img.src;
}());
A YUI3 version
You can easily do all this with YUI3:
YUI().use('gallery-imagecropper', function (Y) {
var img = Y.one('#yui_img');
img.on('load', function () {
var cropper = new Y.ImageCroper({
srcNode: img,
width: img.get('width'),
height: img.get('height')
});
cropper.render();
});
});

Typo in code. Should be
var cropper = new Y.ImageCropper({
You missed a letter "p".

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string