I'm searching a way to get the base64 string representation of a PDFKit document. I cant' find the right way to do it...
Something like this would be extremely convenient.
var doc = new PDFDocument();
doc.addPage();
doc.outputBase64(function (err, pdfAsText) {
console.log('Base64 PDF representation', pdfAsText);
});
I already tried with blob-stream lib, but it doesn't work on a node server (It says that Blob doesn't exist).
Thanks for your help!
I was in a similar predicament, wanting to generate PDF on the fly without having temporary files lying around. My context is a NodeJS API layer (using Express) which is interacted with via a React frontend.
Ironically, a similar discussion for Meteor helped me get to where I needed. Based on that, my solution resembles:
const PDFDocument = require('pdfkit');
const { Base64Encode } = require('base64-stream');
// ...
var doc = new PDFDocument();
// write to PDF
var finalString = ''; // contains the base64 string
var stream = doc.pipe(new Base64Encode());
doc.end(); // will trigger the stream to end
stream.on('data', function(chunk) {
finalString += chunk;
});
stream.on('end', function() {
// the stream is at its end, so push the resulting base64 string to the response
res.json(finalString);
});
Synchronous option not (yet) present in the documentation
const doc = new PDFDocument();
doc.text("Sample text", 100, 100);
doc.end();
const data = doc.read();
console.log(data.toString("base64"));
I just made a module for this you could probably use. js-base64-file
const Base64File=require('js-base64-file');
const b64PDF=new Base64File;
const file='yourPDF.pdf';
const path=`${__dirname}/path/to/pdf/`;
const doc = new PDFDocument();
doc.addPage();
//save you PDF using the filename and path
//this will load and convert
const data=b64PDF.loadSync(path,file);
console.log('Base64 PDF representation', pdfAsText);
//you could also save a copy as base 64 if you wanted like so :
b64PDF.save(data,path,`copy-b64-${file}`);
It's a new module so my documentation isn't complete yet, but there is also an async method.
//this will load and convert if needed asynchriouniously
b64PDF.load(
path,
file,
function(err,base64){
if(err){
//handle error here
process.exit(1);
}
console.log('ASYNC: you could send this PDF via ws or http to the browser now\n');
//or as above save it here
b64PDF.save(base64,path,`copy-async-${file}`);
}
);
I suppose I could add in a convert from memory method too. If this doesn't suit your needs you could submit a request on the base64 file repo
Following Grant's answer, here is an alternative without using node response but a promise (to ease the call outside of a router):
const PDFDocument = require('pdfkit');
const {Base64Encode} = require('base64-stream');
const toBase64 = doc => {
return new Promise((resolve, reject) => {
try {
const stream = doc.pipe(new Base64Encode());
let base64Value = '';
stream.on('data', chunk => {
base64Value += chunk;
});
stream.on('end', () => {
resolve(base64Value);
});
} catch (e) {
reject(e);
}
});
};
The callee should use doc.end() before or after calling this async method.
Related
I'm currently trying to upload an image to Supabase's Storage, this looks fairly simple from the docs
const { data, error } = await supabase.storage
.from('avatars')
.upload('public/avatar1.png', avatarFile)
Unfortunately Supabase expects a File type.
In my API I have a url that points to the image I want to save, what's the best way for me to get the image at my URL as a File in Node.js?
I have tried this:
let response;
try {
// fetch here is from the isomorphic-unfetch package so I can use it sever-side
response = await fetch('https://example.com/image.jpeg');
} catch (err) {
throw new Error(err);
}
let data = await response?.blob();
let metadata = {
type: 'image/jpeg',
};
let file = new File([data], 'test.jpg', metadata);
return file;
But I get a ReferenceError: File is not defined, which leads me to believe only the browser has access to creating a new File().
All I can find are answers about fs, which I think is Google getting confused. I don't think I can use fs to return a File type.
Any ideas?
So what you can do is: send an HTTP request to the file
const fs = require('fs');
const http = require('http'); // maybe https?
const fileStream = fs.createWriteStream('image.png');
const request = http.get('URL_HERE', function(response) {
response.pipe(fileStream);
});
The above code fetches and writes the file from the URL to your server, and then you need to read it and send it to the upload process.
const finalFile = fs.readFileSync( 'image.png', optionsObject );
And now you have your file object do your upload, then don't forget to remove it if not needed anymore.
You can do something like this:
const fspromise = require('fs').promises;
let response;
try {
// fetch here is from the isomorphic-unfetch package so I can use it sever-side
response = await fetch('https://example.com/image.jpeg');
} catch (err) {
throw new Error(err);
}
let data = await response?.blob();
let metadata = {
type: 'image/jpeg',
};
const file = blob2file(data);
function blob2file(blobData) {
const fd = new FormData();
fd.set('a', blobData);
return fd.get('a');
}
const { data, error } = await supabase.storage
.from('avatars')
.upload('public/avatar1.png', file)
After a lot of research, this isn't actually possible. I've tried a lot of npm packages that advertise being able to convert blobs to Files, but none of them seemed to work.
The only actual solution is to download the file as the other answers have suggested, but in my situation it just wasn't doable.
I'm using TypeScript + Node.js + the pdfkit library to create PDFs and verify that they're consistent.
However, when just creating the most basic PDF, consistency already fails. Here's my test.
import {readFileSync, createWriteStream} from "fs";
const PDFDocument = require('pdfkit');
const assert = require('assert').strict;
const fileName = '/tmp/tmp.pdf'
async function makeSimplePDF() {
return new Promise(resolve => {
const stream = createWriteStream(fileName);
const doc = new PDFDocument();
doc.pipe(stream);
doc.end();
stream.on('finish', resolve);
})
}
describe('test that pdfs are consistent', () => {
it('simple pdf test.', async () => {
await makeSimplePDF();
const data: Buffer = readFileSync(fileName);
await makeSimplePDF(); // make PDF again
const data2: Buffer = readFileSync(fileName);
assert.deepStrictEqual(data, data2); // fails!
});
});
Most of the values in the two Buffers are identical but a few of them are not. What's happening here?
I believe that the bytes may be slightly different due to the creation time being factored into the Buffer somehow. When I used mockdate(https://www.npmjs.com/package/mockdate) to fix 'now', I ended up getting consistent Buffers.
I need to create base64 string that I need to send to a third party API. I have the stream and buffer. Form stream I am able to create an image so there is no way the stream is corrupted. Here are the two variables:
var newJpeg = new Buffer(newData, "binary");
var fs = require('fs');
let Duplex = require('stream').Duplex;
let _updatedFileStream = new Duplex();
_updatedFileStream.push(newJpeg);
_updatedFileStream.push(null);
No matter whatever I try, I can not convert either of them in base64 string.
_updatedFileStream.toString('base64');
Buffer(newJpeg, 'base64');
Buffer(newData, 'base64');
None of the above works. Sometimes I get Uint8Array[arraySize] or Gibberish string. What am I doing wrong?
Example using promises (but could easily be adapted to other approaches):
return new Promise((resolve, reject) => {
let buffers = [];
let myStream = <...>;
myStream.on('data', (chunk) => { buffers.push(chunk); });
myStream.once('end', () => {
let buffer = Buffer.concat(buffers);
resolve(buffer.toString('base64'));
});
myStream.once('error', (err) => {
reject(err);
});
});
I am using fill-pdf npm module for filling template pdf's and it creates new file which is read from the disk and returned as buffer to callback. I have two files for which i do the same operation. I want to combine the two buffers there by to form a single pdf file which i can send back to the client. I tried different methods of buffer concatenation. The buffer can be concatenated using Buffer.concat, like,
var newBuffer = Buffer.concat([result_pdf.output, result_pdf_new.output]);
The size of new buffer is also the sum of the size of the input buffers. But still when the newBuffer is sent to client as response, it shows only the file mentioned last in the array.
res.type("application/pdf");
return res.send(buffer);
Any idea ?
As mentioned by #MechaCode, the creator has ended support for HummusJS.
So I would like to give you 2 solutions.
Using node-pdftk npm module
The Following sample code uses node-pdftk npm module to combine
two pdf buffers seamlessly.
const pdftk = require('node-pdftk');
var pdfBuffer1 = fs.readFileSync("./pdf1.pdf");
var pdfBuffer2 = fs.readFileSync("./pdf2.pdf");
pdftk
.input([pdfBuffer1, pdfBuffer2])
.output()
.then(buf => {
let path = 'merged.pdf';
fs.open(path, 'w', function (err, fd) {
fs.write(fd, buf, 0, buf.length, null, function (err) {
fs.close(fd, function () {
console.log('wrote the file successfully');
});
});
});
});
The requirement for node-pdftk npm module is you need to install the
PDFtk library. Some of you may find this overhead / tedious. So I have another solution using pdf-lib library.
Using pdf-lib npm module
const PDFDocument = require('pdf-lib').PDFDocument
var pdfBuffer1 = fs.readFileSync("./pdf1.pdf");
var pdfBuffer2 = fs.readFileSync("./pdf2.pdf");
var pdfsToMerge = [pdfBuffer1, pdfBuffer2]
const mergedPdf = await PDFDocument.create();
for (const pdfBytes of pdfsToMerge) {
const pdf = await PDFDocument.load(pdfBytes);
const copiedPages = await mergedPdf.copyPages(pdf, pdf.getPageIndices());
copiedPages.forEach((page) => {
mergedPdf.addPage(page);
});
}
const buf = await mergedPdf.save(); // Uint8Array
let path = 'merged.pdf';
fs.open(path, 'w', function (err, fd) {
fs.write(fd, buf, 0, buf.length, null, function (err) {
fs.close(fd, function () {
console.log('wrote the file successfully');
});
});
});
Personally I prefer to use pdf-lib npm module.
HummusJS supports combining PDFs using its appendPDFPagesFromPDF method
Example using streams to work with buffers:
const hummus = require('hummus');
const memoryStreams = require('memory-streams');
/**
* Concatenate two PDFs in Buffers
* #param {Buffer} firstBuffer
* #param {Buffer} secondBuffer
* #returns {Buffer} - a Buffer containing the concactenated PDFs
*/
const combinePDFBuffers = (firstBuffer, secondBuffer) => {
var outStream = new memoryStreams.WritableStream();
try {
var firstPDFStream = new hummus.PDFRStreamForBuffer(firstBuffer);
var secondPDFStream = new hummus.PDFRStreamForBuffer(secondBuffer);
var pdfWriter = hummus.createWriterToModify(firstPDFStream, new hummus.PDFStreamForResponse(outStream));
pdfWriter.appendPDFPagesFromPDF(secondPDFStream);
pdfWriter.end();
var newBuffer = outStream.toBuffer();
outStream.end();
return newBuffer;
}
catch(e){
outStream.end();
throw new Error('Error during PDF combination: ' + e.message);
}
};
combinePDFBuffers(PDFBuffer1, PDFBuffer2);
Here's what we use in our Express server to merge a list of PDF blobs.
const { PDFRStreamForBuffer, createWriterToModify, PDFStreamForResponse } = require('hummus');
const { WritableStream } = require('memory-streams');
// Merge the pages of the pdfBlobs (Javascript buffers) into a single PDF blob
const mergePdfs = pdfBlobs => {
if (pdfBlobs.length === 0) throw new Error('mergePdfs called with empty list of PDF blobs');
// This optimization is not necessary, but it avoids the churn down below
if (pdfBlobs.length === 1) return pdfBlobs[0];
// Adapted from: https://stackoverflow.com/questions/36766234/nodejs-merge-two-pdf-files-into-one-using-the-buffer-obtained-by-reading-them?answertab=active#tab-top
// Hummus is useful, but with poor interfaces -- E.g. createWriterToModify shouldn't require any PDF stream
// And Hummus has many Issues: https://github.com/galkahana/HummusJS/issues
const [firstPdfRStream, ...restPdfRStreams] = pdfBlobs.map(pdfBlob => new PDFRStreamForBuffer(pdfBlob));
const outStream = new WritableStream();
const pdfWriter = createWriterToModify(firstPdfRStream, new PDFStreamForResponse(outStream));
restPdfRStreams.forEach(pdfRStream => pdfWriter.appendPDFPagesFromPDF(pdfRStream));
pdfWriter.end();
outStream.end();
return outStream.toBuffer();
};
module.exports = exports = {
mergePdfs,
};
I am using pdfkit on my node server, typically creating pdf files, and then uploading them to s3.
The problem is that pdfkit examples pipe the pdf doc into a node write stream, which writes the file to the disk, I followed the example and worked correctly, however my requirement now is to pipe the pdf doc to a memory stream rather than save it on the disk (I am uploading to s3 anyway).
I've followed some node memory streams procedures but none of them seem to work with pdf pipe with me, I could just write strings to memory streams.
So my question is: How to pipe the pdf kit output to a memory stream (or something alike) and then read it as an object to upload to s3?
var fsStream = fs.createWriteStream(outputPath + fileName);
doc.pipe(fsStream);
An updated answer for 2020. There is no need to introduce a new memory stream because "PDFDocument instances are readable Node streams".
You can use the get-stream package to make it easy to wait for the document to finish before passing the result back to your caller.
https://www.npmjs.com/package/get-stream
const PDFDocument = require('pdfkit')
const getStream = require('get-stream')
const pdf = () => {
const doc = new PDFDocument()
doc.text('Hello, World!')
doc.end()
return await getStream.buffer(doc)
}
// Caller could do this:
const pdfBuffer = await pdf()
const pdfBase64string = pdfBuffer.toString('base64')
You don't have to return a buffer if your needs are different. The get-stream readme offers other examples.
There's no need to use an intermediate memory stream1 – just pipe the pdfkit output stream directly into a HTTP upload stream.
In my experience, the AWS SDK is garbage when it comes to working with streams, so I usually use request.
var upload = request({
method: 'PUT',
url: 'https://bucket.s3.amazonaws.com/doc.pdf',
aws: { bucket: 'bucket', key: ..., secret: ... }
});
doc.pipe(upload);
1 - in fact, it is usually undesirable to use a memory stream because that means buffering the entire thing in RAM, which is exactly what streams are supposed to avoid!
You could try something like this, and upload it to S3 inside the end event.
var doc = new pdfkit();
var MemoryStream = require('memorystream');
var memStream = new MemoryStream(null, {
readable : false
});
doc.pipe(memStream);
doc.on('end', function () {
var buffer = Buffer.concat(memStream.queue);
awsservice.putS3Object(buffer, fileName, fileType, folder).then(function () { }, reject);
})
A tweak of #bolav's answer worked for me trying to work with pdfmake and not pdfkit. First you need to have memorystream added to your project using npm or yarn.
const MemoryStream = require('memorystream');
const PdfPrinter = require('pdfmake');
const pdfPrinter = new PdfPrinter();
const docDef = {};
const pdfDoc = pdfPrinter.createPdfKitDocument(docDef);
const memStream = new MemoryStream(null, {readable: false});
const pdfDocStream = pdfDoc.pipe(memStream);
pdfDoc.end();
pdfDocStream.on('finish', () => {
console.log(Buffer.concat(memStream.queue);
});
My code to return a base64 for pdfkit:
import * as PDFDocument from 'pdfkit'
import getStream from 'get-stream'
const pdf = {
createPdf: async (text: string) => {
const doc = new PDFDocument()
doc.fontSize(10).text(text, 50, 50)
doc.end()
const data = await getStream.buffer(doc)
let b64 = Buffer.from(data).toString('base64')
return b64
}
}
export default pdf
Thanks to Troy's answer, mine worked with get-stream as well. The difference was I did not convert it to base64string, but rather uploaded it to AWS S3 as a buffer.
Here is my code:
import PDFDocument from 'pdfkit'
import getStream from 'get-stream';
import s3Client from 'your s3 config file';
const pdfGenerator = () => {
const doc = new PDFDocument();
doc.text('Hello, World!');
doc.end();
return doc;
}
const uploadFile = async () => {
const pdf = pdfGenerator();
const pdfBuffer = await getStream.buffer(pdf)
await s3Client.send(
new PutObjectCommand({
Bucket: 'bucket-name',
Key: 'filename.pdf',
Body: pdfBuffer,
ContentType: 'application/pdf',
})
);
}
uploadFile()