Node js, piping pdfkit to a memory stream - node.js

I am using pdfkit on my node server, typically creating pdf files, and then uploading them to s3.
The problem is that pdfkit examples pipe the pdf doc into a node write stream, which writes the file to the disk, I followed the example and worked correctly, however my requirement now is to pipe the pdf doc to a memory stream rather than save it on the disk (I am uploading to s3 anyway).
I've followed some node memory streams procedures but none of them seem to work with pdf pipe with me, I could just write strings to memory streams.
So my question is: How to pipe the pdf kit output to a memory stream (or something alike) and then read it as an object to upload to s3?
var fsStream = fs.createWriteStream(outputPath + fileName);
doc.pipe(fsStream);

An updated answer for 2020. There is no need to introduce a new memory stream because "PDFDocument instances are readable Node streams".
You can use the get-stream package to make it easy to wait for the document to finish before passing the result back to your caller.
https://www.npmjs.com/package/get-stream
const PDFDocument = require('pdfkit')
const getStream = require('get-stream')
const pdf = () => {
const doc = new PDFDocument()
doc.text('Hello, World!')
doc.end()
return await getStream.buffer(doc)
}
// Caller could do this:
const pdfBuffer = await pdf()
const pdfBase64string = pdfBuffer.toString('base64')
You don't have to return a buffer if your needs are different. The get-stream readme offers other examples.

There's no need to use an intermediate memory stream1 – just pipe the pdfkit output stream directly into a HTTP upload stream.
In my experience, the AWS SDK is garbage when it comes to working with streams, so I usually use request.
var upload = request({
method: 'PUT',
url: 'https://bucket.s3.amazonaws.com/doc.pdf',
aws: { bucket: 'bucket', key: ..., secret: ... }
});
doc.pipe(upload);
1 - in fact, it is usually undesirable to use a memory stream because that means buffering the entire thing in RAM, which is exactly what streams are supposed to avoid!

You could try something like this, and upload it to S3 inside the end event.
var doc = new pdfkit();
var MemoryStream = require('memorystream');
var memStream = new MemoryStream(null, {
readable : false
});
doc.pipe(memStream);
doc.on('end', function () {
var buffer = Buffer.concat(memStream.queue);
awsservice.putS3Object(buffer, fileName, fileType, folder).then(function () { }, reject);
})

A tweak of #bolav's answer worked for me trying to work with pdfmake and not pdfkit. First you need to have memorystream added to your project using npm or yarn.
const MemoryStream = require('memorystream');
const PdfPrinter = require('pdfmake');
const pdfPrinter = new PdfPrinter();
const docDef = {};
const pdfDoc = pdfPrinter.createPdfKitDocument(docDef);
const memStream = new MemoryStream(null, {readable: false});
const pdfDocStream = pdfDoc.pipe(memStream);
pdfDoc.end();
pdfDocStream.on('finish', () => {
console.log(Buffer.concat(memStream.queue);
});

My code to return a base64 for pdfkit:
import * as PDFDocument from 'pdfkit'
import getStream from 'get-stream'
const pdf = {
createPdf: async (text: string) => {
const doc = new PDFDocument()
doc.fontSize(10).text(text, 50, 50)
doc.end()
const data = await getStream.buffer(doc)
let b64 = Buffer.from(data).toString('base64')
return b64
}
}
export default pdf

Thanks to Troy's answer, mine worked with get-stream as well. The difference was I did not convert it to base64string, but rather uploaded it to AWS S3 as a buffer.
Here is my code:
import PDFDocument from 'pdfkit'
import getStream from 'get-stream';
import s3Client from 'your s3 config file';
const pdfGenerator = () => {
const doc = new PDFDocument();
doc.text('Hello, World!');
doc.end();
return doc;
}
const uploadFile = async () => {
const pdf = pdfGenerator();
const pdfBuffer = await getStream.buffer(pdf)
await s3Client.send(
new PutObjectCommand({
Bucket: 'bucket-name',
Key: 'filename.pdf',
Body: pdfBuffer,
ContentType: 'application/pdf',
})
);
}
uploadFile()

Related

How can i save a file i download using fetch with fs

I try downloading files with the fetch() function from github.
Then i try to save the fetched file Stream as a file with the fs-module.
When doing it, i get this error:
TypeError [ERR_INVALID_ARG_TYPE]: The "transform.writable" property must be an instance of WritableStream. Received an instance of WriteStream
My problem is, that i don't know the difference between WriteStream and WritableStream or how to convert them.
This is the code i run:
async function downloadFile(link, filename = "download") {
var response = await fetch(link);
var body = await response.body;
var filepath = "./" + filename;
var download_write_stream = fs.createWriteStream(filepath);
console.log(download_write_stream.writable);
await body.pipeTo(download_write_stream);
}
Node.js: v18.7.0
You can use Readable.fromWeb to convert body, which is a ReadableStream from the web streams API, into a NodeJS Readable stream that can be used with the fs methods.
Note that readable.pipe returns another stream instantly. To wait for it to finish, you can use the promise version of stream.finished to convert it into a Promise, or else you could add listeners for the 'finish' and 'error' events to detect success or failure.
const fs = require('fs');
const { Readable } = require('stream');
const { finished } = require('stream/promises');
async function downloadFile(link, filepath = './download') {
const response = await fetch(link);
const body = Readable.fromWeb(response.body);
const download_write_stream = fs.createWriteStream(filepath);
await finished(body.pipe(download_write_stream));
}
Good question. Web streams are something new, and they are different way of handling streams. WritableStream tells us that we can create WritableStreams as follows:
import {
WritableStream
} from 'node:stream/web';
const stream = new WritableStream({
write(chunk) {
console.log(chunk);
}
});
Then, you could create a custom stream that writes each chunk to disk. An easy way could be:
const download_write_stream = fs.createWriteStream('./the_path');
const stream = new WritableStream({
write(chunk) {
download_write_stream.write(chunk);
},
});
async function downloadFile(link, filename = 'download') {
const response = await fetch(link);
const body = await response.body;
await body.pipeTo(stream);
}

Node.js: new File from fetched PNG URL

I'm currently trying to upload an image to Supabase's Storage, this looks fairly simple from the docs
const { data, error } = await supabase.storage
.from('avatars')
.upload('public/avatar1.png', avatarFile)
Unfortunately Supabase expects a File type.
In my API I have a url that points to the image I want to save, what's the best way for me to get the image at my URL as a File in Node.js?
I have tried this:
let response;
try {
// fetch here is from the isomorphic-unfetch package so I can use it sever-side
response = await fetch('https://example.com/image.jpeg');
} catch (err) {
throw new Error(err);
}
let data = await response?.blob();
let metadata = {
type: 'image/jpeg',
};
let file = new File([data], 'test.jpg', metadata);
return file;
But I get a ReferenceError: File is not defined, which leads me to believe only the browser has access to creating a new File().
All I can find are answers about fs, which I think is Google getting confused. I don't think I can use fs to return a File type.
Any ideas?
So what you can do is: send an HTTP request to the file
const fs = require('fs');
const http = require('http'); // maybe https?
const fileStream = fs.createWriteStream('image.png');
const request = http.get('URL_HERE', function(response) {
response.pipe(fileStream);
});
The above code fetches and writes the file from the URL to your server, and then you need to read it and send it to the upload process.
const finalFile = fs.readFileSync( 'image.png', optionsObject );
And now you have your file object do your upload, then don't forget to remove it if not needed anymore.
You can do something like this:
const fspromise = require('fs').promises;
let response;
try {
// fetch here is from the isomorphic-unfetch package so I can use it sever-side
response = await fetch('https://example.com/image.jpeg');
} catch (err) {
throw new Error(err);
}
let data = await response?.blob();
let metadata = {
type: 'image/jpeg',
};
const file = blob2file(data);
function blob2file(blobData) {
const fd = new FormData();
fd.set('a', blobData);
return fd.get('a');
}
const { data, error } = await supabase.storage
.from('avatars')
.upload('public/avatar1.png', file)
After a lot of research, this isn't actually possible. I've tried a lot of npm packages that advertise being able to convert blobs to Files, but none of them seemed to work.
The only actual solution is to download the file as the other answers have suggested, but in my situation it just wasn't doable.

Two rendered empty PDFs are not identical

I'm using TypeScript + Node.js + the pdfkit library to create PDFs and verify that they're consistent.
However, when just creating the most basic PDF, consistency already fails. Here's my test.
import {readFileSync, createWriteStream} from "fs";
const PDFDocument = require('pdfkit');
const assert = require('assert').strict;
const fileName = '/tmp/tmp.pdf'
async function makeSimplePDF() {
return new Promise(resolve => {
const stream = createWriteStream(fileName);
const doc = new PDFDocument();
doc.pipe(stream);
doc.end();
stream.on('finish', resolve);
})
}
describe('test that pdfs are consistent', () => {
it('simple pdf test.', async () => {
await makeSimplePDF();
const data: Buffer = readFileSync(fileName);
await makeSimplePDF(); // make PDF again
const data2: Buffer = readFileSync(fileName);
assert.deepStrictEqual(data, data2); // fails!
});
});
Most of the values in the two Buffers are identical but a few of them are not. What's happening here?
I believe that the bytes may be slightly different due to the creation time being factored into the Buffer somehow. When I used mockdate(https://www.npmjs.com/package/mockdate) to fix 'now', I ended up getting consistent Buffers.

Node.js copy a stream into a file without consuming

Given a function parses incoming streams:
async onData(stream, callback) {
const parsed = await simpleParser(stream)
// Code handling parsed stream here
// ...
return callback()
}
I'm looking for a simple and safe way to 'clone' that stream, so I can save it to a file for debugging purposes, without affecting the code. Is this possible?
Same question in fake code: I'm trying to do something like this. Obviously, this is a made up example and doesn't work.
const fs = require('fs')
const wstream = fs.createWriteStream('debug.log')
async onData(stream, callback) {
const debugStream = stream.clone(stream) // Fake code
wstream.write(debugStream)
const parsed = await simpleParser(stream)
// Code handling parsed stream here
// ...
wstream.end()
return callback()
}
No you can't clone a readable stream without consuming. However, you can pipe it twice, one for creating file and the other for 'clone'.
Code is below:
let Readable = require('stream').Readable;
var stream = require('stream')
var s = new Readable()
s.push('beep')
s.push(null)
var stream1 = s.pipe(new stream.PassThrough())
var stream2 = s.pipe(new stream.PassThrough())
// here use stream1 for creating file, and use stream2 just like s' clone stream
// I just print them out for a quick show
stream1.pipe(process.stdout)
stream2.pipe(process.stdout)
I've tried to implement the solution provided by #jiajianrong but was struggling to get it work with a createReadStream, because the Readable throws an error when I try to push the createReadStream directly. Like:
s.push(createReadStream())
To solve this issue I have used a helper function to transform the stream into a buffer.
function streamToBuffer (stream: any) {
const chunks: Buffer[] = []
return new Promise((resolve, reject) => {
stream.on('data', (chunk: any) => chunks.push(Buffer.from(chunk)))
stream.on('error', (err: any) => reject(err))
stream.on('end', () => resolve(Buffer.concat(chunks)))
})
}
Below the solution I have found using one pipe to generate a hash of the stream and the other pipe to upload the stream to a cloud storage.
import stream from 'stream'
const Readable = require('stream').Readable
const s = new Readable()
s.push(await streamToBuffer(createReadStream()))
s.push(null)
const fileStreamForHash = s.pipe(new stream.PassThrough())
const fileStreamForUpload = s.pipe(new stream.PassThrough())
// Generating file hash
const fileHash = await getHashFromStream(fileStreamForHash)
// Uploading stream to cloud storage
await BlobStorage.upload(fileName, fileStreamForUpload)
My answer is mostly based on the answer of jiajianrong.

Get PDFKit as base64 string

I'm searching a way to get the base64 string representation of a PDFKit document. I cant' find the right way to do it...
Something like this would be extremely convenient.
var doc = new PDFDocument();
doc.addPage();
doc.outputBase64(function (err, pdfAsText) {
console.log('Base64 PDF representation', pdfAsText);
});
I already tried with blob-stream lib, but it doesn't work on a node server (It says that Blob doesn't exist).
Thanks for your help!
I was in a similar predicament, wanting to generate PDF on the fly without having temporary files lying around. My context is a NodeJS API layer (using Express) which is interacted with via a React frontend.
Ironically, a similar discussion for Meteor helped me get to where I needed. Based on that, my solution resembles:
const PDFDocument = require('pdfkit');
const { Base64Encode } = require('base64-stream');
// ...
var doc = new PDFDocument();
// write to PDF
var finalString = ''; // contains the base64 string
var stream = doc.pipe(new Base64Encode());
doc.end(); // will trigger the stream to end
stream.on('data', function(chunk) {
finalString += chunk;
});
stream.on('end', function() {
// the stream is at its end, so push the resulting base64 string to the response
res.json(finalString);
});
Synchronous option not (yet) present in the documentation
const doc = new PDFDocument();
doc.text("Sample text", 100, 100);
doc.end();
const data = doc.read();
console.log(data.toString("base64"));
I just made a module for this you could probably use. js-base64-file
const Base64File=require('js-base64-file');
const b64PDF=new Base64File;
const file='yourPDF.pdf';
const path=`${__dirname}/path/to/pdf/`;
const doc = new PDFDocument();
doc.addPage();
//save you PDF using the filename and path
//this will load and convert
const data=b64PDF.loadSync(path,file);
console.log('Base64 PDF representation', pdfAsText);
//you could also save a copy as base 64 if you wanted like so :
b64PDF.save(data,path,`copy-b64-${file}`);
It's a new module so my documentation isn't complete yet, but there is also an async method.
//this will load and convert if needed asynchriouniously
b64PDF.load(
path,
file,
function(err,base64){
if(err){
//handle error here
process.exit(1);
}
console.log('ASYNC: you could send this PDF via ws or http to the browser now\n');
//or as above save it here
b64PDF.save(base64,path,`copy-async-${file}`);
}
);
I suppose I could add in a convert from memory method too. If this doesn't suit your needs you could submit a request on the base64 file repo
Following Grant's answer, here is an alternative without using node response but a promise (to ease the call outside of a router):
const PDFDocument = require('pdfkit');
const {Base64Encode} = require('base64-stream');
const toBase64 = doc => {
return new Promise((resolve, reject) => {
try {
const stream = doc.pipe(new Base64Encode());
let base64Value = '';
stream.on('data', chunk => {
base64Value += chunk;
});
stream.on('end', () => {
resolve(base64Value);
});
} catch (e) {
reject(e);
}
});
};
The callee should use doc.end() before or after calling this async method.

Resources