I'm using TypeScript + Node.js + the pdfkit library to create PDFs and verify that they're consistent.
However, when just creating the most basic PDF, consistency already fails. Here's my test.
import {readFileSync, createWriteStream} from "fs";
const PDFDocument = require('pdfkit');
const assert = require('assert').strict;
const fileName = '/tmp/tmp.pdf'
async function makeSimplePDF() {
return new Promise(resolve => {
const stream = createWriteStream(fileName);
const doc = new PDFDocument();
doc.pipe(stream);
doc.end();
stream.on('finish', resolve);
})
}
describe('test that pdfs are consistent', () => {
it('simple pdf test.', async () => {
await makeSimplePDF();
const data: Buffer = readFileSync(fileName);
await makeSimplePDF(); // make PDF again
const data2: Buffer = readFileSync(fileName);
assert.deepStrictEqual(data, data2); // fails!
});
});
Most of the values in the two Buffers are identical but a few of them are not. What's happening here?
I believe that the bytes may be slightly different due to the creation time being factored into the Buffer somehow. When I used mockdate(https://www.npmjs.com/package/mockdate) to fix 'now', I ended up getting consistent Buffers.
Related
In a Nuxt app I need to render a page with a lot of data displayed on a google map, obtained from a 100MB .jsonl file. I'm using fs.createReadStream inside asyncData() to parse the data and feed it to the Vue component. Since fs is a server-side only module, this means my app errors when it attempts to render that page client-side.
I would like it so this specific page will exclusively be rendered with SSR so I can use fs in the Vue component.
I thought of using a custom Express middleware to process the data, but this still results in downloading dozens of MB to the client, which is unacceptable. You can see how I request it with Axios in my example.
async asyncData( {$axios} ) {
const fs = require('fs');
if (process.server) {
console.log("Server");
async function readData() {
const DelimiterStream = require('delimiter-stream');
const StringDecoder = require('string_decoder').StringDecoder;
const decoder = new StringDecoder('utf8');
let linestream = new DelimiterStream();
let input = fs.createReadStream('/Users/Chibi/WebstormProjects/OPI/OPIExamen/static/stream.jsonl');
return new Promise((resolve, reject) => {
console.log("LETS GO");
let data = [];
linestream.on('data', (chunk) => {
let parsed = JSON.parse(chunk);
if (parsed.coordinates)
data.push({
coordinates: parsed.coordinates.coordinates,
country: parsed.place && parsed.place.country_code
});
});
linestream.on('end', () => {
return resolve(data);
});
input.pipe(linestream);
});
}
const items = await readData();
return {items};
} else {
console.log("CLIENT");
const items = this.$axios.$get('http://localhost:3000/api/stream');
return {items };
}
}
Even when it renders correctly, NUXT will show me an error overlay complaining about the issue.
I've written a script in node using puppeteer to fetch different names and the links to their profiles from a webpage. The script is fetching them in the right way.
What I wish to do now is write the data in a csv file but can't find any idea how to do so. I have come across many tuts which describe about writing the same but most of them are either incomplete or using such libraries which are no longer being maintained.
This is what I've written so far:
const puppeteer = require('puppeteer');
const link = "https://www.ak-brandenburg.de/bauherren/architekten_architektinnen";
(async ()=> {
const browser = await puppeteer.launch()
const [page] = await browser.pages()
await page.goto(link)
const listItem = await page.evaluate(() =>
[...document.querySelectorAll('.views-table tr')].map(item => ({
name: item.querySelector('.views-field-title a').innerText.trim(),
profilelink: "https://www.ak-brandenburg.de" + item.querySelector('.views-field-title a').getAttribute("href"),
}))
);
console.log(listItem);
await browser.close();
})();
How can I write the data in a csv file?
There is a far easier way to achieve the same. If you check out this library, you can write the data in a csv file very easily.
Working script:
const fs = require('fs');
const Json2csv = require('json2csv').Parser;
const puppeteer = require('puppeteer');
const link = "https://www.ak-brandenburg.de/bauherren/architekten_architektinnen";
(async ()=> {
const browser = await puppeteer.launch()
const [page] = await browser.pages()
await page.goto(link)
const listItem = await page.evaluate(() =>
[...document.querySelectorAll('.views-table tbody tr')].map(item => ({
name: item.querySelector('.views-field-title a').innerText.trim(),
profilelink: "https://www.ak-brandenburg.de" + item.querySelector('.views-field-title a').getAttribute("href"),
}))
);
const j2csv = new Json2csv(['name','profilelink']);
const csv = j2csv.parse(listItem);
fs.writeFileSync('./output.csv',csv,'utf-8')
await browser.close();
})();
I haven't worked with puppeteer but I have created csv file in my node project
Store your data in a array eg: csvData
Then use fs.writeFile to save your csv data.
`fs.writeFile(`path/to/csv/${csvName}.csv`, csvData, 'utf8', function(err) {
if (err) {
console.log('error', err)
}
res.send({
url: `path/to/csv/${csvName}.csv`
})
})`
only use res.send if you want to send csv file from server to client
Given a function parses incoming streams:
async onData(stream, callback) {
const parsed = await simpleParser(stream)
// Code handling parsed stream here
// ...
return callback()
}
I'm looking for a simple and safe way to 'clone' that stream, so I can save it to a file for debugging purposes, without affecting the code. Is this possible?
Same question in fake code: I'm trying to do something like this. Obviously, this is a made up example and doesn't work.
const fs = require('fs')
const wstream = fs.createWriteStream('debug.log')
async onData(stream, callback) {
const debugStream = stream.clone(stream) // Fake code
wstream.write(debugStream)
const parsed = await simpleParser(stream)
// Code handling parsed stream here
// ...
wstream.end()
return callback()
}
No you can't clone a readable stream without consuming. However, you can pipe it twice, one for creating file and the other for 'clone'.
Code is below:
let Readable = require('stream').Readable;
var stream = require('stream')
var s = new Readable()
s.push('beep')
s.push(null)
var stream1 = s.pipe(new stream.PassThrough())
var stream2 = s.pipe(new stream.PassThrough())
// here use stream1 for creating file, and use stream2 just like s' clone stream
// I just print them out for a quick show
stream1.pipe(process.stdout)
stream2.pipe(process.stdout)
I've tried to implement the solution provided by #jiajianrong but was struggling to get it work with a createReadStream, because the Readable throws an error when I try to push the createReadStream directly. Like:
s.push(createReadStream())
To solve this issue I have used a helper function to transform the stream into a buffer.
function streamToBuffer (stream: any) {
const chunks: Buffer[] = []
return new Promise((resolve, reject) => {
stream.on('data', (chunk: any) => chunks.push(Buffer.from(chunk)))
stream.on('error', (err: any) => reject(err))
stream.on('end', () => resolve(Buffer.concat(chunks)))
})
}
Below the solution I have found using one pipe to generate a hash of the stream and the other pipe to upload the stream to a cloud storage.
import stream from 'stream'
const Readable = require('stream').Readable
const s = new Readable()
s.push(await streamToBuffer(createReadStream()))
s.push(null)
const fileStreamForHash = s.pipe(new stream.PassThrough())
const fileStreamForUpload = s.pipe(new stream.PassThrough())
// Generating file hash
const fileHash = await getHashFromStream(fileStreamForHash)
// Uploading stream to cloud storage
await BlobStorage.upload(fileName, fileStreamForUpload)
My answer is mostly based on the answer of jiajianrong.
I'm searching a way to get the base64 string representation of a PDFKit document. I cant' find the right way to do it...
Something like this would be extremely convenient.
var doc = new PDFDocument();
doc.addPage();
doc.outputBase64(function (err, pdfAsText) {
console.log('Base64 PDF representation', pdfAsText);
});
I already tried with blob-stream lib, but it doesn't work on a node server (It says that Blob doesn't exist).
Thanks for your help!
I was in a similar predicament, wanting to generate PDF on the fly without having temporary files lying around. My context is a NodeJS API layer (using Express) which is interacted with via a React frontend.
Ironically, a similar discussion for Meteor helped me get to where I needed. Based on that, my solution resembles:
const PDFDocument = require('pdfkit');
const { Base64Encode } = require('base64-stream');
// ...
var doc = new PDFDocument();
// write to PDF
var finalString = ''; // contains the base64 string
var stream = doc.pipe(new Base64Encode());
doc.end(); // will trigger the stream to end
stream.on('data', function(chunk) {
finalString += chunk;
});
stream.on('end', function() {
// the stream is at its end, so push the resulting base64 string to the response
res.json(finalString);
});
Synchronous option not (yet) present in the documentation
const doc = new PDFDocument();
doc.text("Sample text", 100, 100);
doc.end();
const data = doc.read();
console.log(data.toString("base64"));
I just made a module for this you could probably use. js-base64-file
const Base64File=require('js-base64-file');
const b64PDF=new Base64File;
const file='yourPDF.pdf';
const path=`${__dirname}/path/to/pdf/`;
const doc = new PDFDocument();
doc.addPage();
//save you PDF using the filename and path
//this will load and convert
const data=b64PDF.loadSync(path,file);
console.log('Base64 PDF representation', pdfAsText);
//you could also save a copy as base 64 if you wanted like so :
b64PDF.save(data,path,`copy-b64-${file}`);
It's a new module so my documentation isn't complete yet, but there is also an async method.
//this will load and convert if needed asynchriouniously
b64PDF.load(
path,
file,
function(err,base64){
if(err){
//handle error here
process.exit(1);
}
console.log('ASYNC: you could send this PDF via ws or http to the browser now\n');
//or as above save it here
b64PDF.save(base64,path,`copy-async-${file}`);
}
);
I suppose I could add in a convert from memory method too. If this doesn't suit your needs you could submit a request on the base64 file repo
Following Grant's answer, here is an alternative without using node response but a promise (to ease the call outside of a router):
const PDFDocument = require('pdfkit');
const {Base64Encode} = require('base64-stream');
const toBase64 = doc => {
return new Promise((resolve, reject) => {
try {
const stream = doc.pipe(new Base64Encode());
let base64Value = '';
stream.on('data', chunk => {
base64Value += chunk;
});
stream.on('end', () => {
resolve(base64Value);
});
} catch (e) {
reject(e);
}
});
};
The callee should use doc.end() before or after calling this async method.
I am using pdfkit on my node server, typically creating pdf files, and then uploading them to s3.
The problem is that pdfkit examples pipe the pdf doc into a node write stream, which writes the file to the disk, I followed the example and worked correctly, however my requirement now is to pipe the pdf doc to a memory stream rather than save it on the disk (I am uploading to s3 anyway).
I've followed some node memory streams procedures but none of them seem to work with pdf pipe with me, I could just write strings to memory streams.
So my question is: How to pipe the pdf kit output to a memory stream (or something alike) and then read it as an object to upload to s3?
var fsStream = fs.createWriteStream(outputPath + fileName);
doc.pipe(fsStream);
An updated answer for 2020. There is no need to introduce a new memory stream because "PDFDocument instances are readable Node streams".
You can use the get-stream package to make it easy to wait for the document to finish before passing the result back to your caller.
https://www.npmjs.com/package/get-stream
const PDFDocument = require('pdfkit')
const getStream = require('get-stream')
const pdf = () => {
const doc = new PDFDocument()
doc.text('Hello, World!')
doc.end()
return await getStream.buffer(doc)
}
// Caller could do this:
const pdfBuffer = await pdf()
const pdfBase64string = pdfBuffer.toString('base64')
You don't have to return a buffer if your needs are different. The get-stream readme offers other examples.
There's no need to use an intermediate memory stream1 – just pipe the pdfkit output stream directly into a HTTP upload stream.
In my experience, the AWS SDK is garbage when it comes to working with streams, so I usually use request.
var upload = request({
method: 'PUT',
url: 'https://bucket.s3.amazonaws.com/doc.pdf',
aws: { bucket: 'bucket', key: ..., secret: ... }
});
doc.pipe(upload);
1 - in fact, it is usually undesirable to use a memory stream because that means buffering the entire thing in RAM, which is exactly what streams are supposed to avoid!
You could try something like this, and upload it to S3 inside the end event.
var doc = new pdfkit();
var MemoryStream = require('memorystream');
var memStream = new MemoryStream(null, {
readable : false
});
doc.pipe(memStream);
doc.on('end', function () {
var buffer = Buffer.concat(memStream.queue);
awsservice.putS3Object(buffer, fileName, fileType, folder).then(function () { }, reject);
})
A tweak of #bolav's answer worked for me trying to work with pdfmake and not pdfkit. First you need to have memorystream added to your project using npm or yarn.
const MemoryStream = require('memorystream');
const PdfPrinter = require('pdfmake');
const pdfPrinter = new PdfPrinter();
const docDef = {};
const pdfDoc = pdfPrinter.createPdfKitDocument(docDef);
const memStream = new MemoryStream(null, {readable: false});
const pdfDocStream = pdfDoc.pipe(memStream);
pdfDoc.end();
pdfDocStream.on('finish', () => {
console.log(Buffer.concat(memStream.queue);
});
My code to return a base64 for pdfkit:
import * as PDFDocument from 'pdfkit'
import getStream from 'get-stream'
const pdf = {
createPdf: async (text: string) => {
const doc = new PDFDocument()
doc.fontSize(10).text(text, 50, 50)
doc.end()
const data = await getStream.buffer(doc)
let b64 = Buffer.from(data).toString('base64')
return b64
}
}
export default pdf
Thanks to Troy's answer, mine worked with get-stream as well. The difference was I did not convert it to base64string, but rather uploaded it to AWS S3 as a buffer.
Here is my code:
import PDFDocument from 'pdfkit'
import getStream from 'get-stream';
import s3Client from 'your s3 config file';
const pdfGenerator = () => {
const doc = new PDFDocument();
doc.text('Hello, World!');
doc.end();
return doc;
}
const uploadFile = async () => {
const pdf = pdfGenerator();
const pdfBuffer = await getStream.buffer(pdf)
await s3Client.send(
new PutObjectCommand({
Bucket: 'bucket-name',
Key: 'filename.pdf',
Body: pdfBuffer,
ContentType: 'application/pdf',
})
);
}
uploadFile()