Feeding PDF generated from pdfkit as input to pdf-lib for merging - node.js

I am trying to send a pdfkit generated pdf file as input to pdflib for merging. I am using async function. My project is being developed using sails Js version:"^1.2.3", "node": "^12.16", my pdf-kit version is: "^0.11.0", "pdf-lib": "^1.9.0",
This is the code:
const textbytes=fs.readFileSync(textfile);
var bytes1 = new Uint8Array(textbytes);
const textdoc = await PDFDocumentFactory.load(bytes1)
The error i am getting is:
UnhandledPromiseRejectionWarning: Error: Failed to parse PDF document (line:0 col:0 offset=0): No PDF header found
Please help me with this issue.

You really don't need this line.
var bytes1 = new Uint8Array(textbytes);
By just reading the file and sending textbytes in the parameters is more than enough.
I use this function to merge an array of pdfBytes to make one big PDF file:
async function mergePdfs(pdfsToMerge)
{
const mergedPdf = await pdf.PDFDocument.create();
for (const pdfCopyDoc of pdfsToMerge)
{
const pdfDoc = await pdf.PDFDocument.load(pdfCopyDoc);
const copiedPages = await mergedPdf.copyPages(pdfDoc, pdfDoc.getPageIndices());
copiedPages.forEach((page) => {
mergedPdf.addPage(page);
});
}
const mergedPdfFile = await mergedPdf.save();
return mergedPdfFile;
};
So basically after you add the function mergePdfs(pdfsToMerge)
You can just use it like this:
const textbytes = fs.readFileSync(textfile);
const textdoc = await PDFDocumentFactory.load(bytes1)
let finalPdf = await mergePdfs(textdoc);

Related

RangeError [ERR_FS_FILE_TOO_LARGE]: File size (3472064213) is greater than 2 GB

const entries = require("../rootsiir.sql")
const objId = 9
const filteredObj = entries.filter((obj) => obj.tc === objId)[0]
const name = filteredObj.name
const id = filteredObj.id
console.log(name)
console.log(age)
I have this code which is a search JSON file.
But ı am taking this error:
RANGEERROR [ERR_FS_FILE_TOO_LARGE]: FILE SIZE (3472064213) IS GREATER THAN 2 GB-NODE.JS
My JSON file 2.5GB. How can I fix that?
Node.js doesn't support require()ing files over 2.5GB because they can't fit in a single buffer. Instead, you'll need to open the file as a stream and parse the stream as JSON.
You could implement this yourself, but libraries already exist to do it. I suggest using the big-json package. Here's the example from the README:
const fs = require('fs');
const path = require('path');
const json = require('big-json');
const readStream = fs.createReadStream('big.json');
const parseStream = json.createParseStream();
parseStream.on('data', function(obj) {
// => receive reconstructed object
});
readStream.pipe(parseStream);

ErroNormalModuleFactory is no longer a waterfall hook?

I am trying to import WalletProvider from "#truffle/hdwallet-provider"; in reactJS component it is giving me this error as soon as I execute npm run
Error: NormalModuleFactory.resolve (NormalModuleFactory) is no longer a waterfall hook, but a bailing hook instead. Do not return the passed object, but modify it instead. Returning false will ignore the request and results in no module created. Returning a Module object will result in this module used as result.
I tried in separate JS file it is working fine
const Web3 = require("web3");
const WalletProvider = require("#truffle/hdwallet-provider");
let provider = new WalletProvider({
mnemonic: {
phrase:
"***************************************************************",
},
providerOrUrl: "https://goerli.infura.io/v3/*******************",
});
const web3 = new Web3(provider);
const fetch123 = async () => {
const accounts = await web3.eth.getAccounts();
console.log(accounts);
};
fetch123();

How to increase excel sheet column width while downloading in react js ? ,i am using a package called "export-from-json"

downloadPropertiesInXl = async () => {
let API_URL = "something....";
const property = await axios.get(API_URL);
const data = property.data;
const fileName = "download";
const exportType = "xls";
exportFromJSON({ data, fileName, exportType });
}
};
is there any other packages to change column width??
Use Excel.js, it have many options for customization
https://www.npmjs.com/package/exceljs#columnsex

Generating a json for a icon cheatsheet

I'm trying to generate a json file containing the filenames of all the files in a certain directory. I need this to create a cheatsheet for icons.
Currently I'm trying to run a script locally via terminal, to generate the json. That json will be the input for a react component that will display icons. That component works, the create json script doesn't.
Code for generating the json
const fs = require('fs');
const path = require('path');
/**
* Create JSON file
*/
const CreateJson = () => {
const files = [];
const dir = '../icons';
fs.readdirSync(dir).forEach(filename => {
const name = path.parse(filename);
const filepath = path.resolve(dir, filename);
const stat = fs.statSync(filepath);
const isFile = stat.isFile();
if (isFile) files.push({ name });
});
const data = JSON.stringify(files, null, 2);
fs.writeFileSync('../Icons.json', data);
};
module.exports = CreateJson;
I run it in terminal using
"create:json": "NODE_ENV=build node ./scripts/CreateJson.js"
I expect a json file to be created/overridden. But terminal returns:
$ NODE_ENV=build node ./scripts/CreateJson.js
✨ Done in 0.16s.
Any pointers?
You are creating a function CreateJson and exporting it, but you are actually never calling it.
You can get rid of the module.exports and replace it with CreateJson().
When you'll execute the file with node, it will see the function declaration, and a call to it, whereas with your current code there is no call.

PDF to Text extractor in nodejs without OS dependencies

Is there a way to extract text from PDFs in nodejs without any OS dependencies (like pdf2text, or xpdf on windows)? I wasn't able to find any 'native' pdf packages in nodejs. They always are a wrapper/util on top of an existing OS command.
Thanks
Have you checked PDF2Json? It is built on top of PDF.js. Though it is not providing the text output as a single line but I believe you may just reconstruct the final text based on the generated Json output:
'Texts': an array of text blocks with position, actual text and styling informations:
'x' and 'y': relative coordinates for positioning
'clr': a color index in color dictionary, same 'clr' field as in 'Fill' object. If a color can be found in color dictionary, 'oc' field will be added to the field as 'original color" value.
'A': text alignment, including:
left
center
right
'R': an array of text run, each text run object has two main fields:
'T': actual text
'S': style index from style dictionary. More info about 'Style Dictionary' can be found at 'Dictionary Reference' section
After some work, I finally got a reliable function for reading text from PDF using https://github.com/mozilla/pdfjs-dist
To get this to work, first npm install on the command line:
npm i pdfjs-dist
Then create a file with this code (I named the file "pdfExport.js" in this example):
const pdfjsLib = require("pdfjs-dist");
async function GetTextFromPDF(path) {
let doc = await pdfjsLib.getDocument(path).promise;
let page1 = await doc.getPage(1);
let content = await page1.getTextContent();
let strings = content.items.map(function(item) {
return item.str;
});
return strings;
}
module.exports = { GetTextFromPDF }
Then it can simply be used in any other js file you have like so:
const pdfExport = require('./pdfExport');
pdfExport.GetTextFromPDF('./sample.pdf').then(data => console.log(data));
Thought I'd chime in here for anyone who came across this question in the future.
I had this problem and spent hours over literally all the PDF libraries on NPM. My requirements were that I needed to run it on AWS Lambda so could not depend on OS dependencies.
The code below is adapted from another stackoverflow answer (which I cannot currently find). The only difference being that we import the ES5 version which works with Node >= 12. If you just import pdfjs-dist there will be an error of "Readable Stream is not defined". Hope it helps!
import * as pdfjslib from 'pdfjs-dist/es5/build/pdf.js';
export default class Pdf {
public static async getPageText(pdf: any, pageNo: number) {
const page = await pdf.getPage(pageNo);
const tokenizedText = await page.getTextContent();
const pageText = tokenizedText.items.map((token: any) => token.str).join('');
return pageText;
}
public static async getPDFText(source: any): Promise<string> {
const pdf = await pdfjslib.getDocument(source).promise;
const maxPages = pdf.numPages;
const pageTextPromises = [];
for (let pageNo = 1; pageNo <= maxPages; pageNo += 1) {
pageTextPromises.push(Pdf.getPageText(pdf, pageNo));
}
const pageTexts = await Promise.all(pageTextPromises);
return pageTexts.join(' ');
}
}
Usage
const fileBuffer = fs.readFile('sample.pdf');
const pdfText = await Pdf.getPDFText(fileBuffer);
This solution worked for me using node 14.20.1 using "pdf-parse": "^1.1.1"
You can install it with:
yarn add pdf-parse
This is the main function which converts the PDF file to text.
const path = require('path');
const fs = require('fs');
const pdf = require('pdf-parse');
const assert = require('assert');
const extractText = async (pathStr) => {
assert (fs.existsSync(pathStr), `Path does not exist ${pathStr}`)
const pdfFile = path.resolve(pathStr)
const dataBuffer = fs.readFileSync(pdfFile);
const data = await pdf(dataBuffer)
return data.text
}
module.exports = {
extractText
}
Then you can use the function like this:
const { extractText } = require('../api/lighthouse/lib/pdfExtraction')
extractText('./data/CoreDeveloper-v5.1.4.pdf').then(t => console.log(t))
Instead of using the proposed PDF2Json you can also use PDF.js directly (https://github.com/mozilla/pdfjs-dist). This has the advantage that you are not depending on modesty who owns PDF2Json and that he updates the PDF.js base.

Resources