Convert any document, image, text file into PDF - node.js

I want to convert any documents or image or text file into PDF for all the OS.
I tried the approach with node-msoffice-pdf, and its working fine for Windows OS but not working in other OS.
Question:
How to convert docs, images, textfile to pdf in nodejs?

I used wkhtmltopdf from years to manage pdf conversion.
https://github.com/devongovett/node-wkhtmltopdf
You can either render an html file and pass it to the module, or render a pdf directly from an url.

If fidelity/conversion quality is important to you, for Word documents (doc/docx) you could try our freemium https://www.npmjs.com/package/#nativedocuments/docx-wasm which will perform the conversion locally (ie where node is running), without the need to LibreOffice etc.

Related

Extracting Text from multiple pages of PDF using tesseract OCR in node.js

I am currently working on a project for extracting text from multi paged PDFs(these PDFs are generally circulars or application forms) using tesseractOCR (in node.js) ,since tesseract only takes images as input i am not able to pass the pdf. I need a code to help me pass each page of the pdf and get the result back(if the result of pages are appended then its not a problem).
I tried using pdf-poppler,I dont necessarily need to use pdf-poppler.
Technologies using : tesseractOCR for js,Node.js
Additional/optional info: Can i get suggesstion on some free open source OCR to use and how to parse the text i get.

How to redact texts in a pdf file in NodeJs

I am struggling to apply text redaction in a PDF file in a aws lambda function written in NodeJs. Here is a list of libraries that I have tried with no success:
pdf-lib: This library almost fulfils all the requirements except that it doesn't redact the text permanently as part of its limitations https://github.com/Hopding/pdf-lib/issues/827
PDF.js: To overcome the above limitation, tried to covert the pdf to an image, so the redaction black boxes are applied permanently. Example code here: https://github.com/mozilla/pdf.js/blob/master/examples/node/pdf2png/pdf2png.js However, this lib is not reliable as this cannot extract contents from most pdfs during the process.
Finally, Pdf2Pic: This library helps to overcome the limitation of the first library (pdf-lib) by the converting the pdf into images. But this library internally uses two non node based libraries (graphicsmagick and ghostscript) which I am trying to avoid.
Is there a nodejs based solution that can be used to apply redaction permanently on a pdf file or any solution that can be used to covert a pdf to images to overcome limitation of pdf-lib.

how to get and display photo from ldap

I'm using ldap3.
I can connect and read all attributes without any issue, but I don't know how to display the photo of the attribute thumbnailPhoto.
If I print(conn.entries[0].thumbnailPhoto) I get a bunch of binary values like b'\xff\xd8\xff\xe0\x00\x10JFIF.....'.
I have to display it on a bottle web page. So I have to put this value in a jpeg or png file.
How can I do that?
The easiest way is to save the raw byte value in a file and open it with a picture editor. The photo is probably a jpeg, but it can be in any format.
Have a look at my answer at Display thumbnailPhoto from Active Directory in PHP. It's especially for PHP but the concept is the same for Python.
basically it's about either using the base64 encoded raw-data as data-stream or actually using a temporary file that is serverd (or used to determine the mime-type)

NodeJs construct Excel file and export in PDF

As the title said, I'm searching for a NodeJS library to construct Excel (xlsx), with cell format (color, font size, images...). Importantly, it must has the capability of exporting the resulted .xlsx file to .pdf format.
I know some libs that call Excel API but I'm running a linux server and that's impossible for me.
Thanks to jcaron comment I finally found out that I can build a PDF file directly without passing through xls.
I used pdfmaker for nodejs that support creating PDF file pretty well.

node.js read images from PDF

I need to use PDF in a way similar to ZIP/RAR. To hold many images (ancient tibetan buddist literature), ideally 60000. But splitting in 10-100 volumes is OK.
Anything can be used for packing, but for unpacking we need Node.js. Because same PDF file must be served on web. But some users will need to use whole PDF.
So the question is, what node module I can use to read any single arbitrary image from huge PDF? Example would really help.
Every image is a single page. (Or in otherwords every page is single image)
We have been using https://github.com/mirkokiefer/Node-Magick for this....
But the pngs we get out sometimes are fairly low quality..

Resources