CloudConvert API - When converting PDF to PNG, how do I know how many pages have been converted - cloudconvert

When using CloudConvert to convert a PDF to PNGs, ie, there is more then one page in the PDF, CloudConvert will add a '-1' / '-2' to the end of each fileName ( prior to the file extension, ie 'my-image-1.png' ). CloudConvert is creating a separate PNG for each page. BUT how can I find this out from the API ? ( I'm using the official node-api ). I do not know the number of pages before I start the conversion.

Ah, I found it! In the response to each request ( job ) there is an array of tasks, the export task has a results Object with a files Array and within this there will be the filenames created. I was using export/s3.

Related

Extracting Text from multiple pages of PDF using tesseract OCR in node.js

I am currently working on a project for extracting text from multi paged PDFs(these PDFs are generally circulars or application forms) using tesseractOCR (in node.js) ,since tesseract only takes images as input i am not able to pass the pdf. I need a code to help me pass each page of the pdf and get the result back(if the result of pages are appended then its not a problem).
I tried using pdf-poppler,I dont necessarily need to use pdf-poppler.
Technologies using : tesseractOCR for js,Node.js
Additional/optional info: Can i get suggesstion on some free open source OCR to use and how to parse the text i get.

Excel file (.xlsx) to PDF Conversion using microsoft graph api is ignoring page setup instructions

I already have excel file in .xlsx format.I am trying to convert to pdf using microsoft graph api (by uploading the file to one drive and then downloading it as pdf). I am using the following API call
https://graph.microsoft.com/v1.0/me/drive/items/[item-id]/content?format=pdf
I see that the pdf conversion process in above API doesn't consider all the page setup parameters that are set in the underlying .xlsx file. More specifically, I see that converted pdf is always rendered in landscape mode and seems to be ignoring fit to width/height/page settings. If I open the same excel file locally using Excel and save the document as pdf, it renders the document correctly by interpreting all the page setup parameters properly.
Any help would be greatly appreciated as to how I can get pdf conversion API to render pdf as per orientation(portrait/landscape) and page width/height settings on the .xlsx file
I have tried multiple smaller files with different page setup parameters but pdf conversion (using rest api) always returns the document in landscape mode and seems to be ignoring fit to page/width/height settings

Convert any document, image, text file into PDF

I want to convert any documents or image or text file into PDF for all the OS.
I tried the approach with node-msoffice-pdf, and its working fine for Windows OS but not working in other OS.
Question:
How to convert docs, images, textfile to pdf in nodejs?
I used wkhtmltopdf from years to manage pdf conversion.
https://github.com/devongovett/node-wkhtmltopdf
You can either render an html file and pass it to the module, or render a pdf directly from an url.
If fidelity/conversion quality is important to you, for Word documents (doc/docx) you could try our freemium https://www.npmjs.com/package/#nativedocuments/docx-wasm which will perform the conversion locally (ie where node is running), without the need to LibreOffice etc.

how to get and display photo from ldap

I'm using ldap3.
I can connect and read all attributes without any issue, but I don't know how to display the photo of the attribute thumbnailPhoto.
If I print(conn.entries[0].thumbnailPhoto) I get a bunch of binary values like b'\xff\xd8\xff\xe0\x00\x10JFIF.....'.
I have to display it on a bottle web page. So I have to put this value in a jpeg or png file.
How can I do that?
The easiest way is to save the raw byte value in a file and open it with a picture editor. The photo is probably a jpeg, but it can be in any format.
Have a look at my answer at Display thumbnailPhoto from Active Directory in PHP. It's especially for PHP but the concept is the same for Python.
basically it's about either using the base64 encoded raw-data as data-stream or actually using a temporary file that is serverd (or used to determine the mime-type)

Cannot Extract Images from Lotus Notes using Java API

I am working on a Data extract from a Lotus Notes Application. It stores legal documents which may have attachments and images (not mails). I want to convert notes documents to HTML. While importing the data using java API I am able to extract Text, Attachments etc but when it comes to images I am not able to extract them. I did some research and found about two approaches
1) To extract the document using generateXML() method. But the generated document contains a picture tag which has a referenece of location on Notes Domino server. But I want the image so that it can be included in the HTML document.
2) By extractinh as MIME Entity. When I try to get images using getMIMEEntity("Body") or any other field I do not get any image and It always return null.
There is question (Extract inline images from Lotus Notes using Lotus Notes Java API) which deals with this but It does not answers conclusively and its dormant for a long time.
Please help, I am working on it for a couple of days still I cannot import images. Thanks in advance.
In Lotusscript you can first Extract file to your local system/ Server and than export in excel by using that code below.
' Loop through all attachment/document (By creating attachment object)and save Image to some path on server/local 'system(strSaveasPath)
Call object.ExtractFile( strSaveAsPath)
' Now Activate excel row:column range where you wnat to insert image
xlApp.Range("1:1").Activate
xlApp.ActiveSheet.Pictures.Insert(strSaveAsPath)

Resources