PDF display garbled in Chrome - linux

I see this when clicking a link to a PDF stored on Amazon S3 in Chrome:
If I download the same URL using wget or follow the same link in Firefox the PDF displays normally.
It looks like Chrome is not interpreting the file as a PDF. Is the problem with the PDF file or with Chrome? The PDF file was generated by wkhtmltopdf 0.12.3 (with patched qt) on Arch Linux.
Edit: it seems like a problem with the PDF because when I use file to identify the format it returns "data" whereas a normal PDF returns something like "PDF document, version 1.6".

I figured it out. I was using PDFKit to generate PDFs with the verbose option on. The verbose option somehow put all of stdout inside the PDF itself which caused Chrome to not detect the file as a PDF.

Related

PDF Files created on Terminal

I created a pdf file on the Ubuntu's terminal and add plain text. When I use cat it shows the wrote text, no problem. But if I try to open the file with a pdf viewer it says that is corrupted and I can't see.
Why this happen?
There is a way of see the file out of the terminal?
I'm new at Linux.
Remember, the file contains only plain text, and I can normally see the content at the terminal.

How can I take high-quality screenshots of a PDF without ImageMagick using Python?

I would like to automate the process of taking screenshots of a PDF file's pages. I want to be able to specify the zoom (optional) so that the overall image size can be controlled. I would also like to be able to specify the dpi of the screenshots being saved.
Sample PDF file can be found at this link.
I have already tried opening the file with selenium web driver (Firefox), but the scrolling is not supported for rendered PDF files, apparently.
Is there a way to render this PDF file and then use any image processing module like Pillow or Open-CV to take the screenshots, or any module that does it directly?

How can I resize an image from an existing PDF file in Node.js?

I have a PDF file that has an image and some text. I want to read that file and then resize the image, and delete the text.
I tried taking a screenshot of the whole PDF with pdf-poppler and then do some image processing with Jimp, it worked but the program is taking too long to finish executing because the images are quite big.
Adnane, you can try to use pdf-lib.
I don't use Node.js but the Poppler library comes with a binary pdftoimages which extracts all images from a PDF and there is a Node.js wrapper for Poppler.

GhostScript - ImageMagick converts pdf to image to odd letters when converting Microsoft Print to PDF files

NOTICE: Watch updates at bottom.
I am building an API which supposed to convert PDF to base64 images (doesn't matter which type - jpg, jpeg, png..).
The API is built with NodeJS on CentOS 7.5 x64.
I have searched all over the web for npm packages which converts pdf to images, the very most of them uses ImageMagick and GhostScript (The others doesn't seem to work). These packages work well on code but the problem starts when GhostScript does it job.
For example, a simple pdf page with text will look like this after conversion:
This is the output in shell:
**** Warning: can't process font stream, loading font by the name.
**** This file had errors that were repaired or ignored.
**** The file was produced by:
**** >>>> Microsoft: Print To PDF <<<<
**** Please notify the author of the software that produced this
**** file that it does not conform to Adobe's published PDF
**** specification.
I have tried to convert the images with shell commands ended up with the same outputs.
Thanks by advance.
UPDATE:
Converting a sample pdf file which probably was not printed to pdf by Microsoft worked fine, maybe this is the problem?
UPDATE 2:
After converting a few more pdfs it turns out that this is Microsoft Print to PDF files only that making this problem.
This was reported as a bug to the Ghostscript Bugzilla here
As can be seen from the thread, this is due to using an old version of Ghostscript, and has been fixed at some point in the past. So the problem is due to using old (in this case more than 5 years old) software.

How to generate pdf file of text and image in linux?

I am generating a logfile on one of my servers.
Storing alot of data, then sending it to my mail once a month as a pdf file.
The prosess i am using is to 'cat' alot of commands to a text file, then convert it and send.
Is there any linux programs or some eazy way to do something simulare and add a image i have stored on the server in the pdf file?
This answer assumes that you just want to put the image at the end of the PDF.
You could first convert the image using imagemagick to a PDF doing this (will also work with different file types):
convert image.jpg image.pdf
Then, you can use a tool like stapler or pdftk to combine your generated text PDF and the image.pdf (you can add multiple images):
stapler cat text.pdf image.pdf combined.pdf
pdftk text.pdf image.pdf output combined.pdf

Resources