So, here's the thing, we have a bunch of pdf forms for users on our website to fill out, we have a submission button inside every pdf form that sends the filled data through a http post method. We are already doing a pre-filled form, where the user fill general information on a html form, data that are used to generate a partially filled multiple pdf files, so whatever the pdf form the user has selected to fill, all of them will be regenerated having the pre-filled information that he just typed on the html form. I accomplish this by using pdftk (http://www.pdflabs.com/docs/pdftk-man-page/) that enables me to just do that. But we get to a point that to make this more efficient, the user must be able to draw their signature on the html side at the pre filled form stage, so we are using signature-pad for this (http://thomasjbradley.ca/lab/signature-pad/), than we create an image from the canvas that the plugin generates. But as each pdf has its own position for the signature, we must insert a placeholder into the pdf that'll be replaced for the signature image. So I came up with the idea to create a disabled text field on the pdf with the name of "signature", so through a nodejs application with the pdf2json module (https://github.com/modesty/pdf2json) I can parse the pdf form and get the position (x,y,w,h) of that particular field, being able to attach an image over the text field placeholder. So the problem is that pdftk don't give me support to attach an image to an existing pdf or even attach it in a certain position, I tried to find a nodejs module that would enables me to do that, but the only worthing nodejs module that I found was pdfkit (http://pdfkit.org/) but it only works creating a new pdf, not editing an exiting one, I looked into pdfkit source code and I discarded it because I realize that it wont work to my case of an existing pdf. So I came to a long way, got to the final stage of this implementation, came to the last step to get this working, and I'm just stack.
This is the output that I have from the pdf2json module for nodejs that is helping with the placeholder approach.
{
page: 7,
index: 317,
name: 'signature',
type: 'alpha',
x: 43.806640625,
y: 14.64195833333333,
w: 30.546828125000005,
h: 1.9339166666666756
}
If someone know any server application that I could run through an unix command at my server to attach an image over an existing pdf document, it'll fit my needs, don't need to be an exclusively nodejs module.
Obs.: I already checked it out the adobe echosign product, but it doesn't fit our needs, it's not free and don't solve our problem of attaching a signature to multiple pdf files from a single html form.
I realize that I could use the nodejs module pdfkit to generate a new blank PDF with the signature in the right position and just over the two pdfs, having the blank pdf with the signature image on the top like a stamp. I could do this with pdftk by command line:
pdftk form.pdf stamp signature.pdf output form-signed.pdf
There's another free application tool like pdftk that I just found out, and it's pdfjam. Also, if you can't use nodejs module pdfkit (different from the application pdfkt) to generate a new pdf with an image to the right spot, you have the stampTK tool (http://www.pdflabs.com/tools/stamptk-the-pdf-stamp-maker/) where you can parse through command line the image to be a stamp in an existing pdf, but this tool is paid (not much, and its worth it), but as I have the pdfkit module for nodejs that enables me to do just that along side the pdfkt application for free, I'm using that, and I also have more control of multiple signatures on the pdfkit module for nodejs. Hope this answer helps someone.
Related
I am struggling to apply text redaction in a PDF file in a aws lambda function written in NodeJs. Here is a list of libraries that I have tried with no success:
pdf-lib: This library almost fulfils all the requirements except that it doesn't redact the text permanently as part of its limitations https://github.com/Hopding/pdf-lib/issues/827
PDF.js: To overcome the above limitation, tried to covert the pdf to an image, so the redaction black boxes are applied permanently. Example code here: https://github.com/mozilla/pdf.js/blob/master/examples/node/pdf2png/pdf2png.js However, this lib is not reliable as this cannot extract contents from most pdfs during the process.
Finally, Pdf2Pic: This library helps to overcome the limitation of the first library (pdf-lib) by the converting the pdf into images. But this library internally uses two non node based libraries (graphicsmagick and ghostscript) which I am trying to avoid.
Is there a nodejs based solution that can be used to apply redaction permanently on a pdf file or any solution that can be used to covert a pdf to images to overcome limitation of pdf-lib.
Project Environment
The environment we are currently developing is using Windows 10. nodejs 10.16.0, express web framework. The actual environment being deployed is the Linux Ubuntu server and the rest is the same.
What technology do you want to implement?
The technology that I want to implement is the information that I entered when I joined the membership. For example, I want to automatically put it in the input text box using my name, age, address, phone number, etc. so that the user only needs to fill in the remaining information in the PDF. (PDF is on some of the webpages.)
If all the information is entered, the PDF is saved and the document is sent to another vendor, which is the end.
Current Problems
We looked at about four days for PDFs, and we tried to create PDFs when we implemented the outline, structure, and code, just like it was on this site at https://web.archive.org/web/20141010035745/http://gnupdf.org/Introduction_to_PDF
However, most PDFs seem to be compressed into flatDecode rather than this simple. So I also looked at Data extraction from /Filter /FlateDecode PDF stream in PHP and tried to decompress it using QPDF.
Unzip it for now.Well, I thought it would be easy to find out the difference compared to the PDF without Kim after putting it in the first name.
However, there is too much difference even though only three characters are added... And the PDF structure itself is more difficult and complex to proceed with.
Note : https://www.adobe.com/content/dam/acom/en/devnet/pdf/pdfs/PDF32000_2008.pdf (PDF official document in English)
Is there a way to solve the problem now?
It sounds like you want to create a PDF from scratch and possibly extract data from it and you are finding this a more difficult prospect than you first imagined.
Check out my answer here on why PDF creation and reading is non-trivial and why you should reach for a tool you help you do this:
https://stackoverflow.com/a/53357682/1669243
The Problem
I recieved a pdf file at work which I then printed. In the pdf file there were several optional fields where one could enter information such as "place of birth" etc. If I open the pdf file on my computer, I can see a set of input information A (a travel request with dates from this year 2017).
If I print the pdf on the local printer, the printed document contains a set of information B which for example contained travel request dates from 2015.
This information was not visible when opening the file on my computer.
I have been able to reproduce the error multiple times.
Why is this a problem?
It seems that previous entries into the pdf were yet somehow stored in the pdf contrary to what was visible when opening the pdf. When printing, the printer seems to access only the oldest entries and prints those.
This is a potential breach regarding data privacy and security since the pdf file seems to save all previous entries without anyone knowing.
Especially at work, some of these pdfs contains bank account information and other identity related information.
The Question
Did anyone experience a simliar issue or knows how to delete the invisible old information yet stored in the pdf?
UPDATE1: I could not reproduce the error on other printers. It seems this error is caused by the specific printer. Yet the information must be present in the PDF file, which is the specific cause of my question.
UPDATE2: Using the information from the accepted answer, I used the program "PDF CHAIN" and selected the option "drop XFA from document". I then saved the manipulated document again and printed it on the same printer.
Finally, the correct information was printed.
At a guess (and that's all it is without being able to see the original file) the PDF contains optional content or annotations which contain different field data for Print and Screen.
If you open the file using a PDF consumer (eg Acrobat) then what you see is the 'screen' result. Depending on the consumer you are using it may then either send the screen data to the printer, or substitute with the 'Print' data.
The printer you note as being a problem is capable of direct PDF printing, you haven't stated if that's how you are printing the PDF file, or whether you are using an application, nor whether the other printers are PDF capable or not.
My guess is that there is a different decision being made somewhere in the 2 print paths as to which is the 'correct' information to print.
Note that this does not mean that the PDF 'seems to save all previous entries without anyone knowing'; that's not really possible with a PDF file.
A malicious PDF processing application could do so, by adding comments to the PDF file, but only that application would be able to retrieve it.
But it is possible to have multiple entries of different types for different purposes, and if they aren't the same (because of the tool used to edit the file) then you can get strange results like this.
Note that if this is a problem for you then you probably shouldn't be using PDF, but you can mitigate the issue by digitally signing your documents. Signed PDF files include means (secure cryptographic hash) for verifying that the document has not been tampered with . Of course, you can't then edit the PDF file without re-signing it.
Oh, one other possibility would be that the PDF was actually an XFA form; its possible to have part of the document be a valid PDF which prints 'something' when a PDF consumer can't handle an XFA form, but that need bear no relation to what you see when you use an XFA processor.
My money's on optional content, AcroForm fields, or annotations where the Print data is different from the Screen data though.
In my application I am uploading a PDF file after uploading, I should display the information present in PDF file to a HTML form we are using angular 2 for frontend and node js for backend. Can any one help me with this.
Please remember PDF to HTML.
You can do one thing. Convert your pdf to a JSON. Use pdf2json.
pdf2json is a node.js module that parses and converts PDF from binary to json format, it's built with pdf.js and extends it with
interactive form elements and text content parsing outside browser.
The goal is to enable server side PDF parsing with interactive form
elements when wrapped in web service, and also enable parsing local
PDF to json file when using as a command line utility.
perform npm install pdf2json
Create an empty JSON whose key values will be the main headings from the pdf like a customer, age etc. Its values are obtained from the uploaded pdf.
Using this JSON values fill your form, on saving the form using, node.js save it to your DB. Is this what you want?
Simply what you need is to render a PDF in your application.
You could use this library ng2-pdf-viewer
Almost all the basic functionalities are available as properties to this component. You could manipulate it to your requirement.
I'm looking at adding an image to an existing PDF in Node.js. None of the PDF libraries I found appear to have the ability to modify an existing PDF though, so I'm planning on implementing it myself. I'm trying to figure out if it's too much work, as I can always do it server side using iTextPDF instead, but I'd prefer to do it in my app (Electron which uses Node.js).
If I just want to modify an existing PDF and add an image, will I have to write a complete rendering library or is PDF structured in such a way that I can write a very small parser that just gets the page I want and inserts an image using the correct format?
Specifically, I'm asking because I've previously looked into writing a text extraction library, put in order to get the position of text you have to render pretty much the entire PDF because of how positioning is handled. That's too much work to get around server side processing in this case.
To be clear, just asking if it's possible to do, not how to do it (don't want to be too broad, I'm sure I can figure that part out).
To perform a small manipulation of a PDF, you'll need to implement generalized reading, decompression, encryption and traversal of PDF data structures. Some of the thing you would need to handle include:
basic parsing of PDF syntax
indexing via the cross reference index, and/or cross reference index and object streams
objects (num, byte-string, hex string, dictionary, arrays, booleans...)
filters and variants (LZW, Flate, RunLength, Predictors)
encryption (RC4, AES, Custom security handlers)
page tree traversal
basic handling of page content streams
image handling
serialization, either rewriting of the entire PDF, or incremental updates to an existing PDF
Anything's possible, but realistically, you will need a PDF library or toolkit, client or server-side, to accomplish this.