The Problem
I recieved a pdf file at work which I then printed. In the pdf file there were several optional fields where one could enter information such as "place of birth" etc. If I open the pdf file on my computer, I can see a set of input information A (a travel request with dates from this year 2017).
If I print the pdf on the local printer, the printed document contains a set of information B which for example contained travel request dates from 2015.
This information was not visible when opening the file on my computer.
I have been able to reproduce the error multiple times.
Why is this a problem?
It seems that previous entries into the pdf were yet somehow stored in the pdf contrary to what was visible when opening the pdf. When printing, the printer seems to access only the oldest entries and prints those.
This is a potential breach regarding data privacy and security since the pdf file seems to save all previous entries without anyone knowing.
Especially at work, some of these pdfs contains bank account information and other identity related information.
The Question
Did anyone experience a simliar issue or knows how to delete the invisible old information yet stored in the pdf?
UPDATE1: I could not reproduce the error on other printers. It seems this error is caused by the specific printer. Yet the information must be present in the PDF file, which is the specific cause of my question.
UPDATE2: Using the information from the accepted answer, I used the program "PDF CHAIN" and selected the option "drop XFA from document". I then saved the manipulated document again and printed it on the same printer.
Finally, the correct information was printed.
At a guess (and that's all it is without being able to see the original file) the PDF contains optional content or annotations which contain different field data for Print and Screen.
If you open the file using a PDF consumer (eg Acrobat) then what you see is the 'screen' result. Depending on the consumer you are using it may then either send the screen data to the printer, or substitute with the 'Print' data.
The printer you note as being a problem is capable of direct PDF printing, you haven't stated if that's how you are printing the PDF file, or whether you are using an application, nor whether the other printers are PDF capable or not.
My guess is that there is a different decision being made somewhere in the 2 print paths as to which is the 'correct' information to print.
Note that this does not mean that the PDF 'seems to save all previous entries without anyone knowing'; that's not really possible with a PDF file.
A malicious PDF processing application could do so, by adding comments to the PDF file, but only that application would be able to retrieve it.
But it is possible to have multiple entries of different types for different purposes, and if they aren't the same (because of the tool used to edit the file) then you can get strange results like this.
Note that if this is a problem for you then you probably shouldn't be using PDF, but you can mitigate the issue by digitally signing your documents. Signed PDF files include means (secure cryptographic hash) for verifying that the document has not been tampered with . Of course, you can't then edit the PDF file without re-signing it.
Oh, one other possibility would be that the PDF was actually an XFA form; its possible to have part of the document be a valid PDF which prints 'something' when a PDF consumer can't handle an XFA form, but that need bear no relation to what you see when you use an XFA processor.
My money's on optional content, AcroForm fields, or annotations where the Print data is different from the Screen data though.
Related
Project Environment
The environment we are currently developing is using Windows 10. nodejs 10.16.0, express web framework. The actual environment being deployed is the Linux Ubuntu server and the rest is the same.
What technology do you want to implement?
The technology that I want to implement is the information that I entered when I joined the membership. For example, I want to automatically put it in the input text box using my name, age, address, phone number, etc. so that the user only needs to fill in the remaining information in the PDF. (PDF is on some of the webpages.)
If all the information is entered, the PDF is saved and the document is sent to another vendor, which is the end.
Current Problems
We looked at about four days for PDFs, and we tried to create PDFs when we implemented the outline, structure, and code, just like it was on this site at https://web.archive.org/web/20141010035745/http://gnupdf.org/Introduction_to_PDF
However, most PDFs seem to be compressed into flatDecode rather than this simple. So I also looked at Data extraction from /Filter /FlateDecode PDF stream in PHP and tried to decompress it using QPDF.
Unzip it for now.Well, I thought it would be easy to find out the difference compared to the PDF without Kim after putting it in the first name.
However, there is too much difference even though only three characters are added... And the PDF structure itself is more difficult and complex to proceed with.
Note : https://www.adobe.com/content/dam/acom/en/devnet/pdf/pdfs/PDF32000_2008.pdf (PDF official document in English)
Is there a way to solve the problem now?
It sounds like you want to create a PDF from scratch and possibly extract data from it and you are finding this a more difficult prospect than you first imagined.
Check out my answer here on why PDF creation and reading is non-trivial and why you should reach for a tool you help you do this:
https://stackoverflow.com/a/53357682/1669243
How do I search in MS Access (ver 2010) for data in files attached to records? If I do a "Find" and specify text I KNOW is in an attached txt file to a particular record, there are no hits. While if I have the same data in a Text Field or Memo field, Access finds it. I understood from one of the Access help screens I found that it is possible to search attachments from within Access, but I have not been able to do this yet.
BTW, I did try using the query tool and searching for text I knew was in the attachment, but it was not successful, although it did find the same text within a memo field in another record.
Thx,
jmb
I'm fairly certain that there is no mechanism in Access to find records based on text within a file attachment. A bit of web searching found an earlier question here and the responses seem to agree that there isn't.
One reference from Microsoft here says
By using attachments, you open documents and other non-image files in their parent programs, so from within Access, you can search and edit those files.
but I think that statement could be misinterpreted. I believe what they meant to say was that
"...from within Access you can open an attachment in its parent program and then work on it as usual (e.g., edit it, search it, print it, and so on)."
You can use file system object, open the file as string and search sequentially. That's as close as you'll get
In cases when two attachments with the same filename are attached to a Notes Document, the second file is renamed internally to something like ATTXXXX. Even if the first filename is deleted and document re-saved, the internal filename remains cryptic.
There doesn't seem to be any way to retrieve the original Filename through back-end functions. I have looked high and low in LS but also in the C++ API, and could find nothing. It seems to be a trick that can only be done in the front-end. I am not sure where the information in the file icon graphic is stored, and whether it is accessible. In simple cases it would be possible to do a rename, I suppose (i.e. there is a single attachment and a single file icon graphic).
Could anybody confirm that this is, indeed a limitation of Notes or is there a cool way to solve this?
This is causing me some headaches whilst processing a large number of documents. My customer has trouble believing that there are some things that can only be done in the front end.
You should be able to get the original filename, even with duplicates.
It is not when the file is attached that the name is changed, it is when you detatch it.
You are probably using the .Name property, try the .Source property of the EmbeddedObject, that should return the original filename.
From the help:
If the NotesEmbeddedObject is an embedded object or object link, this property returns the internal name that Notes uses to refer to the source document.
If the NotesEmbeddedObject is a file attachment, this property returns the file name of the original file.
Syntax
To get: source$ = notesEmbeddedObject.Source
It's in the CD records for the rich text -- you will see it if you use NotesPeek to examine the contents of the rich text item. But I don't think it's accessible through the NotesRichText navigator class, so I'm pretty sure you would have to go the C API and parse through the CD records. Or, the MIDAS Rich Text API can probably get it, but that's third party software. I.e., not free.
I have a table with documents saved some of them in pdf, some of them image.
I want to create a web app, to show the images (that can be either pdf, either jpg) in the same control.
I can manage to see pdf, if I set the Response.ContentType = "application/pdf" or image if I set "application/jpg". But the problem is that how can I get the file type, having only the stream saved into the database? Does it have the stream the file type information in it?
Thanks.
No, a stream does not have a content type associated with it. If you had the original filename, you could attempt to derive the content type from that, but it wouldn't be foolproof.
Many file formats have a series of "magic bytes" that allow you to detect what (might) be in the file. PDF, for example, begins with the bytes "%PDF" (note: I'm not an expert on PDF, and there may be situations where that is not true).
If you have no other option, you could attempt to parse the file using various libraries until you found one that worked (System.Drawing.Image.FromStream(), iTextSharp, etc).
This is primarily a question of possibilities more than instructions. I'm a programming consultant working on a WSS project site system for my client. We have a document library in which files are uploaded to go through a complex approval process. With multiple stages in this process, we have an extra field which dictates what the current status of the document is.
Now, my client has become enamored with the idea of PDF watermarking. He wants the document (which is already a PDF) to be affixed with a watermark corresponding to the current status, such that with each stage of the approval process the watermark will change.
One method, the traditional method for PDF watermarking, of accomplishing this is to have one "clean" copy of the document somewhere hidden on the site, and create a new PDF from it that has the watermark at each stage of the approval process. Since the filename will never change, this new PDF can be uploaded continually to a public library, always overwriting the old version and simulating a "dynamically changing watermark". However, in the various stages there will also be people uploading clean copies with corrections and suggestions, nevermind the complex nature of juggling around two libraries and the fact we double the number of files stored. My client and I agree that this is not a practical path to choose.
What we would like to do is be able to "modify" the watermark in a PDF, so that we only have to keep one copy of the file. Unfortunately, from what I've seen, in most cases when you make something like a watermark, which in its nature is supposed to be "unmodifyable", you won't be able to edit it later. So, is it possible to have a part of a PDF which cannot be changed by anyone who downloads the file, but can be changed as part of a workflow or other object model process?
PDF Watermarking in SharePoint is a common request. I have written extensively on this topic. See:
Adding a dynamic watermark to a PDF file from a SharePoint Workflow
Adding a (static) watermark to a PDF file from a SharePoint Workflow
Use SharePoint Workflows to inject JavaScript into PDFs and print the ‘open date’
You could use Event Handlers such that code was run every time a document was checked in. In that code you could perform the fixup/check that made the watermark be what you wanted it to be. This assumes you can write code that manipulates a PDF's internal structure such that it has the watermark that you desire.
It sounds to me like you want to allow people to modify the PDF they download, but not modify its watermark. This is probably going to be nigh on impossible if the watermark is embedded in the PDF (afaict) but what if the watermark image is external to the PDF; is it possible to embed a watermark in a PDF that is sourced via HTTP? Then you could embed:
<watermark image="http://sharepoint/site/_vti_bin/docstatus.asmx?id=5">
Of course, I have no idea about PDFs, so this might not be possible but you get the concept.
-Oisin
It is possible to do so if you use third party tool. Then you can put dynamically binded value from your SharePoint metadata, conditions, rules etc: http://www.pdfsharepoint.com