Changing the File extension from "Demo.xlsx" to "Demo.pdf" how does it covert the file from doc to pdf? - excel

In work place everyday we used different type of documents to hold data. For example, DOC, XLSX, PDF files. And sometime we use software (like adobe reader) excel to PDF converter.
As far i know another way to convert a document from excel to pdf is changing the document type from the SaveAs option (correct me if i am wrong) or changing the file extension.
My question is when we change the Document type from save as option does it change the code behind the file?
Another silly question is if we can convert the file by changing extension why we are paying for 3rd party software?!

Every document type has its format. So behind the screen, every type has its style. For example, XLSX format is a combination of XML and zip compression. PDF is a rich document representation format created by Adobe uses PostScript.
When you save a document as XLSX, the document will be saved as its standards. The saving method will be changed. As an answer to your first question, Yes the coding(method) will be changed when saving.
For the second question, the changing file format is not always an easy task. You need to change the encoding of the file when performing the conversion. When you change the extension you do not apply any conversion operation. You say your computer "This is an ... file.". But the encoding of the file is still unchanged.

Related

Data Security Issue - Printing PDF - "deleted" information gets printed

The Problem
I recieved a pdf file at work which I then printed. In the pdf file there were several optional fields where one could enter information such as "place of birth" etc. If I open the pdf file on my computer, I can see a set of input information A (a travel request with dates from this year 2017).
If I print the pdf on the local printer, the printed document contains a set of information B which for example contained travel request dates from 2015.
This information was not visible when opening the file on my computer.
I have been able to reproduce the error multiple times.
Why is this a problem?
It seems that previous entries into the pdf were yet somehow stored in the pdf contrary to what was visible when opening the pdf. When printing, the printer seems to access only the oldest entries and prints those.
This is a potential breach regarding data privacy and security since the pdf file seems to save all previous entries without anyone knowing.
Especially at work, some of these pdfs contains bank account information and other identity related information.
The Question
Did anyone experience a simliar issue or knows how to delete the invisible old information yet stored in the pdf?
UPDATE1: I could not reproduce the error on other printers. It seems this error is caused by the specific printer. Yet the information must be present in the PDF file, which is the specific cause of my question.
UPDATE2: Using the information from the accepted answer, I used the program "PDF CHAIN" and selected the option "drop XFA from document". I then saved the manipulated document again and printed it on the same printer.
Finally, the correct information was printed.
At a guess (and that's all it is without being able to see the original file) the PDF contains optional content or annotations which contain different field data for Print and Screen.
If you open the file using a PDF consumer (eg Acrobat) then what you see is the 'screen' result. Depending on the consumer you are using it may then either send the screen data to the printer, or substitute with the 'Print' data.
The printer you note as being a problem is capable of direct PDF printing, you haven't stated if that's how you are printing the PDF file, or whether you are using an application, nor whether the other printers are PDF capable or not.
My guess is that there is a different decision being made somewhere in the 2 print paths as to which is the 'correct' information to print.
Note that this does not mean that the PDF 'seems to save all previous entries without anyone knowing'; that's not really possible with a PDF file.
A malicious PDF processing application could do so, by adding comments to the PDF file, but only that application would be able to retrieve it.
But it is possible to have multiple entries of different types for different purposes, and if they aren't the same (because of the tool used to edit the file) then you can get strange results like this.
Note that if this is a problem for you then you probably shouldn't be using PDF, but you can mitigate the issue by digitally signing your documents. Signed PDF files include means (secure cryptographic hash) for verifying that the document has not been tampered with . Of course, you can't then edit the PDF file without re-signing it.
Oh, one other possibility would be that the PDF was actually an XFA form; its possible to have part of the document be a valid PDF which prints 'something' when a PDF consumer can't handle an XFA form, but that need bear no relation to what you see when you use an XFA processor.
My money's on optional content, AcroForm fields, or annotations where the Print data is different from the Screen data though.

Detect correct file extension for OpenXmls?

If we have been provided only the XMLs of the document (in input stream, unzipped manner, or in a byte array), can we detect the file extension via parsing XMLs? My motive is to know what particular node in which XML determines that this is DOCX, PPTX, or XLSX file?
I unzipped the documents and tried to dig and found this -
In \docProps\app.xml, application node defines it -
<Application>Microsoft Excel</Application> for Excel,
<Application>Microsoft Office PowerPoint</Application> for PowerPoint, and
<Application>Microsoft Office Word</Application> for Word.

Opening xlsx file created with SpreadSheetGear

I have created a simple Excel file using SpreadSheetGear. If I save it as an xls file
workbook.SaveAs("file.xls", SpreadsheetGear.FileFormat.Excel8);
and attach it to an email, I can open it on my phone (tested both with iPhone and Android).
If I save it as an xlsx file
workbook.SaveAs("file.xlsx", SpreadsheetGear.FileFormat.OpenXMLWorkbook);
and attach it to an email, I CANNOT open it on my phone.
If I open the xlsx file attachment on my computer and save it with no changes and attach it to an email, I now can open it on my phone.
Apparently Excel saves the file differently than SSG. The file size of the xlsx file attachment is 9 KB. When I open it on my computer and save it, the new file size is 24 KB.
Some of my users prefer the xlsx format. Is there anything I can do with to make the SSG generated file attachment open like an Excel generated file attachement?
iOS depends on certain attributes being present in the worksheet data of the Open XML file format to properly parse these files. SpreadsheetGear does not write these attributes out because they are listed as optional in the Open XML file format specification and, also, omitting them reduces file size, as you have noted. Excel, for whatever reason, always writes out these optional attributes and other third-party components often times rely on their presence to function correctly. SpreadsheetGear V5 added a workaround to write out these attributes by enabling a certain "Experimental" option. This option was added because the OLE DB provider also exhibits this errant behavior. You might try something like the following and see if this helps in getting SpreadsheetGear to better work with your viewer:
IWorkbookSet workbookSet = Factory.GetWorkbookSet();
workbookSet.Experimental = "OleDbOpenXmlWorkaround";
IWorkbook workbook = workbookSet.Workbooks.Open(#"C:\temp\BadWorkbook.xlsx");
workbook.SaveAs(#"C:\temp\GoodWorkbook.xlsx", FileFormat.OpenXMLWorkbook);
Please see the SpreadsheetGear.IWorkbookSet.Experimental property for more information on this feature.
From what I can tell, iOS/Andriod/etc often also depend on other certain optional features available in the file formats that SpreadsheetGear either doesn't support or write out by default. For instance, iOS depends on a "data cache" stored within charts to display chart series data points and SpreadsheetGear's support for writing out this data cache is limited. This can result in charts not displaying as expected in iOS, Android, etc.

How to determine real format of an XLSX Excel file?

I have a well known problem that is described in Extension Warning On Opening Excel Workbook from a Web Site microsoft blog entry. I've added URL rewrite to have URL nicely formatted and my mime type matches exactly XLSX recommended file type. However I still get a warning. I suspect that service that provides me those xlsx files mismatches real file format and extension.
Is there a way to determine real xlsx file format? Something that would say what is the native extension for particular Excel file.
Thanks in advance.
Have you tried changing the mime header from vnd:excel to octet-stream? This will still bring Excel up, albeit not embedded into IE, which vnd:excel does (but I hate vnd:excel anyways because embedding the spreadsheet into the browser screws up the form flow of my web apps).
Did not find an answer for that anyway.
However I've discovered the reason why I get a warning from Excel - any parameter in query string will trigger such a warning, even for static files:
http://localhost/1.xls
works ok
http://localhost/1.xls?testparam=paramvalue
gives a warning.
Will use URL rewrite to encode parameters.

Render image or pdf stream from SQL database in asp.net

I have a table with documents saved some of them in pdf, some of them image.
I want to create a web app, to show the images (that can be either pdf, either jpg) in the same control.
I can manage to see pdf, if I set the Response.ContentType = "application/pdf" or image if I set "application/jpg". But the problem is that how can I get the file type, having only the stream saved into the database? Does it have the stream the file type information in it?
Thanks.
No, a stream does not have a content type associated with it. If you had the original filename, you could attempt to derive the content type from that, but it wouldn't be foolproof.
Many file formats have a series of "magic bytes" that allow you to detect what (might) be in the file. PDF, for example, begins with the bytes "%PDF" (note: I'm not an expert on PDF, and there may be situations where that is not true).
If you have no other option, you could attempt to parse the file using various libraries until you found one that worked (System.Drawing.Image.FromStream(), iTextSharp, etc).

Resources