SharePoint and Office Open XML interaction question - sharepoint

I've been frustrated by this for the entire weekend, plus a day or two, so any help would be significantly appreciated.
I'm trying to write a program that can programmatically go into a SharePoint 2007 doc library, open a file, change the contents of the file, then put the file back. I've gotten all but the last part of this down. The reason Office Open XML is involved is that that's how I'm opening the document and modifying it - through the Office Open XML SDK. My question is: How do I get it from the document back into the library?
The problem as I see it is that there's no save function on the WordprocessingDocument object itself. This prevents me from saving it into the SPFile's SaveBinary function.

You should use stream's to write back the changed OOXML into the SPFile.
I hope this example helps!
Stream fs = mySPFile.OpenBinaryStream();
using (WordprocessingDocument ooxmlDoc = WordprocessingDocument.Open(fs, true))
{
MainDocumentPart mainPart = wordDoc.MainDocumentPart;
XmlDocument xmlMainDocument = new XmlDocument();
xmlMainDocument.Load(mainPart.GetStream());
// change the contents of the ooxmlDoc / xmlMainDocument
Stream stream = mainPart.GetStream(FileMode.Open, FileAccess.ReadWrite);
xmlMainDocument.Save(stream);
// the stream should not be longer than the DocumentPart
stream.SetLength(stream.Position);
}
mySPFile.SaveBinary(fs);
fs.Dispose();

Yesterday I saw a webcast with Andrew Connell where he opened a doc from a doc library, added a watermark and saved the file again. It sure sounds like you should have a look at that webcast:
https://msevents.microsoft.com/CUI/WebCastRegistrationConfirmation.aspx?culture=en-US&RegistrationID=1299758384&Validate=false
btw I found that all 10 of the web casts in that serie were very good.

Related

Downloading with node modifies excel files and causes data loss

I am trying to create a script in node.js which will download an excel file. My code is built upon first making an http.get request to the URL and then write to a file using response.pipe and createWriteStream. My code is as follows:
const fs = require("fs");
const http = require("http");
let url = "http://www.functionalglycomics.org:80/glycomics/HFileServlet?operation=downloadRawFile&fileType=DAT&sideMenu=no&objId=1002183";
http.get(url, response => {
let file = fs.createWriteStream('file.xls');
let stream = response.pipe(file);
})
If you download the following file using Firefox the file downloads appropriately and if you open the file it works fine and excel does not give any errors.
http://www.functionalglycomics.org:80/glycomics/HFileServlet?operation=downloadRawFile&fileType=DAT&sideMenu=no&objId=1002183
Note- the download link above will not work with Chrome due to this issue with the filename containing , in filename. Therefore I cannot use puppeteer for this.
However if I use my script above and download the file and try to open it in excel it gives me an error stating "data may have been lost" 5 times but then eventually still opens the file.
My question is therefore, what is causing this data loss when downloading using nodejs?
Update
Some data about versions:
Node:v12.13.1
Excel: Office 2019
OS: Windows 10 latest
Update 2
Based on the comments below from jarmod, I tried using wget on Windows PowerShell. It downloads the file too but also produces the excel error.
I posted this as an issue on the node.js github. #Hakerh400 provided a good description of what is happening there but briefly, on Windows NTFS file system there is something called ADS (Alternate-Data Streams) which keeps track of which files are downloaded from the internet to prompt security concerns. You can read more about it in #Hakerh400 comment here.
The workaround proposed is to add this Zone.Identifier ADS to the file after the download is complete using the following example:
http.get(url, response => {
let file = fs.createWriteStream('file.xls');
let stream = response.pipe(file);
fs.writeFileSync(
'file.xls:Zone.Identifier',
`[ZoneTransfer]\r\nZoneId=3\r\nHostUrl=${url}`,
);
})
Note- This workaround allows you to open the Excel file in "Protected View" without any concerns. However if you click on "Enable Editing" in the security prompt in Excel, the "File Error: data may have been lost" error still pops up (Excel 2019). However, there is no real data loss in terms of the sheets/data in cells.
I hope this answer helps anyone who faces anything similar.

Xref table not zero-indexed. ID numbers for objects will be corrected. won't continue

I am trying to open a pdf to get the number of pages. I am using PyPDF2.
Here is my code:
def pdfPageReader(file_name):
try:
reader = PyPDF2.PdfReader(file_name, strict=True)
number_of_pages = len(reader.pages)
print(f"{file_name} = {number_of_pages}")
return number_of_pages
except:
return "1"
But then i run into this error:
PdfReadWarning: Xref table not zero-indexed. ID numbers for objects will be corrected. [pdf.py:1736]
I tried to use strict=True and strict=False, When it is True, it displays this message, and nothing, I waited for 30minutes, but nothing happened. When it is False, it just display nothing, and that's it, just do nothing, if I press ctrl+c on the terminal (cmd, windows 10) then it cancel that open and continues (I run this in a batch of pdf files). Only 1 in the batch got this problem.
My questions are, how do I fix this, or how do I skip this, or how can I cancel this and move on with the other pdf files?
If somebody had a similar problem and it even crashed the program with this error message
File "C:\Programy\Anaconda3\lib\site-packages\PyPDF2\pdf.py", line 1604, in getObject
% (indirectReference.idnum, indirectReference.generation, idnum, generation))
PyPDF2.utils.PdfReadError: Expected object ID (14 0) does not match actual (13 0); xref table not zero-indexed.
It helped me to add the strict argument equal to False for my pdf reader
pdf_reader = PdfReader(input_file, strict=False)
For anybody else who may be running into this problem, and found that strict=False didn't help, I was able to solve the problem by just re-saving a new copy of the file in Adobe Acrobat Reader. I just opened the PDF file inside an actual copy of Adobe Acrobat Reader (the plain ol' free version on Windows), did a "Save as...", and gave the file a new name. Then I ran my script again using the newly saved copy of my PDF file.
Apparently, the PDF file I was using, which were generated directly from my scanner, were somehow corrupt, even though I could open and view it just fine in Reader. Making a duplicate copy of the file via re-saving in Acrobat Reader somehow seemed to correct whatever was missing.
I had the same problem and looked for a way to skip it. I am not a programmer but looking at the documentation about warnings there is a piece of code that helps you avoid such hindrance.
Although I wouldn't recomend this as a solution, the piece of code that I used for my purpose is (just copied and pasted it from doc on link)
import sys
if not sys.warnoptions:
import warnings
warnings.simplefilter("ignore")
This happens to me when the file was created in a printer / scanner combo that generates PDFs. I could read in the PDF with only a warning though so I read it in, and then rewrote it as a new file. I could append that new one.
from PyPDF2 import PdfMerger, PdfReader, PdfWriter
reader = PdfReader("scanner_generated.pdf", strict=False)
writer = PdfWriter()
for page in reader.pages:
writer.add_page(page)
with open("fixedPDF.pdf", "wb") as fp:
writer.write(fp)
merger = PdfMerger()
merger.append("fixedPDF.pdf")
I had the exact same problem, and the solutions did help but didn't solve the problem completely, at least the one setting strict=False & resaving the document using Acrobat reader.
Anyway, I still got a stream error, but I was able to fix it after using an PDF online repair. I used sejda.com but please be aware that you are uploading your PDF on some website, so make sure there is nothing sensible in there.

Wrong fonts and wrong autofit when using Syncfusion's ExcelToPdfConverter

I am currently using Syncfusion's ExcelToPdfConverter to convert an XLSX document to a PDF.
I first create the XLSX document and then convert it to PDF with the following code:
var converter = new ExcelToPdfConverter(workbook);
//Intialize the PDFDocument
var pdfDoc = new PdfDocument();
//Intialize the ExcelToPdfconverterSettings
var settings = new ExcelToPdfConverterSettings();
//Assign the PDFDocument to the TemplateDocument property of ExcelToPdfConverterSettings
settings.TemplateDocument = pdfDoc;
settings.EmbedFonts = true;
//Convert Excel Document into PDF document
pdfDoc = converter.Convert(settings);
//Save the pdf file
pdfDoc.Save(PDFFullPath);
The resulting XLSX is correct and it looks like it should.
The converted PDF isn't correct though. It's just like it's using wrong fonts and, as a consequence, those rows that have been autofitted with AutofitRow get truncated.
Has anyone else had this issue before?
Is there any way to tell the PDFConverter to use the correct fonts? (Please note that I'm using Arial and Calibri and they're both installed in the system's fonts)
Regards.
We suspect that the issue might occur due to the assignment of wrong row index. In XlsIO, the Row and Column indexes are one based.
Kindly refer below documentation to know more about AutofitRow and AutofitColumn.
Documentation Link: https://help.syncfusion.com/file-formats/xlsio/worksheet-rows-and-columns-manipulation#auto-fit-a-single-row-or-column
Regards,
Abirami.
This was actually a bug in Syncfusion XlsIO that has been fixed in a patch due out in February 2017.
The above code is correct.

Cannot Retrieve Data from Excel File Created using Aspose.Cells

I create an Excel file (.xlsx) using the Aspose.Cells library. But I'm not able to read the data (retrieve rows) using OleDb commands after that, until I open the file and save it manually. I'm running something as simple as this one:
new OleDbDataAdapter("select * from [Sheet1$]", conn); // etc...
Saving the file increases the size of the file as well. Please note that this happens only with the .xlsx format, for the old .xls everything works fine. I even tried the demo code that they have on their website, but the result is the same. Am I missing something?
It seems you need to set the ExportCellName name property to true before saving to xlsx/xlsm format.
Please see the following sample.
//Create your workbook
Workbook workbook = new Workbook(filePath);
//Do your processing
//Save your workbook with export cell as true
OoxmlSaveOptions opts = new OoxmlSaveOptions();
opts.ExportCellName = true;
workbook.Save("output.xlsx", opts);
Note: I am working as Developer Evangelist at Aspose

openxml can't open docx file throught sharepoint rest

I'm using the sharepoint rest api to get the contents of a docx file like so
_api/web/getfolderbyserverrelativeurl('openxmlJsPoc')/files('TemplateDocument.docx')/$value
I get the contents of the file, But I'm having trouble reading it with the openxml javascript api.
this is a sample the return data that I get:
PK ! î¦o´• )  Í[Content_Types].xml ¢É(  ¼•MKÃ#†ï‚ÿ!ìUš­
"ÒÔƒG¬àuÝLÚÅýbgÚÚï$Ú(Z[iª—#²;ïûì»3dpþâl6ƒ„&øBæ}‘ס4~\ˆ‡ÑuïTdHÊ—Ê…X ŠóáþÞ´ˆ€W{,Ä„(žI‰zNa"x^©BrŠø5eTúYAõû'ROà©Gµ†.¡RSKÙÕ~#IQdok¯B¨­ÑŠ˜TÎ|ùÅ¥÷îse³'&âc¹Ò¡^ùÙà½î–£I¦„ìN%ºQŽ1ä<¤R–AOŸ!_/³‚3T•ÑÐÖ×j1
ˆœ¹³y»â”ñKþ9ˆÙ<;³42-ˆ;Û};úRy#BÅ}1ROvÏÐJo„˜ÃÓýŸEñI|7Ë]
%Gç, ¿Ê÷c„DÚùYÕ­·i‹‹XÎk]ýKÇfòþ¢ùÝuaë)RpÎJCàšÜ:‡ÞŠÖz›Co·0tŸûVtk†ãÿÎá£ùšKÙ‘ýŠ>”Ínø
ÿÿ PK ! ™U~ á  ó_rels/.rels ¢ï(  ¬’ÏJÃ#Æï‚ï°Ì½™´Šˆ4éE„ÞDâ»Ó$˜ýÃîTÛ·w-ˆjÒƒÇùæ›ß|ìzs°ƒzç˜zï*X%(vÚ›Þµ¼6O‹{PIȼã
ŽœS__­_x ÉC©ëCRÙÅ¥
:‘ð€˜tÇ–Rá»ÜÙùhIò3¶H¿Q˸*Ë;Œ¿= yª­© nÍ
¨æòæyo¿Ûõš½Þ[vrfòAØ6‹3[”>_£Š-KÆëç\NH!ð<Ñêr¢¿¯EËB†„PûÈÓ<_Š) åå#ó?é|øh0GtÊvŠæö?iô>‰·3ñœ4ßH8ú˜õ' ÿÿ PK ! v¥S¬" Û Ú word/_rels/document.xml.rels ¢Ö (  ¬”ËjÃ0E÷…þƒÑ¾–í´i)‘³)…l[ºUäñƒêa¤I[ÿ}E ±Cƒ’…6‚¡{W#­Ö¿J&ß]o4#yš‘´0u¯[F>ª×
which I'm positive its correct because when i save this as a docx file it opens correctly.
tried using
openXml.OpenXmlPackage(result);
// and
doc = new openXml.OpenXmlPackage();
doc.openFromArrayBuffer
but I keep getting errors
please help!
the problem was with the JZIP.js that comes packaged with the sdk.
A better approac is to save the template as a Word xml file and then download it through ajax and open it.
worked for me

Resources