Extracting text from a OneNote document in "archive format"

Extracting text from a OneNote document in "archive format" - sharepoint

I have a OneNote document in archive format, downloaded from SharePoint using CSOM. I am trying to extract the plain text content from it using IFilter, but the process fails with FILTER_E_UNKNOWNFORMAT (0x8004170C) error. I have OneNote installed and the IFilter for .one files is registered properly.
If I open that document in OneNote, it displays a small banner that offers conversion to editable format. After the conversion, I am able to load the file into IFilter and read the text. I am also able to do the same with notebooks created locally.
I'd like to find a method to achieve the same result programmatically, without user interaction. I've tried to use OneNote interop libraries to convert the notebook into PDF, then extract the text from PDF file, but it seems like an overkill to me.
Microsoft.Office.Interop.OneNote.IApplication app = new Microsoft.Office.Interop.OneNote.ApplicationClass();
try
{
app.OpenHierarchy("d:\\Note.one", string.Empty, out string hierarchyId, Microsoft.Office.Interop.OneNote.CreateFileType.cftNone);
app.SyncHierarchy(hierarchyId);
app.Publish(hierarchyId, $"d:\\Note.pdf", Microsoft.Office.Interop.OneNote.PublishFormat.pfPDF);
}
finally
{
System.Runtime.InteropServices.Marshal.ReleaseComObject(app);
}
I know I can access the OneNote document's content directly, without converting to PDF, but I want to avoid using interop libraries if possible.
Does somebody have experience with reading OneNote documents programmatically, or does somebody know a tool that makes the above mentioned conversion? Or is there a different way of downloading OneNote documents from SharePoint that does not produce such archived files?
Any suggestions would be appreciated.

Related

Excel file (.xlsx) to PDF Conversion using microsoft graph api is ignoring page setup instructions

I already have excel file in .xlsx format.I am trying to convert to pdf using microsoft graph api (by uploading the file to one drive and then downloading it as pdf). I am using the following API call
https://graph.microsoft.com/v1.0/me/drive/items/[item-id]/content?format=pdf
I see that the pdf conversion process in above API doesn't consider all the page setup parameters that are set in the underlying .xlsx file. More specifically, I see that converted pdf is always rendered in landscape mode and seems to be ignoring fit to width/height/page settings. If I open the same excel file locally using Excel and save the document as pdf, it renders the document correctly by interpreting all the page setup parameters properly.
Any help would be greatly appreciated as to how I can get pdf conversion API to render pdf as per orientation(portrait/landscape) and page width/height settings on the .xlsx file
I have tried multiple smaller files with different page setup parameters but pdf conversion (using rest api) always returns the document in landscape mode and seems to be ignoring fit to page/width/height settings

How can we show excel and doc files in angular6

I am using outlook mail api to receive and show mail content which can also include attachments of any type. I had managed to show pdf files using pdf-view module but for excel and doc files no such module or anything exist.Could anyone suggest something for this.Any help would be appreciated.Thank you

So what I found from my research is you can not display a word file using java script or angular, it only opens up a dialog box to download the file.
As an alternate what you can do is convert word file to HTML and preview it in browser. Mammoth.JS is the best option for this.

How to attach pdf embedded in excel in outlook email with vba

I have created an embedded pdf with the insert> object > create from file> browse > display as icon function in excel.
I would like to then use the embedded pdf as an attachment for my outlook email using vba code. I have tried to use the .Attachment.Add code but it seems to fail to detech an embedded object.
Could anyone advise a correct code? Thanks!

Get the file from the source?
I don't understand why you would need to embed the.PDF object in the workbook, if you're going to be emailing it separately anyhow...
Regardless, you could just grab the actual/original .PDF to attach a copy to the email, directly from the same location from where it was embedded. (If it's not there, what happened to it?)
Another option:
As soon as you right-click the embedded object, Excel 2016 "gets ready" for you to open it by extracting it to your local temp folder. (I'm unsure whether this applies to previous versions.)
Therefore, you could programmatically right-click the embedded icon, and then check the temp folder located at the path that you'll find stored in Environ("temp"). One or more copies of your file will be located there (and it should be the 'newest' PDF).
Yet another option:
Excel's XLSM file is simply a compressed ZIP file, if you change the extension. You could programmatically make a copy of the file, changing it's extension to .ZIP.
Embedded object are stored as .BIN files within the ZIP file in the xl\embeddings\ folder. It would have to be extracted and then renamed back to a PDF. Note that this method is a little flakey and won't work with all PDF's.
More Information:
VBA Express : Save embedded PDF file as a separate PDf file
How-to-Geek : How to Extract Images, Text, and Embedded Files from Office Documents

How could I access the source code of a .one OneNote file?

How could I access the source code of a .one OneNote file?
I've tried to rename the .one file to .zip as what happens with .doc files in order to access their source code, but .one doesn't seem to work like that.
Also, I've tried to open it with Notepad++, but it isn't in a plain-text format.
I regard this as a programming question because:
I'm using content-editing-automation scripts (e.g. RegEx-related find and replace scripts). Accessing the source code of .one files helps me apply bulky automated edits on their content Using RegEx.

.one files aren't technically source code - they contain the data that describes the pages in a section and their content.
Opening them as text won't show you anything meaningful as they are binary data.
Microsoft has released the way this data is structured in .one files in the following documentation. You can use this to parse the binary file to obtain the information you need.
https://msdn.microsoft.com/en-us/library/dd924743(v=office.12).aspx
https://support.office.com/en-us/article/File-format-changes-in-OneNote-2016-for-Windows-a9129622-1755-470b-91e7-b2a461194036

The .one file format is super-complicated as it has to store images and all revisions, so it's binary and not XML-based like the rest of the office suite
That said if you do want to see the XML structure of the notebook or specific page content you can use OMSpy:
https://blogs.msdn.microsoft.com/johnguin/2011/07/28/onenote-spy-omspy-for-onenote-2010/
It works fine for 2016 Desktop.

Sharepoint List to PDF report

I have a SharePoint list and I need to transform it into a document (any type) and export it to PDF. Would you have any tips on the best way to do this? I have Crystal Reports but not sure if this is the correct use case for this.

You can programatically access the document library using the object model or via web services.
If you use the object model. You can use the SPContext object to get the current site/list. From there, you can iterate through the items or, you can use a method on the SPList object to turn it into a dataset which you could then use to generate a PDF using some kind of PDF library (e.g. PDF4NET). If you go this route the best way to roll it out is by packaging it up as a feature in a solution file (.WSP) which you can deploy to your farm. In this case the code would be running in the share point environment. You can get pretty fancy with this and have something like a "Print PDF" menu option in the action menu for all lists.
On the other hand, you could also access the list remotely using the web services. In such a case you could just use this as a data provider for your reporting package.

The PDFsharepoint tool (http://www.pdfsharepoint.com) can be used to generate the PDF output. Nice thing about this tool is that you will "design" or "import" a template using WYSIWYG editor and only map the data. Without mess of coding your own PDF generator. It is not free tool though ...

I have had much success using MS-Access for creating PDF reports from SharePoint lists. You can even embed the report as a view in the list. When you select the view, it opens Access for you. Plus you can join multiple lists and even other data from within access.
Access 2007 will save a report as a PDF or you can use a PDF printer adapter such as PDFCreator.

The easiest way to export SharePoint list to pdf is, first export the list as Excel file. Then save the Excel file as Pdf document.

There is a 3rd Party product that automates this.
i-PMO's "SharePoint Data Miner" can be used to create a RS Report across any list data, then use the their SharePoint site Report Viewer and Document publisher to output the report as a PDF into a Document Library.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string