How can I push Lotus Notes documents with rich text fields and file attachments to a MySQL database? - lotus-notes

We are using v9.0.1 and have one application that we must preserve for the time being. I am hoping to expose this database's documents with all content to a MySQL database for which we will develop a web front-end to present the records read-only. I know I can export the data to text files or to XML, but I want to retain the formatting of the rich text fields and the files that have been added to them (there are several per document). I've emailed the documents in an agent to a document management system, but the content of the rich text fields is stripped of any formatting and all the data I output is presented in a plain .txt file, but the attachments are preserved. I am hoping to provide a more elegant solution.
Any ideas will be appreciated.
Ginni

The Domino router can do rich text to HTML conversion, so if set up properly you won't lose all your formatting -- but you still won't get great fidelity. I used that technique in an archiving product that I built 15 years ago, but it was only "good-enough" fidelity, and it was only for Notes email messages. There would be more to do if you're dealing with application documents. It's been too long, so I can't recall any specific details.
If you can spend money on this, the migration tool from Genii Software will help. Look up their AppsFidelity Migrate product. There may be some other third-party products that you could work with, but this one comes from a company that has been 100% dedicated to dealing with Notes rich text for decades. If you call them, tell Ben that I sent you.

Related

Extract Keywords from Office Documents with Sharepoint Flow

I am trying to implement a document management system using Sharepoint. One major issue is that colleagues cannot find documents in the current setup (local fileserver). They have asked that we have a system that scans uploaded documents and automatically looks for keywords in them and then populates a "Meta" column.
I have had sort of success with OCR on image files, but getting keywords out of office documents (doc, xls etc.) I have had no success until now.
Is there a way to setup a flow to do this task for me?
any help is much aprechiated.
i tried "Get file metadata" and Azure "Text analysis", but it seems to take the raw data of the files (XML I assume) and returns that the document is to large to analyse.
There is something vague about this requirement - how is a keyword defined in a document?
Therefore, first obvious solution would be to assign keywords for each file upon uploading it. You may create a process for this with flow - have tasks, reminders and so on.
Automating this with OCR first means that you need to user OCR that works with MS flow you have only one choice - ElasticOCR. Then, in your flow
- feed the document content to the ElasticOCR action
- keep in mind that OCR is not 100% accurate
- analyze the generated text content according to your keyword definition
- finally write the meta back to the library in the corresponding columns.
Having worked on a similar requirement, we asked uploaders to publish their documents with a short abstract(column from the content type). The assumption is the abstract contains the keywords and is stored in a multi-line column - making it searchable site wide.

Convert IBM Lotus Notes file to text

How can I convert an .nsf lotus file to a text file? i want to write a java program to read .nsf file which is on my system. i have tried it simply but it is showing non english character is their any way to get access them normaly.
EDIT:
That code is in .net and using any server's session, I just want to read .nsf file by java without creating any server's session in fact i have .nsf database. i just want to read as a text file.if there a way to parse .nsf with javacc,it would be better enough......
lotus notes database is full of proprietary design components. Assuming you just want to export the data, you will need to write an agent, (aka batch process), that would look at all documents in the database and then export all the fields into a plain text file.
If you're into XML, you can export data in that format as well, but again, you will need to write an agent for that.
Alternatively, there are some basic builtin mechanisms in Lotus Notes to export data but this is restricted to running them from views. Views do no necessarily get all the documents. You can design a view to do that though.
Providing the size of the database is not extremely large, (less than 200k documents), you can create view listing all the columns you want to export and a view formula that has "Select #All", will give you all documents. Then, the "quickest" way to get data out from a view can be found here using simple export procedure.
There is still the issues of exporting rich media, you can have a look here for that.
You can export some data easily by selecting all the documents in a view, and then going to Edit > Copy Selected As Table. Then just paste the content into Excel or a text file.
To access the data beyond what is showing in a view, you can try a few other things:
Connect to the data using the NotesSQL driver
Connect via the COM api (using Java, C#, VB, etc)
Create some views in Lotus Notes that DO have the data you need, and use my recommendation above.
Install and and my Export to Excel application

Exporting data from Lotus Notes: C# interop, C, Java or LotusScript?

I'm about to export a lot of data from a Lotus Notes db, and I'm wondering if anyone can shed any light on how exactly I can move forward on this point.
Notes has some views (lists with custom templates?) of some kind - are these saved in .nsf files on the Domino server, or are the .nsf files for email only?
If the .nsf files are actually the database files, what would be the best language / development pack to use to pull data from them?
If you need full-time synchronization between an existing Notes infrastructure and a RDBMS, LEI (Lotus Enterprise Integrator) or a third-party tool like Notrix would be your best bet -- it's as simple as defining a job and a schedule/trigger to run it. If you need to occasionally pull (or push) a subset of the data, then NotesSQL is probably the easiest approach. If you're not afraid of learning the structure of the NSF (Notes Storage Facility), then the LotusScript/COM API or the Java/CORBA API would give you finer-grained control.
If what you really need is a one-time dump of everything, then exporting all of the data notes to DXL (Domino XML) would give you the most complete version of the data you're going to get, and in a way that would let you recover and convert formatted Notes Rich Text, file attachments, and so on in a way that would be incredibly difficult to achieve otherwise. DXL is verbose, so don't say I didn't warn you, but it is pretty comprehensive as well. (The DOmino Designer Help entry on the NotesDXLExporter class has example code that is exactly on point.)
It all depends on what language you're familiar with.
If you know LotusScript well, then that would be my first choice since it's the most integrated with the platform.
If you don't know LotusScript that well, but you know C#/Java/C really well...then you shouldn't have any trouble using any of those APIs (and they should all be able to get the job done equally as well).
In Lotus Notes Domino all the data is stored in the .nsf files. This is true for all Notes databases, not just email. The data is all stored in documents which are basically collections of named fields containing values. The views are simply ways of indexing and displaying collections of documents based of specific criteria. The views can also calculate values based on the value of a field in the documents.
The Notes LotusScript and Java APIs are essentially identical and would be the simplest way to programmatically access the data. The C API is much lower level and probably overkill for this kind of thing.
You could look at NotesSQL, if you want to create an ODBC connection to an NSF file to pull data into SQL or Access. If all the data is contained within the view you could simply select all the documents and click Edit > Copy Selected As Table and paste into Excel.
To answer your other questions: Notes views are similar to SQL views - essentially a query on the data stored within the NSF. NSF files contain both the data and the structure of the application in one file.

Append a dynamically changing watermark to a PDF in SharePoint

This is primarily a question of possibilities more than instructions. I'm a programming consultant working on a WSS project site system for my client. We have a document library in which files are uploaded to go through a complex approval process. With multiple stages in this process, we have an extra field which dictates what the current status of the document is.
Now, my client has become enamored with the idea of PDF watermarking. He wants the document (which is already a PDF) to be affixed with a watermark corresponding to the current status, such that with each stage of the approval process the watermark will change.
One method, the traditional method for PDF watermarking, of accomplishing this is to have one "clean" copy of the document somewhere hidden on the site, and create a new PDF from it that has the watermark at each stage of the approval process. Since the filename will never change, this new PDF can be uploaded continually to a public library, always overwriting the old version and simulating a "dynamically changing watermark". However, in the various stages there will also be people uploading clean copies with corrections and suggestions, nevermind the complex nature of juggling around two libraries and the fact we double the number of files stored. My client and I agree that this is not a practical path to choose.
What we would like to do is be able to "modify" the watermark in a PDF, so that we only have to keep one copy of the file. Unfortunately, from what I've seen, in most cases when you make something like a watermark, which in its nature is supposed to be "unmodifyable", you won't be able to edit it later. So, is it possible to have a part of a PDF which cannot be changed by anyone who downloads the file, but can be changed as part of a workflow or other object model process?
PDF Watermarking in SharePoint is a common request. I have written extensively on this topic. See:
Adding a dynamic watermark to a PDF file from a SharePoint Workflow
Adding a (static) watermark to a PDF file from a SharePoint Workflow
Use SharePoint Workflows to inject JavaScript into PDFs and print the ‘open date’
You could use Event Handlers such that code was run every time a document was checked in. In that code you could perform the fixup/check that made the watermark be what you wanted it to be. This assumes you can write code that manipulates a PDF's internal structure such that it has the watermark that you desire.
It sounds to me like you want to allow people to modify the PDF they download, but not modify its watermark. This is probably going to be nigh on impossible if the watermark is embedded in the PDF (afaict) but what if the watermark image is external to the PDF; is it possible to embed a watermark in a PDF that is sourced via HTTP? Then you could embed:
<watermark image="http://sharepoint/site/_vti_bin/docstatus.asmx?id=5">
Of course, I have no idea about PDFs, so this might not be possible but you get the concept.
-Oisin
It is possible to do so if you use third party tool. Then you can put dynamically binded value from your SharePoint metadata, conditions, rules etc: http://www.pdfsharepoint.com

How to generate application forms/documents programmatically?

At the moment, we use MS WORD and MS EXCEL to mail merge documents that needs to be sent to multiple recepients.
For example, say there is a complaint form where the complainant needs to fill in his/her name, address, etc. So we have a .doc file set up with the content and the dynamic entities set up for mail merging, with the name and address details put in an excel file, from where we can happily mail merge to generate all or just the necessary forms/documents.
However, I would like to automate this process, like a form in a website where the complainant can fill in his/her name, address and other details, and we could use that to generate the complaint form automatically and offer it to be downloaded (preferrably as a pdf).
Now, the only solution that comes to mind, is Latex, so that I can just replace the needed entities and just compile to PDF. However, that bit has to be negotiated with the webhost, if they are offering Latex or not.
Is there any other solution? Any other way we could get this done, with something that shouldn't be a problem for most webhosting solutions to offer?
EDIT: I would prefer a non .NET or rather non microsoft solution since, the servers are running linux and while mono might be capable of getting the job done, none of our devs know any .NET languages. However, if required we might have to dwelve into it.
Generating PDF using an XSL. Check the following: Apoc XSL-FO
You will need to create an XML file with the required fields and transform that with this tool.
If you wish to avoid .NET then XSL-FO is worth a look. Try the FOray project.
XSLT can be a steep learn if you do not have experience already. Also users will not be able to change the templates without asking the XSLT guru to do it.
If your templates are already in MS Word and MS Excel then I would stick with generating MS docs on the server. These are now easy to work with from code since OpenXML - check out OfficeOpenXML and OpenXMLDeveloper
Apache FOP : http://xmlgraphics.apache.org/fop/
I suggest generating rtf on the server: it's easy enough to automatically generate using cpan's RTF::Writer, has converters generating good pdf, can be edited by hand in word, oo-writer & TextEdit, doesn't have any really bad compatibility issues between the main editing applications, and has decent text & resource extraction tools, with text extraction being rather better than pdf.
There's some support for moving between rtf & latex, although the best rtf -> latex converter, docx2tex, depends on the System.IO.Packaging .net module, whose mono implementation isn't yet rock solid.
Postscript — Not a recommendation: it's too much of an unwieldy sledgehammer for this job, but iText will generate the pdf directly from the form data. If you wanted to do fancy things like signed pdf, that would be the way to go.
Postscript #2 — If you break up the Word document into individual files using word's master document representation, then you can clobber one of the parts with hand-generated content. This makes it easy to do something approximating form-filling on word .doc files using just standard file-utils and some trivial rtf->doc tweaking.

Resources