I am looking for a tool or a way (.NET) to add custom XMP fields. Also, can someone explain the purpose of needing to know if the XMP tag is a textfield, textarea or a select?
XMP is written inside files as an XML packet or as a separate XML file. The XMP specification uses a subset of RDF/XML. So you could look at (RDF/)XML manipulation tools.
For embedded XPackets however, the packet length needs to be calculated and written at the start of the packet, so it may help to have a purpose built library. Adobe provides an XMP SDK (C++) for that.
XMP supports several content types for fields, like Text, Number or URL. Text fields, for example, could be restricted to values from a controlled vocabulary, for which it may make sense to use a select or dropdown form element in a GUI.
Related
I am new to Solr, but I suppose that there is an easy way to index SVG files with Solr. I have installed Solr 6.3.0 and I am using an example 'files' core. It works well, but it seems that it parses the SVG files as plain text.
Is there an easy way to take only the texts between the text tags?
Ideally, I want to combine some meta data from a JSON file with the text from the SVG files. The JSON file looks like:
{
"id":"000001",
"title":"Some diagram",
...
} ...
The associated svg file is 000001.svg.Is there a way to create a scheme in Solr, that can take the fields from the json and merge a field with the text from the SVG file?
The most flexible way that will do what you want is to write a custom indexing utility that parses your JSON, picks up the SVG and extracts the relevant elements, then submits the complete structure to Solr. Depending on your programming language of choice you'll do this with something like SolrJ, Solrnet or another client library.
This is way more flexible and maintainable than integrating it directly into Solr, but if you want to do custom SVG indexing (without the additional JSON), you could use the XSLT support in the regular update handler, or using an XPathEntityProcessor in a DataImportHandler configuration.
My choice would be the custom indexing code.
In Lotuscript you can manipulate design elements - create them, change them, rename them, etc.
Are you able to do the same thing for Xpages and custom controls design elements?
====================================================================
My question should have been clearer. What I want to accomplish is to copy an existing cc and give it a new name, programatically. The app will then close and reopen (or refresh or get rebuilt) so that the app can "see" the new cc. If I copy the cc it will only have one field on it. I will add custom code later. I could just create a new cc with no code in it, that would work too.
I am not familiar with the DXL exporter but I can research it. Using that can I just export the design of the cc to an XML file in a temp directory, use the transform to change the name, and then import the control?
I think the XPage or Custom Control design elements are probably under MISC_CODE or MISC_FORMAT design elements in a NoteCollection.
However, accessing that design element is the easy part. Doing a create / rename / change etc is a much bigger task.
Remember that the XPage or Custom Control XML file is only a starting point:
XPages and Custom Controls also have a .xsp.metadata file, as you'll see with source control.
Custom Controls will also have (and need) a .xsp-config file.
There are corresponding .java files for every XPage and Custom Control in the Local source folder. They're created by a builder based on parsing the XML. I don't think you'll be able to create those programmatically. I'm not sure of the impact of renaming them.
For Custom Controls, even if you can rename the .java file, it's referenced in the .java files of relevant XPages. Updating those is goiong to be a significant task.
The XPages runtime doesn't even use those .java files. Instead it uses the .class files in WebContent\WEB-INF (you need to use Project Explorer view and modify the filter to see those files). This is compiled byte code, so you won't be able to update the .class files for XPages containing renamed Custom Controls, as far as I know.
Even if you can rename the .class files, the XPages runtime almost certainly won't use them until either a Clean (which will overwrite anything you've done) or an HTTP restart. As far as I can tell they're cached.
Depending on your use cases, it's possible not all these points will be an issue, e.g. if you're modifying the XML files and building with headless designer.
I suspect this is why nothing was added to the NoteCollection object or a specific NotesXPage / NotesCustomControl API class added.
In Lotuscript you can manipulate design elements - create them, change them, rename them, etc.
This is only partially true. There is a LS API to create/alter views and outlines. Good luck with other design elements - although they're standard "notes", so you can access their items, in most cases you won't compile them and there will be some problems with signatures (real experience with TeamStudio CIAO).
Your question has two points of view - do you want to alter design elements in design process or alter running application?
To help a designer you can go the way of Eclipse extensions and enrich tools in IBM Designer to help developer. Something like TeamStudio Designer. In this case you need to look for source design elements, mentioned by Paul.
To enrich application you don't need to alter source design elements. IBM Designer transforms XML in source code to a Java code (JSF framework) - so you can generate your Java code from anything you wish. Take a look inside Local\xsp folder of NSF in Package explorer. You will find Java sources made from your XPages and Custom Controls. So if you don't need to work with design elements, go for Java components - they can be built on the fly.
And of course, there is always the option of DXL framework - so you can clone/alter design of the application through XML transformations. Good starting point: http://www-10.lotus.com/ldd/ddwiki.nsf/dx/ls-design-programming.htm
With Windows 8.1 release, there are some new API changes/Added. As per new Addition, there is new feature called as "XAML Binary Format" which will improve performance of rendering on screen. XamlBinaryWriter class is responsible to convert into XAML Binary Format.All the XAML files will be converted to XBF. Has Any one Tried in Converting XBF file into XAML File. I have some dependency on XAML File.I cannot proceed without in XAML format. Please let me know how to convert XBF to XAML File.
As a starting point, download and install Microsoft's .NetNative, the ReducerEngine.dll installed as a part of that thing includes some primitive implementation of the decompiler.
However, the MS' implementation is very poor, it doesn't even support XAML namespaces. You can use the Microsoft's implementation to learn the structure of an XBF file, for decompiling however I suggest you implement your own solution. It's not that hard, mine is about 1000 lines of code in 12 C# files.
XBF files are rather simple. They contain a fixed header, followed by 6 lookup tables (strings, assemblies, type namespaces, types, properties, XML namespaces), followed by the DOM tree part, where objects reference values from those tables by integer keys.
P.S. The most interesting question I have about that, is why did Microsoft choose to reinvent the wheel instead of using their .NET Binary XML format or the subset of it? They have binary XML implementation for many years, and technically it's better format then XBF.
I am new to this topic, but my requirement is to parse documents of different types(Html, pdf,txt) using a crawlers. please suggest me what crawler to use for my requirement and provide me some tutorial s or some example how to parse the document using crawlers.
Thankyou.
This is a very broad question, so my answer is also very broad and only touches the surface.
It all comes down to two steps, (1) extracting the data from its source, and (2) matching and parsing the relevant data.
1a. Extracting data from the web
There are many ways to scrape data from the web. Different strategies can be used depending if the source is static or dynamic.
If the data is on static pages, you can download the HTML source for all the pages (automated, not manually) and then extract the data out of the HTML source. Downloading the HTML source can be done with many different tools (in many different languages), even a simple wget or curl will do.
If the data is on a dynamic page (for example, if the data is behind some forms that you need to do a database query to view it) then a good strategy is to use an automated web scraping or testing tool. There are many of these.
See this list of Automated Data Collection resources [1]. If you use such a tool, you can extract the data right away, you usually don't have the intermediate step of explicitly saving the HTML source to disk and then parsing it afterwards.
1b. Extracting data from PDF
Try Tabula first. It's an open source web application that lets you visually extract tabular data from PDFs.
If your PDF doesn't have its data neatly structured in simple tables or you have way too much data for Tabula to be feasible, then I recommend using the *NIX command-line tool pdftotext for converting Portable Document Format (PDF) files to plain text.
Use the command man pdftotext to see the manual page for the tool. One useful option is the -layout option which tries to preserve the original layout in the text output. The default option is to "undo" the physical layout of the document, and instead output the text in reading order.
1c. Extracting data from spreadsheet
Try xls2text for converting to text.
2. Parsing the (HTML/text) data
For parsing the data, there are also many options. For example, you can use a combination of grep and sed, or the BeautifulSoup Python library` if you're dealing with HTML source, but don't limit yourself to these options, you can use a language or tool that you're familiar with.
When you're parsing and extracting the data, you're essentially doing pattern matching.
Look for unique patterns that make it easy to isolate the data you're after.
One method of course is regular expressions. Say I want to extract email addresses from a text file named file.
egrep -io "\b[A-Z0-9._%+-]+#[A-Z0-9.-]+\.[A-Z]{2,4}\b" file
The above command will print the email addresses [2]. If you instead want to save them to a file, append > filename to the end of the command.
[1] Note that this list is not an exhaustive list. It's missing many options.
[2] This regular expression isn't bulletproof, there are some extreme cases it will not cover.
Alternatively, you can use a script that I've created which is much better for extracting email addresses from text files. It's more accurate at finding email addresses, easier to use, and you can pass it multiple files at once. You can access it here: https://gist.github.com/dideler/5219706
At the moment, we use MS WORD and MS EXCEL to mail merge documents that needs to be sent to multiple recepients.
For example, say there is a complaint form where the complainant needs to fill in his/her name, address, etc. So we have a .doc file set up with the content and the dynamic entities set up for mail merging, with the name and address details put in an excel file, from where we can happily mail merge to generate all or just the necessary forms/documents.
However, I would like to automate this process, like a form in a website where the complainant can fill in his/her name, address and other details, and we could use that to generate the complaint form automatically and offer it to be downloaded (preferrably as a pdf).
Now, the only solution that comes to mind, is Latex, so that I can just replace the needed entities and just compile to PDF. However, that bit has to be negotiated with the webhost, if they are offering Latex or not.
Is there any other solution? Any other way we could get this done, with something that shouldn't be a problem for most webhosting solutions to offer?
EDIT: I would prefer a non .NET or rather non microsoft solution since, the servers are running linux and while mono might be capable of getting the job done, none of our devs know any .NET languages. However, if required we might have to dwelve into it.
Generating PDF using an XSL. Check the following: Apoc XSL-FO
You will need to create an XML file with the required fields and transform that with this tool.
If you wish to avoid .NET then XSL-FO is worth a look. Try the FOray project.
XSLT can be a steep learn if you do not have experience already. Also users will not be able to change the templates without asking the XSLT guru to do it.
If your templates are already in MS Word and MS Excel then I would stick with generating MS docs on the server. These are now easy to work with from code since OpenXML - check out OfficeOpenXML and OpenXMLDeveloper
Apache FOP : http://xmlgraphics.apache.org/fop/
I suggest generating rtf on the server: it's easy enough to automatically generate using cpan's RTF::Writer, has converters generating good pdf, can be edited by hand in word, oo-writer & TextEdit, doesn't have any really bad compatibility issues between the main editing applications, and has decent text & resource extraction tools, with text extraction being rather better than pdf.
There's some support for moving between rtf & latex, although the best rtf -> latex converter, docx2tex, depends on the System.IO.Packaging .net module, whose mono implementation isn't yet rock solid.
Postscript — Not a recommendation: it's too much of an unwieldy sledgehammer for this job, but iText will generate the pdf directly from the form data. If you wanted to do fancy things like signed pdf, that would be the way to go.
Postscript #2 — If you break up the Word document into individual files using word's master document representation, then you can clobber one of the parts with hand-generated content. This makes it easy to do something approximating form-filling on word .doc files using just standard file-utils and some trivial rtf->doc tweaking.