Crafter CMS Search Attachment With External Metadata Post Processor in repeating Group - crafter-cms

I have the following data model.
A Page type with a repeating group names files. Inside the repeating group, there is a node-selector to select files name file
Then I need to index the metadata of the page with the metadata of the file in order to do a search by files.
To accomplish this I'm using org.craftercms.cstudio.publishing.processor.SearchAttachmentWithExternalMetadataPostProcessor
The first document I attached works fine but any other File is not being indexed with the metadata of the page.
This is the Reference list I'm using
<property name="referenceXpathList">
<list>
<value>//file/item/value</value>
</list>
</property>
Even though my XPath expression should match all file.item.value I'm just getting the first match.

SearchAttachmentWithExternalMetadataPostProcessor expects each XML document to just have one associated binary file. In most cases it makes sense because the XML document contains metadata that's just specific to that file. So if the XPath returns a list it will select the first one. You can always extend the processor and make it so that the same XML metadata is associated to different files.

Related

Flatten File + Component + Page Metadata into one

I'm working with Crafter 2.5.10 with the following content model
Pages -> Includes Files and Components (File Component)
File Component - > Also includes other files
I need to perform a search (Solr query) of a given keyword(s) against files metadata, i.e. If I search for "Potato" and I have a PDF file with the word "Potato" inside of it, that would come as result match.
When I get the result of the mentioned query I need to the Information of the page where this file is located (included).
Using the SearchAttachmentWithExternalMetadataPostProcessor I'm able to get the metadata of the files that are included directly into the page. But for the files that are included into File Components, I only get the information that is coming from the component that includes the file.
Is there a way to merge the metadata of the file + the parent component + the parent page component
If you want something like page XML + component XML associated to file + file content itself in a single Solr document it's not possible, because there's no access to the extracted file content when indexing, the extraction is done by Solr and is completely separate from page indexing.
I think you have basically 2 options: search for the page associated to the component/file after doing the first query, or create a processor that adds some of the page metadata when indexing the component/file.

Array of attachment type - how to get a filename for highlighted fragment?

I use ElasticSearch to index resources. I create document for each indexed resource. Each resource can contain meta-data and an array of binary files. I decided to handle these binary files with attachment type. Meta-data is mapped to simple fields of string type. Binary files are mapped to array field of attachment type (field named attachments). Everything works fine - I can find my resources based on contents of binary files.
Another ElasticSearch's feature I use is highlighting. I managed to successfully configure highlighting for both meta-data and binary files, but...
When I ask for highlighted fragments of my attachments field I only get fragments of these files without any information about source of the fragment (there are many files in attachment array field). I need mapping between highlighted fragment and element of attachment array - for instance the name of the file or at least the index in array.
What I get:
"attachments" => ["Fragment <em>number</em> one", "Fragment <em>number</em> two"]
What I need:
"attachments" => [("file_one.pdf", "Fragment <em>number</em> one"), ("file_two.pdf", "Fragment <em>number</em> two")]
Without such mapping, the user of application knows that particular resource contains files with keyword but has no indication about the name of the file.
Is it possible to achieve what I need using ElasticSearch? How?
Thanks in advance.
So what you want here is to store the filename.
Did you send the filename in your json document? Something like:
{
"my_attachment" : {
"_content_type" : "application/pdf",
"_name" : "resource/name/of/my.pdf",
"content" : "... base64 encoded attachment ..."
}
}
If so, you can probably ask for field my_attachment._name.
If it's not the right answer, can you refine a little your question and give a JSON sample document (without the base64 content) and your mapping if any?
UPDATE:
When it come from an array of attachments you can't get from each file it comes because everything is flatten behind the scene. If you really need that, you may want to have a look at nested fields instead.

Custom Metadatafield in Document and Web content in Liferay

I want a metadata field getting values from database record. This metadata field should be added to document.
Can anyone provide a solution to my requirement.??
I presume you are using Liferay 6.1.
Web Content Structures
As for Web Content, you could programmatically create a JournalStructure (see JournalStuctureLocalServiceUtil) and populate the list of possible values for your structure field with values coming out of the database. You can put this "import code" inside a batch job, so your structure field and the values inside the external database are always in sync.
Document Metadata
How to do this with Metadata Sets is probably more interesting, as not only Dynamic Data Lists and Documents & Media use this in Liferay 6.1; as of 6.2, Web Content structures will utilize the same metadata API in favor of the old Journal API.
For this to implement, check out the xsd column of the DDMStructure table. It has more or less the same format as the XML for a JournalStructure, however there are more options available. Use DDMStructureLocalServiceUtil#addStructure to add such a new structure. Again, run this inside a batch so you always have the latest external DB values.

XML Schema: How to validate an attribute with multiple keys concatenated?

Let's say I can get XML like this:
<Property Name="Title"/>
<Property Name="Content"/>
<Property Name="Address"/>
<Source properties="Title,Content,Address"/>
How coud I validate the "properties" attribute of "Source", so that any composition of the above listed "Property" items could be checked? (For example: "Title", "Title,Content", all of these concatenations are correct, while "Title, URL" is not correct.)
You can't do that within XML Schema. You can do it with your own higher level of validation based on XSLT, XQuery or Schematron, for example.
xan is right; validating always means, to match a XML file against a given schema. But there is no schema involved here, your problem is instead, to read a data file, and validate later entries against earlier ones (if the box above is supposed to represent one file) or one data file against another data file (if the gap is supposed to be a file separator). Beyond that, a schema defines the structure of elements and attributes and optionally data types (values only, if there is a strict enumeration of valid values). Also no match here, instead you want to verify data against data. Sorry, the tool of a schema mismatches the problem to solve.

Redefine folder structure of document library with metadata

I have a problem in my sharepoint document library structure. Currently the document library consiste of folder sub-folder structure to store a document categorywise. Now our client want to redefine this folder structure with a metadata structure.
Can any one tell me how can I use metadata instade of folder sub folder structure..?
any related articles or links will be appriciated.
Thanks
Sachin
As already stated, you need to use columns for the metadata, preferably through a new Content Type. After creating this Content Type, you need to attach it to the library and convert all documents to it. Lastly, you also need to modify the views of the library, e.g. depending on your metadata you might only want to display certain columns or filter them.
There is an excellent whitepaper from Microsoft on Content Types available here:
http://technet.microsoft.com/en-us/library/cc262729.aspx
You can also read more about content type planning on Technet:
http://technet.microsoft.com/en-us/library/cc262735.aspx
And here's some info about Views:
http://office.microsoft.com/en-us/sharepointtechnology/HA100215771033.aspx
You must define columns for the metadata fields you want to have, create a content type that includes these columns, and assign this content type to your documents.
You might also change the default view of your document library, or create a new view, to make the new metadata columns visible.

Resources