Flatten File + Component + Page Metadata into one - crafter-cms

I'm working with Crafter 2.5.10 with the following content model
Pages -> Includes Files and Components (File Component)
File Component - > Also includes other files
I need to perform a search (Solr query) of a given keyword(s) against files metadata, i.e. If I search for "Potato" and I have a PDF file with the word "Potato" inside of it, that would come as result match.
When I get the result of the mentioned query I need to the Information of the page where this file is located (included).
Using the SearchAttachmentWithExternalMetadataPostProcessor I'm able to get the metadata of the files that are included directly into the page. But for the files that are included into File Components, I only get the information that is coming from the component that includes the file.
Is there a way to merge the metadata of the file + the parent component + the parent page component

If you want something like page XML + component XML associated to file + file content itself in a single Solr document it's not possible, because there's no access to the extracted file content when indexing, the extraction is done by Solr and is completely separate from page indexing.
I think you have basically 2 options: search for the page associated to the component/file after doing the first query, or create a processor that adds some of the page metadata when indexing the component/file.

Related

Crafter CMS Search Attachment With External Metadata Post Processor in repeating Group

I have the following data model.
A Page type with a repeating group names files. Inside the repeating group, there is a node-selector to select files name file
Then I need to index the metadata of the page with the metadata of the file in order to do a search by files.
To accomplish this I'm using org.craftercms.cstudio.publishing.processor.SearchAttachmentWithExternalMetadataPostProcessor
The first document I attached works fine but any other File is not being indexed with the metadata of the page.
This is the Reference list I'm using
<property name="referenceXpathList">
<list>
<value>//file/item/value</value>
</list>
</property>
Even though my XPath expression should match all file.item.value I'm just getting the first match.
SearchAttachmentWithExternalMetadataPostProcessor expects each XML document to just have one associated binary file. In most cases it makes sense because the XML document contains metadata that's just specific to that file. So if the XPath returns a list it will select the first one. You can always extend the processor and make it so that the same XML metadata is associated to different files.

How to prevent saving same named file on CouchDb?

I am using CouchDB with Divan - C# interfacing library for CouchDb.
A file can be uploaded many times on CouchDb. Every time the "id" is changed after file is uploaded, but the "rev" remains the same.
This happens even if all custom attributes defined for file being uploaded are same any existing file on CouchDb with same name.
Is there any way that can avoid uploading same named file if all custom attributes are same? Fetching all files and checking them for file name repetition could be a way, but definitely not preferable for its required time depending on other factors.
Thanking you.
Let's say you have 3 attributes for a file :
name
size in bytes
Date of modification
I see two main possibilities to avoid duplicates in your database.
Client approach
You query the database to check if the document with the same attributes exists with a view. If it's not existing, create it.
User defined id
You could generate an id from the attributes as this library is doing.
For example, if my document has those attributes :
"name":"test.txt",
"size":"512",
"lastModified":"2016-11-08T15:44:29.563Z"
You could build a unique id like this :
"_id":"test.txt/2016-11-08T15:44:29.563Z/512"

Array of attachment type - how to get a filename for highlighted fragment?

I use ElasticSearch to index resources. I create document for each indexed resource. Each resource can contain meta-data and an array of binary files. I decided to handle these binary files with attachment type. Meta-data is mapped to simple fields of string type. Binary files are mapped to array field of attachment type (field named attachments). Everything works fine - I can find my resources based on contents of binary files.
Another ElasticSearch's feature I use is highlighting. I managed to successfully configure highlighting for both meta-data and binary files, but...
When I ask for highlighted fragments of my attachments field I only get fragments of these files without any information about source of the fragment (there are many files in attachment array field). I need mapping between highlighted fragment and element of attachment array - for instance the name of the file or at least the index in array.
What I get:
"attachments" => ["Fragment <em>number</em> one", "Fragment <em>number</em> two"]
What I need:
"attachments" => [("file_one.pdf", "Fragment <em>number</em> one"), ("file_two.pdf", "Fragment <em>number</em> two")]
Without such mapping, the user of application knows that particular resource contains files with keyword but has no indication about the name of the file.
Is it possible to achieve what I need using ElasticSearch? How?
Thanks in advance.
So what you want here is to store the filename.
Did you send the filename in your json document? Something like:
{
"my_attachment" : {
"_content_type" : "application/pdf",
"_name" : "resource/name/of/my.pdf",
"content" : "... base64 encoded attachment ..."
}
}
If so, you can probably ask for field my_attachment._name.
If it's not the right answer, can you refine a little your question and give a JSON sample document (without the base64 content) and your mapping if any?
UPDATE:
When it come from an array of attachments you can't get from each file it comes because everything is flatten behind the scene. If you really need that, you may want to have a look at nested fields instead.

Custom Metadatafield in Document and Web content in Liferay

I want a metadata field getting values from database record. This metadata field should be added to document.
Can anyone provide a solution to my requirement.??
I presume you are using Liferay 6.1.
Web Content Structures
As for Web Content, you could programmatically create a JournalStructure (see JournalStuctureLocalServiceUtil) and populate the list of possible values for your structure field with values coming out of the database. You can put this "import code" inside a batch job, so your structure field and the values inside the external database are always in sync.
Document Metadata
How to do this with Metadata Sets is probably more interesting, as not only Dynamic Data Lists and Documents & Media use this in Liferay 6.1; as of 6.2, Web Content structures will utilize the same metadata API in favor of the old Journal API.
For this to implement, check out the xsd column of the DDMStructure table. It has more or less the same format as the XML for a JournalStructure, however there are more options available. Use DDMStructureLocalServiceUtil#addStructure to add such a new structure. Again, run this inside a batch so you always have the latest external DB values.

Redefine folder structure of document library with metadata

I have a problem in my sharepoint document library structure. Currently the document library consiste of folder sub-folder structure to store a document categorywise. Now our client want to redefine this folder structure with a metadata structure.
Can any one tell me how can I use metadata instade of folder sub folder structure..?
any related articles or links will be appriciated.
Thanks
Sachin
As already stated, you need to use columns for the metadata, preferably through a new Content Type. After creating this Content Type, you need to attach it to the library and convert all documents to it. Lastly, you also need to modify the views of the library, e.g. depending on your metadata you might only want to display certain columns or filter them.
There is an excellent whitepaper from Microsoft on Content Types available here:
http://technet.microsoft.com/en-us/library/cc262729.aspx
You can also read more about content type planning on Technet:
http://technet.microsoft.com/en-us/library/cc262735.aspx
And here's some info about Views:
http://office.microsoft.com/en-us/sharepointtechnology/HA100215771033.aspx
You must define columns for the metadata fields you want to have, create a content type that includes these columns, and assign this content type to your documents.
You might also change the default view of your document library, or create a new view, to make the new metadata columns visible.

Resources