Issue in retrieving Metatags - Nutch 2.3 version - meta-tags

I'm using Nutch2.3-src version. Am able to crawl the webpages, but it is only taking description, and not other metatags like LastModified, Author.
I have updated Index.metadata and metatags.names property. But still no luck. Getting only null as value.
<property>
<name>metatags.names</name>
<value>*</value>
<description>Names of the metatags to extract, separated by ','.
Use '*' to extract all metatags. Prefixes the names with 'meta_' in
the parse-metadata. For instance, to index description and keywords,
you need to activate the plugins parse-metadata and index-metadata
and set the value of the properties 'metatags.names' and
'index.metadata' to 'description,keywords'.
</description>
</property>
<property>
<name>index.metadata</name>
<value>description,LastModified,Created,WCMCategories,WCMKeywords,Authors,SiteName,title,lastmodified,created,wcmcategories,wcmkeywords,authors,sitename,meta_description,meta_LastModified,meta_Created,meta_WCMCategories,meta_WCMKeywords,meta_Authors,meta_SiteName,meta_title,meta_lastmodified,meta_created,meta_wcmcategories,meta_wcmkeywords,meta_authors,meta_sitename</value>
<description>
Comma-separated list of keys to be taken from the metadata to generate fields.
Can be used e.g. for 'description' or 'keywords' provided that these values are generated
by a parser (see parse-metatags plugin), and property 'metatags.names'.
</description>
</property>

Resolved this issue. Metatags were case-sensitive. The attribute name should match in both webpage and nutch-site.xml.

Related

What is the use of "hmcIndexField" in Hybris?

I noticed that some attributes contain within their definition a hmc custom property:
<custom-properties>
<property name="hmcIndexField">
<value>"thefield"</value>
</property>
</custom-properties>
Can someone explain why is this custom property needed and when should it be used ?
It is an deprecated attribute for hmc search configuration. You can find all information here:
Lucene Search HMC Hybris4
In common the page says that you can define customized searches in the hmc. With this property you define which attributes will be searchable wit the LuceneSearch.

Crafter CMS Search Attachment With External Metadata Post Processor in repeating Group

I have the following data model.
A Page type with a repeating group names files. Inside the repeating group, there is a node-selector to select files name file
Then I need to index the metadata of the page with the metadata of the file in order to do a search by files.
To accomplish this I'm using org.craftercms.cstudio.publishing.processor.SearchAttachmentWithExternalMetadataPostProcessor
The first document I attached works fine but any other File is not being indexed with the metadata of the page.
This is the Reference list I'm using
<property name="referenceXpathList">
<list>
<value>//file/item/value</value>
</list>
</property>
Even though my XPath expression should match all file.item.value I'm just getting the first match.
SearchAttachmentWithExternalMetadataPostProcessor expects each XML document to just have one associated binary file. In most cases it makes sense because the XML document contains metadata that's just specific to that file. So if the XPath returns a list it will select the first one. You can always extend the processor and make it so that the same XML metadata is associated to different files.

BDC model/search connector and multi value field with refinment

BDC model:
My BDC model's entity has a property named Color.
The TypeName is specified as System.String[].
<TypeDescriptor Name="Color" TypeName="System.String[]">
<Properties>
<Property Name="RequiredInForms" Type="System.Boolean">false</Property>
</Properties>
</TypeDescriptor>
Database:
In my database (my BDC content source) I added column values like this one:
;#Blue;#Green;#Yellow;#
Search Schema
I created a new managed property and enabled multiple values (and also refinable - active, queryable, retrievable, safe).
Search Results
Filtering on a specific color via search works.
Example: RsExpAdvWorksProductColor:"blue"
Search Refinement
However I cannot refine on colors.
Adding a refiner on my Managed Property shows up like that:
Color
;#Blue;#Green;#Yellow;#
;#Green;#Yellow;#
;#Red;#Green;#Yellow;#Blue;#Black;#Cyan;#
Obviously the single values are not treated as such - the whole "string" of "special-delimiter" separated values is being shown as a refinment criteria.
Any hints?
Update 2015-03-20: I took a closer look at the built-in multi choice columns. In search results they are being returned as "Value1;#Value2;#" and so on. Basically there is a trailing Red;#Blue;# separator - no leading ;#Red;#Blue;# one. Much to my regret that didn't solve my problem.
Update 2015-03-20: Surprise surprise. It is in fact "working as designed" (like so many things in SharePoint :P). What I am looking for has to be dealt with separately. It behaves exactly the same with built-in multi choice fields so there is nothing wrong with my BDC/Search integration.
Regarding the refiner, have a look at the following links...
http://www.eliostruyf.com/part-6-create-multi-value-search-refiner-control/
https://hyankov.wordpress.com/2014/12/15/sharepoint-2013-refiner-multi-value-contains-instead-of-an-equals/

Making subtags dependent on an attribute of a parent in XML Schema

I have created an XML file like the following
<monitor>
<widget name="Widgets/TestWidget1">
<state code="VIC" />
<state code="TAS" />
</widget>
<widget name="Widgets/TestWidget2">
<client code="someclient" />
</widget>
</monitor>
The name attribute of the <widget> tag tells the parser what widget to load (they are asp.net user controls).
I am trying to create a schema file for the above, the problem is that inside the <widget> the supported subtags are dependent on the name attribute. So TestWidget1 supports the <state> tag and TestWidget2 supports the <client tag.
Currently my XML Schema file just displays all possible <widget> subtags regardless of whether they are supported or not.
How can I write an XML schema file that will only allow specific subtags based on the name attribute? If this is not possible, what options do I have?
You have several options. The simplest and most direct is to re-think your problem a bit. If the legal content of element E1 and the legal content of element E2 are different, then the simplest design is to call them different things, because in XSD as in DTDs the legal content of an element depends on the element type name. A devil's advocate would ask you "if you want different kinds of widget to obey different rules, why are you telling the validator that they are the same kind of widget? Tell the validator the truth, by giving them different names. So don't call them and so on, call them and ."
In XSD 1.1 you can also use conditional type assignment or assertions to define constraints on the legal combinations of attributes and children, but not every schema-aware editor is going to have the chops necessary to analyse the conditional type assignment rules and attributes and understand what to prompt you with.

XML Schema: How to validate an attribute with multiple keys concatenated?

Let's say I can get XML like this:
<Property Name="Title"/>
<Property Name="Content"/>
<Property Name="Address"/>
<Source properties="Title,Content,Address"/>
How coud I validate the "properties" attribute of "Source", so that any composition of the above listed "Property" items could be checked? (For example: "Title", "Title,Content", all of these concatenations are correct, while "Title, URL" is not correct.)
You can't do that within XML Schema. You can do it with your own higher level of validation based on XSLT, XQuery or Schematron, for example.
xan is right; validating always means, to match a XML file against a given schema. But there is no schema involved here, your problem is instead, to read a data file, and validate later entries against earlier ones (if the box above is supposed to represent one file) or one data file against another data file (if the gap is supposed to be a file separator). Beyond that, a schema defines the structure of elements and attributes and optionally data types (values only, if there is a strict enumeration of valid values). Also no match here, instead you want to verify data against data. Sorry, the tool of a schema mismatches the problem to solve.

Resources