Stormcrawler: Storing Outlinks of crawled Pages in Elasticsearch - stormcrawler

I would like to store the outlink Array (URLs, Anchors) in the content Index of ElasticSearch. Can i just add a new line to indexer.md.mapping for this or is it necessary to create a new parsefilter to store Outlinks of crawled Pages?

That's a great question thanks!
The indexer generates the fields for ES from the metadata. To store the outlinks in the content index, you'll need to create a custom ParseFilter and turn the outlinks from ParseResult to key/values in the metadata, then configure indexer.md.mapping accordingly.

Related

Is there any API in Liferay that can be used to create the Journal Article XML Content?

We have a custom Journal Article web content structure and currently using the hard-coded XML content to populate the structure with the values. Is there any API in Liferay that we can make use in order to create the XML content for Journal Article.
I guess you are looking for this guy: com.liferay.dynamic.data.mapping.util.FieldsToDDMFormValuesC‌​onverter
DDMFormValues ddmFormValues = DDMUtil.getDDMFormValues(ddmStructure.getStructureId(), ...);
Fields fields = DDMUtil.getFields(ddmStructure.getStructureId(), ddmFormValues);
String content = _journalConverter.getContent(ddmStructure, fields)

GSA: index the e-commerce sites and display the results in sort by price?

We want to index few public e-commerce sites. When our customers search any one of the product the results, should display sort by pricing from all indexed e-commerce sites.
From My Understanding: The public e-commerce sites have different meta tag for pricing i cannot even consolidate into one meta tag.
Is there possible to Feed via XML, but don't have much idea inside how to achieve? we don't have db access to parse only required data
Via Entity recognition how i can able to index the price as a meta tag ?
Could u please advice us, whether it is achievable or not? If yes, which one is the best solution and refer document for this.
Ignoring the sorting issue and just concentrating on normalising the price metadata problem. You need a way to read the price from whatever metadata field it's in and create a new metadata field with a common name and the same value.
There are a few ways to do this but the simplest are probably:
Generate a Meta-and-url feed for each document and add in the normalised metadata
Crawl via a proxy that can add a X-GSA-External-Metadata header in containing the normalised metadata

How do we eager fetch content fields from the Orchard CMS Queries user interface

When using the Orchard UI to create queries, how do we eager fetch fields? For instance, we have a content type that has an input field, link field, media library picker field, date time field, and text field. None of these fields are terribly complex. How do we eager load fields from the UI? I've looked at the Eager Load filter, and based on intuition, have considered eager loading the FieldIndexPartRecord.
You don't need to: fields are stored in the Infoset, which is a XML data blob stored on the content item's record. Therefore, it is always eager-fetched. Do you have profile data indicating otherwise?

Custom Metadatafield in Document and Web content in Liferay

I want a metadata field getting values from database record. This metadata field should be added to document.
Can anyone provide a solution to my requirement.??
I presume you are using Liferay 6.1.
Web Content Structures
As for Web Content, you could programmatically create a JournalStructure (see JournalStuctureLocalServiceUtil) and populate the list of possible values for your structure field with values coming out of the database. You can put this "import code" inside a batch job, so your structure field and the values inside the external database are always in sync.
Document Metadata
How to do this with Metadata Sets is probably more interesting, as not only Dynamic Data Lists and Documents & Media use this in Liferay 6.1; as of 6.2, Web Content structures will utilize the same metadata API in favor of the old Journal API.
For this to implement, check out the xsd column of the DDMStructure table. It has more or less the same format as the XML for a JournalStructure, however there are more options available. Use DDMStructureLocalServiceUtil#addStructure to add such a new structure. Again, run this inside a batch so you always have the latest external DB values.

ows_editior missing in Metadata Property Mappings in SharePoint 2010

I'm trying to map a Managed Property to the Modified By field in document library's. I can see that the internal name of the field is Editor. But I can't find ows_editor in crawled Properties.
Is there a way of adding it to the crawled properties, and how?
Upload any document to the library where the mapping column exist.
Then perform a full crawl and you will be able to see the required crawled properties.

Resources