Implement Search Everything using Solr - search

How the search everything kind of application is indexing & keeping track of data into its search indexes.
Recently I have been working on Apache Solr which is producing amazing results for a search. But it was for one particular products catalog section that is being searched. As Solr is a stores it's data document, we indexed searchable fields as document in solr. I'm not sure how it can be used to build a search everything kind of search? And how should I index data into Solr?
By search everything I mean, to search into different module for information like Customers, Services, Accounts, Orders, Catalog, Support Ticket, etc. So search return results which is combined as a result from a single search form and user don't need to go into different forms for search that module?
Do I need to build different indexes for each such data models or store them into solr as single document? What is the best strategy to implement this.

You can store all that data in a single index with each document having an extra field that stores its type (Customer, Order, etc.). For the within-module search, just restrict the search query to documents of that type. For the Search All functionality, use copyField to copy all the relevant fields in each document type into one big field, and search with the document type field unconstrained.


Smart search results behaviour of compound index of multiple page types

Can someone confirm the behaviour of the Smart search results webpart when using a Smart search filter on a particular field, documentation here, when the index, and the expected results, are compound of multiple page types?
In my scenario I have 2 page types, one is always a child of the other, my hypothetical scenario would be a Folder and File types as an example.
I've configured the index with Pages type and Standard analyzer to include all Folder and File types under the path /MyOS/% on the tree.
The search page, includes the Smart search results webpart and a Smart search filter, a checkbox for the File's field FileIsHidden.
What I'm trying to ascertain is the possibility for the results to include all folders that have a hidden field, as well as the files?
Client has a v8.2 license and now has a requirement similar to this scenario.
Thanks so much for any help in advance.
Firstly what i would do is download the latest version of LUKE, it's a lucene inspector that allows you to run queries, inspect the data, etc.
Your search indexes are in the App_Data/Modules/SmartSearch/[SearchName], now i am not sure if LUKE can query 2 indexes as the same time, however you can run hte same query against both and see if it's filtering out results one way or another.
If you are trying to query where a field must be a value, and the other page type does not have the field, it probably is filtered out. What you need to do is use the lucene syntax to say "(classname = 'cms.file' and fileonlyproperty = '' OR classname <> 'cms.file')" so to say.
You'll have to test, but say the class name is cms.file and cms.folder, and the property is FileIsHidden, i think the syntax would be:
+((FieldIsHidden:(true) and classname:('cms.file')) OR (NOT classname:('cms.file'))
But you'll have to test that.

Can data in Solr be extended with manually defined meta data?

I have several documents in a solr collection that I want to be able to search through. Most of the data comes from web sites I can easily crawl, however, I need to add some attributes manually to because I have to add these attributes manually.
So as an example I get the following info from a site (all attributes returned from crawled site):
Name: Porsche Boxter
Year: 1996
I want to add additional fields through a web interface (info not present on crawled sites):
Cool: yes
foo: bar
My questions:
Does it make sense at all to store additional information along the indexed data within Solr (inside the documents) or would a best practice only have all crawled data in Solr and merge with an external managed database during query time? To me it makes more sense to have all my data that is eventually queried in Solr as some of the manually added attributed are required search criteria (e.g. look only for cool cars from the 90s).
Is it possible to use Solr to store additional information about indexed documents? I know the entire schema in advance, perhaps this is useful?
If I store my data exclusively in Solr, how can I ensure that during the next crawl the manually added data is not overwritten? Would partial update be required?
Since I am new to Solr it would also be very helpful if someone could simply manage what to look for in the documentation that describes my use case.
That depends on how often the external data changes. The more often, the less meaningful. Generally it is a good idea to store such data along the index data, because you get them without an additional database query.
Yes. Use indexed:falseand stored:true. If you knew not know all of such fields in advance you could use a dynamicField like <dynamicField name="*_stored" type="string" indexed="false" stored="true" />.
Yes. You have to use partial update. This is no problem in your case, because the fields not updated have stored:true.

Solr, managing entities

I have the following situation when using Solr. My document contains "entities" for example "peanut butter". I have a list of such entities. These are items that go together and are not to be treated as two individual words. During indexing, I want solr to realize this and treat "peanut butter" as an entity. For example if someone searches for
then documents that have the word peanut should rank higher than documents that have the word "peanut butter". However if someone searches for
"peanut butter"
then the document that has peanut butter should show up higher than ones that have just peanut. Is there a config setting somewhere which can be modified such that the entity list can be specified in a file and Solr would do the needful?
Configure that field to use a StrField type, instead of a TextField. TextField is designed to handle tokenization and full-text search on textual content. StrField treats it's contents as a keyword, and so does not tokenize.

Invalid Magento Search result

Searching Magento with fulltext search engine and like method , it will store results in catalogsearch_fulltext table in "data_index" field where it stores value in the format like
each searchable attribute is separated with |.
3003|Enabled|None||Product name|1.99|yellow|0
here it store sku,status,tax class, product name , price ,color etc etc
It stores all searchable attribute value.
Now the issue is for Configurable product , it will also store the associated products name ,price ,status in the same field like
3003|Enabled|Enabled|Enabled|Enabled|None|None|None|None|Product name|Product name|associted Product name1|associted Product name2|associted Product name3|1.99|2.00|2.99|3.99|yellow|black|yellow|green|0|0|0|0
So what happen is if i search for any word from associated product, it will also list the main configurable product as it has the word in its "data_index" field.
Need some suggestion how can i avoid associated products being included in data_index, So that i can have perfect search result.
We are looking into our search as well and it has been surprising to see the inefficiencies included in the fulltext table. We have some configurable products as well that have MANY variations and their population in the fulltext search is downright horrendous.
As for solutions, I can only offer my approach to fix the problem (not completed: but rather in the process).
I am extending Magento to include an event listener to the process of indexing the products (Because catalog search indexing is when the fulltext database is populated). Once that process occurs, I am writing my own module to remove duplicate entries from the associated products and also to add the functionality of adding additional search keyword terms as populated from a CSV file.
This should effectively increase search speed dramatically and also return more relevent search results. Because as of now, configurable products are getting "search bias" in the search results.
This isn't so much of an answer as a comment, but it was too lengthy to fit in the comments but I thought this might be beneficial to you. Once I get my module working, if you would like, I can possibly give you directions on how you could implement a similar module yourself.
Hope that helped (if only for moral support in magento's search struggle)

How can I configure Sitecore search to retrieve custom values from the search index

I am using the AdvancedDatabaseCrawler as a base for my search page. I have configured it so that I can search for what I want and it is very fast. The problem is that as soon as you want to do anything with the search results that requires accessing field values the performance goes through the roof.
The main search results part is fine as even if there are 1000 results returned from the search I am only showing 10 or 20 results per page which means I only have to retrieve 10 or 20 items. However in the sidebar I am listing out various filtering options with the number or results associated with each filtering option (eBay style). In order to retrieve these filter options I perform a relationship search based on the search results. Since the search results only contain SkinnyItems it has to call GetItem() on every single result to get the actual item in order to get the value that I'm filtering by. In other words it will call Database.GetItem(id) 1000 times! Obviously that is not terribly efficient.
Am I missing something here? Is there any way to configure Sitecore search to retrieve custom values from the search index? If I can search for the values in the index why can't I also retrieve them? If I can't, how else can I process the results without getting each individual item from the database?
Here is an idea of the functionality that I’m after:
Klaus answered on SDN: use facetting with Apache Solr or similar.
I've currently resolved this by defining dynamic fields for every field that I will need to filter by or return in the search result collection. That way I can achieve the facetted searching that is required without needing to grab field values from the database. I'm assuming that by adding the dynamic fields we are taking a performance hit when rebuilding the index. But I can live with that.
In the future we'll probably look at utilizing a product like Apache Solr.
